[Wien] parallel lapw0 crashes due to wrong case.clmsum
Jorissen Kevin
Kevin.Jorissen at ua.ac.be
Tue Nov 18 13:34:37 CET 2003
Hi,
If your calculation works on a single node, then your input files are correct.
Case.clmsum is written by mixer. So you have a problem in mixer or earlier. I suspect earlier, in lapw1 or lapw2 of the first iteration. Are all required output files written by lapw1 and lapw2?
To eliminate 'slow NFS' problems, you can increase the sleep/wait parameters in lapw1para and lapw2para. Even though that shouldn't matter much on the shared memory machine, maybe it's worth a try?
Good luck,
Kevin.
-----Oorspronkelijk bericht-----
Van: ARUGA Tetsuya [mailto:aruga at kuchem.kyoto-u.ac.jp]
Verzonden: vr 11/14/2003 4:19
Aan: wien at zeus.theochem.tuwien.ac.at
CC:
Onderwerp: [Wien] parallel lapw0 crashes due to wrong case.clmsum
Dear WIEN users,
I encountered a trouble as described below during a k-point-parallel
job. I
searched through the past logs of the ML but could not find a post on
this
problem.
I would really appreciate any suggestion, comment, hint,..., anything!
(1) The trouble: At the _second_ scf cycle, lapw0 crashes. The STDOUT
is read:
---begin copy---
Input/Output Error 148: Invalid character
In Procedure: main program
At Line: 403
Statement: Formatted READ
Unit: 8
Connected To: Cu7.clmsum
Form: Formatted
Access: Sequential
Records Read : 7239
Records Written: 0
Current I/O Buffer:
0 0
0????????????????????????????????????????????????????????????
!
End of diagnostics
---end copy---
Actually, the case.clmsum file written in the first cycle contains
bunches of "?" marks.
(2) The problem occurs only for particular cases but never occur in the
other cases.
This may suggest that something is wrong in the case.struct file or in
the other
input files which I prepared by using w2web. (I am attaching below one
of the
case.struct files which do not run in a parallel job.) But the same
case.struct and
input files give a successful calculation in a single (I mean,
non-parallel) job.
(3) The problem occurs only for the parallel jobs. I therefore
suspected that the
problem may be caused by time-out or something in NFS. But the problem
occurs not
only in a remote job on two different NFS-networked machines but also
in a
background job on a single shared-memory machine....
Best regards,
Tetsuya Aruga
__________________________________________________________________
Tetsuya ARUGA, Dr.
Department of Chemistry
Kyoto University
Kyoto 606-8502, Japan
Voice +81-75-753-3977 Fax +81-75-753-4000
---begin Cu7.struct---
Cu7
P LATTICE,NONEQUIV.ATOMS: 4 123 P4/mmm
MODE OF CALC=RELA unit=bohr
4.855323 4.855323 50.000000 90.000000 90.000000 90.000000
ATOM -1: X=0.50000000 Y=0.50000000 Z=0.20468442
MULT= 2 ISPLIT=-2
-1: X=0.50000000 Y=0.50000000 Z=0.79531558
Cu1 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
ATOM -2: X=0.00000000 Y=0.00000000 Z=0.13796769
MULT= 2 ISPLIT=-2
-2: X=0.00000000 Y=0.00000000 Z=0.86203231
Cu2 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
ATOM -3: X=0.50000000 Y=0.50000000 Z=0.06889081
MULT= 2 ISPLIT=-2
-3: X=0.50000000 Y=0.50000000 Z=0.93110919
Cu3 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
ATOM -4: X=0.00000000 Y=0.00000000 Z=0.00000000
MULT= 1 ISPLIT=-2
Cu4 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
16 NUMBER OF SYMMETRY OPERATIONS
1 0 0 0.0000000
0 1 0 0.0000000
0 0 1 0.0000000
1
-1 0 0 0.0000000
0-1 0 0.0000000
0 0 1 0.0000000
2
0-1 0 0.0000000
1 0 0 0.0000000
0 0 1 0.0000000
3
0 1 0 0.0000000
-1 0 0 0.0000000
0 0 1 0.0000000
4
-1 0 0 0.0000000
0 1 0 0.0000000
0 0-1 0.0000000
5
1 0 0 0.0000000
0-1 0 0.0000000
0 0-1 0.0000000
6
0 1 0 0.0000000
1 0 0 0.0000000
0 0-1 0.0000000
7
0-1 0 0.0000000
-1 0 0 0.0000000
0 0-1 0.0000000
8
-1 0 0 0.0000000
0-1 0 0.0000000
0 0-1 0.0000000
9
1 0 0 0.0000000
0 1 0 0.0000000
0 0-1 0.0000000
10
0 1 0 0.0000000
-1 0 0 0.0000000
0 0-1 0.0000000
11
0-1 0 0.0000000
1 0 0 0.0000000
0 0-1 0.0000000
12
1 0 0 0.0000000
0-1 0 0.0000000
0 0 1 0.0000000
13
-1 0 0 0.0000000
0 1 0 0.0000000
0 0 1 0.0000000
14
0-1 0 0.0000000
-1 0 0 0.0000000
0 0 1 0.0000000
15
0 1 0 0.0000000
1 0 0 0.0000000
0 0 1 0.0000000
16
---end Cu7.struct---
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 11710 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20031118/8a77a002/attachment.bin
More information about the Wien
mailing list