[Wien] parallel lapw0 crashes due to wrong case.clmsum
ARUGA Tetsuya
aruga at kuchem.kyoto-u.ac.jp
Fri Nov 21 04:26:54 CET 2003
Thank you, Kevin.
Since I have been out for a week, I still have not solved the problem.
I already tried
to increase sleep and wait parameters, which did not work. I also
checked (not very
carefully, though) other case.clm* files based on which case.clmsum is
constructed
but did not find anything wrong.
I maybe will compare all the output files written in the first scf
cycle with those
written in a single job.
Tetsuya Aruga
On 18-Nov-2003, at 21:34, Jorissen Kevin wrote:
> Hi,
> If your calculation works on a single node, then your input files are
> correct.
> Case.clmsum is written by mixer. So you have a problem in mixer or
> earlier. I suspect earlier, in lapw1 or lapw2 of the first iteration.
> Are all required output files written by lapw1 and lapw2?
> To eliminate 'slow NFS' problems, you can increase the sleep/wait
> parameters in lapw1para and lapw2para. Even though that shouldn't
> matter much on the shared memory machine, maybe it's worth a try?
>
> Good luck,
>
> Kevin.
>
>
>
> -----Oorspronkelijk bericht-----
> Van: ARUGA Tetsuya [mailto:aruga at kuchem.kyoto-u.ac.jp]
> Verzonden: vr 11/14/2003 4:19
> Aan: wien at zeus.theochem.tuwien.ac.at
> CC:
> Onderwerp: [Wien] parallel lapw0 crashes due to wrong case.clmsum
>
>
>
> Dear WIEN users,
>
> I encountered a trouble as described below during a k-point-parallel
> job. I
> searched through the past logs of the ML but could not find a post on
> this
> problem.
>
> I would really appreciate any suggestion, comment, hint,..., anything!
>
> (1) The trouble: At the _second_ scf cycle, lapw0 crashes. The STDOUT
> is read:
>
> ---begin copy---
> Input/Output Error 148: Invalid character
>
> In Procedure: main program
> At Line: 403
>
> Statement: Formatted READ
> Unit: 8
> Connected To: Cu7.clmsum
> Form: Formatted
> Access: Sequential
> Records Read : 7239
> Records Written: 0
>
> Current I/O Buffer:
>
> 0 0
> 0????????????????????????????????????????????????????????????
> !
>
>
> End of diagnostics
> ---end copy---
>
> Actually, the case.clmsum file written in the first cycle contains
> bunches of "?" marks.
>
> (2) The problem occurs only for particular cases but never occur in
> the
> other cases.
> This may suggest that something is wrong in the case.struct file or in
> the other
> input files which I prepared by using w2web. (I am attaching below one
> of the
> case.struct files which do not run in a parallel job.) But the same
> case.struct and
> input files give a successful calculation in a single (I mean,
> non-parallel) job.
>
> (3) The problem occurs only for the parallel jobs. I therefore
> suspected that the
> problem may be caused by time-out or something in NFS. But the problem
> occurs not
> only in a remote job on two different NFS-networked machines but also
> in a
> background job on a single shared-memory machine....
>
> Best regards,
>
> Tetsuya Aruga
> __________________________________________________________________
> Tetsuya ARUGA, Dr.
> Department of Chemistry
> Kyoto University
> Kyoto 606-8502, Japan
> Voice +81-75-753-3977 Fax +81-75-753-4000
>
>
>
>
> ---begin Cu7.struct---
> Cu7
> P LATTICE,NONEQUIV.ATOMS: 4 123 P4/mmm
> MODE OF CALC=RELA unit=bohr
> 4.855323 4.855323 50.000000 90.000000 90.000000 90.000000
> ATOM -1: X=0.50000000 Y=0.50000000 Z=0.20468442
> MULT= 2 ISPLIT=-2
> -1: X=0.50000000 Y=0.50000000 Z=0.79531558
> Cu1 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -2: X=0.00000000 Y=0.00000000 Z=0.13796769
> MULT= 2 ISPLIT=-2
> -2: X=0.00000000 Y=0.00000000 Z=0.86203231
> Cu2 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -3: X=0.50000000 Y=0.50000000 Z=0.06889081
> MULT= 2 ISPLIT=-2
> -3: X=0.50000000 Y=0.50000000 Z=0.93110919
> Cu3 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -4: X=0.00000000 Y=0.00000000 Z=0.00000000
> MULT= 1 ISPLIT=-2
> Cu4 NPT= 781 R0=0.00010000 RMT= 2.3000 Z: 29.0
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> 16 NUMBER OF SYMMETRY OPERATIONS
> 1 0 0 0.0000000
> 0 1 0 0.0000000
> 0 0 1 0.0000000
> 1
> -1 0 0 0.0000000
> 0-1 0 0.0000000
> 0 0 1 0.0000000
> 2
> 0-1 0 0.0000000
> 1 0 0 0.0000000
> 0 0 1 0.0000000
> 3
> 0 1 0 0.0000000
> -1 0 0 0.0000000
> 0 0 1 0.0000000
> 4
> -1 0 0 0.0000000
> 0 1 0 0.0000000
> 0 0-1 0.0000000
> 5
> 1 0 0 0.0000000
> 0-1 0 0.0000000
> 0 0-1 0.0000000
> 6
> 0 1 0 0.0000000
> 1 0 0 0.0000000
> 0 0-1 0.0000000
> 7
> 0-1 0 0.0000000
> -1 0 0 0.0000000
> 0 0-1 0.0000000
> 8
> -1 0 0 0.0000000
> 0-1 0 0.0000000
> 0 0-1 0.0000000
> 9
> 1 0 0 0.0000000
> 0 1 0 0.0000000
> 0 0-1 0.0000000
> 10
> 0 1 0 0.0000000
> -1 0 0 0.0000000
> 0 0-1 0.0000000
> 11
> 0-1 0 0.0000000
> 1 0 0 0.0000000
> 0 0-1 0.0000000
> 12
> 1 0 0 0.0000000
> 0-1 0 0.0000000
> 0 0 1 0.0000000
> 13
> -1 0 0 0.0000000
> 0 1 0 0.0000000
> 0 0 1 0.0000000
> 14
> 0-1 0 0.0000000
> -1 0 0 0.0000000
> 0 0 1 0.0000000
> 15
> 0 1 0 0.0000000
> 1 0 0 0.0000000
> 0 0 1 0.0000000
> 16
> ---end Cu7.struct---
>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
>
> <winmail.dat>
More information about the Wien
mailing list