[Wien] parallel lapw0 crashes due to wrong case.clmsum

ARUGA Tetsuya aruga at kuchem.kyoto-u.ac.jp
Fri Nov 21 04:26:54 CET 2003


Thank you, Kevin.
Since I have been out for a week, I still have not solved the problem. 
I already tried
to increase sleep and wait parameters, which did not work. I also 
checked (not very
carefully, though) other case.clm* files based on which case.clmsum is 
constructed
but did not find anything wrong.

I maybe will compare all the output files written in the first scf 
cycle with those
written in a single job.

Tetsuya Aruga


On 18-Nov-2003, at 21:34, Jorissen Kevin wrote:

> Hi,
> If your calculation works on a single node, then your input files are 
> correct.
> Case.clmsum is written by mixer.  So you have a problem in mixer or 
> earlier.  I suspect earlier, in lapw1 or lapw2 of the first iteration. 
>  Are all required output files written by lapw1 and lapw2?
> To eliminate 'slow NFS' problems, you can increase the sleep/wait 
> parameters in lapw1para and lapw2para.  Even though that shouldn't 
> matter much on the shared memory machine, maybe it's worth a try?
>
> Good luck,
>
> Kevin.
>
>
>
> 	-----Oorspronkelijk bericht-----
> 	Van: ARUGA Tetsuya [mailto:aruga at kuchem.kyoto-u.ac.jp]
> 	Verzonden: vr 11/14/2003 4:19
> 	Aan: wien at zeus.theochem.tuwien.ac.at
> 	CC:
> 	Onderwerp: [Wien] parallel lapw0 crashes due to wrong case.clmsum
> 	
> 	
>
> 	Dear WIEN users,
> 	
> 	I encountered a trouble as described below during a k-point-parallel
> 	job. I
> 	searched through the past logs of the ML but could not find a post on
> 	this
> 	problem.
> 	
> 	I would really appreciate any suggestion, comment, hint,..., anything!
> 	
> 	(1) The trouble: At the _second_ scf cycle, lapw0 crashes. The STDOUT
> 	is read:
> 	
> 	---begin copy---
> 	  Input/Output Error 148: Invalid character
> 	
> 	    In Procedure: main program
> 	         At Line: 403
> 	
> 	       Statement: Formatted READ
> 	            Unit: 8
> 	    Connected To: Cu7.clmsum
> 	            Form: Formatted
> 	          Access: Sequential
> 	Records Read   : 7239
> 	Records Written: 0
> 	
> 	Current I/O Buffer:
> 	
> 	        0    0
> 	0????????????????????????????????????????????????????????????
> 	                   !
> 	
> 	
> 	End of diagnostics
> 	---end copy---
> 	
> 	Actually, the case.clmsum file written in the first cycle contains
> 	bunches of "?" marks.
> 	
> 	(2) The problem occurs only for particular cases but never occur in 
> the
> 	other cases.
> 	This may suggest that something is wrong in the case.struct file or in
> 	the other
> 	input files which I prepared by using w2web. (I am attaching below one
> 	of the
> 	case.struct files which do not run in a parallel job.) But the same
> 	case.struct and
> 	input files give a successful calculation in a single (I mean,
> 	non-parallel) job.
> 	
> 	(3) The problem occurs only for the parallel jobs. I therefore
> 	suspected that the
> 	problem may be caused by time-out or something in NFS. But the problem
> 	occurs not
> 	only in a remote job on two different NFS-networked machines but also
> 	in a
> 	background job on a single shared-memory machine....
> 	
> 	Best regards,
> 	
> 	Tetsuya Aruga
> 	__________________________________________________________________
> 	  Tetsuya ARUGA, Dr.
> 	  Department of Chemistry
> 	  Kyoto University
> 	  Kyoto 606-8502, Japan
> 	  Voice +81-75-753-3977  Fax +81-75-753-4000
> 	
> 	
> 	
> 	
> 	---begin Cu7.struct---
> 	Cu7
> 	P   LATTICE,NONEQUIV.ATOMS:  4 123 P4/mmm
> 	MODE OF CALC=RELA unit=bohr
> 	   4.855323  4.855323 50.000000 90.000000 90.000000 90.000000
> 	ATOM  -1: X=0.50000000 Y=0.50000000 Z=0.20468442
> 	           MULT= 2          ISPLIT=-2
> 	       -1: X=0.50000000 Y=0.50000000 Z=0.79531558
> 	Cu1        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
> 	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
> 	                      0.0000000 1.0000000 0.0000000
> 	                      0.0000000 0.0000000 1.0000000
> 	ATOM  -2: X=0.00000000 Y=0.00000000 Z=0.13796769
> 	           MULT= 2          ISPLIT=-2
> 	       -2: X=0.00000000 Y=0.00000000 Z=0.86203231
> 	Cu2        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
> 	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
> 	                      0.0000000 1.0000000 0.0000000
> 	                      0.0000000 0.0000000 1.0000000
> 	ATOM  -3: X=0.50000000 Y=0.50000000 Z=0.06889081
> 	           MULT= 2          ISPLIT=-2
> 	       -3: X=0.50000000 Y=0.50000000 Z=0.93110919
> 	Cu3        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
> 	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
> 	                      0.0000000 1.0000000 0.0000000
> 	                      0.0000000 0.0000000 1.0000000
> 	ATOM  -4: X=0.00000000 Y=0.00000000 Z=0.00000000
> 	           MULT= 1          ISPLIT=-2
> 	Cu4        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
> 	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
> 	                      0.0000000 1.0000000 0.0000000
> 	                      0.0000000 0.0000000 1.0000000
> 	   16      NUMBER OF SYMMETRY OPERATIONS
> 	  1 0 0 0.0000000
> 	  0 1 0 0.0000000
> 	  0 0 1 0.0000000
> 	        1
> 	-1 0 0 0.0000000
> 	  0-1 0 0.0000000
> 	  0 0 1 0.0000000
> 	        2
> 	  0-1 0 0.0000000
> 	  1 0 0 0.0000000
> 	  0 0 1 0.0000000
> 	        3
> 	  0 1 0 0.0000000
> 	-1 0 0 0.0000000
> 	  0 0 1 0.0000000
> 	        4
> 	-1 0 0 0.0000000
> 	  0 1 0 0.0000000
> 	  0 0-1 0.0000000
> 	        5
> 	  1 0 0 0.0000000
> 	  0-1 0 0.0000000
> 	  0 0-1 0.0000000
> 	        6
> 	  0 1 0 0.0000000
> 	  1 0 0 0.0000000
> 	  0 0-1 0.0000000
> 	        7
> 	  0-1 0 0.0000000
> 	-1 0 0 0.0000000
> 	  0 0-1 0.0000000
> 	        8
> 	-1 0 0 0.0000000
> 	  0-1 0 0.0000000
> 	  0 0-1 0.0000000
> 	        9
> 	  1 0 0 0.0000000
> 	  0 1 0 0.0000000
> 	  0 0-1 0.0000000
> 	       10
> 	  0 1 0 0.0000000
> 	-1 0 0 0.0000000
> 	  0 0-1 0.0000000
> 	       11
> 	  0-1 0 0.0000000
> 	  1 0 0 0.0000000
> 	  0 0-1 0.0000000
> 	       12
> 	  1 0 0 0.0000000
> 	  0-1 0 0.0000000
> 	  0 0 1 0.0000000
> 	       13
> 	-1 0 0 0.0000000
> 	  0 1 0 0.0000000
> 	  0 0 1 0.0000000
> 	       14
> 	  0-1 0 0.0000000
> 	-1 0 0 0.0000000
> 	  0 0 1 0.0000000
> 	       15
> 	  0 1 0 0.0000000
> 	  1 0 0 0.0000000
> 	  0 0 1 0.0000000
> 	       16
> 	---end Cu7.struct---
> 	
> 	
> 	
> 	
> 	_______________________________________________
> 	Wien mailing list
> 	Wien at zeus.theochem.tuwien.ac.at
> 	http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> 	
> 	
>
> <winmail.dat>




More information about the Wien mailing list