[Wien] parallel lapw0 crashes due to wrong case.clmsum

Jorissen Kevin Kevin.Jorissen at ua.ac.be
Tue Nov 18 13:34:37 CET 2003


Hi,
If your calculation works on a single node, then your input files are correct.
Case.clmsum is written by mixer.  So you have a problem in mixer or earlier.  I suspect earlier, in lapw1 or lapw2 of the first iteration.  Are all required output files written by lapw1 and lapw2?
To eliminate 'slow NFS' problems, you can increase the sleep/wait parameters in lapw1para and lapw2para.  Even though that shouldn't matter much on the shared memory machine, maybe it's worth a try?
 
Good luck,
 
Kevin.
 
 

	-----Oorspronkelijk bericht----- 
	Van: ARUGA Tetsuya [mailto:aruga at kuchem.kyoto-u.ac.jp] 
	Verzonden: vr 11/14/2003 4:19 
	Aan: wien at zeus.theochem.tuwien.ac.at 
	CC: 
	Onderwerp: [Wien] parallel lapw0 crashes due to wrong case.clmsum
	
	

	Dear WIEN users,
	
	I encountered a trouble as described below during a k-point-parallel
	job. I
	searched through the past logs of the ML but could not find a post on
	this
	problem.
	
	I would really appreciate any suggestion, comment, hint,..., anything!
	
	(1) The trouble: At the _second_ scf cycle, lapw0 crashes. The STDOUT
	is read:
	
	---begin copy---
	  Input/Output Error 148: Invalid character
	
	    In Procedure: main program
	         At Line: 403
	
	       Statement: Formatted READ
	            Unit: 8
	    Connected To: Cu7.clmsum
	            Form: Formatted
	          Access: Sequential
	Records Read   : 7239
	Records Written: 0
	
	Current I/O Buffer:
	
	        0    0   
	0????????????????????????????????????????????????????????????
	                   !
	
	
	End of diagnostics
	---end copy---
	
	Actually, the case.clmsum file written in the first cycle contains
	bunches of "?" marks.
	
	(2) The problem occurs only for particular cases but never occur in the
	other cases.
	This may suggest that something is wrong in the case.struct file or in
	the other
	input files which I prepared by using w2web. (I am attaching below one
	of the
	case.struct files which do not run in a parallel job.) But the same
	case.struct and
	input files give a successful calculation in a single (I mean,
	non-parallel) job.
	
	(3) The problem occurs only for the parallel jobs. I therefore
	suspected that the
	problem may be caused by time-out or something in NFS. But the problem
	occurs not
	only in a remote job on two different NFS-networked machines but also
	in a
	background job on a single shared-memory machine....
	
	Best regards,
	
	Tetsuya Aruga
	__________________________________________________________________
	  Tetsuya ARUGA, Dr.
	  Department of Chemistry
	  Kyoto University
	  Kyoto 606-8502, Japan
	  Voice +81-75-753-3977  Fax +81-75-753-4000
	
	
	
	
	---begin Cu7.struct---
	Cu7
	P   LATTICE,NONEQUIV.ATOMS:  4 123 P4/mmm
	MODE OF CALC=RELA unit=bohr
	   4.855323  4.855323 50.000000 90.000000 90.000000 90.000000
	ATOM  -1: X=0.50000000 Y=0.50000000 Z=0.20468442
	           MULT= 2          ISPLIT=-2
	       -1: X=0.50000000 Y=0.50000000 Z=0.79531558
	Cu1        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
	                      0.0000000 1.0000000 0.0000000
	                      0.0000000 0.0000000 1.0000000
	ATOM  -2: X=0.00000000 Y=0.00000000 Z=0.13796769
	           MULT= 2          ISPLIT=-2
	       -2: X=0.00000000 Y=0.00000000 Z=0.86203231
	Cu2        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
	                      0.0000000 1.0000000 0.0000000
	                      0.0000000 0.0000000 1.0000000
	ATOM  -3: X=0.50000000 Y=0.50000000 Z=0.06889081
	           MULT= 2          ISPLIT=-2
	       -3: X=0.50000000 Y=0.50000000 Z=0.93110919
	Cu3        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
	                      0.0000000 1.0000000 0.0000000
	                      0.0000000 0.0000000 1.0000000
	ATOM  -4: X=0.00000000 Y=0.00000000 Z=0.00000000
	           MULT= 1          ISPLIT=-2
	Cu4        NPT=  781  R0=0.00010000 RMT=    2.3000   Z: 29.0
	LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
	                      0.0000000 1.0000000 0.0000000
	                      0.0000000 0.0000000 1.0000000
	   16      NUMBER OF SYMMETRY OPERATIONS
	  1 0 0 0.0000000
	  0 1 0 0.0000000
	  0 0 1 0.0000000
	        1
	-1 0 0 0.0000000
	  0-1 0 0.0000000
	  0 0 1 0.0000000
	        2
	  0-1 0 0.0000000
	  1 0 0 0.0000000
	  0 0 1 0.0000000
	        3
	  0 1 0 0.0000000
	-1 0 0 0.0000000
	  0 0 1 0.0000000
	        4
	-1 0 0 0.0000000
	  0 1 0 0.0000000
	  0 0-1 0.0000000
	        5
	  1 0 0 0.0000000
	  0-1 0 0.0000000
	  0 0-1 0.0000000
	        6
	  0 1 0 0.0000000
	  1 0 0 0.0000000
	  0 0-1 0.0000000
	        7
	  0-1 0 0.0000000
	-1 0 0 0.0000000
	  0 0-1 0.0000000
	        8
	-1 0 0 0.0000000
	  0-1 0 0.0000000
	  0 0-1 0.0000000
	        9
	  1 0 0 0.0000000
	  0 1 0 0.0000000
	  0 0-1 0.0000000
	       10
	  0 1 0 0.0000000
	-1 0 0 0.0000000
	  0 0-1 0.0000000
	       11
	  0-1 0 0.0000000
	  1 0 0 0.0000000
	  0 0-1 0.0000000
	       12
	  1 0 0 0.0000000
	  0-1 0 0.0000000
	  0 0 1 0.0000000
	       13
	-1 0 0 0.0000000
	  0 1 0 0.0000000
	  0 0 1 0.0000000
	       14
	  0-1 0 0.0000000
	-1 0 0 0.0000000
	  0 0 1 0.0000000
	       15
	  0 1 0 0.0000000
	  1 0 0 0.0000000
	  0 0 1 0.0000000
	       16
	---end Cu7.struct---
	
	
	
	
	_______________________________________________
	Wien mailing list
	Wien at zeus.theochem.tuwien.ac.at
	http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
	
	

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 11710 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20031118/8a77a002/attachment.bin


More information about the Wien mailing list