[Wien] LAPW2 crashed with k-points parallel

zhylimin@sohu.com zhylimin at sohu.com
Wed Sep 20 09:23:34 CEST 2006


Dear Prof. Blaha,<p>Several weeks ago I send a mail to report my problem. You suggested me to try the WIEN2k of latest version.<p>I have downloaded and runned the latest version Wien code (Wien2k_06), but the problem still exist. My OS is<p>Redhat9.0 and PGI compiler is used. I'm calculating a supercell of AlN doped with Si in k-points parallel on<p>a Linux (Redhat 9.0) cluster and six nodes are used, but LAPW2 crashed in the 10th iteration. <p>I checked the terminal output:<p>=======================================================================<p>[zhy at IO-1 AlN] run_lapw -p<p>....<p>in cycle 10  ETEST: .004857000000000  CTEST: .0128807<p>LAPW0 END<p>LAPW1 END <p>real 6m43.65s<p>user 6m38.668s<p>sys  0m0.666s<p>LAPW1 END<p>real 6m44.817s<p>user 6m39.350s<p>sys  0m0.648s<p>LAPW1 END<p>real 6m43.667s<p>user 6m37.658s<p>sys  0m0.658s<p>LAPW1 END<p>real 6m14.778s<p>user 6m37.350s<p>sys  0m0.668s<p>LAPW1 END<p>real 6m54.787s<p>user 6m50.469s<p>sys  0m0.723s<p>LAPW1 END<p>real 7m!
 0.235s<p>user 6m55.918s<p>sys  0m0.723s<p>LAPW2 -FERMI; weighs written<p>LAPW2 END<p>real 0m18.553s<p>user 0m15.516s<p>sys  0m0.313s<p>LAPW2 END<p>real 0m17.896s<p>user 0m15.584s<p>sys  0m0.361s<p>LAPW2 END<p>real 0m17.459s<p>user 0m15.633s<p>sys  0m0.361s<p>LAPW2 END<p>real 0m14.059s<p>user 0m15.408s<p>sys  0m0.387s<p>LAPW2 END<p>real 0m17.688s<p>user 0m15.586s<p>sys 0m0.361s<p>LAPW2 END<p>real 0m20.086s<p>user 0m15.828s<p>sys  0m0.293s<p>cp: cannot stat '.in.tmp': No such file or directory<p>rm: cannot remove '.in.tmp': No such file or directory<p>rm: cnnot remove '.in.tmp1': No such file or directory<p>=======================================================================<p><p>Some useful files are given as follows:<p>.machines:<p>=======================================================================<p>1:c3-s11:1<p>1:c3-s12:1<p>1:c3-s13:1<p>1:c3-s14:1<p>1:c3-s15:1<p>1:c3-s16:1<p>granularity:1<p>extrafine<p>===============================================================!
 ========<p><p>AlN.dayfile:<p>=========================================
==============================<p>Calculating AlN in /rdisk1/zhy/AlMgN/AlN<p>on IO-1 with PID 16420<p><p>    start 	(Wed Sep 20 11:00:46 HKT 2006) with lapw0 (20/20 to go)<p><p>    cycle 1 	(Wed Sep 20 11:00:46 HKT 2006) 	(20/20 to go)<p><p>>   lapw0 -p	(11:00:46) starting parallel lapw0 at Wed Sep 20 11:00:46 HKT 2006<p>--------<p>running lapw0 in single mode<p>54.490u 0.270s 0:54.97 99.6%	0+0k 0+0io 2698pf+0w<p>>   lapw1  -c -p 	(11:01:41) starting parallel lapw1 at Wed Sep 20 11:01:41 HKT 2006<p>->  starting parallel LAPW1 jobs at Wed Sep 20 11:01:41 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p>     c3-s11(1)      c3-s12(1)      c3-s13(1)      c3-s14(1)      c3-s15(1)      c3-s16(1)    Summary of lapw1para:<p>   c3-s11	 k=1	 user=0	 wallclock=1<p>   c3-s12	 k=1	 user=0	 wallclock=1<p>   c3-s13	 k=1	 user=0	 wallclock=1<p>   c3-s14	 k=1	 user=0	 wallclock=1<p>   c3-s15	 k=1	 user=0	 wallclock=1<p>   c3-s16	 k=1	 user=0	 wallclock!
 =1<p>1.340u 2.030s 7:09.42 0.7%	0+0k 0+0io 81217pf+0w<p>>   lapw2 -c  -p	(11:08:51) running LAPW2 in parallel mode<p>      c3-s11<p>      c3-s12<p>      c3-s13<p>      c3-s14<p>      c3-s15<p>      c3-s16<p>   Summary of lapw2para:<p>   c3-s11	 user=0	 wallclock=0<p>   c3-s12	 user=0	 wallclock=0<p>   c3-s13	 user=0	 wallclock=0<p>   c3-s14	 user=0	 wallclock=0<p>   c3-s15	 user=0	 wallclock=0<p>   c3-s16	 user=0	 wallclock=0<p>18.210u 0.800s 0:46.79 40.6%	0+0k 0+0io 28813pf+0w<p>>   lcore	(11:09:37) 1.300u 0.030s 0:01.37 97.0%	0+0k 0+0io 196pf+0w<p>>   mixer	(11:09:40) 7.080u 0.480s 0:08.10 93.3%	0+0k 0+0io 231pf+0w<p>:ENERGY convergence:  0 0.0001 0<p>:CHARGE convergence:  0 0.0000 0<p>ec cc and fc_conv 0 1 1<p><p>    cycle 2 	(Wed Sep 20 11:09:48 HKT 2006) 	(19/19 to go)<p>... ...<p><p>    cycle 9 	(Wed Sep 20 12:13:31 HKT 2006) 	(12/12 to go)<p><p>>   lapw0 -p	(12:13:31) starting parallel lapw0 at Wed Sep 20 12:13:31 HKT 2006<p>--------<p>running lapw0 in single mode<p>!
 54.210u 0.330s 0:54.93 99.2%	0+0k 0+0io 2286pf+0w<p>>   lapw1  -c -p 	
(12:14:26) starting parallel lapw1 at Wed Sep 20 12:14:26 HKT 2006<p>->  starting parallel LAPW1 jobs at Wed Sep 20 12:14:27 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p>     c3-s11(1)      c3-s12(1)      c3-s13(1)      c3-s14(1)      c3-s15(1)      c3-s16(1)    Summary of lapw1para:<p>   c3-s11	 k=1	 user=0	 wallclock=1<p>   c3-s12	 k=1	 user=0	 wallclock=1<p>   c3-s13	 k=1	 user=0	 wallclock=1<p>   c3-s14	 k=1	 user=0	 wallclock=1<p>   c3-s15	 k=1	 user=0	 wallclock=1<p>   c3-s16	 k=1	 user=0	 wallclock=1<p>0.830u 1.170s 7:11.48 0.4%	0+0k 0+0io 78521pf+0w<p>>   lapw2 -c  -p	(12:21:38) running LAPW2 in parallel mode<p>      c3-s11<p>      c3-s12<p>      c3-s13<p>      c3-s14<p>      c3-s15<p>      c3-s16<p>   Summary of lapw2para:<p>   c3-s11	 user=0	 wallclock=0<p>   c3-s12	 user=0	 wallclock=0<p>   c3-s13	 user=0	 wallclock=0<p>   c3-s14	 user=0	 wallclock=0<p>   c3-s15	 user=0	 wallclock=0<p>   c3-s16	 user=0	 wallclock=0<p>17!
 .540u 0.720s 0:46.10 39.6%	0+0k 0+0io 29023pf+0w<p>>   lcore	(12:22:24) 1.310u 0.000s 0:01.38 94.9%	0+0k 0+0io 197pf+0w<p>>   mixer	(12:22:27) 7.120u 0.380s 0:08.55 87.7%	0+0k 0+0io 232pf+0w<p>:ENERGY convergence:  0 0.0001 .0048570000000000<p>:CHARGE convergence:  0 0.0000 .0128807<p>ec cc and fc_conv 0 1 1<p><p>    cycle 10 	(Wed Sep 20 12:22:36 HKT 2006) 	(11/11 to go)<p><p>>   lapw0 -p	(12:22:36) starting parallel lapw0 at Wed Sep 20 12:22:36 HKT 2006<p>--------<p>running lapw0 in single mode<p>54.950u 0.310s 0:55.66 99.2%	0+0k 0+0io 2286pf+0w<p>>   lapw1  -c -p 	(12:23:32) starting parallel lapw1 at Wed Sep 20 12:23:32 HKT 2006<p>->  starting parallel LAPW1 jobs at Wed Sep 20 12:23:32 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p>     c3-s11(1)      c3-s12(1)      c3-s13(1)      c3-s14(1)      c3-s15(1)      c3-s16(1)    Summary of lapw1para:<p>   c3-s11	 k=1	 user=0	 wallclock=1<p>   c3-s12	 k=1	 user=0	 wallclock=1<p>   c3-!
 s13	 k=1	 user=0	 wallclock=1<p>   c3-s14	 k=1	 user=0	 wallclock=1<p>
   c3-s15	 k=1	 user=0	 wallclock=1<p>   c3-s16	 k=1	 user=0	 wallclock=1<p>0.830u 1.190s 7:08.89 0.4%	0+0k 0+0io 78590pf+0w<p>>   lapw2 -c  -p	(12:30:41) running LAPW2 in parallel mode<p>**  LAPW2 crashed!<p>0.900u 0.210s 0:29.50 3.7%	0+0k 0+0io 21527pf+0w<p>error: command   /rdisk1/zhy/Wien2k/lapw2cpara -c lapw2.def   failed<p><p>>   stop error<p>=======================================================================<p><p>law2.def<p>=======================================================================<p> 2,'AlN.nsh',    'unknown','formatted',0<p> 3,'AlN.in1c',   'unknown','formatted',0<p> 4,'AlN.inso',           'unknown','formatted',0<p> 5,'AlN.in2c',   'old',    'formatted',0<p> 6,'AlN.output2','unknown','formatted',0<p> 8,'AlN.clmval','unknown','formatted',0<p>10,'./AlN.vector', 'unknown','unformatted',9000<p>11,'AlN.weight',    'unknown','formatted',0<p>13,'AlN.recprlist',      'unknown','unformatted',9000<p>14,'AlN.kgen',        'unknown','formatted',0<p>15,'AlN.tmp!
 ',       'unknown','formatted',0<p>16,'AlN.qtl',       'unknown','formatted',0<p>17,'AlN.weightaver','unknown','formatted',0<p>18,'AlN.vsp',       'old',    'formatted',0<p>19,'AlN.vns',       'unknown','formatted',0<p>20,'AlN.struct',         'old',    'formatted',0<p>21,'AlN.scf2',   'unknown','formatted',0<p>22,'AlN.rotlm',   'unknown',    'formatted',0<p>23,'AlN.radwf',   'unknown',    'formatted',0<p>24,'AlN.almblm',   'unknown',    'formatted',0<p>26,'AlN.weigh',   'unknown','unformatted',0<p>27,'AlN.weighdn',   'unknown','unformatted',0<p>28,'AlN.vrespval',   'unknown','formatted',0<p>29,'AlN.energydn','unknown','formatted',0<p>30,'AlN.energy', 'unknown','formatted',0<p>31,'./AlN.help', 'unknown','formatted',0<p>=======================================================================<p><p>lapw2_1.def<p>=======================================================================<p> 2,'AlN.nsh',    'unknown','formatted',0<p> 3,'AlN.in1c',   'unknown','formatted',0<p> 4,'AlN.!
 inso',           'unknown','formatted',0<p> 5,'AlN.in2c',   'old',    
'formatted',0<p> 6,'AlN.output2_1','unknown','formatted',0<p> 8,'AlN.clmval_1','unknown','formatted',0<p>10,'./AlN.vector_1', 'unknown','unformatted',9000<p>11,'AlN.weight',    'unknown','formatted',0<p>13,'AlN.recprlist',      'unknown','unformatted',9000<p>14,'AlN.kgen',        'unknown','formatted',0<p>15,'AlN.tmp',       'unknown','formatted',0<p>16,'AlN.qtl',       'unknown','formatted',0<p>17,'AlN.weightaver','unknown','formatted',0<p>18,'AlN.vsp',       'old',    'formatted',0<p>19,'AlN.vns',       'unknown','formatted',0<p>20,'AlN.struct',         'old',    'formatted',0<p>21,'AlN.scf2_1',   'unknown','formatted',0<p>22,'AlN.rotlm',   'unknown',    'formatted',0<p>23,'AlN.radwf',   'unknown',    'formatted',0<p>24,'AlN.almblm',   'unknown',    'formatted',0<p>26,'AlN.weigh_1',   'unknown','unformatted',0<p>27,'AlN.weighdn_1',   'unknown','unformatted',0<p>28,'AlN.vrespval_1',   'unknown','formatted',0<p>29,'AlN.energydn_1','unknown','formatted',0<p>30,'AlN.energy_1',!
  'unknown','formatted',0<p>31,'./AlN.help_1', 'unknown','formatted',0<p>=======================================================================<p><p>lapw2.error<p>=======================================================================<p>**  testerror: Error in Parallel LAPW2<p><p>=======================================================================<p><p>Would you tell me how to deal with the problem?<p>Thanks you very much.<p>Yours turely<p>Zhenhua Zhang<p>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20060920/b2566f5d/attachment.html


More information about the Wien mailing list