[Wien] LAPW2 crashed with k-points parallel
zhylimin@sohu.com
zhylimin at sohu.com
Wed Sep 20 09:23:34 CEST 2006
Dear Prof. Blaha,<p>Several weeks ago I send a mail to report my problem. You suggested me to try the WIEN2k of latest version.<p>I have downloaded and runned the latest version Wien code (Wien2k_06), but the problem still exist. My OS is<p>Redhat9.0 and PGI compiler is used. I'm calculating a supercell of AlN doped with Si in k-points parallel on<p>a Linux (Redhat 9.0) cluster and six nodes are used, but LAPW2 crashed in the 10th iteration. <p>I checked the terminal output:<p>=======================================================================<p>[zhy at IO-1 AlN] run_lapw -p<p>....<p>in cycle 10 ETEST: .004857000000000 CTEST: .0128807<p>LAPW0 END<p>LAPW1 END <p>real 6m43.65s<p>user 6m38.668s<p>sys 0m0.666s<p>LAPW1 END<p>real 6m44.817s<p>user 6m39.350s<p>sys 0m0.648s<p>LAPW1 END<p>real 6m43.667s<p>user 6m37.658s<p>sys 0m0.658s<p>LAPW1 END<p>real 6m14.778s<p>user 6m37.350s<p>sys 0m0.668s<p>LAPW1 END<p>real 6m54.787s<p>user 6m50.469s<p>sys 0m0.723s<p>LAPW1 END<p>real 7m!
0.235s<p>user 6m55.918s<p>sys 0m0.723s<p>LAPW2 -FERMI; weighs written<p>LAPW2 END<p>real 0m18.553s<p>user 0m15.516s<p>sys 0m0.313s<p>LAPW2 END<p>real 0m17.896s<p>user 0m15.584s<p>sys 0m0.361s<p>LAPW2 END<p>real 0m17.459s<p>user 0m15.633s<p>sys 0m0.361s<p>LAPW2 END<p>real 0m14.059s<p>user 0m15.408s<p>sys 0m0.387s<p>LAPW2 END<p>real 0m17.688s<p>user 0m15.586s<p>sys 0m0.361s<p>LAPW2 END<p>real 0m20.086s<p>user 0m15.828s<p>sys 0m0.293s<p>cp: cannot stat '.in.tmp': No such file or directory<p>rm: cannot remove '.in.tmp': No such file or directory<p>rm: cnnot remove '.in.tmp1': No such file or directory<p>=======================================================================<p><p>Some useful files are given as follows:<p>.machines:<p>=======================================================================<p>1:c3-s11:1<p>1:c3-s12:1<p>1:c3-s13:1<p>1:c3-s14:1<p>1:c3-s15:1<p>1:c3-s16:1<p>granularity:1<p>extrafine<p>===============================================================!
========<p><p>AlN.dayfile:<p>=========================================
==============================<p>Calculating AlN in /rdisk1/zhy/AlMgN/AlN<p>on IO-1 with PID 16420<p><p> start (Wed Sep 20 11:00:46 HKT 2006) with lapw0 (20/20 to go)<p><p> cycle 1 (Wed Sep 20 11:00:46 HKT 2006) (20/20 to go)<p><p>> lapw0 -p (11:00:46) starting parallel lapw0 at Wed Sep 20 11:00:46 HKT 2006<p>--------<p>running lapw0 in single mode<p>54.490u 0.270s 0:54.97 99.6% 0+0k 0+0io 2698pf+0w<p>> lapw1 -c -p (11:01:41) starting parallel lapw1 at Wed Sep 20 11:01:41 HKT 2006<p>-> starting parallel LAPW1 jobs at Wed Sep 20 11:01:41 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p> c3-s11(1) c3-s12(1) c3-s13(1) c3-s14(1) c3-s15(1) c3-s16(1) Summary of lapw1para:<p> c3-s11 k=1 user=0 wallclock=1<p> c3-s12 k=1 user=0 wallclock=1<p> c3-s13 k=1 user=0 wallclock=1<p> c3-s14 k=1 user=0 wallclock=1<p> c3-s15 k=1 user=0 wallclock=1<p> c3-s16 k=1 user=0 wallclock!
=1<p>1.340u 2.030s 7:09.42 0.7% 0+0k 0+0io 81217pf+0w<p>> lapw2 -c -p (11:08:51) running LAPW2 in parallel mode<p> c3-s11<p> c3-s12<p> c3-s13<p> c3-s14<p> c3-s15<p> c3-s16<p> Summary of lapw2para:<p> c3-s11 user=0 wallclock=0<p> c3-s12 user=0 wallclock=0<p> c3-s13 user=0 wallclock=0<p> c3-s14 user=0 wallclock=0<p> c3-s15 user=0 wallclock=0<p> c3-s16 user=0 wallclock=0<p>18.210u 0.800s 0:46.79 40.6% 0+0k 0+0io 28813pf+0w<p>> lcore (11:09:37) 1.300u 0.030s 0:01.37 97.0% 0+0k 0+0io 196pf+0w<p>> mixer (11:09:40) 7.080u 0.480s 0:08.10 93.3% 0+0k 0+0io 231pf+0w<p>:ENERGY convergence: 0 0.0001 0<p>:CHARGE convergence: 0 0.0000 0<p>ec cc and fc_conv 0 1 1<p><p> cycle 2 (Wed Sep 20 11:09:48 HKT 2006) (19/19 to go)<p>... ...<p><p> cycle 9 (Wed Sep 20 12:13:31 HKT 2006) (12/12 to go)<p><p>> lapw0 -p (12:13:31) starting parallel lapw0 at Wed Sep 20 12:13:31 HKT 2006<p>--------<p>running lapw0 in single mode<p>!
54.210u 0.330s 0:54.93 99.2% 0+0k 0+0io 2286pf+0w<p>> lapw1 -c -p
(12:14:26) starting parallel lapw1 at Wed Sep 20 12:14:26 HKT 2006<p>-> starting parallel LAPW1 jobs at Wed Sep 20 12:14:27 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p> c3-s11(1) c3-s12(1) c3-s13(1) c3-s14(1) c3-s15(1) c3-s16(1) Summary of lapw1para:<p> c3-s11 k=1 user=0 wallclock=1<p> c3-s12 k=1 user=0 wallclock=1<p> c3-s13 k=1 user=0 wallclock=1<p> c3-s14 k=1 user=0 wallclock=1<p> c3-s15 k=1 user=0 wallclock=1<p> c3-s16 k=1 user=0 wallclock=1<p>0.830u 1.170s 7:11.48 0.4% 0+0k 0+0io 78521pf+0w<p>> lapw2 -c -p (12:21:38) running LAPW2 in parallel mode<p> c3-s11<p> c3-s12<p> c3-s13<p> c3-s14<p> c3-s15<p> c3-s16<p> Summary of lapw2para:<p> c3-s11 user=0 wallclock=0<p> c3-s12 user=0 wallclock=0<p> c3-s13 user=0 wallclock=0<p> c3-s14 user=0 wallclock=0<p> c3-s15 user=0 wallclock=0<p> c3-s16 user=0 wallclock=0<p>17!
.540u 0.720s 0:46.10 39.6% 0+0k 0+0io 29023pf+0w<p>> lcore (12:22:24) 1.310u 0.000s 0:01.38 94.9% 0+0k 0+0io 197pf+0w<p>> mixer (12:22:27) 7.120u 0.380s 0:08.55 87.7% 0+0k 0+0io 232pf+0w<p>:ENERGY convergence: 0 0.0001 .0048570000000000<p>:CHARGE convergence: 0 0.0000 .0128807<p>ec cc and fc_conv 0 1 1<p><p> cycle 10 (Wed Sep 20 12:22:36 HKT 2006) (11/11 to go)<p><p>> lapw0 -p (12:22:36) starting parallel lapw0 at Wed Sep 20 12:22:36 HKT 2006<p>--------<p>running lapw0 in single mode<p>54.950u 0.310s 0:55.66 99.2% 0+0k 0+0io 2286pf+0w<p>> lapw1 -c -p (12:23:32) starting parallel lapw1 at Wed Sep 20 12:23:32 HKT 2006<p>-> starting parallel LAPW1 jobs at Wed Sep 20 12:23:32 HKT 2006<p>running LAPW1 in parallel mode (using .machines)<p>6 number_of_parallel_jobs<p> c3-s11(1) c3-s12(1) c3-s13(1) c3-s14(1) c3-s15(1) c3-s16(1) Summary of lapw1para:<p> c3-s11 k=1 user=0 wallclock=1<p> c3-s12 k=1 user=0 wallclock=1<p> c3-!
s13 k=1 user=0 wallclock=1<p> c3-s14 k=1 user=0 wallclock=1<p>
c3-s15 k=1 user=0 wallclock=1<p> c3-s16 k=1 user=0 wallclock=1<p>0.830u 1.190s 7:08.89 0.4% 0+0k 0+0io 78590pf+0w<p>> lapw2 -c -p (12:30:41) running LAPW2 in parallel mode<p>** LAPW2 crashed!<p>0.900u 0.210s 0:29.50 3.7% 0+0k 0+0io 21527pf+0w<p>error: command /rdisk1/zhy/Wien2k/lapw2cpara -c lapw2.def failed<p><p>> stop error<p>=======================================================================<p><p>law2.def<p>=======================================================================<p> 2,'AlN.nsh', 'unknown','formatted',0<p> 3,'AlN.in1c', 'unknown','formatted',0<p> 4,'AlN.inso', 'unknown','formatted',0<p> 5,'AlN.in2c', 'old', 'formatted',0<p> 6,'AlN.output2','unknown','formatted',0<p> 8,'AlN.clmval','unknown','formatted',0<p>10,'./AlN.vector', 'unknown','unformatted',9000<p>11,'AlN.weight', 'unknown','formatted',0<p>13,'AlN.recprlist', 'unknown','unformatted',9000<p>14,'AlN.kgen', 'unknown','formatted',0<p>15,'AlN.tmp!
', 'unknown','formatted',0<p>16,'AlN.qtl', 'unknown','formatted',0<p>17,'AlN.weightaver','unknown','formatted',0<p>18,'AlN.vsp', 'old', 'formatted',0<p>19,'AlN.vns', 'unknown','formatted',0<p>20,'AlN.struct', 'old', 'formatted',0<p>21,'AlN.scf2', 'unknown','formatted',0<p>22,'AlN.rotlm', 'unknown', 'formatted',0<p>23,'AlN.radwf', 'unknown', 'formatted',0<p>24,'AlN.almblm', 'unknown', 'formatted',0<p>26,'AlN.weigh', 'unknown','unformatted',0<p>27,'AlN.weighdn', 'unknown','unformatted',0<p>28,'AlN.vrespval', 'unknown','formatted',0<p>29,'AlN.energydn','unknown','formatted',0<p>30,'AlN.energy', 'unknown','formatted',0<p>31,'./AlN.help', 'unknown','formatted',0<p>=======================================================================<p><p>lapw2_1.def<p>=======================================================================<p> 2,'AlN.nsh', 'unknown','formatted',0<p> 3,'AlN.in1c', 'unknown','formatted',0<p> 4,'AlN.!
inso', 'unknown','formatted',0<p> 5,'AlN.in2c', 'old',
'formatted',0<p> 6,'AlN.output2_1','unknown','formatted',0<p> 8,'AlN.clmval_1','unknown','formatted',0<p>10,'./AlN.vector_1', 'unknown','unformatted',9000<p>11,'AlN.weight', 'unknown','formatted',0<p>13,'AlN.recprlist', 'unknown','unformatted',9000<p>14,'AlN.kgen', 'unknown','formatted',0<p>15,'AlN.tmp', 'unknown','formatted',0<p>16,'AlN.qtl', 'unknown','formatted',0<p>17,'AlN.weightaver','unknown','formatted',0<p>18,'AlN.vsp', 'old', 'formatted',0<p>19,'AlN.vns', 'unknown','formatted',0<p>20,'AlN.struct', 'old', 'formatted',0<p>21,'AlN.scf2_1', 'unknown','formatted',0<p>22,'AlN.rotlm', 'unknown', 'formatted',0<p>23,'AlN.radwf', 'unknown', 'formatted',0<p>24,'AlN.almblm', 'unknown', 'formatted',0<p>26,'AlN.weigh_1', 'unknown','unformatted',0<p>27,'AlN.weighdn_1', 'unknown','unformatted',0<p>28,'AlN.vrespval_1', 'unknown','formatted',0<p>29,'AlN.energydn_1','unknown','formatted',0<p>30,'AlN.energy_1',!
'unknown','formatted',0<p>31,'./AlN.help_1', 'unknown','formatted',0<p>=======================================================================<p><p>lapw2.error<p>=======================================================================<p>** testerror: Error in Parallel LAPW2<p><p>=======================================================================<p><p>Would you tell me how to deal with the problem?<p>Thanks you very much.<p>Yours turely<p>Zhenhua Zhang<p>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20060920/b2566f5d/attachment.html
More information about the Wien
mailing list