<div dir="ltr">Dear Wien2k community,<div><br></div><div style> I have Wien2k 13.1 installed in a SGI cluster using ifort, icc and Open MPI. The installation was a hard work (I would like to thank again the help from Prof. Lawrence Marks), but after all I have used Wien2k without problems for several months.</div>
<div style> I performed the first step of a long calculation and saved it in a different directory. When I tried the next step in the original directory, Wien2k crashed. After some tests, I decided to reinitialize the calculation from the beginning (in other words, to repeat the first step). To my surprise, I did not succeed even in this case and I would like to know if someone has faced such an unexpected problem.</div>
<div style> Please, find below some of the output files that I consider the most relevant ones.</div><div style> Finally, I would like to stress some points:</div><div style><br></div><div style>1) lapw0 stops after more or less 7 minutes, but it took about 2 hours in the successful calculation.</div>
<div style><br></div><div style>2) lapw1 stops after 5 seconds without generating the case.energy_* files and case.dayfile does not contain the time statistic for each processor.</div><div style><br></div><div style>3) OMP_NUM_THREADS=12 is overwritten by the system (in my .bashrc I have OMP_NUM_THREADS=1), but even when I export this variable equal to 1 in the submission script, I get the same crash.</div>
<div style><br></div><div style> Thank you very much for your attention,</div><div style> Luis</div><div style>===========================================================</div><div style>:log file</div><div style>
<br></div><div style><div>> (init_lapw) options: </div><div>Wed Apr 2 14:07:30 BRT 2014> (x_lapw) nn -f InPzb15InPwurt3-V2</div><div>Wed Apr 2 14:07:46 BRT 2014> (x) nn</div><div>Wed Apr 2 14:08:03 BRT 2014> (x) sgroup</div>
<div>Wed Apr 2 14:08:23 BRT 2014> (x) symmetry</div><div>Wed Apr 2 14:08:48 BRT 2014> (x) lstart</div><div>Wed Apr 2 14:09:38 BRT 2014> (x) kgen</div><div>Wed Apr 2 14:09:58 BRT 2014> (x) dstart -c -p</div>
<div>> (initso_lapw) options: </div><div>Tue May 27 16:07:00 BRT 2014> (x) Machines2W</div><div>> (run_lapw) options: -p -NI -ec 0.0001 -cc 0.0001 -i 150 -it</div><div>Tue May 27 16:07:00 BRT 2014> (x) lapw0 -p</div>
<div>Tue May 27 16:14:10 BRT 2014> (x) lapw1 -it -p -c</div><div>Tue May 27 16:14:15 BRT 2014> (x) lapw2 -p -c</div><div><br></div><div>===========================================================<br></div><div style>
case.dayfile</div><div style><br></div><div style><div>Calculating InPzb15InPwurt3-V2 in /home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2</div><div>on r1i0n15 with PID 6538</div>
<div>using WIEN2k_13.1 (Release 17/6/2013) in /home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI</div><div><br></div><div><br></div><div> start <span class="" style="white-space:pre">        </span>(Tue May 27 16:07:00 BRT 2014) with lapw0 (150/99 to go)</div>
<div><br></div><div> cycle 1 <span class="" style="white-space:pre">        </span>(Tue May 27 16:07:00 BRT 2014) <span class="" style="white-space:pre">        </span>(150/99 to go)</div><div><br></div><div>> lapw0 -p<span class="" style="white-space:pre">        </span>(16:07:00) starting parallel lapw0 at Tue May 27 16:07:00 BRT 2014</div>
<div>-------- .machine0 : 12 processors</div><div>2540.314u 12.204s 7:09.36 594.4%<span class="" style="white-space:pre">        </span>0+0k 180672+52736io 5pf+0w</div><div>> lapw1 -it -p -c <span class="" style="white-space:pre">        </span>(16:14:10) starting parallel lapw1 at Tue May 27 16:14:10 BRT 2014</div>
<div>-> starting parallel LAPW1 jobs at Tue May 27 16:14:10 BRT 2014</div><div>running LAPW1 in parallel mode (using .machines)</div><div>12 number_of_parallel_jobs</div><div> r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) r1i0n15(1) Summary of lapw1para:</div>
<div> r1i0n15<span class="" style="white-space:pre">        </span> k=1<span class="" style="white-space:pre">        </span> user=0<span class="" style="white-space:pre">        </span> wallclock=1</div><div>0.132u 0.136s 0:04.75 5.4%<span class="" style="white-space:pre">        </span>0+0k 4104+1688io 5pf+0w</div>
<div>> lapw2 -p -c <span class="" style="white-space:pre">        </span>(16:14:15) running LAPW2 in parallel mode</div><div>** LAPW2 crashed!</div><div>0.396u 0.016s 0:00.66 60.6%<span class="" style="white-space:pre">        </span>0+0k 6424+11472io 1pf+0w</div>
<div>error: command /home/ice/proj/proj546/ogando/Wien/Executaveis-13-OpenMPI/lapw2cpara -c lapw2.def failed</div><div><br></div><div>> stop error</div><div><br></div><div>===========================================================<br>
</div><div style>lapw2.error (the only non empty case.error)</div><div style><br></div><div style><div>Error in LAPW2</div><div> 'LAPW2' - can't open unit: 30 </div>
<div> 'LAPW2' - filename: InPzb15InPwurt3-V2.energy_1 </div><div>** testerror: Error in Parallel LAPW2</div><div><br></div><div>===========================================================<br>
</div><div style>The standard output file</div><div style><br></div><div style><div><br></div><div>OMP_NUM_THREADS = 12</div><div><br></div><div>-----------------------------------------</div><div>Inicio do job: Tue May 27 16:07:00 BRT 2014</div>
<div>Hostname: r1i0n15</div><div>PWD: /home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPzbInPwurt/15camadasZB+3WZ/InPzb15InPwurt3-V2</div><div>0.000u 0.000s 0:00.05 0.0%<span class="" style="white-space:pre">        </span>0+0k 8216+24io 1pf+0w</div>
<div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div><div> LAPW0 END</div>
<div> LAPW0 END</div><div>grep: .processes: No such file or directory</div><div>InPzb15InPwurt3-V2.scf1_1: No such file or directory.</div><div>grep: No match.</div><div>FERMI - Error</div><div>cp: cannot stat `.in.tmp': No such file or directory</div>
<div><br></div><div>> stop error</div><div>Final do job: Tue May 27 16:14:15 BRT 2014</div><div>-----------------------------------------</div><div><br></div><div>OMP_NUM_THREADS = 12</div><div><br></div><div>=======================================</div>
<div style>My parallel_options file</div><div style><br></div><div style><div>setenv TASKSET "no"</div><div>setenv USE_REMOTE 1</div><div>setenv MPI_REMOTE 0</div><div>setenv WIEN_GRANULARITY 1</div><div>setenv WIEN_MPIRUN "/home/ice/proj/proj546/ogando/OpenMPIexec/bin/mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"</div>
<div><br></div></div><div><br></div></div></div></div></div></div>