<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Maxim,<div><br></div><div>Thanks for your reply! </div><div>We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but the problem persists. The only difference is that stdout changes to ''… MPI: invalid option -hostfile …''.</div><div><br></div><div>Thanks,</div><div>Wei</div><div><br></div><div><br><div><div>On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<div bgcolor="#ffffff" text="#000000">
Hi,<br>
<br>
It looks like Intel's mpirun doesn't have '-machinefile' option.
Instead of this it has '-hostfile' option (form here:
<a class="moz-txt-link-freetext" href="http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt">http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt</a>).<br>
<br>
Try 'mpirun -h' for information about options and apply appropriate.<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 4:56, Wei Xie пишет:
<blockquote cite="mid:2C0098E9-D05E-46B8-9BED-983152FB7772@wisc.edu" type="cite">
<div>Dear all WIEN2k community members:</div>
<div><br>
</div>
<div>We encountered some problem when running in parallel
(K-point, MPI or both)--the calculations crashed at LAPW2. Note
we had no problem running it in serial. We have tried to
diagnose the problem, recompile the code with difference options
and test with difference cases and parameters based on similar
problems reported on the mail list, but the problem persists. So
we write here hoping someone can offer us some suggestion. We
have attached related files below for your reference. Your
replies are appreciated in advance! </div>
<div><br>
</div>
<div>This is a TiC example running in both Kpoint and MPI parallel
on two nodes <i>r1i0n0</i> and <i>r1i0n1</i> (8cores/node):</div>
<div><br>
</div>
<div><b>1. </b><b>stdout </b><b>(abridged) </b></div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.004s</div>
<div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
<div>...</div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.003s</div>
<div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.004s</div>
<div>TiC.scf1up_1: No such file or directory.</div>
<div><br>
</div>
<div>LAPW2 - Error. Check file lapw2.error</div>
<div>cp: cannot stat `.in.tmp': No such file or directory</div>
<div>rm: cannot remove `.in.tmp': No such file or directory</div>
<div><b><span class="Apple-style-span" style="font-weight:
normal;">rm: cannot remove `.in.tmp1': No such file or
directory</span></b></div>
<div><b><br>
</b></div>
<div><b><span class="Apple-style-span" style="font-weight:
normal;"></span>2. TiC.dayfile (abridged) </b></div>
<div>...</div>
<div> start <span class="Apple-tab-span" style="white-space:
pre;"> </span>(Sun Oct 31 16:25:06 MDT 2010) with lapw0
(40/99 to go)</div>
<div> cycle 1 <span class="Apple-tab-span" style="white-space:
pre;"> </span>(Sun Oct 31 16:25:06 MDT 2010) <span class="Apple-tab-span" style="white-space: pre;"> </span>(40/99
to go)</div>
<div><br>
</div>
<div>> lapw0 -p<span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:06) starting
parallel lapw0 at Sun Oct 31 16:25:07 MDT 2010</div>
<div>-------- .machine0 : 16 processors</div>
<div>invalid "local" arg: -machinefile</div>
<div><br>
</div>
<div>0.436u 0.412s 0:04.63 18.1%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k 2600+0io 1pf+0w</div>
<div>> lapw1 -up -p <span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:12) starting
parallel lapw1 at Sun Oct 31 16:25:12 MDT 2010</div>
<div>-> starting parallel LAPW1 jobs at Sun Oct 31 16:25:12
MDT 2010</div>
<div>running LAPW1 in parallel mode (using .machines)</div>
<div>2 number_of_parallel_jobs</div>
<div> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
r1i0n0(1) r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1
r1i0n1(1) r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
r1i0n0(1) Summary of lapw1para:</div>
<div> r1i0n0<span class="Apple-tab-span" style="white-space:
pre;"> </span> k=0<span class="Apple-tab-span" style="white-space: pre;"> </span> user=0<span class="Apple-tab-span" style="white-space: pre;"> </span> wallclock=0</div>
<div> r1i0n1<span class="Apple-tab-span" style="white-space:
pre;"> </span> k=0<span class="Apple-tab-span" style="white-space: pre;"> </span> user=0<span class="Apple-tab-span" style="white-space: pre;"> </span> wallclock=0</div>
<div>...</div>
<div>0.116u 0.316s 0:10.48 4.0%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k 0+0io 0pf+0w</div>
<div>> lapw2 -up -p <span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:34) running LAPW2 in
parallel mode</div>
<div>** LAPW2 crashed!</div>
<div>0.032u 0.104s 0:01.13 11.5%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k 82304+0io 8pf+0w</div>
<div>error: command /home/xiew/WIEN2k_10/lapw2para -up
uplapw2.def failed</div>
<div><br>
</div>
<div><b>3. uplapw2.error </b></div>
<div>Error in LAPW2</div>
<div> 'LAPW2' - can't open unit: 18
</div>
<div> 'LAPW2' - filename: TiC.vspup
</div>
<div> 'LAPW2' - status: old form: formatted
</div>
<div>** testerror: Error in Parallel LAPW2</div>
<div><br>
</div>
<div>
<div>
<div><b>4. .machines</b></div>
<div>#</div>
<div>1:r1i0n0:8</div>
<div>1:r1i0n1:8</div>
<div>lapw0:r1i0n0:8 r1i0n1:8 </div>
<div>granularity:1</div>
<div>extrafine:1</div>
</div>
</div>
<div><br>
</div>
<div>
<div><b>5. compilers, MPI and options</b></div>
<div>Intel Compilers and MKL 11.1.046</div>
<div>Intel MPI 3.2.0.011</div>
<div><br>
</div>
<div>current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML -traceback</div>
<div>current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML -traceback</div>
<div>current:LDFLAGS:$(FOPT)
-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread</div>
<div>current:DPARALLEL:'-DParallel'</div>
<div>current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide</div>
<div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
-lmkl_scalapack_lp64
/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a
-Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp
-lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw
$(R_LIBS)</div>
<div>current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_</div>
</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Wei Xie</div>
<div>Computational Materials Group</div>
<div>University of Wisconsin-Madison</div>
<div><br>
</div>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>Wien mailing list<br><a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien<br></blockquote></div><br></div></body></html>