<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Maxim,<div><br></div><div>Thanks for the follow-up!</div><div><br></div><div>I think it should be -machinefile that's appropriate. Here's the help:</div><div>-machinefile # file mapping procs to machine</div><div><br></div><div>No -hostfile option mentioned for my current version of MPI in the help.</div><div><br></div><div>Yes, the machine0/1/2 files are exactly like what you described.</div><div><br></div><div>The parallel_options is: </div><div><div><div>setenv USE_REMOTE 1</div><div>setenv MPI_REMOTE 1</div><div>setenv WIEN_GRANULARITY 1</div><div>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_"</div></div><div><br></div><div>I think the problem should be due to my MPI. However, even if disable MPI parallelization, the problem still persists (no evident difference in the output files, including case.dayfile, stdout and :log). Note we can run with exactly the same set of input files in serial mode with no problem. </div><div><br></div><div>Again, thanks for your help!</div><div><br></div><div>Cheers,</div><div>Wei</div><div><br></div><div><br></div><div><div>On Oct 31, 2010, at 11:27 PM, Maxim Rakitin wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<div bgcolor="#ffffff" text="#000000">
Dear Wei,<br>
<br>
Maybe -machinefile is ok for your mpirun. Which options are
appropriate for it? What does help say?<br>
<br>
Try to restore your MPIRUN variable with -machinefile and rerun the
calculation. Then see what is in .machine0/1/2 files and let us
know. It should contain 8 lines of r1i0n0 node and 8 lines of r1i0n1
node.<br>
<br>
One more thing you should check is $WIENROOT/parallel_options file.
What is its content?<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 9:06, Wei Xie пишет:
<blockquote cite="mid:524CB9BF-DC7E-4688-B113-89C81F6272B1@wisc.edu" type="cite">Hi Maxim,
<div><br>
</div>
<div>Thanks for your reply! </div>
<div>We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but
the problem persists. The only difference is that stdout changes
to ''… MPI: invalid option -hostfile …''.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Wei</div>
<div><br>
</div>
<div><br>
<div>
<div>On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div bgcolor="#ffffff" text="#000000"> Hi,<br>
<br>
It looks like Intel's mpirun doesn't have '-machinefile'
option. Instead of this it has '-hostfile' option (form
here: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt">http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt</a>).<br>
<br>
Try 'mpirun -h' for information about options and apply
appropriate.<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 4:56, Wei Xie пишет:
<blockquote cite="mid:2C0098E9-D05E-46B8-9BED-983152FB7772@wisc.edu" type="cite">
<div>Dear all WIEN2k community members:</div>
<div><br>
</div>
<div>We encountered some problem when running in
parallel (K-point, MPI or both)--the calculations
crashed at LAPW2. Note we had no problem running it in
serial. We have tried to diagnose the problem,
recompile the code with difference options and test
with difference cases and parameters based on similar
problems reported on the mail list, but the problem
persists. So we write here hoping someone can offer us
some suggestion. We have attached related files below
for your reference. Your replies are appreciated in
advance! </div>
<div><br>
</div>
<div>This is a TiC example running in both Kpoint and
MPI parallel on two nodes <i>r1i0n0</i> and <i>r1i0n1</i> (8cores/node):</div>
<div><br>
</div>
<div><b>1. </b><b>stdout </b><b>(abridged) </b></div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.004s</div>
<div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span" style="white-space:
pre;"> </span>0m0.000s</div>
<div>...</div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.003s</div>
<div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span" style="white-space:
pre;"> </span>0m0.004s</div>
<div>TiC.scf1up_1: No such file or directory.</div>
<div><br>
</div>
<div>LAPW2 - Error. Check file lapw2.error</div>
<div>cp: cannot stat `.in.tmp': No such file or
directory</div>
<div>rm: cannot remove `.in.tmp': No such file or
directory</div>
<div><b>rm: cannot remove
`.in.tmp1': No such file or directory</b></div>
<div><b><br>
</b></div>
<div><b>2. TiC.dayfile
(abridged) </b></div>
<div>...</div>
<div> start <span class="Apple-tab-span" style="white-space: pre;"> </span>(Sun Oct 31
16:25:06 MDT 2010) with lapw0 (40/99 to go)</div>
<div> cycle 1 <span class="Apple-tab-span" style="white-space: pre;"> </span>(Sun Oct 31
16:25:06 MDT 2010) <span class="Apple-tab-span" style="white-space: pre;"> </span>(40/99 to go)</div>
<div><br>
</div>
<div>> lapw0 -p<span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:06)
starting parallel lapw0 at Sun Oct 31 16:25:07 MDT
2010</div>
<div>-------- .machine0 : 16 processors</div>
<div>invalid "local" arg: -machinefile</div>
<div><br>
</div>
<div>0.436u 0.412s 0:04.63 18.1%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
2600+0io 1pf+0w</div>
<div>> lapw1 -up -p <span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:12)
starting parallel lapw1 at Sun Oct 31 16:25:12 MDT
2010</div>
<div>-> starting parallel LAPW1 jobs at Sun Oct 31
16:25:12 MDT 2010</div>
<div>running LAPW1 in parallel mode (using .machines)</div>
<div>2 number_of_parallel_jobs</div>
<div> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
r1i0n0 r1i0n0(1) r1i0n1 r1i0n1 r1i0n1 r1i0n1
r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) r1i0n0 r1i0n0
r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)
Summary of lapw1para:</div>
<div> r1i0n0<span class="Apple-tab-span" style="white-space: pre;"> </span> k=0<span class="Apple-tab-span" style="white-space: pre;"> </span> user=0<span class="Apple-tab-span" style="white-space: pre;"> </span> wallclock=0</div>
<div> r1i0n1<span class="Apple-tab-span" style="white-space: pre;"> </span> k=0<span class="Apple-tab-span" style="white-space: pre;"> </span> user=0<span class="Apple-tab-span" style="white-space: pre;"> </span> wallclock=0</div>
<div>...</div>
<div>0.116u 0.316s 0:10.48 4.0%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
0+0io 0pf+0w</div>
<div>> lapw2 -up -p <span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:34)
running LAPW2 in parallel mode</div>
<div>** LAPW2 crashed!</div>
<div>0.032u 0.104s 0:01.13 11.5%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
82304+0io 8pf+0w</div>
<div>error: command /home/xiew/WIEN2k_10/lapw2para -up
uplapw2.def failed</div>
<div><br>
</div>
<div><b>3. uplapw2.error </b></div>
<div>Error in LAPW2</div>
<div> 'LAPW2' - can't open unit: 18
</div>
<div> 'LAPW2' - filename: TiC.vspup
</div>
<div> 'LAPW2' - status: old form:
formatted </div>
<div>** testerror: Error in Parallel LAPW2</div>
<div><br>
</div>
<div>
<div>
<div><b>4. .machines</b></div>
<div>#</div>
<div>1:r1i0n0:8</div>
<div>1:r1i0n1:8</div>
<div>lapw0:r1i0n0:8 r1i0n1:8 </div>
<div>granularity:1</div>
<div>extrafine:1</div>
</div>
</div>
<div><br>
</div>
<div>
<div><b>5. compilers, MPI and options</b></div>
<div>Intel Compilers and MKL 11.1.046</div>
<div>Intel MPI 3.2.0.011</div>
<div><br>
</div>
<div>current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML -traceback</div>
<div>current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad
-ip -DINTEL_VML -traceback</div>
<div>current:LDFLAGS:$(FOPT)
-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
-pthread</div>
<div>current:DPARALLEL:'-DParallel'</div>
<div>current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -openmp -lpthread
-lguide</div>
<div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
-lmkl_scalapack_lp64
/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a
-Wl,--start-group -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core
-lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp
-lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi
-lfftw $(R_LIBS)</div>
<div>current:MPIRUN:mpirun -np _NP_ -machinefile
_HOSTS_ _EXEC_</div>
</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Wei Xie</div>
<div>Computational Materials Group</div>
<div>University of Wisconsin-Madison</div>
<div><br>
</div>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Wien mailing list<br>
<a moz-do-not-send="true" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
</blockquote>
</div>
<br>
</div>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>Wien mailing list<br><a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien<br></blockquote></div><br></div></body></html>