<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Wei,<br>
<br>
The parallel_options file manages how parallel programs run, so
change the following line in it:<br>
<blockquote><tt>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_"</tt><br>
</blockquote>
to<br>
<blockquote><tt>setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile
_HOSTS_ _EXEC_"</tt><br>
</blockquote>
Your .machine0/1/2 files are correct, <br>
<br>
Also I believe that 'USE_REMOTE' variable which is set to 1 makes
parallel scripts (I mean lapw[012]para_lapw) to be launched using
ssh/rsh. So switch it to '0'. I'm not sure about 'MPI_REMOTE'
option, it's a new one. Try to set different values (0 or 1) for it.<br>
<br>
Hope this will help.<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a class="moz-txt-link-freetext" href="http://www.susu.ac.ru">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 21:35, Wei Xie пишет:
<blockquote cite="mid:D9E3DEA1-E905-4FA6-B1A9-834B76512C16@wisc.edu"
type="cite">Hi Maxim,
<div><br>
</div>
<div>Thanks for the follow-up!</div>
<div><br>
</div>
<div>I think it should be -machinefile that's appropriate. Here's
the help:</div>
<div>-machinefile # file mapping procs to machine</div>
<div><br>
</div>
<div>No -hostfile option mentioned for my current version of MPI
in the help.</div>
<div><br>
</div>
<div>Yes, the machine0/1/2 files are exactly like what you
described.</div>
<div><br>
</div>
<div>The parallel_options is: </div>
<div>
<div>
<div>setenv USE_REMOTE 1</div>
<div>setenv MPI_REMOTE 1</div>
<div>setenv WIEN_GRANULARITY 1</div>
<div>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile _HOSTS_
_EXEC_"</div>
</div>
<div><br>
</div>
<div>I think the problem should be due to my MPI. However, even
if disable MPI parallelization, the problem still persists (no
evident difference in the output files, including
case.dayfile, stdout and :log). Note we can run with exactly
the same set of input files in serial mode with no problem. </div>
<div><br>
</div>
<div>Again, thanks for your help!</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Wei</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>On Oct 31, 2010, at 11:27 PM, Maxim Rakitin wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div bgcolor="#ffffff" text="#000000"> Dear Wei,<br>
<br>
Maybe -machinefile is ok for your mpirun. Which options
are appropriate for it? What does help say?<br>
<br>
Try to restore your MPIRUN variable with -machinefile and
rerun the calculation. Then see what is in .machine0/1/2
files and let us know. It should contain 8 lines of r1i0n0
node and 8 lines of r1i0n1 node.<br>
<br>
One more thing you should check is
$WIENROOT/parallel_options file. What is its content?<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 9:06, Wei Xie пишет:
<blockquote
cite="mid:524CB9BF-DC7E-4688-B113-89C81F6272B1@wisc.edu"
type="cite">Hi Maxim,
<div><br>
</div>
<div>Thanks for your reply! </div>
<div>We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_
_EXEC_, but the problem persists. The only difference
is that stdout changes to ''… MPI: invalid option
-hostfile …''.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Wei</div>
<div><br>
</div>
<div><br>
<div>
<div>On Oct 31, 2010, at 10:40 PM, Maxim Rakitin
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div bgcolor="#ffffff" text="#000000"> Hi,<br>
<br>
It looks like Intel's mpirun doesn't have
'-machinefile' option. Instead of this it has
'-hostfile' option (form here: <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt">http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt</a>).<br>
<br>
Try 'mpirun -h' for information about options
and apply appropriate.<br>
<pre class="moz-signature" cols="72">Best regards,
Maxim Rakitin
email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
<br>
01.11.2010 4:56, Wei Xie пишет:
<blockquote
cite="mid:2C0098E9-D05E-46B8-9BED-983152FB7772@wisc.edu"
type="cite">
<div>Dear all WIEN2k community members:</div>
<div><br>
</div>
<div>We encountered some problem when running
in parallel (K-point, MPI or both)--the
calculations crashed at LAPW2. Note we had
no problem running it in serial. We have
tried to diagnose the problem, recompile the
code with difference options and test with
difference cases and parameters based on
similar problems reported on the mail list,
but the problem persists. So we write here
hoping someone can offer us some
suggestion. We have attached related files
below for your reference. Your replies are
appreciated in advance! </div>
<div><br>
</div>
<div>This is a TiC example running in both
Kpoint and MPI parallel on two nodes <i>r1i0n0</i>
and <i>r1i0n1</i> (8cores/node):</div>
<div><br>
</div>
<div><b>1. </b><b>stdout </b><b>(abridged) </b></div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.004s</div>
<div>user<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.000s</div>
<div>...</div>
<div>MPI: invalid option -machinefile</div>
<div>real<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.003s</div>
<div>user<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.000s</div>
<div>sys<span class="Apple-tab-span"
style="white-space: pre;"> </span>0m0.004s</div>
<div>TiC.scf1up_1: No such file or directory.</div>
<div><br>
</div>
<div>LAPW2 - Error. Check file lapw2.error</div>
<div>cp: cannot stat `.in.tmp': No such file
or directory</div>
<div>rm: cannot remove `.in.tmp': No such file
or directory</div>
<div><b>rm: cannot remove `.in.tmp1': No such
file or directory</b></div>
<div><b><br>
</b></div>
<div><b>2. TiC.dayfile (abridged) </b></div>
<div>...</div>
<div> start <span class="Apple-tab-span"
style="white-space: pre;"> </span>(Sun
Oct 31 16:25:06 MDT 2010) with lapw0 (40/99
to go)</div>
<div> cycle 1 <span class="Apple-tab-span"
style="white-space: pre;"> </span>(Sun
Oct 31 16:25:06 MDT 2010) <span
class="Apple-tab-span" style="white-space:
pre;"> </span>(40/99 to go)</div>
<div><br>
</div>
<div>> lapw0 -p<span
class="Apple-tab-span" style="white-space:
pre;"> </span>(16:25:06) starting
parallel lapw0 at Sun Oct 31 16:25:07 MDT
2010</div>
<div>-------- .machine0 : 16 processors</div>
<div>invalid "local" arg: -machinefile</div>
<div><br>
</div>
<div>0.436u 0.412s 0:04.63 18.1%<span
class="Apple-tab-span" style="white-space:
pre;"> </span>0+0k 2600+0io 1pf+0w</div>
<div>> lapw1 -up -p <span
class="Apple-tab-span" style="white-space:
pre;"> </span>(16:25:12) starting
parallel lapw1 at Sun Oct 31 16:25:12 MDT
2010</div>
<div>-> starting parallel LAPW1 jobs at
Sun Oct 31 16:25:12 MDT 2010</div>
<div>running LAPW1 in parallel mode (using
.machines)</div>
<div>2 number_of_parallel_jobs</div>
<div> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
r1i0n0 r1i0n0 r1i0n0(1) r1i0n1 r1i0n1
r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1)
r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
r1i0n0 r1i0n0 r1i0n0(1) Summary of
lapw1para:</div>
<div> r1i0n0<span class="Apple-tab-span"
style="white-space: pre;"> </span> k=0<span
class="Apple-tab-span" style="white-space:
pre;"> </span> user=0<span
class="Apple-tab-span" style="white-space:
pre;"> </span> wallclock=0</div>
<div> r1i0n1<span class="Apple-tab-span"
style="white-space: pre;"> </span> k=0<span
class="Apple-tab-span" style="white-space:
pre;"> </span> user=0<span
class="Apple-tab-span" style="white-space:
pre;"> </span> wallclock=0</div>
<div>...</div>
<div>0.116u 0.316s 0:10.48 4.0%<span
class="Apple-tab-span" style="white-space:
pre;"> </span>0+0k 0+0io 0pf+0w</div>
<div>> lapw2 -up -p <span
class="Apple-tab-span" style="white-space:
pre;"> </span>(16:25:34) running LAPW2 in
parallel mode</div>
<div>** LAPW2 crashed!</div>
<div>0.032u 0.104s 0:01.13 11.5%<span
class="Apple-tab-span" style="white-space:
pre;"> </span>0+0k 82304+0io 8pf+0w</div>
<div>error: command
/home/xiew/WIEN2k_10/lapw2para -up
uplapw2.def failed</div>
<div><br>
</div>
<div><b>3. uplapw2.error </b></div>
<div>Error in LAPW2</div>
<div> 'LAPW2' - can't open unit: 18
</div>
<div> 'LAPW2' - filename: TiC.vspup
</div>
<div> 'LAPW2' - status: old
form: formatted </div>
<div>** testerror: Error in Parallel LAPW2</div>
<div><br>
</div>
<div>
<div>
<div><b>4. .machines</b></div>
<div>#</div>
<div>1:r1i0n0:8</div>
<div>1:r1i0n1:8</div>
<div>lapw0:r1i0n0:8 r1i0n1:8 </div>
<div>granularity:1</div>
<div>extrafine:1</div>
</div>
</div>
<div><br>
</div>
<div>
<div><b>5. compilers, MPI and options</b></div>
<div>Intel Compilers and MKL 11.1.046</div>
<div>Intel MPI 3.2.0.011</div>
<div><br>
</div>
<div>current:FOPT:-FR -mp1 -w -prec_div
-pc80 -pad -ip -DINTEL_VML -traceback</div>
<div>current:FPOPT:-FR -mp1 -w -prec_div
-pc80 -pad -ip -DINTEL_VML -traceback</div>
<div>current:LDFLAGS:$(FOPT)
-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
-pthread</div>
<div>current:DPARALLEL:'-DParallel'</div>
<div>current:R_LIBS:-lmkl_lapack
-lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -openmp -lpthread -lguide</div>
<div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
-lmkl_scalapack_lp64
/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a
-Wl,--start-group -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core
-lmkl_blacs_intelmpi_lp64 -Wl,--end-group
-openmp -lpthread
-L/home/xiew/fftw-2.1.5/lib -lfftw_mpi
-lfftw $(R_LIBS)</div>
<div>current:MPIRUN:mpirun -np _NP_
-machinefile _HOSTS_ _EXEC_</div>
</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Wei Xie</div>
<div>Computational Materials Group</div>
<div>University of Wisconsin-Madison</div>
<div><br>
</div>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Wien mailing list<br>
<a moz-do-not-send="true"
href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
</blockquote>
</div>
<br>
</div>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Wien mailing list<br>
<a moz-do-not-send="true"
href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
</blockquote>
</div>
<br>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
</body>
</html>