<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Maxim,<div><br></div><div>Thanks for the follow-up!</div><div><br></div><div>I think it should be -machinefile &nbsp;that's appropriate. Here's the help:</div><div>-machinefile &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # file mapping procs to machine</div><div><br></div><div>No -hostfile option mentioned for my current version of MPI in the help.</div><div><br></div><div>Yes, the machine0/1/2 files are exactly like what you described.</div><div><br></div><div>The parallel_options is:&nbsp;</div><div><div><div>setenv USE_REMOTE 1</div><div>setenv MPI_REMOTE 1</div><div>setenv WIEN_GRANULARITY 1</div><div>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_"</div></div><div><br></div><div>I think the problem should be due to my MPI. However, even if disable MPI parallelization, the problem still persists (no evident difference in the output files, including case.dayfile, stdout and :log). Note we can run with exactly the same set of input files in serial mode with no problem.&nbsp;</div><div><br></div><div>Again, thanks for your help!</div><div><br></div><div>Cheers,</div><div>Wei</div><div><br></div><div><br></div><div><div>On Oct 31, 2010, at 11:27 PM, Maxim Rakitin wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<div bgcolor="#ffffff" text="#000000">
    Dear Wei,<br>
    <br>
    Maybe -machinefile is ok for your mpirun. Which options are
    appropriate for it? What does help say?<br>
    <br>
    Try to restore your MPIRUN variable with -machinefile and rerun the
    calculation. Then see what is in .machine0/1/2 files and let us
    know. It should contain 8 lines of r1i0n0 node and 8 lines of r1i0n1
    node.<br>
    <br>
    One more thing you should check is $WIENROOT/parallel_options file.
    What is its content?<br>
    <pre class="moz-signature" cols="72">Best regards,
   Maxim Rakitin
   email: <a class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
   web: <a class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
    <br>
    01.11.2010 9:06, Wei Xie пишет:
    <blockquote cite="mid:524CB9BF-DC7E-4688-B113-89C81F6272B1@wisc.edu" type="cite">Hi Maxim,
      <div><br>
      </div>
      <div>Thanks for your reply!&nbsp;</div>
      <div>We tried&nbsp;MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but
        the problem persists. The only difference is that stdout changes
        to ''…&nbsp;MPI: invalid option -hostfile …''.</div>
      <div><br>
      </div>
      <div>Thanks,</div>
      <div>Wei</div>
      <div><br>
      </div>
      <div><br>
        <div>
          <div>On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote:</div>
          <br class="Apple-interchange-newline">
          <blockquote type="cite">
            <div bgcolor="#ffffff" text="#000000"> Hi,<br>
              <br>
              It looks like Intel's mpirun doesn't have '-machinefile'
              option. Instead of this it has '-hostfile' option (form
              here: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt">http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt</a>).<br>
              <br>
              Try 'mpirun -h' for information about options and apply
              appropriate.<br>
              <pre class="moz-signature" cols="72">Best regards,
   Maxim Rakitin
   email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
   web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
              <br>
              01.11.2010 4:56, Wei Xie пишет:
              <blockquote cite="mid:2C0098E9-D05E-46B8-9BED-983152FB7772@wisc.edu" type="cite">
                <div>Dear all WIEN2k community members:</div>
                <div><br>
                </div>
                <div>We encountered some problem when running in
                  parallel (K-point,&nbsp;MPI&nbsp;or both)--the calculations
                  crashed at LAPW2. Note we had no problem running it in
                  serial. We have tried to diagnose the problem,
                  recompile the code with difference options and test
                  with difference cases and parameters based on similar
                  problems reported on the mail list, but the problem
                  persists. So we write here hoping someone can offer us
                  some suggestion.&nbsp;We have attached related files below
                  for your reference.&nbsp;Your replies are appreciated in
                  advance!&nbsp;</div>
                <div><br>
                </div>
                <div>This is a TiC example running in both Kpoint and
                  MPI parallel on two nodes&nbsp;<i>r1i0n0</i> and&nbsp;<i>r1i0n1</i>&nbsp;(8cores/node):</div>
                <div><br>
                </div>
                <div><b>1.&nbsp;</b><b>stdout&nbsp;</b><b>(abridged)&nbsp;</b></div>
                <div>MPI: invalid option -machinefile</div>
                <div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.004s</div>
                <div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
                <div>sys<span class="Apple-tab-span" style="white-space:
                    pre;"> </span>0m0.000s</div>
                <div>...</div>
                <div>MPI: invalid option -machinefile</div>
                <div>real<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.003s</div>
                <div>user<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</div>
                <div>sys<span class="Apple-tab-span" style="white-space:
                    pre;"> </span>0m0.004s</div>
                <div>TiC.scf1up_1: No such file or directory.</div>
                <div><br>
                </div>
                <div>LAPW2 - Error. Check file lapw2.error</div>
                <div>cp: cannot stat `.in.tmp': No such file or
                  directory</div>
                <div>rm: cannot remove `.in.tmp': No such file or
                  directory</div>
                <div><b>rm: cannot remove
                      `.in.tmp1': No such file or directory</b></div>
                <div><b><br>
                  </b></div>
                <div><b>2. TiC.dayfile
                    (abridged)&nbsp;</b></div>
                <div>...</div>
                <div>&nbsp;&nbsp; &nbsp;start&nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>(Sun Oct 31
                  16:25:06 MDT 2010) with lapw0 (40/99 to go)</div>
                <div>&nbsp;&nbsp; &nbsp;cycle 1&nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>(Sun Oct 31
                  16:25:06 MDT 2010)&nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>(40/99 to go)</div>
                <div><br>
                </div>
                <div>&gt; &nbsp; lapw0 -p<span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:06)
                  starting parallel lapw0 at Sun Oct 31 16:25:07 MDT
                  2010</div>
                <div>-------- .machine0 : 16 processors</div>
                <div>invalid "local" arg: -machinefile</div>
                <div><br>
                </div>
                <div>0.436u 0.412s 0:04.63 18.1%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
                  2600+0io 1pf+0w</div>
                <div>&gt; &nbsp; lapw1 &nbsp;-up -p &nbsp;&nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:12)
                  starting parallel lapw1 at Sun Oct 31 16:25:12 MDT
                  2010</div>
                <div>-&gt; &nbsp;starting parallel LAPW1 jobs at Sun Oct 31
                  16:25:12 MDT 2010</div>
                <div>running LAPW1 in parallel mode (using .machines)</div>
                <div>2 number_of_parallel_jobs</div>
                <div>&nbsp;&nbsp; &nbsp; r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
                  r1i0n0 r1i0n0(1) &nbsp; &nbsp; &nbsp;r1i0n1 r1i0n1 r1i0n1 r1i0n1
                  r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) &nbsp; &nbsp; &nbsp;r1i0n0 r1i0n0
                  r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) &nbsp;
                  &nbsp;Summary of lapw1para:</div>
                <div>&nbsp;&nbsp; r1i0n0<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;k=0<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;user=0<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;wallclock=0</div>
                <div>&nbsp;&nbsp; r1i0n1<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;k=0<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;user=0<span class="Apple-tab-span" style="white-space: pre;"> </span>&nbsp;wallclock=0</div>
                <div>...</div>
                <div>0.116u 0.316s 0:10.48 4.0%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
                  0+0io 0pf+0w</div>
                <div>&gt; &nbsp; lapw2 -up -p &nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>(16:25:34)
                  running LAPW2 in parallel mode</div>
                <div>** &nbsp;LAPW2 crashed!</div>
                <div>0.032u 0.104s 0:01.13 11.5%<span class="Apple-tab-span" style="white-space: pre;"> </span>0+0k
                  82304+0io 8pf+0w</div>
                <div>error: command &nbsp; /home/xiew/WIEN2k_10/lapw2para -up
                  uplapw2.def &nbsp; failed</div>
                <div><br>
                </div>
                <div><b>3.&nbsp;uplapw2.error&nbsp;</b></div>
                <div>Error in LAPW2</div>
                <div>&nbsp;'LAPW2' - can't open unit: 18 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
                  &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</div>
                <div>&nbsp;'LAPW2' - &nbsp; &nbsp; &nbsp; &nbsp;filename: TiC.vspup &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
                  &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div>
                <div>&nbsp;'LAPW2' - &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;status: old &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;form:
                  formatted &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</div>
                <div>** &nbsp;testerror: Error in Parallel LAPW2</div>
                <div><br>
                </div>
                <div>
                  <div>
                    <div><b>4. .machines</b></div>
                    <div>#</div>
                    <div>1:r1i0n0:8</div>
                    <div>1:r1i0n1:8</div>
                    <div>lapw0:r1i0n0:8 r1i0n1:8&nbsp;</div>
                    <div>granularity:1</div>
                    <div>extrafine:1</div>
                  </div>
                </div>
                <div><br>
                </div>
                <div>
                  <div><b>5. compilers, MPI and options</b></div>
                  <div>Intel Compilers &nbsp;and MKL 11.1.046</div>
                  <div>Intel MPI&nbsp;3.2.0.011</div>
                  <div><br>
                  </div>
                  <div>current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip
                    -DINTEL_VML -traceback</div>
                  <div>current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad
                    -ip -DINTEL_VML -traceback</div>
                  <div>current:LDFLAGS:$(FOPT)
                    -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
                    -pthread</div>
                  <div>current:DPARALLEL:'-DParallel'</div>
                  <div>current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64
                    -lmkl_intel_thread -lmkl_core -openmp -lpthread
                    -lguide</div>
                  <div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t

                    -lmkl_scalapack_lp64
                    /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a
                    -Wl,--start-group -lmkl_intel_lp64
                    -lmkl_intel_thread -lmkl_core
                    -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp
                    -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi
                    -lfftw $(R_LIBS)</div>
                  <div>current:MPIRUN:mpirun -np _NP_ -machinefile
                    _HOSTS_ _EXEC_</div>
                </div>
                <div><br>
                </div>
                <div>Best regards,</div>
                <div>Wei Xie</div>
                <div>Computational Materials Group</div>
                <div>University of Wisconsin-Madison</div>
                <div><br>
                </div>
                <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
              </blockquote>
            </div>
            _______________________________________________<br>
            Wien mailing list<br>
            <a moz-do-not-send="true" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
            <a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
          </blockquote>
        </div>
        <br>
      </div>
      <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
    </blockquote>
  </div>

_______________________________________________<br>Wien mailing list<br><a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien<br></blockquote></div><br></div></body></html>