<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    Hi Wei,<br>
    <br>
    The parallel_options file manages how parallel programs run, so
    change the following line in it:<br>
    <blockquote><tt>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile
        _HOSTS_ _EXEC_"</tt><br>
    </blockquote>
    to<br>
    <blockquote><tt>setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile
        _HOSTS_ _EXEC_"</tt><br>
    </blockquote>
    Your .machine0/1/2 files are correct, <br>
    <br>
    Also I believe that 'USE_REMOTE' variable which is set to 1 makes
    parallel scripts (I mean lapw[012]para_lapw) to be launched using
    ssh/rsh. So switch it to '0'. I'm not sure about 'MPI_REMOTE'
    option, it's a new one. Try to set different values (0 or 1) for it.<br>
    <br>
    Hope this will help.<br>
    <pre class="moz-signature" cols="72">Best regards,
   Maxim Rakitin
   email: <a class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
   web: <a class="moz-txt-link-freetext" href="http://www.susu.ac.ru">http://www.susu.ac.ru</a></pre>
    <br>
    01.11.2010 21:35, Wei Xie пишет:
    <blockquote cite="mid:D9E3DEA1-E905-4FA6-B1A9-834B76512C16@wisc.edu"
      type="cite">Hi Maxim,
      <div><br>
      </div>
      <div>Thanks for the follow-up!</div>
      <div><br>
      </div>
      <div>I think it should be -machinefile  that's appropriate. Here's
        the help:</div>
      <div>-machinefile                 # file mapping procs to machine</div>
      <div><br>
      </div>
      <div>No -hostfile option mentioned for my current version of MPI
        in the help.</div>
      <div><br>
      </div>
      <div>Yes, the machine0/1/2 files are exactly like what you
        described.</div>
      <div><br>
      </div>
      <div>The parallel_options is: </div>
      <div>
        <div>
          <div>setenv USE_REMOTE 1</div>
          <div>setenv MPI_REMOTE 1</div>
          <div>setenv WIEN_GRANULARITY 1</div>
          <div>setenv WIEN_MPIRUN "mpirun -np _NP_ -hostfile _HOSTS_
            _EXEC_"</div>
        </div>
        <div><br>
        </div>
        <div>I think the problem should be due to my MPI. However, even
          if disable MPI parallelization, the problem still persists (no
          evident difference in the output files, including
          case.dayfile, stdout and :log). Note we can run with exactly
          the same set of input files in serial mode with no problem. </div>
        <div><br>
        </div>
        <div>Again, thanks for your help!</div>
        <div><br>
        </div>
        <div>Cheers,</div>
        <div>Wei</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>
          <div>On Oct 31, 2010, at 11:27 PM, Maxim Rakitin wrote:</div>
          <br class="Apple-interchange-newline">
          <blockquote type="cite">
            <div bgcolor="#ffffff" text="#000000"> Dear Wei,<br>
              <br>
              Maybe -machinefile is ok for your mpirun. Which options
              are appropriate for it? What does help say?<br>
              <br>
              Try to restore your MPIRUN variable with -machinefile and
              rerun the calculation. Then see what is in .machine0/1/2
              files and let us know. It should contain 8 lines of r1i0n0
              node and 8 lines of r1i0n1 node.<br>
              <br>
              One more thing you should check is
              $WIENROOT/parallel_options file. What is its content?<br>
              <pre class="moz-signature" cols="72">Best regards,
   Maxim Rakitin
   email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
   web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
              <br>
              01.11.2010 9:06, Wei Xie пишет:
              <blockquote
                cite="mid:524CB9BF-DC7E-4688-B113-89C81F6272B1@wisc.edu"
                type="cite">Hi Maxim,
                <div><br>
                </div>
                <div>Thanks for your reply! </div>
                <div>We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_
                  _EXEC_, but the problem persists. The only difference
                  is that stdout changes to ''… MPI: invalid option
                  -hostfile …''.</div>
                <div><br>
                </div>
                <div>Thanks,</div>
                <div>Wei</div>
                <div><br>
                </div>
                <div><br>
                  <div>
                    <div>On Oct 31, 2010, at 10:40 PM, Maxim Rakitin
                      wrote:</div>
                    <br class="Apple-interchange-newline">
                    <blockquote type="cite">
                      <div bgcolor="#ffffff" text="#000000"> Hi,<br>
                        <br>
                        It looks like Intel's mpirun doesn't have
                        '-machinefile' option. Instead of this it has
                        '-hostfile' option (form here: <a
                          moz-do-not-send="true"
                          class="moz-txt-link-freetext"
                          href="http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt">http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt</a>).<br>
                        <br>
                        Try 'mpirun -h' for information about options
                        and apply appropriate.<br>
                        <pre class="moz-signature" cols="72">Best regards,
   Maxim Rakitin
   email: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rms85@physics.susu.ac.ru">rms85@physics.susu.ac.ru</a>
   web: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.susu.ac.ru/">http://www.susu.ac.ru</a></pre>
                        <br>
                        01.11.2010 4:56, Wei Xie пишет:
                        <blockquote
                          cite="mid:2C0098E9-D05E-46B8-9BED-983152FB7772@wisc.edu"
                          type="cite">
                          <div>Dear all WIEN2k community members:</div>
                          <div><br>
                          </div>
                          <div>We encountered some problem when running
                            in parallel (K-point, MPI or both)--the
                            calculations crashed at LAPW2. Note we had
                            no problem running it in serial. We have
                            tried to diagnose the problem, recompile the
                            code with difference options and test with
                            difference cases and parameters based on
                            similar problems reported on the mail list,
                            but the problem persists. So we write here
                            hoping someone can offer us some
                            suggestion. We have attached related files
                            below for your reference. Your replies are
                            appreciated in advance! </div>
                          <div><br>
                          </div>
                          <div>This is a TiC example running in both
                            Kpoint and MPI parallel on two nodes <i>r1i0n0</i>
                            and <i>r1i0n1</i> (8cores/node):</div>
                          <div><br>
                          </div>
                          <div><b>1. </b><b>stdout </b><b>(abridged) </b></div>
                          <div>MPI: invalid option -machinefile</div>
                          <div>real<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.004s</div>
                          <div>user<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.000s</div>
                          <div>sys<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.000s</div>
                          <div>...</div>
                          <div>MPI: invalid option -machinefile</div>
                          <div>real<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.003s</div>
                          <div>user<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.000s</div>
                          <div>sys<span class="Apple-tab-span"
                              style="white-space: pre;"> </span>0m0.004s</div>
                          <div>TiC.scf1up_1: No such file or directory.</div>
                          <div><br>
                          </div>
                          <div>LAPW2 - Error. Check file lapw2.error</div>
                          <div>cp: cannot stat `.in.tmp': No such file
                            or directory</div>
                          <div>rm: cannot remove `.in.tmp': No such file
                            or directory</div>
                          <div><b>rm: cannot remove `.in.tmp1': No such
                              file or directory</b></div>
                          <div><b><br>
                            </b></div>
                          <div><b>2. TiC.dayfile (abridged) </b></div>
                          <div>...</div>
                          <div>    start <span class="Apple-tab-span"
                              style="white-space: pre;"> </span>(Sun
                            Oct 31 16:25:06 MDT 2010) with lapw0 (40/99
                            to go)</div>
                          <div>    cycle 1 <span class="Apple-tab-span"
                              style="white-space: pre;"> </span>(Sun
                            Oct 31 16:25:06 MDT 2010) <span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>(40/99 to go)</div>
                          <div><br>
                          </div>
                          <div>&gt;   lapw0 -p<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>(16:25:06) starting
                            parallel lapw0 at Sun Oct 31 16:25:07 MDT
                            2010</div>
                          <div>-------- .machine0 : 16 processors</div>
                          <div>invalid "local" arg: -machinefile</div>
                          <div><br>
                          </div>
                          <div>0.436u 0.412s 0:04.63 18.1%<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>0+0k 2600+0io 1pf+0w</div>
                          <div>&gt;   lapw1  -up -p   <span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>(16:25:12) starting
                            parallel lapw1 at Sun Oct 31 16:25:12 MDT
                            2010</div>
                          <div>-&gt;  starting parallel LAPW1 jobs at
                            Sun Oct 31 16:25:12 MDT 2010</div>
                          <div>running LAPW1 in parallel mode (using
                            .machines)</div>
                          <div>2 number_of_parallel_jobs</div>
                          <div>     r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
                            r1i0n0 r1i0n0 r1i0n0(1)      r1i0n1 r1i0n1
                            r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1)
                                 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0
                            r1i0n0 r1i0n0 r1i0n0(1)    Summary of
                            lapw1para:</div>
                          <div>   r1i0n0<span class="Apple-tab-span"
                              style="white-space: pre;"> </span> k=0<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span> user=0<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span> wallclock=0</div>
                          <div>   r1i0n1<span class="Apple-tab-span"
                              style="white-space: pre;"> </span> k=0<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span> user=0<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span> wallclock=0</div>
                          <div>...</div>
                          <div>0.116u 0.316s 0:10.48 4.0%<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>0+0k 0+0io 0pf+0w</div>
                          <div>&gt;   lapw2 -up -p  <span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>(16:25:34) running LAPW2 in
                            parallel mode</div>
                          <div>**  LAPW2 crashed!</div>
                          <div>0.032u 0.104s 0:01.13 11.5%<span
                              class="Apple-tab-span" style="white-space:
                              pre;"> </span>0+0k 82304+0io 8pf+0w</div>
                          <div>error: command  
                            /home/xiew/WIEN2k_10/lapw2para -up
                            uplapw2.def   failed</div>
                          <div><br>
                          </div>
                          <div><b>3. uplapw2.error </b></div>
                          <div>Error in LAPW2</div>
                          <div> 'LAPW2' - can't open unit: 18          
                                                                 </div>
                          <div> 'LAPW2' -        filename: TiC.vspup    
                                                                </div>
                          <div> 'LAPW2' -          status: old        
                             form: formatted                      </div>
                          <div>**  testerror: Error in Parallel LAPW2</div>
                          <div><br>
                          </div>
                          <div>
                            <div>
                              <div><b>4. .machines</b></div>
                              <div>#</div>
                              <div>1:r1i0n0:8</div>
                              <div>1:r1i0n1:8</div>
                              <div>lapw0:r1i0n0:8 r1i0n1:8 </div>
                              <div>granularity:1</div>
                              <div>extrafine:1</div>
                            </div>
                          </div>
                          <div><br>
                          </div>
                          <div>
                            <div><b>5. compilers, MPI and options</b></div>
                            <div>Intel Compilers  and MKL 11.1.046</div>
                            <div>Intel MPI 3.2.0.011</div>
                            <div><br>
                            </div>
                            <div>current:FOPT:-FR -mp1 -w -prec_div
                              -pc80 -pad -ip -DINTEL_VML -traceback</div>
                            <div>current:FPOPT:-FR -mp1 -w -prec_div
                              -pc80 -pad -ip -DINTEL_VML -traceback</div>
                            <div>current:LDFLAGS:$(FOPT)
                              -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t
                              -pthread</div>
                            <div>current:DPARALLEL:'-DParallel'</div>
                            <div>current:R_LIBS:-lmkl_lapack
                              -lmkl_intel_lp64 -lmkl_intel_thread
                              -lmkl_core -openmp -lpthread -lguide</div>
                            <div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t


                              -lmkl_scalapack_lp64
                              /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a
                              -Wl,--start-group -lmkl_intel_lp64
                              -lmkl_intel_thread -lmkl_core
                              -lmkl_blacs_intelmpi_lp64 -Wl,--end-group
                              -openmp -lpthread
                              -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi
                              -lfftw $(R_LIBS)</div>
                            <div>current:MPIRUN:mpirun -np _NP_
                              -machinefile _HOSTS_ _EXEC_</div>
                          </div>
                          <div><br>
                          </div>
                          <div>Best regards,</div>
                          <div>Wei Xie</div>
                          <div>Computational Materials Group</div>
                          <div>University of Wisconsin-Madison</div>
                          <div><br>
                          </div>
                          <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
                        </blockquote>
                      </div>
                      _______________________________________________<br>
                      Wien mailing list<br>
                      <a moz-do-not-send="true"
                        href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
                      <a moz-do-not-send="true"
                        class="moz-txt-link-freetext"
                        href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
                    </blockquote>
                  </div>
                  <br>
                </div>
                <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
              </blockquote>
            </div>
            _______________________________________________<br>
            Wien mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
            <a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
          </blockquote>
        </div>
        <br>
      </div>
      <pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
    </blockquote>
  </body>
</html>