<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><br></div><div>Hi,</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>I have Wien2K running on a cluster of linux boxes each with 32 cores and connected by 10Gb ethernet. I have compiled Wien2K by the 3.174 version of Wien2K (I learned the hard way that bugs in the newer versions of the Intel compiler lead to crashes in Wien2K). I have also installed Intel's MPI. First, the single process Wien2K, let's say for the TiC case, works fine. It also works fine when I use a .machines file like</div><div><br></div><div>granulaity:1</div><div>localhost:1</div><div>localhost:1</div><div>… (24 times).</div><div><br></div><div>This file leads to parallel execution without error. I can vary the number of processes by increasing the number of localhost:1 and the number of localhost:1 lines in the file and still everything works fine. When I try to use mpi to communicate with one process, it works as well.</div><div><br></div><div>1:localhost:1 </div><div><br></div><div></div><blockquote type="cite"><div>l<span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">starting parallel lapw1 at Mon Jan 23 06:49:16 JST 2012</span></div><pre>-> starting parallel LAPW1 jobs at Mon Jan 23 06:49:16 JST 2012
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
[1] 22417
LAPW1 END
[1] + Done ( cd $PWD; $t $exe ${def}_$loop.def; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
localhost(111) 179.004u 4.635s 0:32.73 561.0%        0+0k 0+26392io 0pf+0w
Summary of lapw1para:
localhost         k=111         user=179.004         wallclock=32.73
179.167u 4.791s 0:35.61 516.5%        0+0k 0+26624io 0pf+0w
</pre></blockquote><div><br></div><div>Changing the machine file to use more than one process (the same form of error occurs for more than 2)</div><div><br></div><div>1:localhost:2</div><div><br></div><div>lead to a run time error in the MPI subsystem.</div><div><br></div><div><pre></pre><blockquote type="cite"><pre>starting parallel lapw1 at Mon Jan 23 06:51:04 JST 2012
-> starting parallel LAPW1 jobs at Mon Jan 23 06:51:04 JST 2012
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
[1] 22673
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
MPI_Comm_size(76).: Invalid communicator
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
MPI_Comm_size(76).: Invalid communicator
[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
localhost localhost(111) APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
0.037u 0.036s 0:00.06 100.0%        0+0k 0+0io 0pf+0w
TiC.scf1_1: No such file or directory.
Summary of lapw1para:
localhost         k=0         user=111         wallclock=0
0.105u 0.168s 0:03.21 8.0%        0+0k 0+216io 0pf+0w
</pre></blockquote><div><br></div></div><div>I have properly sourced the appropriate runtime environment for the Intel system. For example, compiling (mpiifort) and running the f90 mpi test program from intel produces:</div><div><div><br></div><div><br></div><div><br></div><div></div></div><blockquote type="cite"><div><div>mpirun -np 32 /home/paulfons/mpitest/testf90</div><div> Hello world: rank 0 of 32 running on </div><div> asccmp177 </div><div> </div><div> Hello world: rank 1 of 32 running on (32 times)</div></div></blockquote><div><br></div><div>Does anyone have any suggestions as to what to try next? I am not sure how to debug things from here. I have about 512 nodes that I can use for larger calculations that only can be accessed by mpi (the ssh setup works fine as well by the way). It would be great to figure out what is wrong.</div><div><br></div><div>Thanks.</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><br><div apple-content-edited="true">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">Dr. Paul Fons</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; ">Functional Nano-phase-change Research Team</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; ">Team Leader</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; ">Nanodevice Innovation Research Center (NIRC)</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">National Institute for Advanced Industrial Science & Technology</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">METI</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">AIST Central 4, Higashi 1-1-1</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">Tsukuba, Ibaraki JAPAN 305-8568</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 12px/normal Helvetica; min-height: 14px; font-size: 12px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">tel. +81-298-61-5636</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">fax. +81-298-61-2939</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 12px/normal Helvetica; min-height: 14px; font-size: 12px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; ">email: </font><font face="Helvetica" size="3" color="#1919ff" style="font: normal normal normal 12px/normal Helvetica; color: rgb(25, 25, 255); "><u style="color: rgb(25, 25, 255); -webkit-text-decorations-in-effect: underline; "><a href="mailto:paul-fons@aist.go.jp">paul-fons@aist.go.jp</a></u></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 12px/normal Helvetica; min-height: 14px; font-size: 12px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="Hiragino Kaku Gothic Pro">The following lines are in a Japanese font</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="Hiragino Kaku Gothic Pro"><br class="khtml-block-placeholder"></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="Hiragino Kaku Gothic Pro"><span class="Apple-style-span" style="font-family: 'Hiragino Kaku Gothic Pro'; ">〒305-8562 茨城県つくば市つくば中央東 1-1-1</span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="Hiragino Kaku Gothic Pro"><span class="Apple-style-span" style="font-family: 'Hiragino Kaku Gothic Pro'; ">産業技術総合研究所</span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="Hiragino Kaku Gothic Pro"><span class="Apple-style-span" style="font-family: 'Hiragino Kaku Gothic Pro'; ">ナノ電子デバイス研究センター</span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><font class="Apple-style-span" face="'Hiragino Kaku Gothic Pro'">相変化新規機能デバイス研究チーム チームリーダー</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><span class="Apple-style-span" style="font-family: 'Hiragino Kaku Gothic Pro'; ">ポール・フォンス</span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 12px; "><br class="khtml-block-placeholder"></div></div></div><br class="Apple-interchange-newline"><br class="Apple-interchange-newline">
</div>
<br></body></html>