<div dir="ltr"><p>Thanks for Prof. Marks' comment.</p><p>1. In the previous email, I have missed to copy the line</p><p>setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_"</p><div>It was in the parallel_option. Sorry about that.</div><p>2. I have checked that the running program was lapw1c_mpi. Besides, when the mpi calculation was done on a single node for some other system, the results are consistent with the literature. So I believe that the mpi code has been setup and compiled properly. <br></p><p>Would there be something wrong with my option in siteconfig..? Do I have to set some command to bind the job? Any other possible cause of the error?</p><p>Any suggestions or comments would be appreciated. Thanks.</p><p><br></p><p>Regards,</p><p>Fermin</p><p>----------------------------------------------------------------------------------------------------<br></p><p><span lang="EN-US">You appear to be missing the line</span></p>
<p><span lang="EN-US">setenv WIEN_MPIRUN=...</span></p>
<p><span lang="EN-US">This is setup when you run siteconfig, and provides the
information on how mpi is run on your system.</span></p>
<p><span lang="EN-US">N.B., did you setup and compile the mpi code?</span></p>
<p><span lang="EN-US">___________________________<br>
Professor Laurence Marks<br>
Department of Materials Science and Engineering<br>
Northwestern University<br>
<a href="http://www.numis.northwestern.edu">www.numis.northwestern.edu</a><br>
<a href="http://MURI4D.numis.northwestern.edu">MURI4D.numis.northwestern.edu</a><br>
Co-Editor, Acta Cryst A<br>
"Research is to see what everybody else has seen, and to think what nobody
else has thought"<br>
Albert Szent-Gyorgi</span></p>
<p class="MsoNormal"><span lang="EN-US">On Apr 28, 2015 4:22 AM, "lung
Fermin" <<a href="mailto:ferminlung@gmail.com">ferminlung@gmail.com</a>>
wrote:</span></p>
<p class="MsoNormal"><span lang="EN-US">Dear Wien2k community,</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">I am trying to perform calculation on a
system of ~100 in-equivalent atoms using mpi+k point parallelization on a
cluster. Everything goes fine when the program was run on a single node.
However, if I perform the calculation across different nodes, the follow error
occurs. How to solve this problem? I am a newbie to mpi programming, any help
would be appreciated. Thanks.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">The error message (MVAPICH2 2.0a):</span></p>
<p class="MsoNormal"><span lang="EN-US">---------------------------------------------------------------------------------------------------</span></p>
<p class="MsoNormal"><span lang="EN-US">Warning: no access to tty (Bad file
descriptor).</span></p>
<p class="MsoNormal"><span lang="EN-US">Thus no job control in this shell.</span></p>
<p class="MsoNormal"><span lang="EN-US">z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13
z1-13 z1-13 z1-13 z1-13 z1</span></p>
<p class="MsoNormal"><span lang="EN-US">-13 z1-13 z1-13 z1-13 z1-13 z1-13</span></p>
<p class="MsoNormal"><span lang="EN-US">number of processors: 32</span></p>
<p class="MsoNormal"><span lang="EN-US"> LAPW0 END</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-2:mpirun_rsh][process_mpispawn_connection]
mpispawn_0 from node z1-13 aborted: Error while reading a PMI socket (4)</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-13:mpispawn_0][child_handler] MPI
process (rank: 11, pid: 8546) terminated with signal 9 -> abort job</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-13:mpispawn_0][readline] Unexpected
End-Of-File on file descriptor 8. MPI process died?</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-13:mpispawn_0][mtpmi_processops] Error
while reading PMI socket. MPI process died?</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-2:mpispawn_0][readline] Unexpected
End-Of-File on file descriptor 12. MPI process died?</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-2:mpispawn_0][mtpmi_processops] Error
while reading PMI socket. MPI process died?</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-2:mpispawn_0][child_handler] MPI
process (rank: 0, pid: 35454) terminated with signal 9 -> abort job</span></p>
<p class="MsoNormal"><span lang="EN-US">[z1-2:mpirun_rsh][process_mpispawn_connection]
mpispawn_0 from node z1-2 aborted: MPI process error (1)</span></p>
<p class="MsoNormal"><span lang="EN-US">[cli_15]: aborting job:</span></p>
<p class="MsoNormal"><span lang="EN-US">application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 15</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">> stop error</span></p>
<p class="MsoNormal"><span lang="EN-US">------------------------------------------------------------------------------------------------------</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">The .machines file:</span></p>
<p class="MsoNormal"><span lang="EN-US">#</span></p>
<p class="MsoNormal"><span lang="EN-US">1:z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2</span></p>
<p class="MsoNormal"><span lang="EN-US">1:z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13
z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13</span></p>
<p class="MsoNormal"><span lang="EN-US">granularity:1</span></p>
<p class="MsoNormal"><span lang="EN-US">extrafine:1</span></p>
<p class="MsoNormal"><span lang="EN-US">--------------------------------------------------------------------------------------------------------</span></p>
<p class="MsoNormal"><span lang="EN-US">The parallel_options:</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">setenv TASKSET "no"</span></p>
<p class="MsoNormal"><span lang="EN-US">setenv USE_REMOTE 0</span></p>
<p class="MsoNormal"><span lang="EN-US">setenv MPI_REMOTE 1</span></p>
<p class="MsoNormal"><span lang="EN-US">setenv WIEN_GRANULARITY 1</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">--------------------------------------------------------------------------------------------------------</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks.</span></p>
<p class="MsoNormal"><span lang="EN-US"> </span></p>
<p class="MsoNormal"><span lang="EN-US">Regards,</span></p>
<p class="MsoNormal"><span lang="EN-US">Fermin</span></p></div>