[Wien] Error in mpi+k point parallelization across multiple nodes
lung Fermin
ferminlung at gmail.com
Wed Apr 29 05:17:31 CEST 2015
Thanks for Prof. Marks' comment.
1. In the previous email, I have missed to copy the line
setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_"
It was in the parallel_option. Sorry about that.
2. I have checked that the running program was lapw1c_mpi. Besides, when
the mpi calculation was done on a single node for some other system, the
results are consistent with the literature. So I believe that the mpi code
has been setup and compiled properly.
Would there be something wrong with my option in siteconfig..? Do I have to
set some command to bind the job? Any other possible cause of the error?
Any suggestions or comments would be appreciated. Thanks.
Regards,
Fermin
----------------------------------------------------------------------------------------------------
You appear to be missing the line
setenv WIEN_MPIRUN=...
This is setup when you run siteconfig, and provides the information on how
mpi is run on your system.
N.B., did you setup and compile the mpi code?
___________________________
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Apr 28, 2015 4:22 AM, "lung Fermin" <ferminlung at gmail.com> wrote:
Dear Wien2k community,
I am trying to perform calculation on a system of ~100 in-equivalent atoms
using mpi+k point parallelization on a cluster. Everything goes fine when
the program was run on a single node. However, if I perform the calculation
across different nodes, the follow error occurs. How to solve this problem?
I am a newbie to mpi programming, any help would be appreciated. Thanks.
The error message (MVAPICH2 2.0a):
---------------------------------------------------------------------------------------------------
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1
-13 z1-13 z1-13 z1-13 z1-13 z1-13
number of processors: 32
LAPW0 END
[z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-13
aborted: Error while reading a PMI socket (4)
[z1-13:mpispawn_0][child_handler] MPI process (rank: 11, pid: 8546)
terminated with signal 9 -> abort job
[z1-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 8.
MPI process died?
[z1-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
[z1-2:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 12.
MPI process died?
[z1-2:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
[z1-2:mpispawn_0][child_handler] MPI process (rank: 0, pid: 35454)
terminated with signal 9 -> abort job
[z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-2
aborted: MPI process error (1)
[cli_15]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15
> stop error
------------------------------------------------------------------------------------------------------
The .machines file:
#
1:z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-2
1:z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13
z1-13 z1-13 z1-13 z1-13
granularity:1
extrafine:1
--------------------------------------------------------------------------------------------------------
The parallel_options:
setenv TASKSET "no"
setenv USE_REMOTE 0
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
--------------------------------------------------------------------------------------------------------
Thanks.
Regards,
Fermin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150429/47d47907/attachment.html>
More information about the Wien
mailing list