[Wien] Error in mpi+k point parallelization across multiple nodes

Laurence Marks L-marks at northwestern.edu
Tue Apr 28 13:37:03 CEST 2015


You appear to be missing the line

setenv WIEN_MPIRUN=...

This is setup when you run siteconfig, and provides the information on how
mpi is run on your system.

N.B., did you setup and compile the mpi code?

___________________________
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Apr 28, 2015 4:22 AM, "lung Fermin" <ferminlung at gmail.com> wrote:

>  Dear Wien2k community,
>
>  I am trying to perform calculation on a system of ~100 in-equivalent
> atoms using mpi+k point parallelization on a cluster. Everything goes fine
> when the program was run on a single node. However, if I perform the
> calculation across different nodes, the follow error occurs. How to solve
> this problem? I am a newbie to mpi programming, any help would be
> appreciated. Thanks.
>
>  The error message (MVAPICH2 2.0a):
>
> ---------------------------------------------------------------------------------------------------
>  Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.
> z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
> z1-2 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1
> -13 z1-13 z1-13 z1-13 z1-13 z1-13
> number of processors: 32
>  LAPW0 END
> [z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-13
> aborted: Error while reading a PMI socket (4)
> [z1-13:mpispawn_0][child_handler] MPI process (rank: 11, pid: 8546)
> terminated with signal 9 -> abort job
> [z1-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 8.
> MPI process died?
> [z1-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
> process died?
> [z1-2:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 12.
> MPI process died?
> [z1-2:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
> process died?
> [z1-2:mpispawn_0][child_handler] MPI process (rank: 0, pid: 35454)
> terminated with signal 9 -> abort job
> [z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-2
> aborted: MPI process error (1)
> [cli_15]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15
>
>  >   stop error
>
> ------------------------------------------------------------------------------------------------------
>
>  The .machines file:
>  #
> 1:z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
> z1-2 z1-2
> 1:z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13
> z1-13 z1-13 z1-13 z1-13
> granularity:1
> extrafine:1
>
> --------------------------------------------------------------------------------------------------------
> The parallel_options:
>
>  setenv TASKSET "no"
> setenv USE_REMOTE 0
> setenv MPI_REMOTE 1
> setenv WIEN_GRANULARITY 1
>
>
> --------------------------------------------------------------------------------------------------------
>
>  Thanks.
>
>  Regards,
> Fermin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150428/74720bc1/attachment.html>


More information about the Wien mailing list