[Wien] Error in mpi+k point parallelization across multiple nodes

lung Fermin ferminlung at gmail.com
Wed Apr 29 05:17:31 CEST 2015


Thanks for Prof. Marks' comment.

1. In the previous email, I have missed to copy the line

setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_"
It was in the parallel_option. Sorry about that.

2. I have checked that the running program was lapw1c_mpi. Besides, when
the mpi calculation was done on a single node for some other system, the
results are consistent with the literature. So I believe that the mpi code
has been setup and compiled properly.

Would there be something wrong with my option in siteconfig..? Do I have to
set some command to bind the job? Any other possible cause of the error?

Any suggestions or comments would be appreciated. Thanks.


Regards,

Fermin

----------------------------------------------------------------------------------------------------

You appear to be missing the line

setenv WIEN_MPIRUN=...

This is setup when you run siteconfig, and provides the information on how
mpi is run on your system.

N.B., did you setup and compile the mpi code?

___________________________
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi

On Apr 28, 2015 4:22 AM, "lung Fermin" <ferminlung at gmail.com> wrote:

Dear Wien2k community,



I am trying to perform calculation on a system of ~100 in-equivalent atoms
using mpi+k point parallelization on a cluster. Everything goes fine when
the program was run on a single node. However, if I perform the calculation
across different nodes, the follow error occurs. How to solve this problem?
I am a newbie to mpi programming, any help would be appreciated. Thanks.



The error message (MVAPICH2 2.0a):

---------------------------------------------------------------------------------------------------

Warning: no access to tty (Bad file descriptor).

Thus no job control in this shell.

z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1

-13 z1-13 z1-13 z1-13 z1-13 z1-13

number of processors: 32

 LAPW0 END

[z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-13
aborted: Error while reading a PMI socket (4)

[z1-13:mpispawn_0][child_handler] MPI process (rank: 11, pid: 8546)
terminated with signal 9 -> abort job

[z1-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 8.
MPI process died?

[z1-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?

[z1-2:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 12.
MPI process died?

[z1-2:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?

[z1-2:mpispawn_0][child_handler] MPI process (rank: 0, pid: 35454)
terminated with signal 9 -> abort job

[z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-2
aborted: MPI process error (1)

[cli_15]: aborting job:

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15



>   stop error

------------------------------------------------------------------------------------------------------



The .machines file:

#

1:z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2
z1-2 z1-2

1:z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13
z1-13 z1-13 z1-13 z1-13

granularity:1

extrafine:1

--------------------------------------------------------------------------------------------------------

The parallel_options:



setenv TASKSET "no"

setenv USE_REMOTE 0

setenv MPI_REMOTE 1

setenv WIEN_GRANULARITY 1



--------------------------------------------------------------------------------------------------------



Thanks.



Regards,

Fermin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150429/47d47907/attachment.html>


More information about the Wien mailing list