[Wien] Error in mpi+k point parallelization across multiple nodes
lung Fermin
ferminlung at gmail.com
Wed May 6 04:24:43 CEST 2015
Thanks for all the information and suggestions.
I have tried to change -lmkl_blacs_intelmpi_lp64 to -lmkl_blacs_lp64 and
recompile. However, I got the following error message in the screen output
LAPW0 END
[cli_14]: [cli_15]: [cli_6]: aborting job:
Fatal error in PMPI_Comm_size:
Invalid communicator, error stack:
PMPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7f190c) failed
PMPI_Comm_size(69).: Invalid communicator
aborting job:
Fatal error in PMPI_Comm_size:
Invalid communicator, error stack:
PMPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7f190c) failed
PMPI_Comm_size(69).: Invalid communicator
.......
[z0-5:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 20.
MPI process died?
[z0-5:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
[z0-5:mpispawn_0][child_handler] MPI process (rank: 14, pid: 11260) exited
with status 1
[z0-5:mpispawn_0][child_handler] MPI process (rank: 3, pid: 11249) exited
with status 1
[z0-5:mpispawn_0][child_handler] MPI process (rank: 6, pid: 11252) exited
with status 1
.....
Previously I compiled the program with -lmkl_blacs_intelmpi_lp64 and the
mpi parallelization on a single node seems to be working. I notice that
during the run, the *.error files have finite sizes, but I re-examine them
after the job finished and there were no errors written inside (and the
files have 0kb now). Does this indicates that the mpi is not running
probably at all even on a single node? But I have checked the output result
and it's in agreement with the non-mpi results..(for some simple cases)
I also tried changing the mpirun to mpiexec as suggested by Prof. Marks by
setting:
setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpiexec -np _NP_ -f _HOSTS_
_EXEC_"
in the parallel_option. In this case, the program does not run and also
does not terminate (qstat on cluster just gives 00:00:00 for the time with
a running status)..
Regards,
Fermin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150506/fb8070cc/attachment.html>
More information about the Wien
mailing list