[Wien] Mpirun Errors

Gavin Abo gsabo at crimson.ua.edu
Mon Jul 6 06:21:00 CEST 2020


Yu,

In addition to the usersguide [1] describing use of "run_lapw -p" along 
with an appropriately set up .machines file, don't forget about the 
WIEN2k-notes of the University of Texas [2], the workshop video on 
Parallelization [3], and mailing list archive for previous posts on the 
topic of mpi parallelization (for example [4-10] are just a few of the 
many posts existing in the archive).

[1] http://susi.theochem.tuwien.ac.at/reg_user/textbooks/usersguide.pdf
[2] http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html
[3] http://susi.theochem.tuwien.ac.at/onlineworkshop/
[4] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08702.html
[5] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00985.html
[6] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18967.html
[7] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
[8] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15984.html
[9] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05622.html
[10] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html
[11] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17627.html

Kind Regards,

Gavin

On 7/5/2020 9:02 PM, Laurence Marks wrote:
> Please carefully read the user guide -- "mpirun -np 4 run_lapw" is not 
> how it works.
>
> Also, to use mpi you need scalapack, which you did not mention. If you 
> only have 4 cores you do not want to use mpi.
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Sun, Jul 5, 2020, 21:57 晨晨 <chiniku at qq.com <mailto:chiniku at qq.com>> 
> wrote:
>
>     Dear W2k developers and users,
>
>       The wien2k version is 19.2 on Linux with gfortran, OpenBlas and
>     openmpi. Now executing parallel run_lapw occurs errors.
>
>     When I run the command “run_lapw”, there is not any error. When I
>     run the command “mpirun -np 4 run_lapw”, there are some errors. I
>     have tested the openmpi installed successfully.
>
>     1. The following error occurs when running run_LAPW in parallel on
>     four processors:
>
>     [YG_cheny at yg TiC]$ mpirun -np 4 run_lapw
>
>     STOP  LAPW0 END
>
>     STOP  LAPW0 END
>
>     mv: cannot move `.tmp' to `TiC.dayfile': No such file or directory
>
>     STOP  LAPW0 END
>
>     STOP  LAPW0 END
>
>     printf: write error: No such file or directory
>
>     >   stop error
>
>     -------------------------------------------------------
>
>     Primary job terminated normally, but 1 process returned
>
>     a non-zero exit code. Per user-direction, the job has been aborted.
>
>     -------------------------------------------------------
>
>     >   stop error
>
>     --------------------------------------------------------------------------
>
>     mpirun detected that one or more processes exited with non-zero
>     status, thus causing
>
>     the job to be terminated. The first process to do so was:
>
>      Process name: [[45157,1],1]
>
>        Exit code: 9
>
>     2. Since the file TiC. Dayfile exists, I ran the DOS2UNIX command
>     in order to solve the problem. Then, "No such File or Directory"
>     message disappeared.
>
>     3.However, run the command “mpirun -np 4 run_lapw”again and the
>     following error message still appears:
>
>     [YG_cheny at yg TiC]$ mpirun -np 4 run_lapw
>
>     STOP  LAPW0 END
>
>     STOP  LAPW0 END
>
>     STOP  LAPW0 END
>
>     >   stop error
>
>     -------------------------------------------------------
>
>     Primary job terminated normally, but 1 process returned
>
>     a non-zero exit code. Per user-direction, the job has been aborted.
>
>     -------------------------------------------------------
>
>     >   stop error
>
>     --------------------------------------------------------------------------
>
>     mpirun detected that one or more processes exited with non-zero
>     status, thus causing
>
>     the job to be terminated. The first process to do so was:
>
>       Process name: [[46028,1],3]
>
>       Exit code:    9
>
>     --------------------------------------------------------------------------
>
>     Sincerely yours Yu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20200705/3e2cb8e4/attachment.html>


More information about the Wien mailing list