[Wien] Issues with parallel runs

Gavin Abo gsabo at crimson.ua.edu
Fri Jan 11 05:10:46 CET 2019


As Prof. Marks hinted at, it looks to me that mpirun is properly set in 
your PATH of your current terminal such that mpirun works.

However, when you run "run_lapw -p", probably when that script executes 
the lapw1para_lapw script, the program is probably then doing an ssh 
into the nodes you have set in your .machines file. It is on the nodes 
that that mpirun command probably cannot be found.

You can likely test if that is the case in the terminal by trying 
commands like:

ssh localhost
which mpirun
exit

where localhost above should be replaced by the hostname (or ip address) 
to one of your local (e.g. https://en.wikipedia.org/wiki/Localhost ) or 
remote nodes that you have used in your hand edited .machines file.  Or 
if your .machines file is created automatically on the fly by your job 
script [ http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html ], 
which is usually the case for the clusters needed and used for mpi 
parallel calculations, you should be able find the hostnames in the 
.machines file that it tried to use for the calculation when it failed.

If you set the base path to mpirun in the PATH of your .bashrc (or 
.cshrc) [e.g., 
https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path ] and 
your system pushes that out to all nodes, that might resolve the 
problem.  If you are using a job script, depending on your queuing 
system, you might have to add an option to push the environment to the 
nodes [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15985.html 
].

If your system doesn't access remote nodes with ssh, see section "11.4 
Environment Variables" in the WIEN2k 18.2 userguide 
[http://susi.theochem.tuwien.ac.at/reg_user/textbooks/usersguide.pdf] 
about setting "USE_REMOTE 0" in parallel_options so that it does not use 
ssh.

Depending on the hardware specifications of your workstation, keep in 
mind as mentioned in the list before if it is a general-purpose 
computer, and not a high performance computing (HPC) cluster [ 
https://en.wikipedia.org/wiki/Supercomputer ], that k-point parallel 
might work better than mpi parallel for certain computer systems (or 
calculation cases):

http://susi.theochem.tuwien.ac.at/reg_user/benchmark/
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg03793.html
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08301.html
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html

On 1/10/2019 12:44 PM, Laurence Marks wrote:
> Most probably you forgot to export your PATH (e.g. from bash do 
> "export PATH") so the information is not making it beyond your shell. 
> You might have a bad csh/tcsh. Try adding "which lapw1_mpi" to 
> $WIENROOT/parallel_options, and check that file has correct setenv 
> for  WIEN_MPIRUN.
>
> On Thu, Jan 10, 2019 at 12:00 PM Matthew D Redell 
> <mredell1 at binghamton.edu <mailto:mredell1 at binghamton.edu>> wrote:
>
>     Hello,
>
>     I am running WIEN2k_2018.2 on CentOs7 and have come across the
>     following problem that I cannot seem to resolve.
>
>     After successfully initializing the calculation and setting up the
>     .machines for a single host (local workstation), I run: run_lapw -p
>
>     lapw0 ends fine, but the lapw1 returns
>     bash: mpirun: command not found
>
>     The same error occurs if I just try
>     x lapw1 -p
>
>     However, which mpirun
>     returns
>     /opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpirun
>
>     I also did a little troubleshooting to see if I could run lapw1 in
>     parallel via
>
>     mpirun -n 4 lapw1_mpi lapw1_1.def
>
>
>     which ran without any issues. Also, checking more…
>     grep MPIRUN $WIENROOT/WIEN2k_OPTIONS
>     returns
>     current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>
>     So, the only possibility I am able to deduce is that either the
>     run_lapw script or the lapw1para script is not locating the mpirun
>     command, but I do not know how to begin sorting out this issue.
>     Any help would be greatly appreciated.
>
>     Best,
>     Matt
>
>     ------------------------
>     Matthew D Redell
>     /Graduate Student/Teaching Assistant/
>     /Department of Physics, Applied Physics, and Astronomy/
>     *Binghamton University-State University of New York*
>     E-mail: mredell1 at binghamton.edu <mailto:mredell1 at binghamton.edu>
>     Office: SN-2011D
>
>
>
>
>
>
>
>
> -- 
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu> ; 
> Corrosion in 4D: MURI4D.numis.northwestern.edu 
> <http://MURI4D.numis.northwestern.edu>
> Partner of the CFW 100% program for gender equity, 
> www.cfw.org/100-percent <http://www.cfw.org/100-percent>
> Co-Editor, Acta Cryst A
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190110/4de3f76e/attachment-0001.html>


More information about the Wien mailing list