[Wien] configuring parallel options using ssh

Laurence Marks L-marks at northwestern.edu
Tue Sep 11 04:35:18 CEST 2018


You state "However, a .machines file with several machines will run using all
required CPUs on the machine where launched (ignoring hosts)."

That implies that you have not correctly configured the command to execute
the mpi task. Without knowledge of what this is on your system (mpirun,
srun, other) it is impossible to say more than this.


On Mon, Sep 10, 2018, 13:57 Luc Fruchter <luc.fruchter at u-psud.fr> wrote:

> Dear users,
>
> I failed configuring the parallel options to run cases on several
> machines, each of them with several CPUs, driven by ssh protocol.
>
> * Configuring the parallel options with: shared memory, MPI = 0, ssh
> protocol, allows to run parallel jobs using several CPUs on the same
> machine. However, a .machines file with several machines will run using
> all required CPUs on the machine where launched (ignoring hosts).
>
> - Configuring with: no shared memory, MPI = 0, ssh protocol, will run no
> parallel jobs, either on the same or different machines (Below is the
> output for the error in this case).
>
> All machines communicate without problem with ssh and no password, and
> have identical file paths.
>
> Thanks for helping
>
> ------------------------------------------------------------------
>
>  >   lapw0  -p  (20:33:36) starting parallel lapw0 at Mon Sep 10 20:33:36
> CEST 2018
> -------- .machine0 : processors
> running lapw0 in single mode
> 6.793u 0.073s 0:06.86 100.0%    0+0k 0+5152io 0pf+0w
>  >   lapw1  -p          (20:33:43) starting parallel lapw1 at Mon Sep 10
> 20:33:43 CEST 2018
> ->  starting parallel LAPW1 jobs at Mon Sep 10 20:33:43 CEST 2018
> running LAPW1 in parallel mode (using .machines)
> 1 number_of_parallel_jobs
>       localhost(48)    Summary of lapw1para:
>     localhost    k=48    user=0  wallclock=0
> 0.112u 0.158s 0:02.28 11.4%     0+0k 0+224io 0pf+0w
>  >   lapw2 -p           (20:33:45) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.085u 0.062s 0:00.13 107.6%    0+0k 0+872io 0pf+0w
> error: command   /root/Documents/WIEN2KROOT/lapw2para lapw2.def   failed
>
>  >   stop error
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=P9YpOLMVPRwD8rg_-dqUngDGtvXLh4QdqM0nLjjzfgI&e=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=9gXTE0OrAjv5wkaa5hWJTChJjI_2UG4VVZDEVV4W2ZM&e=
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180910/622bd75c/attachment.html>


More information about the Wien mailing list