[Wien] SLURM support "no ssh" for WIEN2k?

Peter Blaha pblaha at theochem.tuwien.ac.at
Thu Nov 12 13:42:21 CET 2015


Hi,

WIEN2k has a usersguide, where the different parallelization modes are 
extensively described.

On a cluster with a queuing system (like SLURM) it should not even be 
possible to access nodes (except the frontend) via ssh without using 
SLURM (on our SLURM machine ssh is possible only to nodes which are 
assigned to a user by salloc or a sbatch job), thus overloading can be 
prevented.

We ALWAYS run our jobs using SLURM and typically submit them using

sbatch slurm.job

Now you mentioned correctly, that wien2k needs a ".machine" file and 
thus "slurm.job" has to create it on the fly.

I've provided an example script (which you may need to adapt for user or 
resource specifications) at

http://www.wien2k.at/reg_user/faq/pbs.html

One more hint: in the file $WIENROOT/parallel_options one specifies if 
k-point parallel (USE_REMOTE) and mpi-parallel (MPI_REMOTE) jobs are 
started using ssh (1) or not (0).

setenv USE_REMOTE 1 or 0
setenv MPI_REMOTE 0

On modern mpi-versions use always MPI_REMOTE=0.
Usually k-point parallelism is meaningful only for up to 8 (then set 
OMP_NUM_THREAD=2) or 16 cores, otherwise the overhead is too big. In 
such cases one would use only ONE node and it runs as a "shared-memory 
machine (USE_REMOTE=0) (without mpi at all).
On medium sized cases with a few k-points and larger matrices, a "mixed" 
k-parallel and mpi-parallel setup is best and is used in the slurm.job 
example above.


PS: I'm sending this also to our WIEN2k-mailing list, because this is of 
general interest and I don't want to write the same email all the time 
again.
PPS: Please use in general the mailing list (see www.wien2k.at), as I 
normally do not answer questions directly sent to me.

Best regards
Peter Blaha

On 11/11/2015 07:08 PM, Robb III, George B. wrote:
> Hi Dr. Schwartz / Dr. Blaha-
>
> We have noticed on our SLURM <http://schedmd.com/#index> based research
> cluster that WIEN2k suite commands take advantage of a .machines file to
> spawn off ssh sessions to individual nodes vs using a scheduler.
>
> We have SLURM configured to control cluster resource allocations and
> have collisions of resources when ssh processes are called from the
> WIEN2k suite.
>
> e.g. SLURM controls node2 and has 95% allocated resources for jobs, but
> WIEN2k process is launched from the head node it will ssh to node2 (due
> to the 5% free resources) and spawn additional un-SLURM-managed
> processes on node2.  Node2 is now over subscribed.
>
> Does the WIEN2k suite have an administrative guide or documentation?
> Does the WIEN2k suite have recommend cluster manager (i.e. PBS, SLURM,
> LSF, etc..)?
>
> Thanks again for any assistance and looking forward having the labs on
> our campus use the WIEN2k suite.
>
> Thanks,
>
> George B. Robb III
> Systems Administrator
> Research Computing Support Services - (RCSS)
> University of Missouri System

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------


More information about the Wien mailing list