[Wien] SLURM support "no ssh" for WIEN2k?
Peter Blaha
pblaha at theochem.tuwien.ac.at
Thu Nov 12 13:42:21 CET 2015
Hi,
WIEN2k has a usersguide, where the different parallelization modes are
extensively described.
On a cluster with a queuing system (like SLURM) it should not even be
possible to access nodes (except the frontend) via ssh without using
SLURM (on our SLURM machine ssh is possible only to nodes which are
assigned to a user by salloc or a sbatch job), thus overloading can be
prevented.
We ALWAYS run our jobs using SLURM and typically submit them using
sbatch slurm.job
Now you mentioned correctly, that wien2k needs a ".machine" file and
thus "slurm.job" has to create it on the fly.
I've provided an example script (which you may need to adapt for user or
resource specifications) at
http://www.wien2k.at/reg_user/faq/pbs.html
One more hint: in the file $WIENROOT/parallel_options one specifies if
k-point parallel (USE_REMOTE) and mpi-parallel (MPI_REMOTE) jobs are
started using ssh (1) or not (0).
setenv USE_REMOTE 1 or 0
setenv MPI_REMOTE 0
On modern mpi-versions use always MPI_REMOTE=0.
Usually k-point parallelism is meaningful only for up to 8 (then set
OMP_NUM_THREAD=2) or 16 cores, otherwise the overhead is too big. In
such cases one would use only ONE node and it runs as a "shared-memory
machine (USE_REMOTE=0) (without mpi at all).
On medium sized cases with a few k-points and larger matrices, a "mixed"
k-parallel and mpi-parallel setup is best and is used in the slurm.job
example above.
PS: I'm sending this also to our WIEN2k-mailing list, because this is of
general interest and I don't want to write the same email all the time
again.
PPS: Please use in general the mailing list (see www.wien2k.at), as I
normally do not answer questions directly sent to me.
Best regards
Peter Blaha
On 11/11/2015 07:08 PM, Robb III, George B. wrote:
> Hi Dr. Schwartz / Dr. Blaha-
>
> We have noticed on our SLURM <http://schedmd.com/#index> based research
> cluster that WIEN2k suite commands take advantage of a .machines file to
> spawn off ssh sessions to individual nodes vs using a scheduler.
>
> We have SLURM configured to control cluster resource allocations and
> have collisions of resources when ssh processes are called from the
> WIEN2k suite.
>
> e.g. SLURM controls node2 and has 95% allocated resources for jobs, but
> WIEN2k process is launched from the head node it will ssh to node2 (due
> to the 5% free resources) and spawn additional un-SLURM-managed
> processes on node2. Node2 is now over subscribed.
>
> Does the WIEN2k suite have an administrative guide or documentation?
> Does the WIEN2k suite have recommend cluster manager (i.e. PBS, SLURM,
> LSF, etc..)?
>
> Thanks again for any assistance and looking forward having the labs on
> our campus use the WIEN2k suite.
>
> Thanks,
>
> George B. Robb III
> Systems Administrator
> Research Computing Support Services - (RCSS)
> University of Missouri System
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------
More information about the Wien
mailing list