[Wien] SLURM cluster issues

Tue Apr 16 08:57:47 CEST 2024

Hi,

> I am trying to set up WIEN2k ver 23.2 to run on a SLURM cluster. I 
> have gotten it to work with SCALAPACK, runnning with a slurm batch 
> submission script through w2web by following the examples.
>
> I have two issues.
>
>  1. Is it possible to make the “x dstart” button in the initialize web
>     interface submit its job with a script like is possible for
>     running SCF? Right now, it runs on the w2web node, and I can’t use
>     a preexisting .machines file because all jobs are submitted
>     through SLURM so the .machines needs to be generated on the fly.
>
If you really want to run this in parallel on a compute node (makes only 
sense for really big cases with 100 or more atoms/unit cell), you have 
to do it similar as with the scf calculation and execute it via a script.

A user could activate the   -nodstart   option in the initialization and 
run dstart as "single program" with the submission method.

> 1.
>
>
>  2. I cannot seem to get ELPA to work. It compiles fine and passes the
>     ELPA make check tests. However, when I try to run the TiC example
>     in the usersguide, I always get an error. I have a feeling that
>     ELPA is not using the correct kernel—is there a way to specify
>     that though WIEN2k, or should I set a default ELPA kernel through
>     ELPA ./configure?
>
> Here is a link to a zip file with what I hope are the relevant files: 
> https://tulane.box.com/s/ozohfwe0xyoipb8jzxeh3ec15imq1eam. My email 
> got blocked when I attached them directly.
>
I guess the only problem is that the TiC case is too small to run in mpi 
parallel.

In WIEN2k one has to adapt the parallelization to the case one is studying.

The openMP parallelization (on one node) works always, but is efficient 
only up to 4 nodes. It should always be used.

The k-point parallelization is the second option, it is helpful if one 
has many k-points in the input and the lapw1 step takes more than eg. 20 
seconds.

The last parllelization is via mpi using SCALAPACK and/or ELPA. The 
latter is much more efficient, but works only if the problem size 
(number of atoms ---> size of the eigenvalue problem) exceeds a few 
thousands. Even with such intermediate sizes (NMAT between 3000 -20000) 
the number of cores should not be too large, otherwise communication 
wins and it takes longer than on fewer cores.

For really large problems (NMAT up to 100000x100000) many cores (eg. 
512) can be used efficiently.

PS: I always compile ELPA without machine specific options and let it 
decide on runtime what is available on the specific hardware).

PPS: I'd also suggest to run the w2web interface ONLY on a local 
workstation and setting up a case there. Only after the initialization 
I'd migrate (migrate_lapw) this to a supercomputer and run the scf there.

Best regards

-- 
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email:peter.blaha at tuwien.ac.at           
WWW:http://www.imc.tuwien.ac.at       WIEN2k:http://www.wien2k.at
-------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20240416/a2ada800/attachment.htm>