[Wien] SLURM cluster issues

Straus, Daniel B dstraus at tulane.edu
Fri Apr 26 23:43:38 CEST 2024


Hi Peter,

Thanks for the response. In case this is helpful to anyone in the future:

To get w2web working on the cluster and not get banned for accidentally running things on the login node, I modified the w2web source so it does not create a new configuration for every node by fixing the host name in the configuration part of the script. I then run w2web on a compute node using an interactive job with port forwarding enabled. This way, everything will either run on the compute node running w2web or it will get submitted as a batch job to the queue if I use a submission script.


Daniel Straus
Assistant Professor
Department of Chemistry
Tulane University
5088 Percival Stern Hall
6400 Freret Street
New Orleans, LA 70118
(504) 862-3585
http://straus.tulane.edu/


From: Peter Blaha <peter.blaha at tuwien.ac.at>
Sent: Tuesday, April 16, 2024 1:58 AM
To: wien at zeus.theochem.tuwien.ac.at
Subject: Re: [Wien] SLURM cluster issues


Hi,


I am trying to set up WIEN2k ver 23.2 to run on a SLURM cluster. I have gotten it to work with SCALAPACK, runnning with a slurm batch submission script through w2web by following the examples.

I have two issues.

1.       Is it possible to make the “x dstart” button in the initialize web interface submit its job with a script like is possible for running SCF? Right now, it runs on the w2web node, and I can’t use a preexisting .machines file because all jobs are submitted through SLURM so the .machines needs to be generated on the fly.

If you really want to run this in parallel on a compute node (makes only sense for really big cases with 100 or more atoms/unit cell), you have to do it similar as with the scf calculation and execute it via a script.

A user could activate the   -nodstart   option in the initialization and run dstart as "single program" with the submission method.

1.

2.       I cannot seem to get ELPA to work. It compiles fine and passes the ELPA make check tests. However, when I try to run the TiC example in the usersguide, I always get an error. I have a feeling that ELPA is not using the correct kernel—is there a way to specify that though WIEN2k, or should I set a default ELPA kernel through ELPA ./configure?

Here is a link to a zip file with what I hope are the relevant files: https://tulane.box.com/s/ozohfwe0xyoipb8jzxeh3ec15imq1eam. My email got blocked when I attached them directly.


I guess the only problem is that the TiC case is too small to run in mpi parallel.

In WIEN2k one has to adapt the parallelization to the case one is studying.

The openMP parallelization (on one node) works always, but is efficient only up to 4 nodes. It should always be used.

The k-point parallelization is the second option, it is helpful if one has many k-points in the input and the lapw1 step takes more than eg. 20 seconds.

The last parllelization is via mpi using SCALAPACK and/or ELPA. The latter is much more efficient, but works only if the problem size (number of atoms ---> size of the eigenvalue problem) exceeds a few thousands. Even with such intermediate sizes (NMAT between 3000 -20000) the number of cores should not be too large, otherwise communication wins and it takes longer than on fewer cores.

For really large problems (NMAT up to 100000x100000) many cores (eg. 512) can be used efficiently.



PS: I always compile ELPA without machine specific options and let it decide on runtime what is available on the specific hardware).



PPS: I'd also suggest to run the w2web interface ONLY on a local workstation and setting up a case there. Only after the initialization I'd migrate (migrate_lapw) this to a supercomputer and run the scf there.



Best regards

--

-----------------------------------------------------------------------

Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna

Phone: +43-158801165300

Email: peter.blaha at tuwien.ac.at<mailto:peter.blaha at tuwien.ac.at>

WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at

-------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20240426/b6b34856/attachment.htm>


More information about the Wien mailing list