<div dir="ltr">Dear All,<div><br></div><div>As I am currently trying to get Wien2k running on Stampede (also SLURM), let me add a little clarification without disagreeing with anything Peter said.</div><div><br></div><div>A typical workflow in Wien2k is (very simplified) an iterative loop controlled by csh scripts:</div><div><br></div><div>1) A single serial multithreaded or mpi task</div><div>2) A number of parallel multithreaded or mpi tasks at the same time</div><div>3) A single multithreaded task</div><div><br></div><div>Wien2k uses a file .machines constructed by the user to control these. Peter like to do this with csh scripts, I prefer a different approach using an unsupported utility Machines2W. Both need a list of the nodes available to the user, for instance with PBS/QSUB this is in PBS_NODEFILE. There are various ways to generate this with SLURM.</div><div><br></div><div>Wien2k does not have any internal code that allows it to interrogate the batch control system to know whether the .machines file is correct. Unfortunately on all OS that I can think of it is possible to incorrectly construct the file .machines and as a consequence oversubscribe nodes.</div><div><br></div><div>In principle one can manipulate at the OS level how parallel tasks are executed, for instance by changing (for mpi) how mpirun bootstraps tasks. This works fine if users are only going to launch a single program, e.g.</div><div><br></div><div>mpirun -np 64 program</div><div><br></div><div>when one can replace mpirun with something else. However, this can be inappropriate for Wien2k and prevent it from working. I have seen many cases on supercomputers around the world where changes have been made at the OS level that are specialized and non-standard, and as a consequence made Wien2k inoperable. I strongly suggest care with customizations.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 12, 2015 at 6:42 AM, Peter Blaha <span dir="ltr"><<a href="mailto:pblaha@theochem.tuwien.ac.at" target="_blank">pblaha@theochem.tuwien.ac.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
WIEN2k has a usersguide, where the different parallelization modes are<br>
extensively described.<br>
<br>
On a cluster with a queuing system (like SLURM) it should not even be<br>
possible to access nodes (except the frontend) via ssh without using<br>
SLURM (on our SLURM machine ssh is possible only to nodes which are<br>
assigned to a user by salloc or a sbatch job), thus overloading can be<br>
prevented.<br>
<br>
We ALWAYS run our jobs using SLURM and typically submit them using<br>
<br>
sbatch slurm.job<br>
<br>
Now you mentioned correctly, that wien2k needs a ".machine" file and<br>
thus "slurm.job" has to create it on the fly.<br>
<br>
I've provided an example script (which you may need to adapt for user or<br>
resource specifications) at<br>
<br>
<a href="http://www.wien2k.at/reg_user/faq/pbs.html" rel="noreferrer" target="_blank">http://www.wien2k.at/reg_user/faq/pbs.html</a><br>
<br>
One more hint: in the file $WIENROOT/parallel_options one specifies if<br>
k-point parallel (USE_REMOTE) and mpi-parallel (MPI_REMOTE) jobs are<br>
started using ssh (1) or not (0).<br>
<br>
setenv USE_REMOTE 1 or 0<br>
setenv MPI_REMOTE 0<br>
<br>
On modern mpi-versions use always MPI_REMOTE=0.<br>
Usually k-point parallelism is meaningful only for up to 8 (then set<br>
OMP_NUM_THREAD=2) or 16 cores, otherwise the overhead is too big. In<br>
such cases one would use only ONE node and it runs as a "shared-memory<br>
machine (USE_REMOTE=0) (without mpi at all).<br>
On medium sized cases with a few k-points and larger matrices, a "mixed"<br>
k-parallel and mpi-parallel setup is best and is used in the slurm.job<br>
example above.<br>
<br>
<br>
PS: I'm sending this also to our WIEN2k-mailing list, because this is of<br>
general interest and I don't want to write the same email all the time<br>
again.<br>
PPS: Please use in general the mailing list (see <a href="http://www.wien2k.at" rel="noreferrer" target="_blank">www.wien2k.at</a>), as I<br>
normally do not answer questions directly sent to me.<br>
<br>
Best regards<br>
Peter Blaha<br>
<br>
On 11/11/2015 07:08 PM, Robb III, George B. wrote:<br>
> Hi Dr. Schwartz / Dr. Blaha-<br>
><br>
> We have noticed on our SLURM <<a href="http://schedmd.com/#index" rel="noreferrer" target="_blank">http://schedmd.com/#index</a>> based research<br>
> cluster that WIEN2k suite commands take advantage of a .machines file to<br>
> spawn off ssh sessions to individual nodes vs using a scheduler.<br>
><br>
> We have SLURM configured to control cluster resource allocations and<br>
> have collisions of resources when ssh processes are called from the<br>
> WIEN2k suite.<br>
><br>
> e.g. SLURM controls node2 and has 95% allocated resources for jobs, but<br>
> WIEN2k process is launched from the head node it will ssh to node2 (due<br>
> to the 5% free resources) and spawn additional un-SLURM-managed<br>
> processes on node2. Node2 is now over subscribed.<br>
><br>
> Does the WIEN2k suite have an administrative guide or documentation?<br>
> Does the WIEN2k suite have recommend cluster manager (i.e. PBS, SLURM,<br>
> LSF, etc..)?<br>
><br>
> Thanks again for any assistance and looking forward having the labs on<br>
> our campus use the WIEN2k suite.<br>
><br>
> Thanks,<br>
><br>
> George B. Robb III<br>
> Systems Administrator<br>
> Research Computing Support Services - (RCSS)<br>
> University of Missouri System<br>
<br>
--<br>
<br>
P.Blaha<br>
--------------------------------------------------------------------------<br>
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna<br>
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982<br>
Email: <a href="mailto:blaha@theochem.tuwien.ac.at">blaha@theochem.tuwien.ac.at</a> WIEN2k: <a href="http://www.wien2k.at" rel="noreferrer" target="_blank">http://www.wien2k.at</a><br>
WWW: <a href="http://www.imc.tuwien.ac.at/staff/tc_group_e.php" rel="noreferrer" target="_blank">http://www.imc.tuwien.ac.at/staff/tc_group_e.php</a><br>
--------------------------------------------------------------------------<br>
_______________________________________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" rel="noreferrer" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" rel="noreferrer" target="_blank">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Professor Laurence Marks<br>Department of Materials Science and Engineering<br>Northwestern University<br><a href="http://www.numis.northwestern.edu" target="_blank">www.numis.northwestern.edu</a><div>Corrosion in 4D: <a href="http://MURI4D.numis.northwestern.edu" target="_blank">MURI4D.numis.northwestern.edu</a><br>Co-Editor, Acta Cryst A<br>"Research is to see what everybody else has seen, and to think what nobody else has thought"<br>Albert Szent-Gyorgi</div></div></div>
</div>