<html><body><span style="display:block;" class="xfm_00445113"> Dear Prof. Blaha<div>Thank you!</div><div><br/></div><div>The description of script for cluster is here </div><div><a href="https://redmine.mcia.univ-bordeaux.fr/projects/cluster-curta/wiki/Slurm" target="_blank" rel="noreferrer noopener">https://redmine.mcia.univ-bordeaux.fr/projects/cluster-curta/wiki/Slurm</a><br/></div><div>(unfortunately it is in french and I'm not strong in cluster structures)</div><div><br/></div><div>yes, the cluster uses "module" system. I'v used commands like "module load ..." in .bashrc and slurm.job (In addition I include direct path to the compiller and mpi with "source" command in .bashrc).</div><div>To compile WIEN I used intel 2019.3.199. </div><div>FFTW 3.3.8 I have compiled by myself.</div><div>The WIEN2k compilation was with no errors.</div><div>The lapw1_mpi has been compiled with default options. Only the direct path to libraries was specified </div><div><br/></div><div>P.S. I cant reproduce the previous errors. Now running mpi, I got "permission denied" error with MPI_REMOTE=0</div><div><br/><br/><div style="font-size:0.9em;font-style:italic;"> --- Исходное сообщение ---<br/> От кого: "Peter Blaha" <pblaha@theochem.tuwien.ac.at><br/> Дата: 7 мая 2019, 13:08:58<br/></div> <br/><blockquote class="xfmc1" style="border-left:1px solid rgb(204, 204, 204);margin:0px 0px 0px 0.8ex;padding-left:1ex;"><pre>So it seems that your cluster forbids to use ssh (even on assigned
nodes). If this is the case. you MUST use USE_REMOTE=0 and with
k-parallel mode you can use only one node (32 cores).
For mpi I do not know. There should be some "userguide" (web-site,
wicki, ...) for your cluster, where all details how to use the cluster
are listed. In particular they should say:
Which mpi + mkl + fftw you should use during compilation (maybe you
have a "module" system ?). (You did not say anything how you compiled
lapw1_mpi ?)
How to execute a mpi job. On some clusters the standard "mpirun"
command is no longer supported, and on our cluster we have to use srun
instead.
I don't know about your cluster, this depends on the SLURM version and
the specific setup of the cluster.
PS: A possibility for the lapw1_mpi problems is always a mismatch
between mpi and blacs and Scalapack. Did you ever try to run dstart
or lapw0 in mpi mode. These are more "simple" mpi-programs as they do
not use SCALAPACK.
On 5/7/<span data-ukrnet-code="1911">19 11</span>:33 AM, <a href="mailto:webfinder@ukr.net" target="_self" rel="noreferrer noopener">webfinder@ukr.net</a> wrote:
> Dear Prof. Blaha
>
> thank you for the explanation!
> Sorry, I should put hostname in quotes. Script I used is based on that
> in the WIEN-FAQ and produce .machines based on the nodes provided by the
> slurm:
> for k-points:
> #
> 1:n270
> 1:n270
> 1:n270
> 1:n270
> 1:n270
> ....
> granularity:1
> extrafine:1
>
> for mpi:
> #
> 1:n270 n270 n270 n270 n270 ....
> granularity:1
> extrafine:1
>
> After I changed USE_REMOTE to 1 the "Permission denied, please try
> again" appears also for k-point parallelization.
> As it is stated in the userguide I did things like "ssh-keygen" and copy
> to "authorized_keys" but result is the same.
> As a "low-level" user on a cluster I dont have any permission to login
> to the nodes.
>
> For k-point parallelezation with USE_REMOTE=1 the *.out file has the lines:
>
> Got 96 cores nodelist n[<span data-ukrnet-code="270272">270-272</span>] tasks_per_node 32 jobs_per_node 32
> because OMP_NUM_THREADS = 1 96 nodes for this job: n270 n270 n270 n270
> n270 n270 ....
> 10:04:01 up 18 days, 58 min, 0 users, load average: 0.04, 0.04, 0.07
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> ...
> -------- .machine0 : processors
> running dstart in single mode C T F DSTART ENDS 22.030u 0.102s 0:22.20
> 99.6% 0+0k 0+0io 0pf+0w LAPW0 END full diagonalization forced Permission
> denied, please try again. Permission denied, please try again.
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
> [1] + Done ( ( $remote $machine[$p] "cd $PWD;$t $taskset0 $exe
> ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p]
> ) >& .stdout1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" ) Permission
> denied, please try again. Permission denied, please try again.
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
> ...
>
>
> For mpi-parallelization with USE_REMOTE=1, MPI_REMOTE=0, WIEN_MPIRUN
> "srun ..."
> the output is:
> LAPW0 END
> Abort(0) on node 0 (rank 0 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 0) - process 0
> Abort(0) on node 0 (rank 0 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 0) - process 0
> ...
> [1] + Done ( cd $PWD; $t $ttt; rm -f
> .lock_$lockfile[$p] ) >> .time1_$loop
> bccTi54Htet.scf1up_1: No such file or directory.
> grep: No match.
> grep: No match.
> grep: No match.
>
> if WIEN_MPIRUN "mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_"
> the output is:
> LAPW0 END
> Abort(0) on node 0 (rank 0 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 0) - process 0
> w2k_dispatch_signal(): received: Terminated
> w2k_dispatch_signal(): received: Terminated
> Abort(9) on node 0 (rank 0 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 9) - process 0
> w2k_dispatch_signal(): received: Terminated
> ...
> Abort(-1694629136) on node 11 (rank 11 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, -1694629136) - process 11
> [cli_11]: readline failed
> Abort(2118074352) on node 2 (rank 2 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 2118074352) - process 2
> [cli_2]: readline failed
> WIEN2K ABORTING
> [cli_1]: readline failed
> WIEN2K ABORTING
>
>
>
> --- Исходное сообщение ---
> От кого: "Peter Blaha" <<a href="mailto:pblaha@theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">pblaha@theochem.tuwien.ac.at</a>>
> Дата: 7 мая <span data-ukrnet-code="2019">2019</span>, 09:14:44
>
> When setting USE_REMOTE=0 it means, that you do not use "ssh" in
> k-parallel mode.
> This has the following consequences:
> What you write for "hostname" in .machines is not important, only the
> number of lines counts. And it will span as many k-parallel jobs as you
> have lines (1:hostname), but they all will run ONLY on the "masternode",
> i.e. you can use only ONE node within your slurm job.
>
> When you use mpi-parallel (with MPI_REMOTE=0 AND MPIRUN command is the
> "srun ..." command), it will use a srun command to span the mpi job, not
> the usual mpirun command. In this case, however, "hostname" must be the
> real name of the nodes where you want to run. The slurm-script as to
> find out the node-names and insert them properly.
>
> Am 06.05.<span data-ukrnet-code="2019">2019</span> um 14:23 <a href="mailto:schriebwebfinder@ukr.net" target="_self" rel="noreferrer noopener">schriebwebfinder@ukr.net</a> <mailto:webfinder@ukr.net>:
> > Dear wien2k users,
> >
> > wien2k_18.2
> > I'm trying to run a test task on a cluster with slurm batch system using
> > mpi parallelization.
> >
> > In "parallel_options" USE_REMOTE=0, MPI_REMOTE=0.
> > (during the siteconfig_lapw the slurm option was chosen)
> >
> > the k-point parallelization works well. But if I change the "slurm.job"
> > script to produce .machines file for mpi run
> > (e.g. from
> > 1: hostname
> > 1: hostname
> > ....
> > to
> > 1: hostname hostname ....)
> >
> > there is always a error message:
> > permission_denied, please try again.
> > permission_denied, please try again
> > permission_denied, please try again (....)
> >
> > How can I solve this?
> > How could it be that k-point parallelization works but mpi not?
> >
> > P.S. I have also tried after getting "nodelist" from batch system to
> > include ssh-copy-id command to slurm.job script to copy the keys but the
> > result is the same.
> >
> > Thank you for the answers!
> >
> >
> >
> > _______________________________________________
> > Wien mailing list
> > <a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">Wien@zeus.theochem.tuwien.ac.at</a> <mailto:Wien@zeus.theochem.tuwien.ac.at>
> > <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank" rel="noreferrer noopener">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
> > SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" target="_blank" rel="noreferrer noopener">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a>
> >
>
> --
> --------------------------------------------------------------------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-<span data-ukrnet-code="1060">1060</span> Vienna
> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
> Email:blaha@theochem.tuwien.ac.at <mailto:blaha@theochem.tuwien.ac.at> WIEN2k:http://www.wien2k.at
> WWW:
> <a href="http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------" target="_blank" rel="noreferrer noopener">http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------</a>
>
> _______________________________________________
> Wien mailing list
> <a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">Wien@zeus.theochem.tuwien.ac.at</a> <mailto:Wien@zeus.theochem.tuwien.ac.at>
> <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank" rel="noreferrer noopener">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
> SEARCH the MAILING-LIST at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
> _______________________________________________
> Wien mailing list
> <a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">Wien@zeus.theochem.tuwien.ac.at</a>
> <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank" rel="noreferrer noopener">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
> SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" target="_blank" rel="noreferrer noopener">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a>
>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-<span data-ukrnet-code="1060">1060</span> Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: <a href="mailto:blaha@theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">blaha@theochem.tuwien.ac.at</a> WIEN2k: <a href="http://www.wien2k.at" target="_blank" rel="noreferrer noopener">http://www.wien2k.at</a>
WWW: <a href="http://www.imc.tuwien.ac.at/TC_Blaha" target="_blank" rel="noreferrer noopener">http://www.imc.tuwien.ac.at/TC_Blaha</a>
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_self" rel="noreferrer noopener">Wien@zeus.theochem.tuwien.ac.at</a>
<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank" rel="noreferrer noopener">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" target="_blank" rel="noreferrer noopener">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a>
</pre></blockquote> </div></span></body></html>