[Wien] Parallel execution of SCF cycle
Laurence Marks
laurence.marks at gmail.com
Tue Jan 31 18:10:55 CET 2023
Please do "cat $WIENROOT/parallel_options", as I suspect you have an issue
there.
Do you have a "normal" mpirun or does your cluster require something
different?
Which mpirun are you using?
Also, I doubt you need "lapw2_vector_split:2", and you do not appear to
have set the "omp_XXX" variables which are needed for recent versions.
On Tue, Jan 31, 2023 at 10:59 AM Calum Cunningham <
Calum.Cunningham at uknnl.com> wrote:
> Dear WIEN2k users,
>
>
>
> My colleagues and I are having some trouble running SCF calculations in
> parallel mode. I have had no issues when working in serial mode. We are
> using version 21.1 on a computer cluster that operates the LSF queuing
> system.
>
>
>
> As an example, I will explain my attempt to run a parallel execution for
> the TiO2 (rutile) test case. I am using the default values of RKmax,
> k-points, VXC, etc.
>
>
>
> The .machines file was created using a bespoke script that updates the
> names of the processors being used for the current job. In this case, I am
> using 16 cores on a single node. The .machines file is below:
>
>
>
> # .machines file for Wien2k
>
> #
>
> 1:sqg1cintr16.bullx:16
>
> granularity:1
>
> extrafine:1
>
>
>
> lapw0: sqg1cintr16.bullx:16
>
>
>
> dstart: sqg1cintr16.bullx:16
>
>
>
> nlvdw: sqg1cintr16.bullx:16
>
>
>
> lapw2_vector_split:2
>
>
>
> After I initialise the calculation interactively via the w2web GUI (i.e.
> not in parallel), I attempted to execute the SCF cycle in w2web with the
> parallel option selected. I received the following error in STDOUT:
>
>
>
> LAPW0 END
>
> [1] Done mpirun -np 16
> /lustre/scafellpike/local/apps/intel/wien2k/21.1/lapw0_mpi lapw0.def >>
> .time00
>
> LAPW1 END
>
> [1] + Done ( cd $PWD; $t $ttt; rm -f
> .lock_$lockfile[$p] ) >> .time1_$loop
>
> tmpmach: Subscript out of range.
>
> grep: lapw2*.error: No such file or directory
>
>
>
> > stop error
>
>
>
> Note that I consistently receive this “grep: lapw2*.error” error when
> attempting to run SCF calculations in parallel! After this, I tested each
> of lapw0, lapw1 and lapw2 as single programmes (in parallel) to try to fix
> the problem. I think that lapw1 ran correctly, but I have given the output
> below just in case there is a problem here. There is, however, an obvious
> error when lapw2 is executed (see below).
>
>
>
> starting parallel lapw1 at Tue Jan 31 15:00:07 GMT 2023
>
> -> starting parallel LAPW1 jobs at Tue Jan 31 15:00:07 GMT 2023
>
> running LAPW1 in parallel mode (using .machines)
>
> granularity set to 1 because of nonlocal SCRATCH variable
>
> 1 number_of_parallel_jobs
>
> [1] 46212
>
> LAPW1 END
>
> [1] + Done ( cd $PWD; $t $ttt; rm -f
> .lock_$lockfile[$p] ) >> .time1_$loop
>
> (70) 0.011u 0.027s 0:14.52 0.2% 0+0k 0+8io 0pf+0w
>
> Summary of lapw1para:
>
> sqg1cintr16.bullx k= user= wallclock=
>
> 0.100u 0.299s 0:16.85 2.3% 0+0k 616+248io 0pf+0w
>
>
>
> #lapw2 as a single programme (parallel):
>
> running LAPW2 in parallel mode
>
> tmpmach: Subscript out of range.
>
> 0.016u 0.043s 0:00.06 83.3% 0+0k 32+24io 0pf+0w
>
> error: command
> /lustre/scafellpike/local/apps/intel/wien2k/21.1/lapw2para lapw2.def
> failed
>
>
>
>
>
> Please let me know if you need any more information. I would particularly
> like to know why the errors are occurring at lapw2 (e.g. what is the
> “tmpmach” error?)
>
>
>
> Many thanks,
>
> Calum Cunningham
> This e-mail is from the National Nuclear Laboratory Limited (NNL). This
> e-mail and any attachments are intended for the addressee and may also be
> legally privileged. If you are not the intended recipient please do not
> print, re-transmit, store or act in reliance on it or any attachments.
> Instead, please e-mail it back to the sender and then immediately
> permanently delete it. National Nuclear Laboratory Limited (Company Number
> 3857752) Registered in England and Wales. Registered office Chadwick House,
> Warrington Road, Birchwood Park, Warrington, WA3 6AE.
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Györgyi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20230131/2f8239c0/attachment.htm>
More information about the Wien
mailing list