[Wien] ** testerror: Error in Parallel LAPW
pluto
pluto at physics.ucdavis.edu
Tue Jun 20 14:09:25 CEST 2023
Dear Miro,
It is hard to give your a meaningful answer with little info, but I will
try my best guess because I needed to set this up recently. I assume
that you want to use k-parallel and you don't have mpi.
With a serial job you automatically run on a single node. Single node is
a physical computer with a physical CPU, but typically with 4 memory
channels to it can run 8 jobs in parallel.
With k-parallel you need to define nodes on which k-points are
calculated. With slurm, maybe things will work if you create 8
"localhost" lines in .machines file, because this will still run on a
single node that is assigned automatically. But things probably won't
work if you create lines such as "node001", "node002" etc (depending on
the names of the nodes in your cluster). And to take an advantage of the
cluster you need to use as many nodes as possible.
Now the problem is, that k-parallel works assuming you can ssh to every
node without a password. This is typically forbidden in the slurm
environment. Prof. Blaha provides workarounds, but to me their
implementation seems complicated (I not an expert):
http://www.wien2k.at/reg_user/faq/pbs.html
I am using an older cluster where it is possible to allocate nodes, and
with this allocation comes automatically passwordless ssh to these
nodes. Then the slurm workarounds are not needed. Maybe you can talk to
your administrator if this is possible in your cluster, because I think
typically this is blocked.
Best,
Lukasz
On 2023-06-20 10:18, Ilias Miroslav, doc. RNDr., PhD. wrote:
> Hello,
>
> I am able to run serial SCF via SLURM
>
>
> https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.01
>
>
> but when trying parallel
>
> https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.02
>
>
> I get lapw2.error
>
> 'LAPW2' - can't open unit: 30
>
> 'LAPW2' - filename: LvO2onQg.energy_1
>
> ** testerror: Error in Parallel LAPW2
>
> The file "LvO2onQg.energy" is correct in serial mode.
>
> Seems that LvO2onQg.energy_1 file is not produces in parallel run ?
>
> All files are
> https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
> [1]
>
> Best,
>
> Miro
>
>
> Links:
> ------
> [1]
> https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
More information about the Wien
mailing list