[Wien] ** testerror: Error in Parallel LAPW

pluto pluto at physics.ucdavis.edu
Tue Jun 20 14:09:25 CEST 2023


Dear Miro,

It is hard to give your a meaningful answer with little info, but I will 
try my best guess because I needed to set this up recently. I assume 
that you want to use k-parallel and you don't have mpi.

With a serial job you automatically run on a single node. Single node is 
a physical computer with a physical CPU, but typically with 4 memory 
channels to it can run 8 jobs in parallel.

With k-parallel you need to define nodes on which k-points are 
calculated. With slurm, maybe things will work if you create 8 
"localhost" lines in .machines file, because this will still run on a 
single node that is assigned automatically. But things probably won't 
work if you create lines such as "node001", "node002" etc (depending on 
the names of the nodes in your cluster). And to take an advantage of the 
cluster you need to use as many nodes as possible.

Now the problem is, that k-parallel works assuming you can ssh to every 
node without a password. This is typically forbidden in the slurm 
environment. Prof. Blaha provides workarounds, but to me their 
implementation seems complicated (I not an expert): 
http://www.wien2k.at/reg_user/faq/pbs.html

I am using an older cluster where it is possible to allocate nodes, and 
with this allocation comes automatically passwordless ssh to these 
nodes. Then the slurm workarounds are not needed. Maybe you can talk to 
your administrator if this is possible in your cluster, because I think 
typically this is blocked.

Best,
Lukasz





On 2023-06-20 10:18, Ilias Miroslav, doc. RNDr., PhD. wrote:
> Hello,
> 
>  I am able to run serial SCF via SLURM
> 
> 
> https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.01
> 
> 
>  but when trying parallel
> 
> https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.02
> 
> 
>  I get lapw2.error
> 
>  'LAPW2' - can't open unit: 30
> 
> 'LAPW2' -        filename: LvO2onQg.energy_1
> 
> **  testerror: Error in Parallel LAPW2
> 
>  The file "LvO2onQg.energy" is correct in serial mode.
> 
>  Seems that LvO2onQg.energy_1 file is not produces in parallel run ?
> 
>  All files are
> https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
> [1]
> 
>  Best,
> 
>  Miro
> 
> 
> Links:
> ------
> [1]
> https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


More information about the Wien mailing list