[Wien] Problem with k-parallel in version 24.1?

Yichen Zhang zycforphysics at gmail.com
Wed Oct 16 00:49:42 CEST 2024


Dear WIEN2k developers and users,

I'm running WIEN2k 24.1 on a SLURM cluster. In the case here, only
k-parallel is used (no omp or mpi). I typically divided klist into 64
groups onto 64 cores for this set of calculations. Hyperthreading is turned
off.

I encountered this error from time to time. Sometimes all SCF cycles just
finish successfully, but there is maybe a 20-40% chance that the SCF stops
at sumpara at one cycle after lapw2. Restarting the SCF may just work fine
until convergence or encounter this problem again at one cycle. Sometimes
the error just doesn't pop up. The error comes from file case.scf2up/dn_XX
not found. XX being between, for example 1 and 64, if 64 k-point parallel
procedures.

One example of such error in slurm standard output is:

forrtl: No such file or directory

forrtl: severe (29): file not found, unit 21, file
/scratch/yz155/UUD_U6p25eV/UUD_U6p25eV.scf2dn_62

Image              PC                Routine            Line        Source

sumpara            000000000042876C  Unknown               Unknown  Unknown

sumpara            000000000041303A  scfsum_                   128  scfsum.f

sumpara            0000000000410F92  MAIN__                    242
sumpara.F

sumpara            000000000040434D  Unknown               Unknown  Unknown

libc.so.6          000014D975829590  Unknown               Unknown  Unknown

libc.so.6          000014D975829640  __libc_start_main     Unknown  Unknown

sumpara            0000000000404265  Unknown               Unknown  Unknown

cp: cannot stat '.in.tmp': No such file or directory

grep: No match.


>   stop error


The missing scf2 file sometimes comes from scf2up or sometimes from scf2dn.
The "62" seems random among k-parallel numbers.


I noticed a previous thread in 2016 when Maciej Polak asked about "Problem
with k-parallel", but I guess much has been updated since then.


Does it still come from slow I/O? I already run it in /scratch on the
cluster which has the fastest I/O. What are some insights and suggestions?
Thank you very much in advance.


Best regards

Yichen

-- 
Yichen Zhang
Department of Physics and Astronomy
Rice University
6100 Main St., Houston, TX 77005-1892
Email: zycforphysics at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20241015/b01399d1/attachment.htm>


More information about the Wien mailing list