[Wien] lapw0 stuck/drained with seemingly no error message upon launching SCF

Yichen Zhang zycforphysics at gmail.com
Sat Jun 22 06:51:48 CEST 2024


Dear WIEN2k developers and users,

I am running WIEN2k 23.2 on an Apple silicon (M3 max) Macbook Pro with the
MacOS system (ARM architecture, so basically no Intel support). The WIEN2k
code has been successfully compiled using gfortran11 + openblas. For the
parallel part, it also compiles well with the mpif90 wrapper, openmpi, and
scalapack. LIBXC is compiled and included in the makefiles for WIEN2k
compilation. No errors from compile.msg. All the tcsh scripts run OK. W2web
and other software assisting the GUI such as xcrysden, octave, gnuplot,
xmgrace, and so on have been installed. I have gotten to the part testing
that xcrysden works well in w2web. All environment variables have been
configured to the $HOME shell rc file.

Since I'm new to WIEN2k (having previous experience doing KKR
calculations), my purpose is to explore WIEN2k by doing the TiC tutorial
and test its performance on this Mac (16 core CPU, 64 GB unified memory, 40
core GPU, 4TB SSD). My previous experience with KKR code is that it can be
similarly compiled well using this gfortran11, openmpi, together with some
math libs, and can run extremely fast on the machine.

In the TiC WIEN2k calculation here, all the input setup before starting SCF
seemed fine. I've tried setting up inputs both from the command line (e.g.
init_lapw -b -rkmax 6.60 -numk 1000) or w2web. They both generated the
required input files (as specified in the user guide manual) and reported
no errors. For example, pertaining to lapw0, a TiC.in0 file reads,
*********
TOT   XC_PBE        (XC_LDA,XC_PBESOL,XC_WC,XC_MBJ,XC_SCAN)

NR2V   IFFT    (R2V)

  45     45     45     2.00    1     NCON 9  #some comments I omit here
**********
However, when executing run_lapw, the program gets stuck at lapw0 (seems
forever) of the first cycle and never makes to lapw1. No error message is
written (with only a flag error file of lapw0.error reporting no errors,
only saying "Error in LAPW0", which I assume is a standard operating
procedure of preparing for error report in f77 code). Such a situation
happens for both sequential mode and parallel mode, so we can probably rule
out an origin due to parallelization. No matter which, it just occupies the
CPU(s) forever, but seems to do nothing (by stopping writing anything more
into TiC.output0). To narrow down the trouble shooting, we only discuss the
sequential mode and only attach the last few lines of the TiC.output0 file
below to see where it gets stuck. One TiC.output0 hangs at generating the
pseudo multipole-moment of Ti:
***********
L=6   M=4   PSEUDO MULTIPOLMOMENT = 0.000531906 0.000000000
L=6   M=0   PSEUDO MULTIPOLMOMENT =-0.000995105 -0.000000000
***********
One sees that it didn't get to the part of generating the pseudo
multipole-moment of carbon, not even the ones for negative M of the Ti atom.

My naive thought at the time was could it get stuck in one of the DO loops
of generating pseudo multipole-moment of the Ti atom? However, looking at
the code in lapw0.F, I couldn't find any reason that it could possibly get
trapped in the DO loops there.

The next test I did was changing IPRINT from 1 to 0, so that TiC.output0
will write out fewer details. As a result, it similarly gets hanged on
lapw0 forever, stopping writing output0 file up to a point. However, with
IPRINT=0, it gets passed over generating all pseudo multipole-moments, and
it reaches the third step of "C O N V E R G E N C E PARAMETERS". The last
line of the TiC.output0 file is (only talking about sequential mode):
************
    XC-potentials inside spheres (XCPOT1)
************

In all tests, scf0 file and the spherical, non-spherical potential files
are all generated but empty.

The TiC.dayfile gets hanged at:
************
>     lapw0     (17:53:39)
************

Therefore, this issue is particularly confusing to me. How does lapw0 get
hanged there? And why does writing out less lines in the output file (by
IPRINT=0) move forward where lapw0 hangs?

I apologize ahead if it could be something just very silly, but oblivious
to my eyes due to my inexperience on WIEN2k. I just really hope to figure
out this strange issue and get wien2k run on my Mac.

I think "a Mac issue" doesn't seem like a straightforward or reasonable
lead, since the compilation was successful and all the scripts and
interface run. The calculations are indeed initiated (all steps in
init_lapw finished ok).

I also noticed that there are a small number of files that do not have
object files after a full compilation using ./siteconfig_lapw. They are not
included in the original Makefiles. For example, dlu.f and dtau.f in
SRC_lapw0. Are they intentionally dropped out by the authors? (I haven't
looked at what they do)

In the end, please do not suggest me turning to Linux or cluster :) It's
well tested and should work there. The goal is to work it out on Apple
silicon Mac, which could show high PC performance, surpassing cluster
performance for not too large systems.

Thank you and best regards,
Yichen

-- 
Yichen Zhang
Department of Physics and Astronomy
Rice University
6100 Main St., Houston, TX 77005-1892
Email: zycforphysics at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20240621/ecf274c7/attachment.htm>


More information about the Wien mailing list