[Wien] ‘lapw2 -so’ hangs

Thu Nov 7 09:21:36 CET 2013

Hi List,

I have two sp+SO calculations which are mostly identical, apart from
the fact that the magnetization directions are different.  Both cases
have worked fine, but now in one case, ‘lapw2’ does not finish.

In the ‘output2’ file, RECPR says

  generate new recprlist
   KXMAX,KYMAX,KZMAX          17          17          15
          3605 PLANE WAVES GENERATED (INCLUDING FORBIDDEN H,K,L)

  nwav1,kn       11111        3605

but then the k-vector list only runs from KVEC( 1) to KVEC( 3484).

An ‘strace’ shows that ‘lapw2’ goes on to read the ‘energydum’ and
‘energyso’ files (fd 26 is ‘energydum’, 27 is ‘energysodn’; there is
some seeking in between)

   write(9, " Running LAPW2 in single process"..., 7926) = 7926
   write(9, "       KVEC(       125) =    -4 "..., 7980) = 7980
   …
   read(27, "199.25000200.20750198.72842  0.2"..., 8192) = 8192
   read(26, "   199.25000   200.20814   198.7"..., 8192) = 8192
   …
   read(27, "", 8192)                      = 0
   lseek(26, 0, SEEK_CUR)                  = 5105419
   lseek(26, -7859, SEEK_CUR)              = 5097560
   lseek(26, 0, SEEK_SET)                  = 0
   lseek(27, 0, SEEK_CUR)                  = 5163641
   lseek(27, 0, SEEK_CUR)                  = 5163641
   lseek(27, 0, SEEK_SET)                  = 0
   mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2b32cfd730
   00
   mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2b32cffae0
   00
   mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2b32d01e90
   00
   mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2b32d04240
   00
   read(27, "199.25000200.20750198.72842  0.2"..., 8192) = 8192
   read(26, "   199.25000   200.20814   198.7"..., 8192) = 8192
   …
   read(26, "956434490995     \n          50  "..., 8173) = 8173
   read(26, "  37   1.97297910179952     \n   "..., 8184) = 8184
   read(26, "  25   1.96075450056876     \n   "..., 8184) = 8184
   read(26, "  13   1.96281528875338     \n   "..., 8184) = 3777
   read(26, "", 8192)                      = 0

EOF.  Now it opens a file containing error messages, but note that no
message is printed.

open("/opt/intel/Compiler/11.1/046/lib/intel64/locale/en_US/ifcore_msg.cat", 
O_RDONLY) = 4
   fstat(4, {st_mode=S_IFREG|0664, st_size=30244, ...}) = 0
   mmap(NULL, 30244, PROT_READ, MAP_PRIVATE, 4, 0) = 0x2b32d065f000
   close(4)                                = 0

Then it gets a SIGSEGV

   --- SIGSEGV (Segmentation fault) @ 0 (0) ---
   rt_sigreturn(0xb)                       = 47497226185184

but it does not die (SIGSEGV was trapped earlier).  Instead, the last
two lines are repeated ad infinitum.  This behavior occurs without
parallelization, and whether I with ‘-fermi’, ‘-qtl’, or without those
flags.

This seems to be caused by an incomplete ‘energysodn’ file.  In the
problematic case:

$ wc Bi100.energy{dn,sodn,dum}
    1421538  2903129 53209476 Bi100.energydn
->  136400   284967  5163641 Bi100.energysodn
     673789  1407639 25506805 Bi100.energydum

while in the case with the other magnetization direction:

$ wc Bi010.energy{dn,sodn,dum}
     1421538   2903129  53209476 Bi010.energydn
->   673786   1407624  25506619 Bi010.energysodn
      673789   1407639  25506805 Bi010.energydum

But I have no idea why this happens.  I have certainly tried “lapw0;
lapw1; lapwso” several times, even with different ‘clm’s (from various
saves).

The ‘outputso’ files in both cases run up to “K=12008”, and prints
“TOTAL NUMBER OF K-POINTS:       12008” at the end.

Any pointers?

	Elias