[Wien] ‘lapw2 -so’ hangs
Elias Assmann
elias.assmann at gmail.com
Thu Nov 7 09:21:36 CET 2013
Hi List,
I have two sp+SO calculations which are mostly identical, apart from
the fact that the magnetization directions are different. Both cases
have worked fine, but now in one case, ‘lapw2’ does not finish.
In the ‘output2’ file, RECPR says
generate new recprlist
KXMAX,KYMAX,KZMAX 17 17 15
3605 PLANE WAVES GENERATED (INCLUDING FORBIDDEN H,K,L)
nwav1,kn 11111 3605
but then the k-vector list only runs from KVEC( 1) to KVEC( 3484).
An ‘strace’ shows that ‘lapw2’ goes on to read the ‘energydum’ and
‘energyso’ files (fd 26 is ‘energydum’, 27 is ‘energysodn’; there is
some seeking in between)
write(9, " Running LAPW2 in single process"..., 7926) = 7926
write(9, " KVEC( 125) = -4 "..., 7980) = 7980
…
read(27, "199.25000200.20750198.72842 0.2"..., 8192) = 8192
read(26, " 199.25000 200.20814 198.7"..., 8192) = 8192
…
read(27, "", 8192) = 0
lseek(26, 0, SEEK_CUR) = 5105419
lseek(26, -7859, SEEK_CUR) = 5097560
lseek(26, 0, SEEK_SET) = 0
lseek(27, 0, SEEK_CUR) = 5163641
lseek(27, 0, SEEK_CUR) = 5163641
lseek(27, 0, SEEK_SET) = 0
mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2b32cfd730
00
mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2b32cffae0
00
mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2b32d01e90
00
mmap(NULL, 2338816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2b32d04240
00
read(27, "199.25000200.20750198.72842 0.2"..., 8192) = 8192
read(26, " 199.25000 200.20814 198.7"..., 8192) = 8192
…
read(26, "956434490995 \n 50 "..., 8173) = 8173
read(26, " 37 1.97297910179952 \n "..., 8184) = 8184
read(26, " 25 1.96075450056876 \n "..., 8184) = 8184
read(26, " 13 1.96281528875338 \n "..., 8184) = 3777
read(26, "", 8192) = 0
EOF. Now it opens a file containing error messages, but note that no
message is printed.
open("/opt/intel/Compiler/11.1/046/lib/intel64/locale/en_US/ifcore_msg.cat",
O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=30244, ...}) = 0
mmap(NULL, 30244, PROT_READ, MAP_PRIVATE, 4, 0) = 0x2b32d065f000
close(4) = 0
Then it gets a SIGSEGV
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigreturn(0xb) = 47497226185184
but it does not die (SIGSEGV was trapped earlier). Instead, the last
two lines are repeated ad infinitum. This behavior occurs without
parallelization, and whether I with ‘-fermi’, ‘-qtl’, or without those
flags.
This seems to be caused by an incomplete ‘energysodn’ file. In the
problematic case:
$ wc Bi100.energy{dn,sodn,dum}
1421538 2903129 53209476 Bi100.energydn
-> 136400 284967 5163641 Bi100.energysodn
673789 1407639 25506805 Bi100.energydum
while in the case with the other magnetization direction:
$ wc Bi010.energy{dn,sodn,dum}
1421538 2903129 53209476 Bi010.energydn
-> 673786 1407624 25506619 Bi010.energysodn
673789 1407639 25506805 Bi010.energydum
But I have no idea why this happens. I have certainly tried “lapw0;
lapw1; lapwso” several times, even with different ‘clm’s (from various
saves).
The ‘outputso’ files in both cases run up to “K=12008”, and prints
“TOTAL NUMBER OF K-POINTS: 12008” at the end.
Any pointers?
Elias
More information about the Wien
mailing list