[Wien] Spin-orbit coupling crash
Luigi Maduro - TNW
L.A.Maduro at tudelft.nl
Tue Oct 1 10:31:51 CEST 2019
Dear WIEN2k users,
I am trying to carry out a calculation on a supercell of MoS2 with spin-orbit coupling in parallel mode using the WIEN2k_19.1 version. The calculation runs fine for lapw0 and lapw1, however when it reaches lapwso the calculation crashes and gives the following error:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
LAPW0 END
[1] Done mpirun -np 120 -machinefile .machine0 /home/WIEN2k_19_2/lapw0_mpi lapw0.def >> .time00
LAPW1 END
LAPW1 END
[4] Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
[6] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[5] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[3] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[2] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
forrtl: severe (39): error during read, unit 9, file /home/Data/MoS2_SO/MoS2_SO.vector_1
Image PC Routine Line Source
lapwso_mpi 000000000046BC13 Unknown Unknown Unknown
lapwso_mpi 0000000000490934 Unknown Unknown Unknown
lapwso_mpi 0000000000429158 kptin_ 60 kptin.F
lapwso_mpi 000000000042F7EE MAIN__ 570 lapwso.F
lapwso_mpi 0000000000405C5E Unknown Unknown Unknown
libc.so.6 00002B04C2A12B35 Unknown Unknown Unknown
lapwso_mpi 0000000000405B69 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
lapwso_mpi 0000000000523F95 Unknown Unknown Unknown
lapwso_mpi 0000000000521BB7 Unknown Unknown Unknown
lapwso_mpi 00000000004D8084 Unknown Unknown Unknown
lapwso_mpi 00000000004D7E96 Unknown Unknown Unknown
lapwso_mpi 000000000046C929 Unknown Unknown Unknown
lapwso_mpi 000000000047140E Unknown Unknown Unknown
libpthread.so.0 00002B2A5349B370 Unknown Unknown Unknown
libmpi.so.12 00002B2A58D16455 Unknown Unknown Unknown
libmpi.so.12 00002B2A58F52D74 Unknown Unknown Unknown
libmkl_blacs_inte 00002B2A547FC015 Unknown Unknown Unknown
libmkl_blacs_inte 00002B2A547FF9A9 Unknown Unknown Unknown
libmkl_blacs_inte 00002B2A547DDF96 Unknown Unknown Unknown
lapwso_mpi 0000000000429FFB kptin_ 108 kptin.F
lapwso_mpi 000000000042F7EE MAIN__ 570 lapwso.F
lapwso_mpi 0000000000405C5E Unknown Unknown Unknown
libc.so.6 00002B2A595F5B35 Unknown Unknown Unknown
lapwso_mpi 0000000000405B69 Unknown Unknown Unknown
---------------------------------------------------------------------------------------------------------------------------------------------------------------
I have used the intel_xe_2016 compiler to compile WIEN2k_19.1. I am using a Beowulf style cluster where each individual node is a shared memory machine and runs CentOS 7. A scheduler (Maui) and a resource manager (Torque) are both running on the master node. I have written a script to create a .machines file on the fly, and for this calculation it looks like this:
1:n05-07:20
1:n05-08:20
1:n05-09:20
1:n05-10:20
1:n05-11:20
1:n05-12:20
lapw0:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20
dstart:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20
nlvdw:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20
Any suggestions for finding/fixing the cause of the crash are highly appreciated. :)
Kind regards,
Luigi Maduro
PhD candidate
Kavli Institute of Nanoscience
Department of Quantum Nanoscience
Faculty of Applied Sciences
Delft University of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20191001/80a9ec90/attachment.html>
More information about the Wien
mailing list