[Wien] Spin-orbit coupling crash

Luigi Maduro - TNW L.A.Maduro at tudelft.nl
Tue Oct 1 10:31:51 CEST 2019


Dear WIEN2k users,

I am trying to carry out a calculation on a supercell of MoS2 with spin-orbit coupling in parallel mode using the WIEN2k_19.1 version. The calculation runs fine for lapw0 and lapw1, however when it reaches lapwso the calculation crashes and gives the following error:


---------------------------------------------------------------------------------------------------------------------------------------------------------------
LAPW0 END
[1]    Done                          mpirun -np 120 -machinefile .machine0 /home/WIEN2k_19_2/lapw0_mpi lapw0.def >> .time00
LAPW1 END
LAPW1 END
[4]    Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
[6]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[5]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[3]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[2]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
forrtl: severe (39): error during read, unit 9, file /home/Data/MoS2_SO/MoS2_SO.vector_1
Image              PC                Routine            Line        Source
lapwso_mpi         000000000046BC13  Unknown               Unknown  Unknown
lapwso_mpi         0000000000490934  Unknown               Unknown  Unknown
lapwso_mpi         0000000000429158  kptin_                     60  kptin.F
lapwso_mpi         000000000042F7EE  MAIN__                    570  lapwso.F
lapwso_mpi         0000000000405C5E  Unknown               Unknown  Unknown
libc.so.6          00002B04C2A12B35  Unknown               Unknown  Unknown
lapwso_mpi         0000000000405B69  Unknown               Unknown  Unknown
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source
lapwso_mpi         0000000000523F95  Unknown               Unknown  Unknown
lapwso_mpi         0000000000521BB7  Unknown               Unknown  Unknown
lapwso_mpi         00000000004D8084  Unknown               Unknown  Unknown
lapwso_mpi         00000000004D7E96  Unknown               Unknown  Unknown
lapwso_mpi         000000000046C929  Unknown               Unknown  Unknown
lapwso_mpi         000000000047140E  Unknown               Unknown  Unknown
libpthread.so.0    00002B2A5349B370  Unknown               Unknown  Unknown
libmpi.so.12       00002B2A58D16455  Unknown               Unknown  Unknown
libmpi.so.12       00002B2A58F52D74  Unknown               Unknown  Unknown
libmkl_blacs_inte  00002B2A547FC015  Unknown               Unknown  Unknown
libmkl_blacs_inte  00002B2A547FF9A9  Unknown               Unknown  Unknown
libmkl_blacs_inte  00002B2A547DDF96  Unknown               Unknown  Unknown
lapwso_mpi         0000000000429FFB  kptin_                    108  kptin.F
lapwso_mpi         000000000042F7EE  MAIN__                    570  lapwso.F
lapwso_mpi         0000000000405C5E  Unknown               Unknown  Unknown
libc.so.6          00002B2A595F5B35  Unknown               Unknown  Unknown
lapwso_mpi         0000000000405B69  Unknown               Unknown  Unknown
---------------------------------------------------------------------------------------------------------------------------------------------------------------

I have used the intel_xe_2016 compiler to compile WIEN2k_19.1. I am using a Beowulf style cluster where each individual node is a shared memory machine and runs CentOS 7. A scheduler (Maui) and a resource manager (Torque) are both running on the master node. I have written a script to create a .machines file on the fly, and for this calculation it looks like this:

1:n05-07:20
1:n05-08:20
1:n05-09:20
1:n05-10:20
1:n05-11:20
1:n05-12:20

lapw0:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20
dstart:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20
nlvdw:n05-07:20 n05-08:20 n05-09:20 n05-10:20 n05-11:20 n05-12:20


Any suggestions for finding/fixing the cause of the crash are highly appreciated. :)

Kind regards,
Luigi Maduro
PhD candidate
Kavli Institute of Nanoscience
Department of Quantum Nanoscience
Faculty of Applied Sciences
Delft University of Technology

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20191001/80a9ec90/attachment.html>


More information about the Wien mailing list