[Wien] addendum: Problems when running in a cluster

Daniel Fernández Hevia dhevia at physics.usyd.edu.au
Mon Apr 19 09:20:58 CEST 2004


Dear WIEN2k users and developers,

Sincere thanks for the responses, I am working on it. Also, sorry for the 
lack of attachments in my previous e-mail. It seems to be that, in my 
despair, I forget about the attachments, my address, and all that!! Here 
they are.

Best wishes to you all,

Daniel


--------------------------------------
Dr. Daniel Fernández Hevia
School of Physics
The University of Sydney
Sydney 2006, Australia
Phone: +61 2 9036 5301
email: 
<https://correo.etsit.upm.es/twig//twig/index.php3?&s[mailbox]=mail%2FEnviados1&s[mainGroup]=%2A&s[mailtree]=0%7C&s[mailGroup]=%2A&s[sortby]=date&s[sortbyway]=1&s[delete-return]=msgview&c[f]=mail&c[a]=compose&form[to]=dhevia@physics.usyd.edu.au>dhevia at physics.usyd.edu.au
---------------------------------------



***************************************************************************************************************************************
Dear WIEN2k users and developers,

I have been trying different things for three weeks now, and I am 
definitely unable to run moderately big cases in a linux cluster so, before 
giving up, I just send as many details as possible, in the hope that anyone 
can offer help or any suggestion.

I am able to run calculations for simple cases (bulk AlN in wurtzite phase) 
but, as soon as I go to a moderately big supercell (35 atoms), I obtain a 
"segmentation fault" error immediately after trying to run lapw0 (using 
lapw0 lapw0.def). I know this topic has been discussed several times in the 
mailing lists: I have read all the mails about this topic since 2002, and 
tried all the possible solutions without success.

The details of the operating system and compiler are:

The OS is Redhat 9 (glibc 2.3.2)
The fortran compiler is Intel Fortran 8.0, version l_fc_pc_8.0.039_pe044.1 
(20040318)
The flags I have tried are:  -FR -mp -w -O2 -xNW -ip,
                                       -FR -mp -w -O3 -ip,

and MKL version 6.0 is being used.

I am not using the w2web interface, i.e., I just run the initialization by 
typing instgen and then init_lapw. Then I try to run lapw0 with "x lapw0 
-d" and "lapw0 lapw0.def". Then I get a segmentation fault after just a 
couple of seconds. I'm guessing the compiler or MKL might be suspect for 
the errors.

The memory parameters of the system where the code is running are as follows:

**********************************************************************************************************************
[dhevia at barossa AlN_Super-10]$ ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited

[dhevia at barossa AlN_Super-10]$ free
              total       used       free     shared    buffers     cached
Mem:       3874188    3862252      11936          0     195808    3034768
-/+ buffers/cache:     631676    3242512
Swap:      2096472     120692    1975780
**********************************************************************************************************************

Everything regarding available memory, stacksize, swap space and the like 
seems to be fine. I am sending the compile.msg for the lapw0 code and the 
structure file I am using (could anyone just run the initialization and a 
single SCF cycle with this file to check that everything is right? I am 
almost 99% sure that this is not the problem, but just in case ...).

I should like to know if you believe that it is worthy to go on trying 
variations of the compiler/linking options. I really do not know what to 
do: I am afraid this goes much beyond my very limited compiling skills!!!

Thanks in advance for your help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/9cbcb850/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compile.msg
Type: application/octet-stream
Size: 7252 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/9cbcb850/compile.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AlN_Super.struct
Type: application/octet-stream
Size: 5944 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/9cbcb850/AlN_Super.obj


More information about the Wien mailing list