[Wien] Problems when running in a cluster

Daniel Fernández Hevia dhevia at physics.usyd.edu.au
Mon Apr 19 03:23:55 CEST 2004


Dear WIEN2k users and developers,

I have been trying different things for three weeks now, and I am 
definitely unable to run moderately big cases in a linux cluster so, before 
giving up, I just send as many details as possible, in the hope that anyone 
can offer help or any suggestion.

I am able to run calculations for simple cases (bulk AlN in wurtzite phase) 
but, as soon as I go to a moderately big supercell (35 atoms), I obtain a 
"segmentation fault" error immediately after trying to run lapw0 (using 
lapw0 lapw0.def). I know this topic has been discussed several times in the 
mailing lists: I have read all the mails about this topic since 2002, and 
tried all the possible solutions without success.

The details of the operating system and compiler are:

The OS is Redhat 9 (glibc 2.3.2)
The fortran compiler is Intel Fortran 8.0, version l_fc_pc_8.0.039_pe044.1 
(20040318)
The flags I have tried are:  -FR -mp -w -O2 -xNW -ip,
                                       -FR -mp -w -O3 -ip,

and MKL version 6.0 is being used.

I am not using the w2web interface, i.e., I just run the initialization by 
typing instgen and then init_lapw. Then I try to run lapw0 with "x lapw0 
-d" and "lapw0 lapw0.def". Then I get a segmentation fault after just a 
couple of seconds. I'm guessing the compiler or MKL might be suspect for 
the errors.

The memory parameters of the system where the code is running are as follows:

**********************************************************************************************************************
[dhevia at barossa AlN_Super-10]$ ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited

[dhevia at barossa AlN_Super-10]$ free
              total       used       free     shared    buffers     cached
Mem:       3874188    3862252      11936          0     195808    3034768
-/+ buffers/cache:     631676    3242512
Swap:      2096472     120692    1975780
**********************************************************************************************************************

Everything regarding available memory, stacksize, swap space and the like 
seems to be fine. I am sending the compile.msg for the lapw0 code and the 
structure file I am using (could anyone just run the initialization and a 
single SCF cycle with this file to check that everything is right? I am 
almost 99% sure that this is not the problem, but just in case ...).

I should like to know if you believe that it is worthy to go on trying 
variations of the compiler/linking options. I really do not know what to 
do: I am afraid this goes much beyond my very limited compiling skills!!!

Thanks in advance for your help 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/0e21e446/attachment.html


More information about the Wien mailing list