[Wien] Problems when running in a cluster
Daniel Fernández Hevia
dhevia at physics.usyd.edu.au
Mon Apr 19 03:23:55 CEST 2004
Dear WIEN2k users and developers,
I have been trying different things for three weeks now, and I am
definitely unable to run moderately big cases in a linux cluster so, before
giving up, I just send as many details as possible, in the hope that anyone
can offer help or any suggestion.
I am able to run calculations for simple cases (bulk AlN in wurtzite phase)
but, as soon as I go to a moderately big supercell (35 atoms), I obtain a
"segmentation fault" error immediately after trying to run lapw0 (using
lapw0 lapw0.def). I know this topic has been discussed several times in the
mailing lists: I have read all the mails about this topic since 2002, and
tried all the possible solutions without success.
The details of the operating system and compiler are:
The OS is Redhat 9 (glibc 2.3.2)
The fortran compiler is Intel Fortran 8.0, version l_fc_pc_8.0.039_pe044.1
(20040318)
The flags I have tried are: -FR -mp -w -O2 -xNW -ip,
-FR -mp -w -O3 -ip,
and MKL version 6.0 is being used.
I am not using the w2web interface, i.e., I just run the initialization by
typing instgen and then init_lapw. Then I try to run lapw0 with "x lapw0
-d" and "lapw0 lapw0.def". Then I get a segmentation fault after just a
couple of seconds. I'm guessing the compiler or MKL might be suspect for
the errors.
The memory parameters of the system where the code is running are as follows:
**********************************************************************************************************************
[dhevia at barossa AlN_Super-10]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 7168
virtual memory (kbytes, -v) unlimited
[dhevia at barossa AlN_Super-10]$ free
total used free shared buffers cached
Mem: 3874188 3862252 11936 0 195808 3034768
-/+ buffers/cache: 631676 3242512
Swap: 2096472 120692 1975780
**********************************************************************************************************************
Everything regarding available memory, stacksize, swap space and the like
seems to be fine. I am sending the compile.msg for the lapw0 code and the
structure file I am using (could anyone just run the initialization and a
single SCF cycle with this file to check that everything is right? I am
almost 99% sure that this is not the problem, but just in case ...).
I should like to know if you believe that it is worthy to go on trying
variations of the compiler/linking options. I really do not know what to
do: I am afraid this goes much beyond my very limited compiling skills!!!
Thanks in advance for your help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/0e21e446/attachment.html
More information about the Wien
mailing list