[Wien] Problems when running in a cluster
Yang, Jinbo
jinbo at umr.edu
Mon Apr 19 05:14:14 CEST 2004
Hi Daniel Fernández Hevia
I have used the same configuration, and gotten the same error for big case in a Linux computer. I believed it was the problem of the complier, even though it can compile the code successfully. Therefore, I copy the compiled executive files from another configuration( the complier 6.0, MKL 6.1 and linux 7.3) to this computer. It works fine. Maybe you can try this method.
good luck
Jinbo Yang
-----Original Message-----
From: wien-admin at zeus.theochem.tuwien.ac.at [mailto:wien-admin at zeus.theochem.tuwien.ac.at]On Behalf Of Daniel Fernández Hevia
Sent: Sunday, April 18, 2004 8:24 PM
To: wien at zeus.theochem.tuwien.ac.at
Subject: [Wien] Problems when running in a cluster
Dear WIEN2k users and developers,
I have been trying different things for three weeks now, and I am definitely unable to run moderately big cases in a linux cluster so, before giving up, I just send as many details as possible, in the hope that anyone can offer help or any suggestion.
I am able to run calculations for simple cases (bulk AlN in wurtzite phase) but, as soon as I go to a moderately big supercell (35 atoms), I obtain a "segmentation fault" error immediately after trying to run lapw0 (using lapw0 lapw0.def). I know this topic has been discussed several times in the mailing lists: I have read all the mails about this topic since 2002, and tried all the possible solutions without success.
The details of the operating system and compiler are:
The OS is Redhat 9 (glibc 2.3.2)
The fortran compiler is Intel Fortran 8.0, version l_fc_pc_8.0.039_pe044.1 (20040318)
The flags I have tried are: -FR -mp -w -O2 -xNW -ip,
-FR -mp -w -O3 -ip,
and MKL version 6.0 is being used.
I am not using the w2web interface, i.e., I just run the initialization by typing instgen and then init_lapw. Then I try to run lapw0 with "x lapw0 -d" and "lapw0 lapw0.def". Then I get a segmentation fault after just a couple of seconds. I'm guessing the compiler or MKL might be suspect for the errors.
The memory parameters of the system where the code is running are as follows:
**********************************************************************************************************************
[dhevia at barossa AlN_Super-10]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 7168
virtual memory (kbytes, -v) unlimited
[dhevia at barossa AlN_Super-10]$ free
total used free shared buffers cached
Mem: 3874188 3862252 11936 0 195808 3034768
-/+ buffers/cache: 631676 3242512
Swap: 2096472 120692 1975780
**********************************************************************************************************************
Everything regarding available memory, stacksize, swap space and the like seems to be fine. I am sending the compile.msg for the lapw0 code and the structure file I am using (could anyone just run the initialization and a single SCF cycle with this file to check that everything is right? I am almost 99% sure that this is not the problem, but just in case ...).
I should like to know if you believe that it is worthy to go on trying variations of the compiler/linking options. I really do not know what to do: I am afraid this goes much beyond my very limited compiling skills!!!
Thanks in advance for your help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040419/a7e0ba84/attachment.html
More information about the Wien
mailing list