[Wien] Segmentation fault in LAPW1 on Opteron / gentoo
wiener at arcscluster.caltech.edu
wiener at arcscluster.caltech.edu
Fri Oct 15 03:48:11 CEST 2004
Dear Wien2k team,
We are trying to use Wien2k_04.7 on a dual Opteron cluster running gentoo
linux.
I am compiling with the PGI-5.1 version of pgf90 and I am linking against
ACML-2.0/pgi64_mp .
I can get the package to compile with no errors and I can run the
test_case both in serial or parallel without problem. I can also run a
small case of mine both serial/parallel through a full run_lapw cycle
until convergence. However for an intermediate 16 atom supercell witha
central impurity, I get a segmentation fault, both in serial or parallel.
The memory usage seems to remain low however as I track it with 'top'.
Does anyone have any suggestion? I appended below the output I get before
lapw1 crashes.
Also, regarding the benchmark posted on the Wien2k site for dual opterons
with pgf90/ACML-2.0: was the code compiled in 64bit mode? was it allowing
for mp support? did it run fine besides the 'test_case' ?
Thanks,
Olivier.
=================================================
olivier at strongbad VNi_1e3k $ x lapw1 -p
starting parallel lapw1 at Thu Oct 14 18:34:26 PDT 2004
-> starting parallel LAPW1 jobs at Thu Oct 14 18:34:26 PDT 2004
Thu Oct 14 18:34:26 PDT 2004 -> Setting up case VNi_1e3k for parallel
execution
Thu Oct 14 18:34:26 PDT 2004 -> of LAPW1
Thu Oct 14 18:34:26 PDT 2004 ->
running LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine unset
Thu Oct 14 18:34:26 PDT 2004 -> klist: 4
Thu Oct 14 18:34:26 PDT 2004 -> machines: node002 node003 node004
node005
Thu Oct 14 18:34:26 PDT 2004 -> procs: 4
Thu Oct 14 18:34:26 PDT 2004 -> weigh(old): 1 1 1 1
Thu Oct 14 18:34:26 PDT 2004 -> sumw: 4
Thu Oct 14 18:34:26 PDT 2004 -> granularity: 1
Thu Oct 14 18:34:26 PDT 2004 -> weigh(new): 1 1 1 1
Thu Oct 14 18:34:26 PDT 2004 -> Splitting VNi_1e3k.klist.tmp into junks
node002
node003
node004
node005
.machinetmp222
4 number_of_parallel_jobs
prepare 1 on node002
Thu Oct 14 18:34:26 PDT 2004 -> Creating klist 1
[1] 13562
prepare 2 on node003
Thu Oct 14 18:34:27 PDT 2004 -> Creating klist 2
[2] 13577
prepare 3 on node004
Thu Oct 14 18:34:28 PDT 2004 -> Creating klist 3
[3] 13592
prepare 4 on node005
Thu Oct 14 18:34:29 PDT 2004 -> Creating klist 4
[4] 13607
real 0m3.912s
user 0m3.733s
sys 0m0.095s
[1] Done ( $remote $machine[$p] ...
real 0m3.691s
user 0m3.565s
sys 0m0.110s
[2] Done ( $remote $machine[$p] ...
waiting for all processes to complete
real 0m3.826s
user 0m3.712s
sys 0m0.096s
real 0m4.136s
user 0m4.010s
sys 0m0.106s
[4] Done ( $remote $machine[$p] ...
[3] + Done ( $remote $machine[$p] ...
Thu Oct 14 18:34:33 PDT 2004 -> all processes done.
** LAPW1 crashed!
0.070u 0.137s 0:08.43 2.3% 0+0k 0+0io 0pf+0w
More information about the Wien
mailing list