[Wien] Segmentation fault in LAPW1 on Opteron / gentoo

wiener at arcscluster.caltech.edu wiener at arcscluster.caltech.edu
Fri Oct 15 03:48:11 CEST 2004


Dear Wien2k team,

We are trying to use Wien2k_04.7 on a dual Opteron cluster running gentoo 
linux. 
I am compiling with the PGI-5.1 version of pgf90 and I am linking against 
ACML-2.0/pgi64_mp . 
I can get the package to compile with no errors and I can run the 
test_case both in serial or parallel without problem. I can also run a 
small case of mine both serial/parallel through a full run_lapw cycle 
until convergence. However for an intermediate 16 atom supercell witha  
central impurity, I get a segmentation fault, both in serial or parallel. 
The memory usage seems to remain low however as I track it with 'top'.
Does anyone have any suggestion? I appended below the output I get before 
lapw1 crashes.
Also, regarding the benchmark posted on the Wien2k site for dual opterons 
with pgf90/ACML-2.0: was the code compiled in 64bit mode? was it allowing 
for mp support? did it run fine besides the 'test_case' ?

Thanks,
Olivier.

=================================================
olivier at strongbad VNi_1e3k $ x lapw1 -p 
starting parallel lapw1 at Thu Oct 14 18:34:26 PDT 2004
->  starting parallel LAPW1 jobs at Thu Oct 14 18:34:26 PDT 2004
Thu Oct 14 18:34:26 PDT 2004 -> Setting up case VNi_1e3k for parallel 
execution
Thu Oct 14 18:34:26 PDT 2004 -> of LAPW1
Thu Oct 14 18:34:26 PDT 2004 -> 
running LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine unset
Thu Oct 14 18:34:26 PDT 2004 -> klist:       4
Thu Oct 14 18:34:26 PDT 2004 -> machines:    node002 node003 node004 
node005
Thu Oct 14 18:34:26 PDT 2004 -> procs:       4
Thu Oct 14 18:34:26 PDT 2004 -> weigh(old):  1 1 1 1
Thu Oct 14 18:34:26 PDT 2004 -> sumw:        4
Thu Oct 14 18:34:26 PDT 2004 -> granularity: 1
Thu Oct 14 18:34:26 PDT 2004 -> weigh(new):  1 1 1 1
Thu Oct 14 18:34:26 PDT 2004 -> Splitting VNi_1e3k.klist.tmp into junks
node002
node003
node004
node005
.machinetmp222
4 number_of_parallel_jobs
prepare 1 on node002
Thu Oct 14 18:34:26 PDT 2004 -> Creating klist 1 
[1] 13562
prepare 2 on node003
Thu Oct 14 18:34:27 PDT 2004 -> Creating klist 2 
[2] 13577
prepare 3 on node004
Thu Oct 14 18:34:28 PDT 2004 -> Creating klist 3 
[3] 13592
prepare 4 on node005
Thu Oct 14 18:34:29 PDT 2004 -> Creating klist 4 
[4] 13607

real    0m3.912s
user    0m3.733s
sys     0m0.095s
[1]    Done                          ( $remote $machine[$p]  ...

real    0m3.691s
user    0m3.565s
sys     0m0.110s
[2]    Done                          ( $remote $machine[$p]  ...
waiting for all processes to complete

real    0m3.826s
user    0m3.712s
sys     0m0.096s

real    0m4.136s
user    0m4.010s
sys     0m0.106s
[4]    Done                          ( $remote $machine[$p]  ...
[3]  + Done                          ( $remote $machine[$p]  ...
Thu Oct 14 18:34:33 PDT 2004 -> all processes done.
**  LAPW1 crashed!
0.070u 0.137s 0:08.43 2.3%      0+0k 0+0io 0pf+0w





More information about the Wien mailing list