[Wien] Segmentation fault in LAPW1 on Opteron / gentoo
Torsten Andersen
thor at physik.uni-kl.de
Fri Oct 15 08:13:59 CEST 2004
Dear Olivier,
I experienced the same thing with Opteron when I used the -O5 switch in
the compiler. Something is buggy in the compiler or the ACML. Compiling
with -O3 removed the problem for me, although then the speed went down
on the test_case by a factor of two. However, the speed of the test_case
is not enough - it also has to work for real systems.
Does anyone have optimized compiler options for the Opteron/PGI that
also works beyond the test case?
Best regards,
Torsten Andersen.
wiener at arcscluster.caltech.edu wrote:
> Dear Wien2k team,
>
> We are trying to use Wien2k_04.7 on a dual Opteron cluster running gentoo
> linux.
> I am compiling with the PGI-5.1 version of pgf90 and I am linking against
> ACML-2.0/pgi64_mp .
> I can get the package to compile with no errors and I can run the
> test_case both in serial or parallel without problem. I can also run a
> small case of mine both serial/parallel through a full run_lapw cycle
> until convergence. However for an intermediate 16 atom supercell witha
> central impurity, I get a segmentation fault, both in serial or parallel.
> The memory usage seems to remain low however as I track it with 'top'.
> Does anyone have any suggestion? I appended below the output I get before
> lapw1 crashes.
> Also, regarding the benchmark posted on the Wien2k site for dual opterons
> with pgf90/ACML-2.0: was the code compiled in 64bit mode? was it allowing
> for mp support? did it run fine besides the 'test_case' ?
>
> Thanks,
> Olivier.
>
> =================================================
> olivier at strongbad VNi_1e3k $ x lapw1 -p
> starting parallel lapw1 at Thu Oct 14 18:34:26 PDT 2004
> -> starting parallel LAPW1 jobs at Thu Oct 14 18:34:26 PDT 2004
> Thu Oct 14 18:34:26 PDT 2004 -> Setting up case VNi_1e3k for parallel
> execution
> Thu Oct 14 18:34:26 PDT 2004 -> of LAPW1
> Thu Oct 14 18:34:26 PDT 2004 ->
> running LAPW1 in parallel mode (using .machines)
> Granularity set to 1
> Extrafine unset
> Thu Oct 14 18:34:26 PDT 2004 -> klist: 4
> Thu Oct 14 18:34:26 PDT 2004 -> machines: node002 node003 node004
> node005
> Thu Oct 14 18:34:26 PDT 2004 -> procs: 4
> Thu Oct 14 18:34:26 PDT 2004 -> weigh(old): 1 1 1 1
> Thu Oct 14 18:34:26 PDT 2004 -> sumw: 4
> Thu Oct 14 18:34:26 PDT 2004 -> granularity: 1
> Thu Oct 14 18:34:26 PDT 2004 -> weigh(new): 1 1 1 1
> Thu Oct 14 18:34:26 PDT 2004 -> Splitting VNi_1e3k.klist.tmp into junks
> node002
> node003
> node004
> node005
> .machinetmp222
> 4 number_of_parallel_jobs
> prepare 1 on node002
> Thu Oct 14 18:34:26 PDT 2004 -> Creating klist 1
> [1] 13562
> prepare 2 on node003
> Thu Oct 14 18:34:27 PDT 2004 -> Creating klist 2
> [2] 13577
> prepare 3 on node004
> Thu Oct 14 18:34:28 PDT 2004 -> Creating klist 3
> [3] 13592
> prepare 4 on node005
> Thu Oct 14 18:34:29 PDT 2004 -> Creating klist 4
> [4] 13607
>
> real 0m3.912s
> user 0m3.733s
> sys 0m0.095s
> [1] Done ( $remote $machine[$p] ...
>
> real 0m3.691s
> user 0m3.565s
> sys 0m0.110s
> [2] Done ( $remote $machine[$p] ...
> waiting for all processes to complete
>
> real 0m3.826s
> user 0m3.712s
> sys 0m0.096s
>
> real 0m4.136s
> user 0m4.010s
> sys 0m0.106s
> [4] Done ( $remote $machine[$p] ...
> [3] + Done ( $remote $machine[$p] ...
> Thu Oct 14 18:34:33 PDT 2004 -> all processes done.
> ** LAPW1 crashed!
> 0.070u 0.137s 0:08.43 2.3% 0+0k 0+0io 0pf+0w
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
--
Dr. Torsten Andersen TA-web: http://deep.at/myspace/
AG Hübner, Department of Physics, Kaiserslautern University
http://cmt.physik.uni-kl.de http://www.physik.uni-kl.de/
More information about the Wien
mailing list