[Wien] [Fwd: Re: parallel job error]
mino yang
yangmino at samsung.com
Tue Feb 27 02:02:56 CET 2007
This is just my case. Please refer to the installation article in FAQ.
Sure. It works nicely.
But just one problem, the x lapw -p -qtl makes errors everytime. So I run again scf cycle in single mode (one processor), then run x lapw -qtl. The final scf cycle in single mode converge immediately as the energy converged already during the previous parallel.
------- Original Message -------
Sender : jadhikari at clarku.edu<jadhikari at clarku.edu>
Date : 2007-02-27 09:04
Title : [Wien] [Fwd: Re: parallel job error]
Hi,
Thank you very much for the reply.
Were you able to use multiple processers after switching to static?
How often did it crashed?
Just curious about it.
:)
Subin
Dear Subin.
I'm not expert. But hopefully it can help you.
It is not exactely my case. But I faced simillar problems.
I solved it by changing the complie option from floating to static.
This is my static complie option in siteconfig
O Compiler options: -FR -mp1 -w -prec_div -pad -ip
L Linker Flags: -L/opt/intel/intel_fce_90/lib -i-static
-lguide_stats -lsvml -lpthread
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -L/opt/intel/mkl/8.0.2/lib/em64t
-lmkl_lapack64 -lmkl_em64t -lguide -lguide_stats
And I followed the installation article of FAQ.
Have a nice day.
------- Original Message -------
Sender : jadhikari at clarku.edu<jadhikari at clarku.edu>
Date : 2007-02-27 01:47
Title : [Wien] parallel job error
Dear Wien users,
The calculation seems to stop with "( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> ..." as the last line in the dayfile.
Single mode runs fine but with 4 processors it never works.
I understand this error but cannot fix it. Please let me know about this.
Thank you.
Subin
Following is the dayfile for a parallel job-
_______________________________________________________________________
start (Mon Feb 26 09:59:35 EST 2007) with lapw0 (80/20 to go)
cycle 1 (Mon Feb 26 09:59:35 EST 2007) (80/20 to go)
> lapw0 -p (09:59:35) starting parallel lapw0 at Mon Feb 26 09:59:35
EST 2007
--------
running lapw0 in single mode
80.915u 0.224s 1:21.45 99.6% 0+0k 0+0io 5pf+0w
> lapw1 -p (10:00:56) starting parallel lapw1 at Mon Feb 26 10:00:56
EST 2007
-> starting parallel LAPW1 jobs at Mon Feb 26 10:00:56 EST 2007
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
[1] 20280
[2] 20295
[3] 20310
[4] 20325
[2] Done ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> ...
_____________________________________________________________________
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
More information about the Wien
mailing list