[Wien] [Fwd: Re: parallel job error]

mino yang yangmino at samsung.com
Tue Feb 27 02:02:56 CET 2007


This is just my case.  Please refer to the installation article in FAQ. 

Sure.  It works nicely.  
But just one problem, the x lapw -p -qtl makes errors everytime.  So I run again scf cycle in single mode (one processor), then run x lapw -qtl.  The final scf cycle in single mode converge immediately as the energy converged already during the previous parallel. 



------- Original Message -------
Sender : jadhikari at clarku.edu<jadhikari at clarku.edu> 
Date   : 2007-02-27 09:04
Title  : [Wien] [Fwd: Re:  parallel job error]

Hi,
Thank you very much for the reply.
Were you able to use multiple processers after switching to static?
How often did it crashed?
Just curious about it.
:)
Subin


Dear Subin.

I'm not expert.  But hopefully it can help you.

It is not exactely my case. But I faced simillar problems.
I solved it by changing the complie option from floating to static.

This is my static complie option in siteconfig
     O   Compiler options:        -FR -mp1 -w -prec_div -pad -ip
     L   Linker Flags:            -L/opt/intel/intel_fce_90/lib -i-static
-lguide_stats -lsvml -lpthread
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -L/opt/intel/mkl/8.0.2/lib/em64t
-lmkl_lapack64 -lmkl_em64t -lguide -lguide_stats

And I followed the installation article of FAQ.

Have a nice day.



------- Original Message -------
Sender : jadhikari at clarku.edu<jadhikari at clarku.edu>
Date   : 2007-02-27 01:47
Title  : [Wien] parallel job error

Dear Wien users,

The calculation seems to stop with "( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >>  ..." as the last line in the dayfile.
Single mode runs fine but with 4 processors it never works.

I understand this error but cannot fix it. Please let me know about this.

Thank you.

Subin


Following is the dayfile for a parallel job-
_______________________________________________________________________
    start       (Mon Feb 26 09:59:35 EST 2007) with lapw0 (80/20 to go)

    cycle 1     (Mon Feb 26 09:59:35 EST 2007)  (80/20 to go)

>   lapw0 -p    (09:59:35) starting parallel lapw0 at Mon Feb 26 09:59:35
EST 2007
--------
running lapw0 in single mode
80.915u 0.224s 1:21.45 99.6%    0+0k 0+0io 5pf+0w
>   lapw1  -p   (10:00:56) starting parallel lapw1 at Mon Feb 26 10:00:56
EST 2007
->  starting parallel LAPW1 jobs at Mon Feb 26 10:00:56 EST 2007
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
[1] 20280
[2] 20295
[3] 20310
[4] 20325
[2]    Done                          ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >>  ...
_____________________________________________________________________
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien



_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien




More information about the Wien mailing list