[Wien] incorrect distribution over parallel nodes
Peter Blaha
pblaha at zeus.theochem.tuwien.ac.at
Wed Nov 24 23:04:29 CET 2004
This is probably due to an attempted load balencing, which is fine and
desirable without scratch disks.
Include granularity:1 in .machines
This should fix the distribution 1+2 k to node 1; 3+4 to node 2,....
> I found that with a relatively large number of nodes
> in simple parallelization sometimes the script
> incorrectly distrubutes the k-points over the nodes,
> which later results in a crash. Here is an example.
> 50 k-point were excuted on 25 machines. This is the
> .machines file:
> 1:compute-3-35.local
> 1:compute-3-34.local
> 1:compute-3-33.local
> 1:compute-3-32.local
> <skipped>
> 1:compute-3-11.local
> 1:compute-3-10.local
>
> Now see the result of running the following command:
> >foreach f ( `sed 's/1://' .machines` )
> > echo $f
> > ssh $f "ls -trs /scratch/mazin/RUN5*"
> > echo ""
> >end
>
> Note below that
>
> 1680 /scratch/mazin/RUN5.vectorup_1
> 1780 /scratch/mazin/RUN5.vectorup_26
> 1680 /scratch/mazin/RUN5.vectordn_1
> 1780 /scratch/mazin/RUN5.vectordn_26
> 6608 /scratch/mazin/RUN5.vectorsoup_1
> 6608 /scratch/mazin/RUN5.vectorsodn_1
> 7012 /scratch/mazin/RUN5.vectorsoup_26
> 7012 /scratch/mazin/RUN5.vectorsodn_26
>
> compute-3-34.local
> 1716 /scratch/mazin/RUN5.vectorup_2
> 1800 /scratch/mazin/RUN5.vectorup_27
> 1716 /scratch/mazin/RUN5.vectordn_2
> 1800 /scratch/mazin/RUN5.vectordn_27
> 6748 /scratch/mazin/RUN5.vectorsoup_2
> 6748 /scratch/mazin/RUN5.vectorsodn_2
> 7092 /scratch/mazin/RUN5.vectorsoup_27
> 7092 /scratch/mazin/RUN5.vectorsodn_27
>
> <skipped 1 "correct" node >
>
> compute-3-32.local
> 1768 /scratch/mazin/RUN5.vectorup_4
> 1744 /scratch/mazin/RUN5.vectorup_35
> 1768 /scratch/mazin/RUN5.vectordn_4
> 1800 /scratch/mazin/RUN5.vectordn_29
> 6960 /scratch/mazin/RUN5.vectorsoup_4
> 6960 /scratch/mazin/RUN5.vectorsodn_4
> 0 /scratch/mazin/RUN5.vectorup_29
> 0 /scratch/mazin/RUN5.vectorsoup_29
> 0 /scratch/mazin/RUN5.vectorsodn_29
>
> <skipped 6 "incorrect" nodes>
>
> compute-3-25.local
> 1792 /scratch/mazin/RUN5.vectorup_11
> 1744 /scratch/mazin/RUN5.vectorup_36
> 1792 /scratch/mazin/RUN5.vectordn_11
> 1744 /scratch/mazin/RUN5.vectordn_36
> 7056 /scratch/mazin/RUN5.vectorsoup_11
> 7056 /scratch/mazin/RUN5.vectorsodn_11
> 6860 /scratch/mazin/RUN5.vectorsoup_36
> 6860 /scratch/mazin/RUN5.vectorsodn_36
>
> <the rest is correct>
>
>
>
> __________________________________
> Do you Yahoo!?
> The all-new My Yahoo! - What will yours do?
> http://my.yahoo.com
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
More information about the Wien
mailing list