[Wien] incorrect distribution over parallel nodes

Igor Mazin wienuser at yahoo.com
Wed Nov 24 21:52:39 CET 2004


Dear All

Has anybody experienced the following problem and is
there any way of fixing it?

I found that with a relatively large number of nodes
in simple parallelization sometimes the script
incorrectly distrubutes the k-points over the nodes,
which later results in a crash. Here is an example.
50 k-point were excuted on 25 machines. This is the
.machines file:
1:compute-3-35.local
1:compute-3-34.local
1:compute-3-33.local
1:compute-3-32.local
<skipped>
1:compute-3-11.local
1:compute-3-10.local

Now see the result of running the following command:
>foreach f ( `sed 's/1://' .machines` )
> echo $f
> ssh $f "ls -trs /scratch/mazin/RUN5*"
> echo ""
>end

Note below that

1680 /scratch/mazin/RUN5.vectorup_1
1780 /scratch/mazin/RUN5.vectorup_26
1680 /scratch/mazin/RUN5.vectordn_1
1780 /scratch/mazin/RUN5.vectordn_26
6608 /scratch/mazin/RUN5.vectorsoup_1
6608 /scratch/mazin/RUN5.vectorsodn_1
7012 /scratch/mazin/RUN5.vectorsoup_26
7012 /scratch/mazin/RUN5.vectorsodn_26

compute-3-34.local
1716 /scratch/mazin/RUN5.vectorup_2
1800 /scratch/mazin/RUN5.vectorup_27
1716 /scratch/mazin/RUN5.vectordn_2
1800 /scratch/mazin/RUN5.vectordn_27
6748 /scratch/mazin/RUN5.vectorsoup_2
6748 /scratch/mazin/RUN5.vectorsodn_2
7092 /scratch/mazin/RUN5.vectorsoup_27
7092 /scratch/mazin/RUN5.vectorsodn_27

<skipped 1 "correct" node >

compute-3-32.local
1768 /scratch/mazin/RUN5.vectorup_4
1744 /scratch/mazin/RUN5.vectorup_35
1768 /scratch/mazin/RUN5.vectordn_4
1800 /scratch/mazin/RUN5.vectordn_29
6960 /scratch/mazin/RUN5.vectorsoup_4
6960 /scratch/mazin/RUN5.vectorsodn_4
   0 /scratch/mazin/RUN5.vectorup_29
   0 /scratch/mazin/RUN5.vectorsoup_29
   0 /scratch/mazin/RUN5.vectorsodn_29

<skipped 6 "incorrect" nodes>

compute-3-25.local
1792 /scratch/mazin/RUN5.vectorup_11
1744 /scratch/mazin/RUN5.vectorup_36
1792 /scratch/mazin/RUN5.vectordn_11
1744 /scratch/mazin/RUN5.vectordn_36
7056 /scratch/mazin/RUN5.vectorsoup_11
7056 /scratch/mazin/RUN5.vectorsodn_11
6860 /scratch/mazin/RUN5.vectorsoup_36
6860 /scratch/mazin/RUN5.vectorsodn_36

<the rest is correct>


		
__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 




More information about the Wien mailing list