[Wien] machines file ... again

Jorissen Kevin Kevin.Jorissen at ua.ac.be
Tue Oct 5 23:06:56 CEST 2004


I recall from discussion with P. Blaha on the ML that the MPI version of wien is of little use on duals ...  The connection between different duals is probably too slow anyway (maybe not if you're using myrinet or so, but simple Gbit is definitely slow), and since the MPI consumes more memory than the sequential version, it becomes advantageous only for more than two processors ...
 
 
 
Kevin Jorissen
 
EMAT - Electron Microscopy for Materials Science   (http://webhost.ua.ac.be/emat/)
Dept. of Physics
 
UA - Universiteit Antwerpen
Groenenborgerlaan 171
B-2020 Antwerpen
Belgium
 
tel  +32 3 2653249
fax + 32 3 2653257
e-mail kevin.jorissen at ua.ac.be
 

________________________________

Van: wien-admin at zeus.theochem.tuwien.ac.at namens Griselda Garcia
Verzonden: di 5-10-2004 18:28
Aan: wien at zeus.theochem.tuwien.ac.at
Onderwerp: [Wien] machines file ... again



Hello all!

I an trying to set up a parallel calculation on a PC's cluster (10 dual
machines, each one is called fisnodeX). When I use the K-point
paralellization and run the program as run_lapw -p, everything is ok.

[griselda at clustersvr si-bulk]$ more .machines
granularity:1
1:fisnode2
1:fisnode3
1:fisnode4
1:fisnode5

[griselda at clustersvr si-bulk]$ testpara_lapw

#####################################################
#                     TESTPARA                      #
#####################################################

Test: LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine unset

    klist:       8
    machines:    fisnode2 fisnode3 fisnode4 fisnode5
    procs:       4
    weigh(old):  1 1 1 1
    sumw:        4
    granularity: 1
    weigh(new):  2 2 2 2

Distribution of k-point (under ideal conditions)
will be:

1 : fisnode2(2) 2k
2 : fisnode3(2) 2k
3 : fisnode4(2) 2k
4 : fisnode5(2) 2k
[griselda at clustersvr si-bulk]$


Then I try to use the fine grained version of parallelization, now my
machines file (using one processor of each node) is:

griselda at clustersvr si-bulk]$ more .machines
granularity:1
1:fisnode2 fisnode3 fisnode4 fisnode5 fisnode6 fisnode7 fisnode8 fisnode9
lapw0:fisnode1:2

[griselda at clustersvr si-bulk]$ testpara_lapw

#####################################################
#                     TESTPARA                      #
#####################################################

Test: LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine unset

    klist:       8
    machines:    fisnode2 fisnode3 fisnode4 fisnode5 fisnode6 fisnode7
fisnode8 fisnode9
    procs:       1
    weigh(old):  1
    sumw:        1
    granularity: 1
    weigh(new):  8

Distribution of k-point (under ideal conditions)
will be:

1 : fisnode2(8) 8k
[griselda at clustersvr si-bulk]$

I do not realize what it is wrong in the machine file. Why just one processor
will be used to calculate the 8 Kpoints?.

I read several times the user's guide but i can not get running the mpi
version.

The lapw0 works fine with two processors but lapw1 does not run. The dayfile
file is:

[griselda at clustersvr si-bulk]$ more case.dayfile


Calculating case in /home/griselda/WIEN/case/case
on clustersvr

    start       (Tue Oct  5 12:27:53 EDT 2004) with lapw0 (20/20 to go)
>   lapw0 -p    (12:27:53) starting parallel lapw0 at Tue Oct  5 12:27:53 EDT
2004
-------- .machine1 : 2 processors
fisnode1:2
--------
12.160u 12.530s 0:18.92 130.4%  0+0k 0+0io 11341pf+0w
>   lapw1  -p   (12:28:12) starting parallel lapw1 at Tue Oct  5 12:28:12 EDT
2004
->  starting parallel LAPW1 jobs at Tue Oct  5 12:28:12 EDT 2004
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
**  LAPW1 crashed!
0.150u 0.180s 0:03.27 10.0%     0+0k 0+0io 12563pf+0w

>   stop error

I do not find any clue about the errors.

I will really appreciate your help.

Griselda.

_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 8235 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20041005/e42f7c91/attachment.bin


More information about the Wien mailing list