[Wien] I still have problem with wienk in parallel mode

Nilton nilton.dantas at gmail.com
Wed Dec 28 22:33:45 CET 2011


Dear L. Marks
thanks a lot for the answer. Let's to my comments

2011/12/27 Laurence Marks <L-marks at northwestern.edu>

> It is hard to know as you have not provided us with enough
> information, so we can only guess. Most likely is that you have setup
> the problem wrong, for instance bad RMTs, bad case.in1c or other. Read
> the file lapw1.error to see if it has anything, and also the various
> output files. Beyond this:
>

The setup is correct because I can run wien in sequential version


>
> a) Did you compile the mpi versions? If not, then what you are using
> will not work. There are two ways to run Wien2k in parallel, one uses
> mpi and is needed for big jobs, the other does not use mpi and is
> often simpler for small jobs.
>

Yes, I am using wien2k10.1. I tried to compile wien2k11 but I got some
errors in lapw2(c)_mpi compilation, so I gave up


> b) Edit parallel_options and put "setenv debug 1" in (remove it later)
> then do "x lapw1 -p" from the terminal. This will give you more
> output.
>

I did, it seems ok but runnig in single mode. Please, see the output below


> c) Check that you have ssh enabled to the compute nodes (I don't think
> you need the .local at the end)
>

My ssh is working. I can log on the nodes of my cluster.

>
> A comment. You have setup your .machines file to run 5 tasks for
> lapw1, each using 4 cpu's. Some mpi versions are not smart and with
> what you have will run both tasks on compute-0-0 using the same cores.
>
granularity:1
1:bodesking.uefs.br:1
1:bodesking.uefs.br:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1

with this file if I type run_lapw -p I get 11 processes for lapw1, and 2 in
all computers listed but not lapw1_mpi or lapw2_mpi. This is the point: how
can I setup .machines in order to run wien2k with mpi libraries. Below you
can see the config of parallel_options file

setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"


------------------------------The output of x lapw0 -p and x lapw1 -p
[nilton at bodesking case]$ x lapw0 -p
starting parallel lapw0 at Wed Dec 28 18:29:26 BRT 2011
-------- .machine0 : processors
running lapw0 in single mode
 LAPW0 END
14.599u 0.400s 0:15.01 99.8%    0+0k 0+0io 0pf+0w
[nilton at bodesking case]$ x lapw1 -p
starting parallel lapw1 at Wed Dec 28 18:29:46 BRT 2011
->  starting parallel LAPW1 jobs at Wed Dec 28 18:29:46 BRT 2011
running LAPW1 in parallel mode (using .machines)
10 number_of_parallel_jobs
[1] 11587
[2] 11724
[3] 11856
[4] 11887
[5] 11917
[6] 11944
[7] 11976
[8] 12002
[9] 12033
 LAPW1 END
[1]    Done                          ( ( $remote $machine[$p]  ...
[1] 12066
 LAPW1 END
[2]    Done                          ( ( $remote $machine[$p]  ...
[2] 12108
 LAPW1 END
[3]    Done                          ( ( $remote $machine[$p]  ...
[3] 12249
 LAPW1 END
[4]    Done                          ( ( $remote $machine[$p]  ...
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
[3]    Done                          ( ( $remote $machine[$p]  ...
[2]  + Done                          ( ( $remote $machine[$p]  ...
[1]  + Done                          ( ( $remote $machine[$p]  ...
[9]  + Done                          ( ( $remote $machine[$p]  ...
[8]  + Done                          ( ( $remote $machine[$p]  ...
[7]  + Done                          ( ( $remote $machine[$p]  ...
[6]  + Done                          ( ( $remote $machine[$p]  ...
[5]  + Done                          ( ( $remote $machine[$p]  ...
     bodesking.uefs.br(3) 7.766u 0.476s 8.26 99.76%      0+0k 0+0io 0pf+0w
     bodesking.uefs.br(3) 7.916u 0.225s 8.18 99.46%      0+0k 0+0io 0pf+0w
     compute-0-0.local(3) 8.529u 0.300s 8.92 98.97%      0+0k 0+0io 0pf+0w
     compute-0-0.local(3) 8.899u 0.185s 9.2 98.74%      0+0k 0+0io 0pf+0w
     compute-0-0.local(3) 8.640u 0.260s 9.00 98.82%      0+0k 0+0io 0pf+0w
     compute-0-0.local(3) 8.335u 0.249s 8.90 96.35%      0+0k 0+0io 0pf+0w
     compute-0-1.local(3) 10.687u 0.250s 11.08 98.69%      0+0k 0+0io 0pf+0w
     compute-0-1.local(3) 10.632u 0.294s 11.03 98.99%      0+0k 0+0io 0pf+0w
     compute-0-1.local(3) 10.708u 0.206s 11.07 98.51%      0+0k 0+0io 0pf+0w
     compute-0-1.local(3) 10.573u 0.310s 11.18 97.27%      0+0k 0+0io 0pf+0w
     bodesking.uefs.br(3) 7.794u 0.343s 8.19 99.35%      0+0k 0+0io 0pf+0w
     bodesking.uefs.br(3) 8.336u 0.209s 8.59 99.48%      0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   bodesking.uefs.br     k=12    user=31.812     wallclock=2391.25
   compute-0-0.local     k=12    user=34.403     wallclock=2554.08
   compute-0-1.local     k=12    user=42.6       wallclock=3055.06
0.272u 0.446s 0:22.32 3.1%      0+0k 0+0io 0pf+0w


Nilton
-- 
Nilton S. Dantas
Universidade Estadual de Feira de Santana
Departamento de Ciências Exatas
Área de Informática
Av. Transnordestina, S/N, Bairro Novo Horizonte
CEP 44036900 - Feira de Santana, Bahia, Brasil
Tel./Fax +55 75 31618086
http://www2.ecomp.uefs.br/ <http://www.uefs.br/portal>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20111228/de6213d9/attachment.htm>


More information about the Wien mailing list