[Wien] I still have problem with wienk in parallel mode
Nilton
nilton.dantas at gmail.com
Wed Dec 28 22:33:45 CET 2011
Dear L. Marks
thanks a lot for the answer. Let's to my comments
2011/12/27 Laurence Marks <L-marks at northwestern.edu>
> It is hard to know as you have not provided us with enough
> information, so we can only guess. Most likely is that you have setup
> the problem wrong, for instance bad RMTs, bad case.in1c or other. Read
> the file lapw1.error to see if it has anything, and also the various
> output files. Beyond this:
>
The setup is correct because I can run wien in sequential version
>
> a) Did you compile the mpi versions? If not, then what you are using
> will not work. There are two ways to run Wien2k in parallel, one uses
> mpi and is needed for big jobs, the other does not use mpi and is
> often simpler for small jobs.
>
Yes, I am using wien2k10.1. I tried to compile wien2k11 but I got some
errors in lapw2(c)_mpi compilation, so I gave up
> b) Edit parallel_options and put "setenv debug 1" in (remove it later)
> then do "x lapw1 -p" from the terminal. This will give you more
> output.
>
I did, it seems ok but runnig in single mode. Please, see the output below
> c) Check that you have ssh enabled to the compute nodes (I don't think
> you need the .local at the end)
>
My ssh is working. I can log on the nodes of my cluster.
>
> A comment. You have setup your .machines file to run 5 tasks for
> lapw1, each using 4 cpu's. Some mpi versions are not smart and with
> what you have will run both tasks on compute-0-0 using the same cores.
>
granularity:1
1:bodesking.uefs.br:1
1:bodesking.uefs.br:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-0.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1
1:compute-0-1.local:1
with this file if I type run_lapw -p I get 11 processes for lapw1, and 2 in
all computers listed but not lapw1_mpi or lapw2_mpi. This is the point: how
can I setup .machines in order to run wien2k with mpi libraries. Below you
can see the config of parallel_options file
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"
------------------------------The output of x lapw0 -p and x lapw1 -p
[nilton at bodesking case]$ x lapw0 -p
starting parallel lapw0 at Wed Dec 28 18:29:26 BRT 2011
-------- .machine0 : processors
running lapw0 in single mode
LAPW0 END
14.599u 0.400s 0:15.01 99.8% 0+0k 0+0io 0pf+0w
[nilton at bodesking case]$ x lapw1 -p
starting parallel lapw1 at Wed Dec 28 18:29:46 BRT 2011
-> starting parallel LAPW1 jobs at Wed Dec 28 18:29:46 BRT 2011
running LAPW1 in parallel mode (using .machines)
10 number_of_parallel_jobs
[1] 11587
[2] 11724
[3] 11856
[4] 11887
[5] 11917
[6] 11944
[7] 11976
[8] 12002
[9] 12033
LAPW1 END
[1] Done ( ( $remote $machine[$p] ...
[1] 12066
LAPW1 END
[2] Done ( ( $remote $machine[$p] ...
[2] 12108
LAPW1 END
[3] Done ( ( $remote $machine[$p] ...
[3] 12249
LAPW1 END
[4] Done ( ( $remote $machine[$p] ...
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
[3] Done ( ( $remote $machine[$p] ...
[2] + Done ( ( $remote $machine[$p] ...
[1] + Done ( ( $remote $machine[$p] ...
[9] + Done ( ( $remote $machine[$p] ...
[8] + Done ( ( $remote $machine[$p] ...
[7] + Done ( ( $remote $machine[$p] ...
[6] + Done ( ( $remote $machine[$p] ...
[5] + Done ( ( $remote $machine[$p] ...
bodesking.uefs.br(3) 7.766u 0.476s 8.26 99.76% 0+0k 0+0io 0pf+0w
bodesking.uefs.br(3) 7.916u 0.225s 8.18 99.46% 0+0k 0+0io 0pf+0w
compute-0-0.local(3) 8.529u 0.300s 8.92 98.97% 0+0k 0+0io 0pf+0w
compute-0-0.local(3) 8.899u 0.185s 9.2 98.74% 0+0k 0+0io 0pf+0w
compute-0-0.local(3) 8.640u 0.260s 9.00 98.82% 0+0k 0+0io 0pf+0w
compute-0-0.local(3) 8.335u 0.249s 8.90 96.35% 0+0k 0+0io 0pf+0w
compute-0-1.local(3) 10.687u 0.250s 11.08 98.69% 0+0k 0+0io 0pf+0w
compute-0-1.local(3) 10.632u 0.294s 11.03 98.99% 0+0k 0+0io 0pf+0w
compute-0-1.local(3) 10.708u 0.206s 11.07 98.51% 0+0k 0+0io 0pf+0w
compute-0-1.local(3) 10.573u 0.310s 11.18 97.27% 0+0k 0+0io 0pf+0w
bodesking.uefs.br(3) 7.794u 0.343s 8.19 99.35% 0+0k 0+0io 0pf+0w
bodesking.uefs.br(3) 8.336u 0.209s 8.59 99.48% 0+0k 0+0io 0pf+0w
Summary of lapw1para:
bodesking.uefs.br k=12 user=31.812 wallclock=2391.25
compute-0-0.local k=12 user=34.403 wallclock=2554.08
compute-0-1.local k=12 user=42.6 wallclock=3055.06
0.272u 0.446s 0:22.32 3.1% 0+0k 0+0io 0pf+0w
Nilton
--
Nilton S. Dantas
Universidade Estadual de Feira de Santana
Departamento de Ciências Exatas
Área de Informática
Av. Transnordestina, S/N, Bairro Novo Horizonte
CEP 44036900 - Feira de Santana, Bahia, Brasil
Tel./Fax +55 75 31618086
http://www2.ecomp.uefs.br/ <http://www.uefs.br/portal>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20111228/de6213d9/attachment.htm>
More information about the Wien
mailing list