[Wien] [please pay attention] query for mpi job file
Dr. K. C. Bhamu
kcbhamu85 at gmail.com
Thu Jan 19 09:23:30 CET 2017
Thank you very much Prof. Lyudmila
Please see my updated reduced query.
> I do not use mpi, only simple parallelization over k-points, so I will
> answer only some of your questions.
> > (1) is it ok with mpiifort or mpicc or it should have mpifort or
> mpicc??
>
> I do not know and I even do not understand the question.
>
I compiled Win2k_16 with mpiifort and mpiicc, so my question is whether
mpiifort and mpiicc is correct or I should use mpifort and mpicc (look for
double "i").
Hope, this question is now well framed.
>
> > (2) how to know that job is running with mpi parallelization?
>
> IMHO, the simplest way is from dayfile:
>
It is good idea to see in case.dayfile.
> cycle 1 (Ср. сент. 21 21:59:09 SAMT 2016) (60/99 to go)
> > lapw0 -p (21:59:09) starting parallel lapw0 at Ср. сент. 21
> 21:59:09 SAMT 2016
> -------- .machine0 : processors
> running lapw0 in single mode <-----***this is no mpi--)
> 10.221u 0.064s 0:10.35 99.3% 0+0k 0+28016io 0pf+0w
> > lapw1 -up -p -c (21:59:19) starting parallel lapw1 at Ср.
> сент. 21 21:59:19 SAMT 2016
> -> starting parallel LAPW1 jobs at Ср. сент. 21 21:59:19 SAMT 2016
> running LAPW1 in parallel mode (using .machines) <---***this is k-point
> parallel.--)
> 9 number_of_parallel_jobs <-----***this is k-point parallel.--)
> localhost(12) 131.805u 1.038s 2:13.24 99.6% 0+0k 0+94072io 0pf+0w
> ...
> localhost(12) 122.034u 1.234s 2:03.67 99.6% 0+0k 0+81472io 0pf+0w
> Summary of lapw1para: <------***this is k-point parallel.--)
>
Thank you very much for detailed answer.
>
> > the *.err file seems as:
>
>> cp: cannot stat `CuGaO2.scfdmup': No such file or directory >>>
>>
> I don't know, and I am afraid nobody knows without info
>
This is not a problem, this is set by default dor runsp_c_lapw case by
Prof. Peter to save computational time. I got answer from three years old
answer by Prof. Peter.
Mond. Sept 19 15:10:29 SAMT 2016> (x) lapw1 -up -p -c
> Mond. Sept 19 15:12:52 SAMT 2016> (x) lapw1 -dn -p -c
> Mond. Sept 19 15:15:09 SAMT 2016> (x) lapw2 -up -p -c ...
Okay, because you are running run_lapw -c case.
(3) I want to know how to change below variable in the job file so
>> that I can run more effectively mpi run
>> # the following number / 4 = number of nodes
>> #$ -pe mpich 32
>> set mpijob=1 ??
>> set jobs_per_node=4 ??
>> #### the definition above requests 32 cores and we have 4 cores /node.
>> #### We request only k-point parallel, thus mpijob=1
>> #### the resulting machines names are in $TMPDIR/machines
>> setenv OMP_NUM_THREADS 1 ???????
>>
>
> I don't know.
>
Okay, may be someone else may look for this.
>
> (4) The job with 32 core and with 64 core (with "set mpijob=2") taking
>> ~equal time for scf cycles.
>>
>
> From your log file it looks like you do not have any parallelization, so
> in both cases you have equal time.
>
Yeah, it may be. But if I use "set mpijob=1" then it runs well for k-point
parallelization.
Thnak you very much
Sincerely
Sincerely
Bhamu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20170119/18cbeee1/attachment.html>
More information about the Wien
mailing list