[Wien] [please pay attention] query for mpi job file

Dr. K. C. Bhamu kcbhamu85 at gmail.com
Thu Jan 19 09:23:30 CET 2017


 Thank you very much Prof. Lyudmila
Please see my updated reduced query.


> I do not use mpi, only simple parallelization over k-points, so I will
> answer only some of your questions.
> >     (1) is it ok with mpiifort or mpicc or it should have mpifort or
> mpicc??
>
> I do not know and I even do not understand the question.
>

I compiled Win2k_16 with mpiifort and mpiicc, so my question is whether
mpiifort and mpiicc is correct or I should use mpifort and mpicc (look for
double "i").
Hope, this question is now well framed.


>
> >     (2) how to know that job is running with mpi parallelization?
>
> IMHO, the simplest way is from dayfile:
>

It is good idea to see in case.dayfile.


>     cycle 1     (Ср. сент. 21 21:59:09 SAMT 2016)       (60/99 to go)
> >   lapw0 -p    (21:59:09) starting parallel lapw0 at Ср. сент. 21
> 21:59:09 SAMT 2016
> -------- .machine0 : processors
> running lapw0 in single mode  <-----***this is no mpi--)
> 10.221u 0.064s 0:10.35 99.3%    0+0k 0+28016io 0pf+0w
> >   lapw1  -up -p    -c         (21:59:19) starting parallel lapw1 at Ср.
> сент. 21 21:59:19 SAMT 2016
> ->  starting parallel LAPW1 jobs at Ср. сент. 21 21:59:19 SAMT 2016
> running LAPW1 in parallel mode (using .machines) <---***this is k-point
> parallel.--)
> 9 number_of_parallel_jobs <-----***this is k-point parallel.--)
> localhost(12) 131.805u 1.038s 2:13.24 99.6% 0+0k 0+94072io 0pf+0w
> ...
> localhost(12) 122.034u 1.234s 2:03.67 99.6% 0+0k 0+81472io 0pf+0w
>    Summary of lapw1para: <------***this is k-point parallel.--)
>


Thank you very much for detailed answer.


>
> >     the *.err file seems as:
>
>>     cp: cannot stat `CuGaO2.scfdmup': No such file or directory      >>>
>>
> I don't know, and I am afraid nobody knows without info
>

This is not a problem, this is set by default dor runsp_c_lapw case by
Prof. Peter to save computational time. I got answer from three years old
answer by Prof. Peter.


Mond. Sept 19 15:10:29 SAMT 2016> (x) lapw1 -up -p -c
> Mond. Sept 19 15:12:52 SAMT 2016> (x) lapw1 -dn -p -c
> Mond. Sept 19 15:15:09 SAMT 2016> (x) lapw2 -up -p -c ...


Okay, because you are running run_lapw -c case.

    (3) I want to know how to change below variable in the job file so
>>     that I can run more effectively mpi run
>>     # the following number / 4 = number of nodes
>>     #$ -pe mpich 32
>>     set mpijob=1                        ??
>>     set jobs_per_node=4                    ??
>>     #### the definition above requests 32 cores and we have 4 cores /node.
>>     #### We request only k-point parallel, thus mpijob=1
>>     #### the resulting machines names are in $TMPDIR/machines
>>     setenv OMP_NUM_THREADS 1    ???????
>>
>
> I don't know.
>


Okay, may be someone else may look for this.


>
>     (4) The job with 32 core and with 64 core (with "set mpijob=2") taking
>> ~equal time for scf cycles.
>>
>
> From your log file it looks like you do not have any parallelization, so
> in both cases you have equal time.
>

Yeah, it may be. But if I use "set mpijob=1" then it runs well for k-point
parallelization.



Thnak you very much


Sincerely
Sincerely

Bhamu


>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20170119/18cbeee1/attachment.html>


More information about the Wien mailing list