[Wien] Job distribution problem in MPI+k point parallelization

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Jan 28 07:45:22 CET 2015


Now it is rather clear why you had 8 mpi jobs running previously.

The new definition of WIEN_MPIRUN and also your pbs script seems ok
and the jobs are now distributed as expected.

I do not know why you get only 50% in this test. Maybe because the test
is not suitable and requires so much communication that the cpu cannot
run at full speed.

As I said before, a setup with two mpi jobs and 2 k-parallel jobs on a
4 core machine is a "useless" setup. Parallelization is not a task which
works in an "arbitrary" way, but needs to be adapted to the hardware AND
the physical problem.

Your task is now to compare "timings" and find out the optimal setup for
the specific problem and the available hardware.

Run the same job with .machines file:
1:host:4

or
1:host
1:host
1:host
1:host

or setenv OMP_NUM_THREAD =2
1:host
1:host

and check which run is the fastest.


Am 28.01.2015 um 04:36 schrieb lung Fermin:
> I have checked with the MPIRUN option. I used
>
>
> setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -hostfile $PBS_NODEFILE _EXEC_"
>
> *
> *
>
> before. Now I changed the hostfile to _HOSTS_ instead of $PBS_NODEFILE. I can get 4 lapw1_mpi running. However, the CPU usage of each of the job is still only 50% (I use
> "top" to check this). Why is this the case? What could I do in order to get CPU usage of 100%? (OMP_NUM_THREAD=1 and in the .machine1 and .machine2 file I have two lines of
> node1)
>
>
> In the pure MPI case, using the .machines file as
>
> #
>
> 1:node1 node1 node1 node1
>
> granularity:1
>
> extrafine:1
>
> #
>
> I can get 4 lapw1_mpi running with 100% CPU usage. How shall I understand this situation?
>
>
> The following are some details on the options and the system I used:
>
> 1. Wien2k_14.2, mpif90 (compiled with ifort) for MVAPICH2 version 2.0
>
>
> 2. The batch system is PBS and the script I used for qsub:
>
> #
>
> #!/bin/tcsh
>
> #PBS -l nodes=1:ppn=4
>
> #PBS -l walltime=00:30:00
>
> #PBS -q node1
>
> #PBS -o wien2k_output
>
> #PBS -j oe
>
>
> cd $PBS_O_WORKDIR
>
> limit vmemoryuse unlimited
>
> #set how many cores to be used for each mpi job
>
> set mpijob=2
>
> set proclist=`cat $PBS_NODEFILE `
>
> echo $proclist
>
> set nproc=$#proclist
>
> echo number of processors: $nproc
>
>
> #---------- writing .machines file ------------------
>
> echo '#' > .machines
>
>   set i=1
>   while ($i <= $nproc )
>   echo -n '1:' >>.machines
>    @ i1 = $i + $mpijob
>    @ i2 = $i1 - 1
>    echo $proclist[$i-$i2] >>.machines
>   set i=$i1
>   end
> echo 'granularity:1' >>.machines
> echo 'extrafine:1' >>.machines
> # --------- end of .machines file
>
> run_lapw -p -i 40 -cc 0.0001 -ec 0.00001
> ###
>
> 3. The .machines file:
> #
> 1:node1 node1
> 1:node1 node1
> granularity:1
> extrafine:1
>
> and .machine1 and .machine2 files are both
> node1
> node1
>
>
> 3. The parallel_options:
> setenv TASKSET "no"
> setenv USE_REMOTE 1
> setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_"
>
>
> 4.  The compiling options:
>
> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -Dmkl_scalapack -traceback
>
> current:FFTW_OPT:-DFFTW3 -I/usr/local/include
>
> current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib
>
> current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t -pthread
>
> current:DPARALLEL:'-DParallel'
>
> current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide
>
> current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_intelmpi_lp64 $(R_LIBS)
>
> current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_
>
> current:MKL_TARGET_ARCH:intel64
>
>
> Thanks,
>
> Fermin
>
> *------------------------------------------------------------*
>
> *-------------------------------------------------------------*
>
> -----Original Message-----
>
> From: wien-bounces at zeus.theochem.tuwien.ac.at <mailto:wien-bounces at zeus.theochem.tuwien.ac.at> [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha
>
> Sent: Tuesday, January 27, 2015 11:55 PM
>
> To: A Mailing list for WIEN2k users
>
> Subject: Re: [Wien] Job distribution problem in MPI+k point parallelization
>
> It should actually be only 4 lapw1_mpi jobs running with this setup.
>
> How did you find this: using "top" or ps  ???
>
> Do you have thread-parallelization on ? (OMP_NUM_THREAD=2 ???) Then it doubles the processes (but you gain nothing ...)
>
> It could also be that your mpirun definition is not ok with respect of you version of mpi, ...
>
> PS: I hope it is clear, that such a setup is useful only for testing.
>
> the mpi-program on 2 cores is "slower/at least not faster" than the sequential program on 1 core.
>
> On 01/27/2015 04:41 PM, lung Fermin wrote:
>
>> Dear Wien2k community,
>
>>
>
>> Recently, I am trying to set up a  calculation of a system with ~40
>
>> atoms using MPI+k point  parallelization. Suppose in one single node, I
>
>> want to calculate 2 k points, with  each k point using 2 processors to
>
>> run mpi parallel. The .machines  file I used was #
>
>> 1:node1 node1
>
>> 1:node1 node1
>
>> granularity:1
>
>> extrafine:1
>
>> #
>
>>
>
>> When I ssh into node1, I saw that  there were 8 lapw1_mpi running, each
>
>> with CPU usage of 50%. Is this  natural or have I done something wrong?
>
>> What I expect was having 4  lapw1_mpi running each with CPU usage of
>
>> 100% instead. I am a newbei to mpi  parallelization. Please point me
>
>> out if I have misunderstand  anything.
>
>>
>
>> Thanks in advance,
>
>> Fermin
>
>>
>
>>
>
>>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------


More information about the Wien mailing list