[Wien] Job distribution problem in MPI+k point parallelization
lung Fermin
ferminlung at gmail.com
Wed Jan 28 04:36:24 CET 2015
I have checked with the MPIRUN option. I used
setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -hostfile
$PBS_NODEFILE _EXEC_"
before. Now I changed the hostfile to _HOSTS_ instead of $PBS_NODEFILE. I
can get 4 lapw1_mpi running. However, the CPU usage of each of the job is
still only 50% (I use "top" to check this). Why is this the case? What
could I do in order to get CPU usage of 100%? (OMP_NUM_THREAD=1 and in the
.machine1 and .machine2 file I have two lines of node1)
In the pure MPI case, using the .machines file as
#
1:node1 node1 node1 node1
granularity:1
extrafine:1
#
I can get 4 lapw1_mpi running with 100% CPU usage. How shall I understand
this situation?
The following are some details on the options and the system I used:
1. Wien2k_14.2, mpif90 (compiled with ifort) for MVAPICH2 version 2.0
2. The batch system is PBS and the script I used for qsub:
#
#!/bin/tcsh
#PBS -l nodes=1:ppn=4
#PBS -l walltime=00:30:00
#PBS -q node1
#PBS -o wien2k_output
#PBS -j oe
cd $PBS_O_WORKDIR
limit vmemoryuse unlimited
#set how many cores to be used for each mpi job
set mpijob=2
set proclist=`cat $PBS_NODEFILE `
echo $proclist
set nproc=$#proclist
echo number of processors: $nproc
#---------- writing .machines file ------------------
echo '#' > .machines
set i=1
while ($i <= $nproc )
echo -n '1:' >>.machines
@ i1 = $i + $mpijob
@ i2 = $i1 - 1
echo $proclist[$i-$i2] >>.machines
set i=$i1
end
echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines
# --------- end of .machines file
run_lapw -p -i 40 -cc 0.0001 -ec 0.00001
###
3. The .machines file:
#
1:node1 node1
1:node1 node1
granularity:1
extrafine:1
and .machine1 and .machine2 files are both
node1
node1
3. The parallel_options:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_"
4. The compiling options:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback
current:FFTW_OPT:-DFFTW3 -I/usr/local/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib
current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t
-pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
-openmp -lpthread -lguide
current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64
-lmkl_blacs_intelmpi_lp64 $(R_LIBS)
current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_
current:MKL_TARGET_ARCH:intel64
Thanks,
Fermin
*------------------------------------------------------------*
*-------------------------------------------------------------*
-----Original Message-----
From: wien-bounces at zeus.theochem.tuwien.ac.at [
mailto:wien-bounces at zeus.theochem.tuwien.ac.at
<wien-bounces at zeus.theochem.tuwien.ac.at>] On Behalf Of Peter Blaha
Sent: Tuesday, January 27, 2015 11:55 PM
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] Job distribution problem in MPI+k point parallelization
It should actually be only 4 lapw1_mpi jobs running with this setup.
How did you find this: using "top" or ps ???
Do you have thread-parallelization on ? (OMP_NUM_THREAD=2 ???) Then it
doubles the processes (but you gain nothing ...)
It could also be that your mpirun definition is not ok with respect of you
version of mpi, ...
PS: I hope it is clear, that such a setup is useful only for testing.
the mpi-program on 2 cores is "slower/at least not faster" than the
sequential program on 1 core.
On 01/27/2015 04:41 PM, lung Fermin wrote:
> Dear Wien2k community,
>
> Recently, I am trying to set up a calculation of a system with ~40
> atoms using MPI+k point parallelization. Suppose in one single node, I
> want to calculate 2 k points, with each k point using 2 processors to
> run mpi parallel. The .machines file I used was #
> 1:node1 node1
> 1:node1 node1
> granularity:1
> extrafine:1
> #
>
> When I ssh into node1, I saw that there were 8 lapw1_mpi running, each
> with CPU usage of 50%. Is this natural or have I done something wrong?
> What I expect was having 4 lapw1_mpi running each with CPU usage of
> 100% instead. I am a newbei to mpi parallelization. Please point me
> out if I have misunderstand anything.
>
> Thanks in advance,
> Fermin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150128/4d27c3e0/attachment.html>
More information about the Wien
mailing list