[Wien] query for mpi job file

Dr. K. C. Bhamu kcbhamu85 at gmail.com
Tue Jan 17 18:20:52 CET 2017


Dear Experts

I just installed Wien2k_16 on a sge cluster (linuxifc) with 40 nodes with
each node having 16 core and each core has 4GB RAM (~2GB/ processor), 40
Gbps Infiniband interconnect. I used "mpiifort" and "mpiicc"   compiler
with scalapck, blas, fftd3 and blacs library (without ELPA and
LIBXC-3.0.0). I also specified number of core (16) during configuration for
each node (the compiler options are specified at the bottom or email).

Now I have submitted the job using the sge script:

http://susi.theochem.tuwien.ac.at/reg_user/faq/sge.job

with set mpijob=2 instead of set mpijob=1.


I spacified
      PARAMETER          (NMATMAX=   19000)
      PARAMETER          (NUME=   6000)

Now I have few queries:
(1) is it ok with mpiifort or mpicc or it should have mpifort or mpicc??
(2) how to know that job is running with mpi parallelization?


the basic outputs are:

[bhamu at gu CuGaO2]$ testpara1_lapw
.processes: No such file or directory.
(standard_in) 1: syntax error

#####################################################
#                     TESTPARA1                     #
#####################################################

Tue Jan 17 22:14:57 IST 2017

   lapw1para was not yet executed

the *.err file seems as:
LAPW0 END
 ORB   END
 ORB   END
 LAPW1 END
 LAPW2 END
cp: cannot stat `CuGaO2.scfdmup': No such file or directory      >>> why
this is error? I want to overcome this.
 CORE  END
 CORE  END
 MIXER END

The :log file

Tue Jan 17 22:16:14 IST 2017> (x) lapw0
Tue Jan 17 22:16:17 IST 2017> (x) orb -up
Tue Jan 17 22:16:17 IST 2017> (x) orb -dn
Tue Jan 17 22:16:17 IST 2017> (x) lapw1 -up -orb
Tue Jan 17 22:17:26 IST 2017> (x) lapw2 -up -orb
Tue Jan 17 22:17:44 IST 2017> (x) lcore -up
Tue Jan 17 22:17:44 IST 2017> (x) lcore -dn
Tue Jan 17 22:17:45 IST 2017> (x) mixer -orb


(3) I want to know how to change below variable in the job file so that I
can run more effectively mpi run

# the following number / 4 = number of nodes
#$ -pe mpich 32
set mpijob=1                        ??
set jobs_per_node=4                    ??

#### the definition above requests 32 cores and we have 4 cores /node.
#### We request only k-point parallel, thus mpijob=1
#### the resulting machines names are in $TMPDIR/machines

setenv OMP_NUM_THREADS 1    ???????


(4) The job with 32 core and with 64 core (with "set mpijob=2") taking
~equal time for scf cycles.



The other compilers options set as:


   Recommended options for system linuxifc are:

         RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64
-lmkl_blacs_intelmpi_lp64 $(R_LIBS)
         FPOPT(par.comp.options): -O1 -FR -mp1 -w -prec_div -pc80 -pad
-ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include
         MPIRUN command         : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

   Current settings:

         FFTW_LIB + FFTW_OPT    : -lfftw3_mpi -lfftw3 -L/usr/local/lib
 +  -DFFTW3 -I/usr/local/include (already set)
         ELPA_LIB + ELPA_OPT    :   +   (already set)
     RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64
-lmkl_blacs_intelmpi_lp64 $(R_LIBS)
     FP  FPOPT(par.comp.options): -O1 -FR -mp1 -w -prec_div -pc80 -pad
-ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include
     MP  MPIRUN command         : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
     CN  CORES_PER_NODE         : 16


For any other supporting information please let me know.


Sincerely

Bhamu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20170117/3a8a76cd/attachment-0001.html>


More information about the Wien mailing list