[Wien] Compiling lapw1_mpi with HP-mpi and MKL

Oleg Rubel rubelo at tbh.net
Thu Jul 9 17:41:54 CEST 2009


My working configuration is: mvapich2-1.2/Intel fortran 11/MKL 10. I use SGE for job submission that takes care of booting and shutting down MPD. My experience with IntelMPI was not good. (I cannot say anything about m(va)pich.) In order to make Wien2k work with m(va)pich2 on 64-bit machines, some minor modification of lapw0 is necessary as reported in the maillist (http://zeus.theochem.tuwien.ac.at/pipermail/wien/2009-March/012271.html). Please find below some details. I am ready to share additional information, if necessary.

I hope this will help.

Best regards,

Oleg

--
Oleg Rubel, PhD
Scientist
Thunder Bay Regional Research Institute
290 Munro St
Thunder Bay, ON
P7A  7T1, Canada
Phone: +1-807-7663350
Fax: +1-807-3441948
E-mail: rubelo at tbh.net
Homepage: http://www.tbrri.com/~orubel/


In submission script it is important to have that:
   ### This is important to prevent CPU affinity and have 100% CPU 
   ### ===========================================================
   setenv MV2_ENABLE_AFFINITY 0

Further details of the configuration:
[oleg at feynman mvapich2-1.2]$ cat /etc/*release*
CentOS release 5.2 (Final)
[oleg at feynman mvapich2-1.2]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz
stepping        : 6
cpu MHz         : 2793.004
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 5589.78
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:
...
[oleg at feynman WIEN2k_v08.mkl_10_IB_MVAPICH2]$ cat VERSION
WIEN2k_08.3 (Release 18/9/2008)
[oleg at feynman WIEN2k_v08.mkl_10_IB_MVAPICH2]$ cat OPTIONS
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -align -DINTEL_VML
current:FPOPT:$(FOPT)
current:LDFLAGS:$(FOPT) -L/act/intel/mkl/10.1.0.015/lib/em64t -i-static
current:DPARALLEL:'-DParallel'
current:R_LIBS:-Bstatic -lmkl_lapack -lmkl_em64t -lguide -Bdynamic -lpthread
current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_core -lmkl_intel_lp64 -lmkl_solver_lp64_sequential -lmkl_em64t -lmkl_sequential -lmkl_intel_thread -lmkl_lapack -lmkl_cdft -lmkl_cdft_core -lmkl_solver_lp64 -lguide -lpthread
current:MPIRUN:mpiexec -machinefile _HOSTS_ -n _NP_ _EXEC_
[oleg at feynman WIEN2k_v08.mkl_10_IB_MVAPICH2]$ cat parallel_options
setenv USE_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpiexec -machinefile _HOSTS_ -n _NP_ _EXEC_"
[oleg at feynman ~]$ ifort -v
Version 11.0
[oleg at feynman ~]$ env
MKLROOT=/act/intel/mkl/10.1.0.015
MODULE_VERSION_STACK=3.2.6
MANPATH=/act/intel/mkl/10.1.0.015/man:/act/intel/itac/7.2.0.011/man:/act/intel/fc/11.0.074/man:/act/intel/cc/11.0.074/man:/usr/man:/usr/share/man:/usr/local/man:/usr/local/share/man:/usr/X11R6/man:/act/Modules/3.2.6/man
HOSTNAME=feynman.tbrri.com
INTEL_LICENSE_FILE=/home/act/intel/licenses:/home/act/intel/licenses:/act/intel/fc/11.0.074/licenses:/opt/intel/licenses:/home/oleg/intel/licenses:/act/intel/cc/11.0.074/licenses:/opt/intel/licenses:/home/oleg/intel/licenses
TERM=vt100
SHELL=/bin/bash
XCRYSDEN_SCRATCH=/gtmp/xcrysden_scratch
HISTSIZE=1000
SSH_CLIENT=10.67.195.1 1147 22
LIBRARY_PATH=/act/intel/mkl/10.1.0.015/lib/em64t
SGE_CELL=default
FPATH=/act/intel/mkl/10.1.0.015/include
OCTAVE_EXEC_PATH=/act/WIEN2k_v08.mkl_10_IB_MVAPICH2:/act/WIEN2k_v08.mkl_10_IB_MVAPICH2/SRC_structeditor/bin
SSH_TTY=/dev/pts/6
XCRYSDEN_TOPDIR=/act/XCrySDen-1.5.17-bin-semishared
USER=oleg
LD_LIBRARY_PATH=/act/intel/impi/3.2.0.011/lib64:/act/intel/mkl/10.1.0.015/lib/em64t:/act/intel/itac/7.2.0.011/itac/slib_impi3:/act/intel/fc/11.0.074/lib/intel64:/act/intel/cc/11.0.074/lib/intel64
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
CPATH=/act/intel/mkl/10.1.0.015/include
OCTAVE_PATH=/act/WIEN2k_v08.mkl_10_IB_MVAPICH2/SRC_structeditor/bin
NLSPATH=/act/intel/mkl/10.1.0.015/lib/em64t/locale/%l_%t/%N:/act/intel/fc/11.0.074/lib/intel64/locale/%l_%t/%N:/act/intel/cc/11.0.074/idb/intel64/locale/%l_%t/%N:/act/intel/cc/11.0.074/lib/intel64/locale/%l_%t/%N
VT_ADD_LIBS=-ldwarf -lelf -lvtunwind -lnsl -lm -ldl -lpthread
MODULE_VERSION=3.2.6
MAIL=/var/spool/mail/oleg
PATH=/act/XCrySDen-1.5.17-bin-semishared:/act/WIEN2k_v08.mkl_10_IB_MVAPICH2:/act/WIEN2k_v08.mkl_10_IB_MVAPICH2/SRC_structeditor/bin:/act/intel/ictce/3.2.0.020/bin:/act/intel/impi/3.2.0.011/bin64:/act/intel/itac/7.2.0.011/bin:/usr/kerberos/bin:/act/intel/fc/11.0.074/bin/intel64:/act/intel/cc/11.0.074/bin/intel64:/act/ge/bin/lx24-amd64:/usr/local/bin:/bin:/usr/bin:/act/bin:/act/bin:/home/oleg/bin
STRUCTEDIT_PATH=/act/WIEN2k_v08.mkl_10_IB_MVAPICH2/SRC_structeditor/bin
INPUTRC=/etc/inputrc
PWD=/home/oleg
_LMFILES_=/act/Modules/3.2.6/modulefiles/ict:/act/Modules/3.2.6/modulefiles/wien2k_mvapich2
EDITOR=vi
IDB_HOME=/act/intel/cc/11.0.074/bin/intel64
LANG=en_US.UTF-8
MODULEPATH=/act/Modules/versions:/act/Modules/$MODULE_VERSION/modulefiles:/act/Modules/modulefiles:
VT_LIB_DIR=/act/intel/itac/7.2.0.011/itac/lib_impi3
LOADEDMODULES=ict:wien2k_mvapich2
SGE_ROOT=/act/ge
VT_ROOT=/act/intel/itac/7.2.0.011
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SHLVL=1
HOME=/home/oleg
VT_SLIB_DIR=/act/intel/itac/7.2.0.011/itac/slib_impi3
LOGNAME=oleg
CVS_RSH=ssh
CLASSPATH=/act/intel/itac/7.2.0.011/itac/lib_impi3
SSH_CONNECTION=10.67.195.1 1147 10.60.196.1 22
MODULESHOME=/act/Modules/3.2.6
OMP_NUM_THREADS=1
LESSOPEN=|/usr/bin/lesspipe.sh %s
WIENROOT=/act/WIEN2k_v08.mkl_10_IB_MVAPICH2
SGE_CLUSTER_NAME=feynman
INCLUDE=/act/intel/mkl/10.1.0.015/include
G_BROKEN_FILENAMES=1
_=/bin/env

>>> "Lombardi, Enrico" <Lombaeb at unisa.ac.za> 07/07/09 10:03 AM >>>
This message (and attachments) is subject to restrictions and a disclaimer. Please refer to http://www.unisa.ac.za/disclaimer for full details.
________________________________

Dear Wien2k users and authors

We are trying to compile mpi-parallel Wien2k lapw1/2 on an infiniband system, but have not been successful up to now.

We would appreciate an indication of which combinations of mpi-library, math-library and compiler are known to work on infini-band systems?  Also what scaling has been achieved on such systems up to now?

Currently, we are compiling using different scenarios:

1. HP-MPI v2.3.1, Intel Fortran v 11.0 and MKL :  In this case the code compiles without error messages, but lapw1 crashes immediately with numerous segfaults.

2. Still using HP-MPI, with Intel Fortran v11.0, but with selfcompiled ScaLAPACK+BLAS in addition to the Intel MKL, this also compiles smoothly. However lapw1_mpi runtime behaviour depends on how the parallelization is done [mix of mpi+k-parallelization], with some cases resulting in seeming smooth runs, but crashes in lapw2: dnlapw2_XX.error files containing 'l2main' - QTL-B.GT.15., Ghostbands, check scf files".  while other combinations of k-point vs mpi-parallelization result in hanging lapw1_mpi jobs which never complete (0% CPU usage, which later segfault).

Note that 'serial' Wien2k (k-point parallelization) always works smoothly.

It would be appreciated if we could obtain known working link/compile options for mpi-parallel lapwX on infiniband systems:
1. Which MPI libraries were used?
2. Which ScaLAPACK/BLAS, and version?
3. Which Compiler and version?
4. Linking options and mpirun options?

Please let me know if there are any additional details which are needed.

Any assistance would be appreciated.

Thank you
Regards
Enrico Lombardi

NOTES ON INPUT:
In all cases the tests are based on the standard mpi-parallel benchmark, but increasing the number of k-points to match number of nodes (and first initializing the calculation in the usual way to be able to complete SCF cycles, not just lapw1).

.machines files used:
K-point parallelization only:
1:node1
1:node1
...
1:node2
1:node2
...

mpi-parallelization only:
1:node1:8 node2:8 node3:8  node4:8 .....

mixture of mpi and k-point parallelization:
1:node1:8 node2:8 node3:8 .....
1:node9:8 node10:8 node11:8 ....
....

--
Dr E B Lombardi
Physics Department
University of South Africa
P.O. Box 392
UNISA 0003
Pretoria
South Africa

Tel: 012 429 8654 / 8027
Fax: 012 429 3643
E-mail: lombaeb at unisa.ac.za<mailto:lombaeb at science.unisa.ac.za>




More information about the Wien mailing list