[Wien] problem with parallel version

Laurence Marks L-marks at northwestern.edu
Fri Jul 23 13:49:19 CEST 2010


Probably you do not have an appropriate machines file setup, these
things can be different with different flavors of mpi. Many things:

1) For SGI MPT you need to ensure that you have the correct SGI
libraries (check with ldd and the Intel mkl compilation advisor), and
the format of the command in $WIENROOT/paralel options is different. I
used

setenv WIEN_MPIRUN "sgi_mpirun.sh _HOSTS_ _EXEC_"

where sgi_mpirun.sh contains

#!/bin/bash
#
# This file needs to be in the users path
MPI_HOSTS=$(sort $1 | uniq -c | awk '{print $2 " " $1}' | tr "\n" ","
| sed 's/.$//')
#
#if needed
#ulimit -l unlimited
export MKL_NUM_THREADS=1 OMP_NUM_THREADS=1
export MKL_DOMAIN_NUM_THREADS="MKL_VML=1, MKL_ALL=1"
mpirun $MPI_HOSTS $2 $3 $4 $5 $6

2. mvapich 1.0.1 is very old. I would update.

3. In the file lapw0para, after the lines (183 in my version)
#insert $WIENROOT, since some stupid mpi clusters don't propagate the PATH
set ttt=(`echo $mpirun | sed -e "s^_NP_^$number_per_job[1]^" -e "s^_EXEC_

add a "echo $ttt" before the $ttt. This will print the command being
used and you can then check it, and also use it as it is.

2010/7/23 pascal boulet <pascal.boulet at univ-provence.fr>:
> Dear all,
>
>
> Sorry for this long post. I tried to be as complete as possible...
>
>
> I have problems to run Wien2k (v.10) in parallel. Note that the serial
> executables work fine.
> I have compiled Wien2k successfully (at least, I guess...) on a SGI with
> intel processors.
> The configuration I use is the following:
> intel/10.1.017    mkl/10.0.3.020      mvapich/1.0.1     fftw/2.1.5
>
> First, I compiled with a more recent configuration:
> intel/11.1.059   mkl/11.1.059   and   SGI MPT 1.26 libraries
> but it seems that the mpirun command does not accept the -machinefile
> option...
>
>
> The cluster consists in nodes composed of quadricores bi-processors, so 8
> cores per node.
>
>
> The queuing system is PBS. My script is:
>
> ##### PBS SCRIPT
> #!/usr/bin/csh -v
> #PBS -S /usr/bin/csh
> #PBS -l select=1:ncpus=8:mpiprocs=8
> #PBS -l walltime=00:10:00
> #PBS -N scf_wien2k
> #PBS -j eo
>
> module load intel/10.1.017 mvapich/1.0.1 fftw/2.1.5
> cd $PBS_O_WORKDIR
>
> cat $PBS_NODEFILE > .machines_current
> set aa=`wc .machines_current`
>
> # for MPI
> echo -n 'lapw0:' > .machines
> set i=1
> while ($i < $aa[1] )
>   echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >> .machines
>   @ i ++
> end
> echo `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >> .machines
>
> # for k-points
> set i=1
> while ($i <= $aa[1] )
>   echo -n '1:' >> .machines
>   head -$i .machines_current |tail -1 >> .machines
>   @ i ++
> end
> echo 'granularity:1' >> .machines
> echo 'extrafine:1' >> .machines
>
> setenv WIENROOT /work/cdeck/wien2k_10
> setenv W2WEB_CASE_BASEDIR /work/cdeck/WIEN2k
> setenv STRUCTEDIT_PATH $WIENROOT/SRC_structeditor/bin
>
> $WIENROOT/run_lapw -p -i 60 -ec 0.0001 >& wien2k.log
>
> ##### END OF PBS SCRIPT
>
> The .machines file created by the script looks like this:
>
> lapw0:r14i0n12  r14i0n12  r14i0n12  r14i0n12  r14i0n12  r14i0n12  r14i0n12
> r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> 1:r14i0n12
> granularity:1
> extrafine:1
>
> I also tried the following:
>
> lapw0:r14i0n12:8
> 1:r14i0n12:8
> granularity:1
> extrafine:1
>
> But it does not work either.
>
> Now here is the content of TiC.dayfile:
>
> Calculating TiC in /work/cdeck/WIEN2k/TiC
> on r14i0n12 with PID 13222
> using WIEN2k_10.1 (Release 7/6/2010) in /work/cdeck/wien2k_10
>
>
>     start     (Thu Jul 22 22:49:58 CEST 2010) with lapw0 (60/99 to go)
>
>     cycle 1     (Thu Jul 22 22:49:58 CEST 2010)     (60/99 to go)
>
>>   lapw0 -p    (22:49:58) starting parallel lapw0 at Thu Jul 22 22:49:58
>> CEST 2010
> -------- .machine0 : 8 processors
> 0.032u 0.080s 0:04.95 2.2%    0+0k 0+0io 37pf+0w
> error: command   /work/cdeck/wien2k_10/lapw0para lapw0.def   failed
>
>>   stop error
>
> That of the log file that I called "wien2k.log" (in the PBS script):
>
> Exit code -3 signaled from r14i0n12
> Killing remote processes...2 - MPI_COMM_RANK : Null communicator
> [2] [] Aborting Program!
> [6] [] Aborting Program!
> 5 - MPI_COMM_RANK : Null communicator
> 4 - MPI_COMM_RANK : Null communicator
> 3 - MPI_COMM_RANK : Null communicator
> [4] [] Aborting Program!
> [5] [] Aborting Program!
> [3] [] Aborting Program!
> 7 - MPI_COMM_RANK : Null communicator
> [7] [] Aborting Program!
> Abort signaled by rank 6:  Aborting program !
> Abort signaled by rank 2:  Aborting program !
> Abort signaled by rank 3:  Aborting program !
> Abort signaled by rank 4:  Aborting program !
> Abort signaled by rank 5:  Aborting program !
> Abort signaled by rank 7:  Aborting program !
> 1 - MPI_COMM_RANK : Null communicator
> [1] [] Aborting Program!
> Abort signaled by rank 1:  Aborting program !
> 0 - MPI_COMM_RANK : Null communicator
> [0] [] Aborting Program!
> Abort signaled by rank 0:  Aborting program !
> MPI process terminated unexpectedly
> DONE
>
>>   stop error
>
> The lapw0.error file contains: Error in LAPW0
>
>
> Could anyone give me a hint on what is wrong?
>
> Thank you
> Pascal
>
>
>
> --
> Dr. pascal Boulet, computational chemist
> University of Aix-Marseille I
> Laboratoire Chimie Provence, UMR 6264
> Group of Theoretical Chemistry
> Avenue Normandie-Niemen
> 13397 Marseille Cedex 20
> France
> **********
> Tel. (+33) (0)491.63.71.17
> Fax. (+33) (0)491.63.71.11
> **********
> http://www.lc-provence.fr
> https://sites.google.com/a/univ-provence.fr/pb-comput-chem
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.


More information about the Wien mailing list