[Wien] Problem when running MPI-parallel version of LAPW0

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Oct 22 14:22:13 CEST 2014


Usually the "crucial" point for lapw0  is the fftw3-library.

I noticed you have fftw-3.3.4, which I never tested. Since fftw is 
incompatible between fftw2 and 3, maybe they have done something again ...

Besides that, I assume you have installed fftw using the same ifor and 
mpi versions ...



On 10/22/2014 01:29 PM, Rémi Arras wrote:
> Dear Pr. Blaha, Dear Wien2k users,
>
> We tried to install the last version of Wien2k (14.1) on a supercomputer
> and we are facing some troubles with the MPI parallel version.
>
> 1)lapw0 is running correctly in sequential, but crashes systematically
> when the parallel option is activated (independently of the number of
> cores we use):
>
>>lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13
> CEST 2014
> -------- .machine0 : 4 processors
> Child id1 SIGSEGV
> Child id2 SIGSEGV
> Child id3 SIGSEGV
> Child id0 SIGSEGV
> **lapw0 crashed!
> 0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
> error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
> lapw0.deffailed
>>stop error
>
> w2k_dispatch_signal(): received: Segmentation fault
> w2k_dispatch_signal(): received: Segmentation fault
> Child with myid of1has an error
> 'Unknown' - SIGSEGV
> Child id1 SIGSEGV
> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
> **lapw0 crashed!
> cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
> error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
> lapw0.deffailed
>
>
> 2) lapw2 also crashes sometimes when MPI parallelization is used.
> Sequential or k-parallel runs are ok, and contrary to lapw0, the error
> does not occur for all cases (we did not notice any problem when testing
> the mpi benchmark with lapw1):
>
> w2k_dispatch_signal(): received: Segmentation fault application called
> MPI_Abort(MPI_COMM_WORLD, 768) - process 0
>
> Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and
> we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.
> The batch Scheduler is SLURM.
>
> Here are the settings and the options we used for the installation :
>
> OPTIONS:
> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -Dmkl_scalapack -traceback -xAVX
> current:FFTW_OPT:-DFFTW3
> -I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
> current:FFTW_LIBS:-lfftw3_mpi -lfftw3
> -L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
> -lmkl_core -openmp -lpthread
> current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3
> -L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
> current:MPIRUN:mpirun -np _NP_ _EXEC_
> current:MKL_TARGET_ARCH:intel64
>
> PARALLEL_OPTIONS:
> setenv TASKSET "no"
> setenv USE_REMOTE 1
> setenv MPI_REMOTE 1
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"
>
> Any suggestions which could help us to solve this problem would be
> greatly appreciated.
>
> Best regards,
> Rémi Arras
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------


More information about the Wien mailing list