[Wien] LAPW2 crashed when running in parallel

Maxim Rakitin rms85 at physics.susu.ac.ru
Mon Nov 1 04:40:50 CET 2010


Hi,

It looks like Intel's mpirun doesn't have '-machinefile' option. Instead 
of this it has '-hostfile' option (form here: 
http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt).

Try 'mpirun -h' for information about options and apply appropriate.

Best regards,
    Maxim Rakitin
    email: rms85 at physics.susu.ac.ru
    web: http://www.susu.ac.ru


01.11.2010 4:56, Wei Xie ?????:
> Dear all WIEN2k community members:
>
> We encountered some problem when running in parallel (K-point, MPI or 
> both)--the calculations crashed at LAPW2. Note we had no problem 
> running it in serial. We have tried to diagnose the problem, recompile 
> the code with difference options and test with difference cases and 
> parameters based on similar problems reported on the mail list, but 
> the problem persists. So we write here hoping someone can offer us 
> some suggestion. We have attached related files below for your 
> reference. Your replies are appreciated in advance!
>
> This is a TiC example running in both Kpoint and MPI parallel on two 
> nodes /r1i0n0/ and /r1i0n1/ (8cores/node):
>
> *1. **stdout **(abridged) *
> MPI: invalid option -machinefile
> real0m0.004s
> user0m0.000s
> sys0m0.000s
> ...
> MPI: invalid option -machinefile
> real0m0.003s
> user0m0.000s
> sys0m0.004s
> TiC.scf1up_1: No such file or directory.
>
> LAPW2 - Error. Check file lapw2.error
> cp: cannot stat `.in.tmp': No such file or directory
> rm: cannot remove `.in.tmp': No such file or directory
> *rm: cannot remove `.in.tmp1': No such file or directory*
> *
> *
> *2. TiC.dayfile (abridged) *
> ...
>     start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go)
>     cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go)
>
> >   lapw0 -p(16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 
> MDT 2010
> -------- .machine0 : 16 processors
> invalid "local" arg: -machinefile
>
> 0.436u 0.412s 0:04.63 18.1%0+0k 2600+0io 1pf+0w
> >   lapw1  -up -p (16:25:12) starting parallel lapw1 at Sun Oct 31 
> 16:25:12 MDT 2010
> ->  starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010
> running LAPW1 in parallel mode (using .machines)
> 2 number_of_parallel_jobs
>      r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)     
>  r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1)     
>  r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)    Summary 
> of lapw1para:
>    r1i0n0 k=0 user=0 wallclock=0
>    r1i0n1 k=0 user=0 wallclock=0
> ...
> 0.116u 0.316s 0:10.48 4.0%0+0k 0+0io 0pf+0w
> >   lapw2 -up -p (16:25:34) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.032u 0.104s 0:01.13 11.5%0+0k 82304+0io 8pf+0w
> error: command   /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def   failed
>
> *3. uplapw2.error *
> Error in LAPW2
>  'LAPW2' - can't open unit: 18
>  'LAPW2' -        filename: TiC.vspup
>  'LAPW2' -          status: old          form: formatted
> **  testerror: Error in Parallel LAPW2
>
> *4. .machines*
> #
> 1:r1i0n0:8
> 1:r1i0n1:8
> lapw0:r1i0n0:8 r1i0n1:8
> granularity:1
> extrafine:1
>
> *5. compilers, MPI and options*
> Intel Compilers  and MKL 11.1.046
> Intel MPI 3.2.0.011
>
> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
> current:LDFLAGS:$(FOPT) 
> -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread 
> -lmkl_core -openmp -lpthread -lguide
> current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
> -lmkl_scalapack_lp64 
> /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
> -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
> -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
> -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>
> Best regards,
> Wei Xie
> Computational Materials Group
> University of Wisconsin-Madison
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/e1463e23/attachment.htm>


More information about the Wien mailing list