[Wien] Problems with lapw1_mpi: Could not convert index 1140850688 into a pointer

Oleg Rubel rubel at Physik.Uni-Marburg.de
Tue Feb 26 11:53:30 CET 2008


Dear Wien2k Users,

I am trying to run MPI version of WIEN2k_08.1 (Release 14/12/2007) 
compiled with /opt/intel/mpich/bin/mpif90 v10 + mkl 10 on a cluster of 
dual core AMD Opteron processors. My test run of a serial version of the 
programm as well as of the k-pint parallel version was successful. The MPI 
test run with 2 nodes generates the following error:

     marc-hn:~/wien_work/GaAs2_mpi> nice x dstart -c ; rm *.broy* ; nice run_lapw -p -I -i 40 -ec 0.0001 -cc 0.001
     DSTART ENDS
     2.1u 0.0s 0:02.25 100.0% 0+0k 0+0io 0pf+0w
     rm: No match.
     hup: Command not found.
     Invalid null command.
     LAPW0 END
     LAPW0 END
     0 - <NO ERROR MESSAGE> : Could not convert index 1140850688 into a pointer
     The index may be an incorrect argument.
     Possible sources of this problem are a missing "include 'mpif.h'",
     a misspelled MPI object (e.g., MPI_COM_WORLD instead of MPI_COMM_WORLD)
     or a misspelled user variable for an MPI object (e.g.,
     com instead of comm).
     [0]  Aborting program !
     [0] Aborting program!
     cat: No match.
     >   stop error

There was a discussion about a very similar or even the same problem two 
years ago, but it did not end up with a real solution. I put some marks 
into the programm in order to localize the error. Here is the result:

     marc-hn:~/wien_work/GaAs2_mpi> cat *.day*
     Calculating GaAs2_mpi in /home/rubel/wien_work/GaAs2_mpi
     on marc-hn with PID 16908
         start       (Tue Feb 26 10:49:43 CET 2008) with lapw0 (40/99 to go)
         cycle 1     (Tue Feb 26 10:49:43 CET 2008)  (40/99 to go)
     >   lapw0 -p    (10:49:43) starting parallel lapw0 at Tue Feb 26 10:49:43 CET 2008
     -------- .machine1 : 2 processors
     node0:1 node0:1
     --------
     3.5u 0.3s 0:06.46 60.2% 0+0k 0+0io 0pf+0w
     >   lapw1  -c -p        (10:49:49) starting parallel lapw1 at Tue Feb 26 10:49:49 CET 2008
     ->  starting parallel LAPW1 jobs at Tue Feb 26 10:49:49 CET 2008
     running LAPW1 in parallel mode (using .machines)
     1 number_of_parallel_jobs
         node0 node0(8)  In lapw1
     In lapw1: start GTFNAM
     In GTFNAM
     In GTFNAM: call INIT_PARALLEL
     In lapw1
     In lapw1: start GTFNAM
     In GTFNAM
     In GTFNAM: call INIT_PARALLEL
     Using    2 processors
     p0_17262:  p4_error: : 9039
     0.0u 0.0s 0:01.37 11.6% 0+0k 0+0io 0pf+0w
     **  LAPW1 crashed!
     0.1u 0.1s 0:03.27 8.5% 0+0k 0+0io 0pf+0w
     error: command   /home/rubel/WIEN2k_v08.mkl_10_mpi/lapw1cpara -c lapw1.def   failed

It seems that the programm crashes at CALL INIT_PARALLEL in 
SRC_lapw1/GTFNAM.F. However, the same CALL INIT_PARALLEL in 
SRC_lapw0/GTFNAM.F goes though. (???) My questions are: (1) What 
INIT_PARALLEL suppose to do and what it belongs to? (2) Did anyone else 
sufferer from this kind of error?

Thank you in advance.


P.S. Here are the compiler options for the reference (looks as a mess, 
but I did not find anything better as to put all possible links):

     FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
     FPOPT:-I/opt/intel/mpich/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
     LDFLAGS:$(FOPT) -L/opt/intel/mkl/10/lib/em64t -static-libcxa
     DPARALLEL:'-DParallel'
     R_LIBS:-Bstatic -lmkl_lapack -lmkl_em64t -lguide -Bdynamic -lpthread
     RP_LIBS:-L /opt/intel/mkl/10/lib/em64t -lmkl_scalapack -lmkl_blacs_intelmpi_lp64 -lmkl_core -lmkl_intel_lp64 -lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential -liomp5 -lmkl_blacs_lp64 -lmkl_em64t -lmkl_intel_sp2dp -lmkl_sequential -lmkl_blacs_ilp64 -lmkl_blacs_openmpi_ilp64 -lmkl_gf_ilp64 -lmkl_intel_thread -lmkl_solver -lmkl_blacs_intelmpi20_ilp64 -lmkl_blacs_openmpi_lp64 -lmkl_gf_lp64 -lmkl_lapack -lmkl_solver_ilp64 -lmkl_blacs_intelmpi20_lp64 -lmkl_cdft -lmkl_scalapack -lmkl_solver_ilp64_sequential -lmkl_blacs_intelmpi_ilp64 -lmkl_cdft_core -lmkl_intel_ilp64 -lmkl_scalapack_ilp64 -lmkl_solver_lp64 -lpthread
     MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_


Oleg Rubel

===========================
Faculty of Physics
Philipps University Marburg
Renthof 5, 35032 Marburg, Germany
E-mail: Oleg.Rubel at physik.uni-marburg.de
Homepage: http://www.staff.uni-marburg.de/~rubel/


More information about the Wien mailing list