[Wien] Problems with lapw1_mpi: Could not convert index 1140850688 into a pointer
Oleg Rubel
rubel at Physik.Uni-Marburg.de
Tue Feb 26 11:53:30 CET 2008
Dear Wien2k Users,
I am trying to run MPI version of WIEN2k_08.1 (Release 14/12/2007)
compiled with /opt/intel/mpich/bin/mpif90 v10 + mkl 10 on a cluster of
dual core AMD Opteron processors. My test run of a serial version of the
programm as well as of the k-pint parallel version was successful. The MPI
test run with 2 nodes generates the following error:
marc-hn:~/wien_work/GaAs2_mpi> nice x dstart -c ; rm *.broy* ; nice run_lapw -p -I -i 40 -ec 0.0001 -cc 0.001
DSTART ENDS
2.1u 0.0s 0:02.25 100.0% 0+0k 0+0io 0pf+0w
rm: No match.
hup: Command not found.
Invalid null command.
LAPW0 END
LAPW0 END
0 - <NO ERROR MESSAGE> : Could not convert index 1140850688 into a pointer
The index may be an incorrect argument.
Possible sources of this problem are a missing "include 'mpif.h'",
a misspelled MPI object (e.g., MPI_COM_WORLD instead of MPI_COMM_WORLD)
or a misspelled user variable for an MPI object (e.g.,
com instead of comm).
[0] Aborting program !
[0] Aborting program!
cat: No match.
> stop error
There was a discussion about a very similar or even the same problem two
years ago, but it did not end up with a real solution. I put some marks
into the programm in order to localize the error. Here is the result:
marc-hn:~/wien_work/GaAs2_mpi> cat *.day*
Calculating GaAs2_mpi in /home/rubel/wien_work/GaAs2_mpi
on marc-hn with PID 16908
start (Tue Feb 26 10:49:43 CET 2008) with lapw0 (40/99 to go)
cycle 1 (Tue Feb 26 10:49:43 CET 2008) (40/99 to go)
> lapw0 -p (10:49:43) starting parallel lapw0 at Tue Feb 26 10:49:43 CET 2008
-------- .machine1 : 2 processors
node0:1 node0:1
--------
3.5u 0.3s 0:06.46 60.2% 0+0k 0+0io 0pf+0w
> lapw1 -c -p (10:49:49) starting parallel lapw1 at Tue Feb 26 10:49:49 CET 2008
-> starting parallel LAPW1 jobs at Tue Feb 26 10:49:49 CET 2008
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
node0 node0(8) In lapw1
In lapw1: start GTFNAM
In GTFNAM
In GTFNAM: call INIT_PARALLEL
In lapw1
In lapw1: start GTFNAM
In GTFNAM
In GTFNAM: call INIT_PARALLEL
Using 2 processors
p0_17262: p4_error: : 9039
0.0u 0.0s 0:01.37 11.6% 0+0k 0+0io 0pf+0w
** LAPW1 crashed!
0.1u 0.1s 0:03.27 8.5% 0+0k 0+0io 0pf+0w
error: command /home/rubel/WIEN2k_v08.mkl_10_mpi/lapw1cpara -c lapw1.def failed
It seems that the programm crashes at CALL INIT_PARALLEL in
SRC_lapw1/GTFNAM.F. However, the same CALL INIT_PARALLEL in
SRC_lapw0/GTFNAM.F goes though. (???) My questions are: (1) What
INIT_PARALLEL suppose to do and what it belongs to? (2) Did anyone else
sufferer from this kind of error?
Thank you in advance.
P.S. Here are the compiler options for the reference (looks as a mess,
but I did not find anything better as to put all possible links):
FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
FPOPT:-I/opt/intel/mpich/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
LDFLAGS:$(FOPT) -L/opt/intel/mkl/10/lib/em64t -static-libcxa
DPARALLEL:'-DParallel'
R_LIBS:-Bstatic -lmkl_lapack -lmkl_em64t -lguide -Bdynamic -lpthread
RP_LIBS:-L /opt/intel/mkl/10/lib/em64t -lmkl_scalapack -lmkl_blacs_intelmpi_lp64 -lmkl_core -lmkl_intel_lp64 -lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential -liomp5 -lmkl_blacs_lp64 -lmkl_em64t -lmkl_intel_sp2dp -lmkl_sequential -lmkl_blacs_ilp64 -lmkl_blacs_openmpi_ilp64 -lmkl_gf_ilp64 -lmkl_intel_thread -lmkl_solver -lmkl_blacs_intelmpi20_ilp64 -lmkl_blacs_openmpi_lp64 -lmkl_gf_lp64 -lmkl_lapack -lmkl_solver_ilp64 -lmkl_blacs_intelmpi20_lp64 -lmkl_cdft -lmkl_scalapack -lmkl_solver_ilp64_sequential -lmkl_blacs_intelmpi_ilp64 -lmkl_cdft_core -lmkl_intel_ilp64 -lmkl_scalapack_ilp64 -lmkl_solver_lp64 -lpthread
MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
Oleg Rubel
===========================
Faculty of Physics
Philipps University Marburg
Renthof 5, 35032 Marburg, Germany
E-mail: Oleg.Rubel at physik.uni-marburg.de
Homepage: http://www.staff.uni-marburg.de/~rubel/
More information about the Wien
mailing list