[Wien] There are no allocated resources for the application

贾亚磊 jia_yalei at 163.com
Fri Aug 2 22:49:03 CEST 2013


Dear all,
     I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi mpif90, and intel mkl  with the following parameter:
K1   Linux (Intel ifort 11.1 compiler + mkl )
 O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 L   Linker Flags:            $(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t -pthread
 P   Preprocessor flags       '-DParallel'
 R   R_LIB (LAPACK+BLAS):     -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide
RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_openmpi_lp64 -L/home/yljia/compiler_library/fftw-2.1.5/lib/ -lfftw_mpi -lfftw $(R_LIBS)
FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
MP  MPIRUN commando        : mpirun -np _NP_ --hostfile _HOSTS_ _EXEC_
   The program can run in non parallel mode, k point paralle. But in mpi  parallel mode , it has error messages in the following two files:
1. STDOUT:
     LAPW0 END
     LAPW0 END
    .........
     LAPW0 END
     LAPW1 END
     LAPW1 END
     LAPW1 END
     LAPW1 END
    --------------------------------------------------------------------------
    There are no allocated resources for the application
      /home/yljia/software/wien2k_11/lapw1_mpi
    that match the requested mapping:
      .machine5

    Verify that you have mapped the allocated resources properly using the
    --host or --hostfile specification.
    --------------------------------------------------------------------------
     LAPW1 END
     LAPW1 END
    --------------------------------------------------------------------------
    There are no allocated resources for the application
      /home/yljia/software/wien2k_11/lapw1_mpi
    that match the requested mapping:
      .machine6
    ...........
    ...........
      .machine8
                                              
    Verify that you have mapped the allocated resources properly using the
    --host or --hostfile specification.
    --------------------------------------------------------------------------
    FERMI - Error
    cp: cannot stat `.in.tmp': No such file or directory
    rm: cannot remove `.in.tmp': No such file or directory
    rm: cannot remove `.in.tmp1': No such file or directory

    >   stop error
2. TiC.dayfile:
    Calculating TiC in /home/yljia/wien2k/TiC/testqsub/TiC
    on compute-0-12.local with PID 16027
    using WIEN2k_11.1 (Release 14/6/2011) in /home/yljia/software/wien2k_11

    start   (Sat Aug  3 00:42:07 CST 2013) with lapw0 (40/99 to go)

        cycle 1     (Sat Aug  3 00:42:07 CST 2013)  (40/99 to go)

    >   lapw0 -p    (00:42:07) starting parallel lapw0 at Sat Aug  3 00:42:07 CST 2013
    -------- .machine0 : 16 processors
    5.812u 22.540s 0:04.23 670.2%   0+0k 0+0io 205pf+0w
    >   lapw1  -p   (00:42:11) starting parallel lapw1 at Sat Aug  3 00:42:12 CST 2013
    ->  starting parallel LAPW1 jobs at Sat Aug  3 00:42:12 CST 2013
    running LAPW1 in parallel mode (using .machines)
    8 number_of_parallel_jobs
         compute-0-12 compute-0-12(32) 3.181u 0.181s 0:02.77 121.2% 0+0k 0+0io 33pf+0w
         compute-0-12 compute-0-12(32) 2.781u 0.117s 0:02.58 112.0% 0+0k 0+0io 18pf+0w
         compute-0-12 compute-0-12(32) 2.343u 0.089s 0:02.28 106.1% 0+0k 0+0io 17pf+0w
         compute-0-12 compute-0-12(32) 2.818u 0.126s 0:02.52 116.2% 0+0k 0+0io 17pf+0w
         compute-0-2 compute-0-2(32) 0.010u 0.012s 0:00.03 66.6%    0+0k 0+0io 0pf+0w
         compute-0-2 compute-0-2(32) 0.009u 0.014s 0:00.03 33.3%    0+0k 0+0io 0pf+0w
         compute-0-2 compute-0-2(32) 0.010u 0.020s 0:00.04 75.0%    0+0k 0+0io 0pf+0w
         compute-0-2 compute-0-2(32) 0.012u 0.020s 0:00.04 75.0%    0+0k 0+0io 0pf+0w
       Summary of lapw1para:
       compute-0-12  k=0     user=128    wallclock=30.78
    11.349u 1.617s 0:10.77 120.2%   0+0k 0+0io 85pf+0w
    >   lapw2 -p    (00:42:22) running LAPW2 in parallel mode
    **  LAPW2 crashed!
    0.076u 0.108s 0:00.20 85.0% 0+0k 0+0io 9pf+0w
    error: command   /home/yljia/software/wien2k_11/lapw2para lapw2.def   failed

    >   stop error 
  The following is the shell script I submit. I have 2 nodes,  and each has 8 cores[except the host node]:
#!/bin/tcsh
#$ -S /bin/tcsh
#$ -N W2web_Job
# MPIR_HOME from submitting environment
#$ -v MPIR_HOME
# needs in
#   $NSLOTS
#       the number of tasks to be used
#   $TMPDIR/machines
#       a valid machine file to be passed to mpirun
#$ -cwd
#$ -o job.out
#$ -e job.err
#$ -q  parallel.q
#$ -pe mpich 8
# mpich / jobs_per_node = number of nodes 

set mpijob=1
set jobs_per_node=8
setenv OMP_NUM_THREADS 1
setenv USE_REMOTE 0

echo "Got $NSLOTS slots." > job.out
echo "Got $NSLOTS slots." > job.err

pwd

set proclist=`cat $TMPDIR/machines`
set nproc=$NSLOTS
echo $nproc nodes for this job: $proclist
if( -e .proclist_tmp)  rm .proclist_tmp
if ($jobs_per_node != 8 ) then
set j=1
while ($j <= $nproc )
@ j1 = $j + $jobs_per_node
@ j1 = $j1 - 1
echo $proclist[$j-$j1] >>.proclist_tmp
@ j = $j + 8
end
set proclist=`cat .proclist_tmp`
rm .proclist_tmp
set nproc=$#proclist
endif
echo $nproc nodes for this job: $proclist

echo '#' > .machines

# example for an MPI parallel lapw0
echo -n 'lapw0:' >> .machines
echo $proclist >>.machines
#example for k-point and mpi parallel lapw1/2
#set j=1
#while ($j <= $jobs_per_node )
set i=1
while ($i <= $nproc )
echo -n '1:' >>.machines
@ i1 = $i + $mpijob
@ i2 = $i1 - 1
echo $proclist[$i-$i2] >>.machines
set i=$i1
end
echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines

date

run_lapw -p  -ec 0.0001 -NI  >& STDOUT
Any comment is welcome! Thanks in advance!

Have a nice weekend!
Jia Yalei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130803/4ac64767/attachment.htm>


More information about the Wien mailing list