[Wien] There are no allocated resources for the application
贾亚磊
jia_yalei at 163.com
Fri Aug 2 22:49:03 CEST 2013
Dear all,
I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi mpif90, and intel mkl with the following parameter:
K1 Linux (Intel ifort 11.1 compiler + mkl )
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
L Linker Flags: $(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t -pthread
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide
RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_openmpi_lp64 -L/home/yljia/compiler_library/fftw-2.1.5/lib/ -lfftw_mpi -lfftw $(R_LIBS)
FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
MP MPIRUN commando : mpirun -np _NP_ --hostfile _HOSTS_ _EXEC_
The program can run in non parallel mode, k point paralle. But in mpi parallel mode , it has error messages in the following two files:
1. STDOUT:
LAPW0 END
LAPW0 END
.........
LAPW0 END
LAPW1 END
LAPW1 END
LAPW1 END
LAPW1 END
--------------------------------------------------------------------------
There are no allocated resources for the application
/home/yljia/software/wien2k_11/lapw1_mpi
that match the requested mapping:
.machine5
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
LAPW1 END
LAPW1 END
--------------------------------------------------------------------------
There are no allocated resources for the application
/home/yljia/software/wien2k_11/lapw1_mpi
that match the requested mapping:
.machine6
...........
...........
.machine8
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
FERMI - Error
cp: cannot stat `.in.tmp': No such file or directory
rm: cannot remove `.in.tmp': No such file or directory
rm: cannot remove `.in.tmp1': No such file or directory
> stop error
2. TiC.dayfile:
Calculating TiC in /home/yljia/wien2k/TiC/testqsub/TiC
on compute-0-12.local with PID 16027
using WIEN2k_11.1 (Release 14/6/2011) in /home/yljia/software/wien2k_11
start (Sat Aug 3 00:42:07 CST 2013) with lapw0 (40/99 to go)
cycle 1 (Sat Aug 3 00:42:07 CST 2013) (40/99 to go)
> lapw0 -p (00:42:07) starting parallel lapw0 at Sat Aug 3 00:42:07 CST 2013
-------- .machine0 : 16 processors
5.812u 22.540s 0:04.23 670.2% 0+0k 0+0io 205pf+0w
> lapw1 -p (00:42:11) starting parallel lapw1 at Sat Aug 3 00:42:12 CST 2013
-> starting parallel LAPW1 jobs at Sat Aug 3 00:42:12 CST 2013
running LAPW1 in parallel mode (using .machines)
8 number_of_parallel_jobs
compute-0-12 compute-0-12(32) 3.181u 0.181s 0:02.77 121.2% 0+0k 0+0io 33pf+0w
compute-0-12 compute-0-12(32) 2.781u 0.117s 0:02.58 112.0% 0+0k 0+0io 18pf+0w
compute-0-12 compute-0-12(32) 2.343u 0.089s 0:02.28 106.1% 0+0k 0+0io 17pf+0w
compute-0-12 compute-0-12(32) 2.818u 0.126s 0:02.52 116.2% 0+0k 0+0io 17pf+0w
compute-0-2 compute-0-2(32) 0.010u 0.012s 0:00.03 66.6% 0+0k 0+0io 0pf+0w
compute-0-2 compute-0-2(32) 0.009u 0.014s 0:00.03 33.3% 0+0k 0+0io 0pf+0w
compute-0-2 compute-0-2(32) 0.010u 0.020s 0:00.04 75.0% 0+0k 0+0io 0pf+0w
compute-0-2 compute-0-2(32) 0.012u 0.020s 0:00.04 75.0% 0+0k 0+0io 0pf+0w
Summary of lapw1para:
compute-0-12 k=0 user=128 wallclock=30.78
11.349u 1.617s 0:10.77 120.2% 0+0k 0+0io 85pf+0w
> lapw2 -p (00:42:22) running LAPW2 in parallel mode
** LAPW2 crashed!
0.076u 0.108s 0:00.20 85.0% 0+0k 0+0io 9pf+0w
error: command /home/yljia/software/wien2k_11/lapw2para lapw2.def failed
> stop error
The following is the shell script I submit. I have 2 nodes, and each has 8 cores[except the host node]:
#!/bin/tcsh
#$ -S /bin/tcsh
#$ -N W2web_Job
# MPIR_HOME from submitting environment
#$ -v MPIR_HOME
# needs in
# $NSLOTS
# the number of tasks to be used
# $TMPDIR/machines
# a valid machine file to be passed to mpirun
#$ -cwd
#$ -o job.out
#$ -e job.err
#$ -q parallel.q
#$ -pe mpich 8
# mpich / jobs_per_node = number of nodes
set mpijob=1
set jobs_per_node=8
setenv OMP_NUM_THREADS 1
setenv USE_REMOTE 0
echo "Got $NSLOTS slots." > job.out
echo "Got $NSLOTS slots." > job.err
pwd
set proclist=`cat $TMPDIR/machines`
set nproc=$NSLOTS
echo $nproc nodes for this job: $proclist
if( -e .proclist_tmp) rm .proclist_tmp
if ($jobs_per_node != 8 ) then
set j=1
while ($j <= $nproc )
@ j1 = $j + $jobs_per_node
@ j1 = $j1 - 1
echo $proclist[$j-$j1] >>.proclist_tmp
@ j = $j + 8
end
set proclist=`cat .proclist_tmp`
rm .proclist_tmp
set nproc=$#proclist
endif
echo $nproc nodes for this job: $proclist
echo '#' > .machines
# example for an MPI parallel lapw0
echo -n 'lapw0:' >> .machines
echo $proclist >>.machines
#example for k-point and mpi parallel lapw1/2
#set j=1
#while ($j <= $jobs_per_node )
set i=1
while ($i <= $nproc )
echo -n '1:' >>.machines
@ i1 = $i + $mpijob
@ i2 = $i1 - 1
echo $proclist[$i-$i2] >>.machines
set i=$i1
end
echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines
date
run_lapw -p -ec 0.0001 -NI >& STDOUT
Any comment is welcome! Thanks in advance!
Have a nice weekend!
Jia Yalei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130803/4ac64767/attachment.htm>
More information about the Wien
mailing list