[Wien] forrtl: severe (41): insufficient virtual memory (file attached!!)

Peter Blaha pblaha at theochem.tuwien.ac.at
Sun Apr 15 09:06:25 CEST 2012


The dayfile indicates that you are doing a non-mpi, but k-point parallel calculation using
8 k-parallel lapw1 jobs per node. (only lapw0 runs mpi-parallel)

However, the timing is strange:
tachyon1218(1) 527.132u 2.121s 25:49.23 34.1%
indicating that a job which should run 530 seconds (9 minutes) needs actually 3 times as long.
This usually means that i) your memory is insufficient, or ii) somebody else is using the same node too
or iii) it is not a real 8-core but eg. only a 4 core node.

In any case, the error is in lapwso (which is never mpi-parallel), and it seems rather clear, that
you do not have enough memory to run 8 parallel lapwso jobs on one node.

Modify your script such that you are using only 4 parallel jobs per node. That should be much
faster and the memory should probably be sufficient.


Am 15.04.2012 02:49, schrieb hyunjung kim:
> Dear all,
>
>
> (I'm sorry, I forgot to attach file which including error message and job script files)
>
> I constantly got following error messages when the parallel job was submitted.
>
> I attach it.
> Also the generated .machines file is attached, please check whether it is properly generated or not. I intended to do 24 k-point parallelized job.
>
> The compiler version is
> fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within ifort 11.1 version, so I guess that fortran version is not the origin of this problem..]
> openmpi : 1.4.5
> FFTW2 : 2.1.5
> CC : icc, 12.0 (2011.3.174)
> compiler option
> O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
> L Linker Flags: $(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
> P Preprocessor flags '-DParallel'
> R R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread
>
> RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS)
> FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
> MP MPIRUN commando : mpirun -mca btl self,openib -mca plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ -machinefile
> _HOSTS_ _EXEC_
>
>
> The error messages is:
> ~~~~~~~~~~ abbreviation ~~~~~
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> forrtl: severe (41): insufficient virtual memory
> Image PC Routine Line Source
> libintlc.so.5 00002B0540E88F7A Unknown Unknown Unknown
> libintlc.so.5 00002B0540E87AF5 Unknown Unknown Unknown
> libifcoremt.so.5 00002B0540058CF2 Unknown Unknown Unknown
> libifcoremt.so.5 00002B053FFCAAAB Unknown Unknown Unknown
> libifcoremt.so.5 00002B054001AFBA Unknown Unknown Unknown
> libifcoremt.so.5 00002B054001AE11 Unknown Unknown Unknown
> lapwso 00000000004281C0 MAIN__ 131 lapwso.f
> lapwso 0000000000402A9C Unknown Unknown Unknown
> libc.so.6 0000003CFA61D974 Unknown Unknown Unknown
> lapwso 00000000004029A9 Unknown Unknown Unknown
> forrtl: severe (41): insufficient virtual memory
> Image PC Routine Line Source
> libintlc.so.5 00002B5D32256F7A Unknown Unknown Unknown
> libintlc.so.5 00002B5D32255AF5 Unknown Unknown Unknown
> libifcoremt.so.5 00002B5D31426CF2 Unknown Unknown Unknown
> libifcoremt.so.5 00002B5D31398AAB Unknown Unknown Unknown
> libifcoremt.so.5 00002B5D313E8FBA Unknown Unknown Unknown
> libifcoremt.so.5 00002B5D313E8E11 Unknown Unknown Unknown
> lapwso 0000000000409A6A hmsout_mp_init_hm 78 modules.f
> lapwso 00000000004280E2 MAIN__ 130 lapwso.f
> lapwso 0000000000402A9C Unknown Unknown Unknown
> libc.so.6 0000003CFA61D974 Unknown Unknown Unknown
> ~~~~~~ abbreviation ~~~~~~
>
> I note that the compilation was done without any error messages.
>
> Any advice will be greatly appreciated!
>
> ------------------------------------------------------------------------
> Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
> Department of Physics|
> Hanyang University| e-mail: angpangmokjang at h <mailto:hyunjung at fhi-berlin.mpg.de>anmail.net <http://anmail.net>
> 17 Haengdang-Dong|
> 133-791 Seongdong-Ku,Seoul/Korea|
> ------------------------------------------------------------------------
> www: http://physics.hanyang.ac.kr/~sst/
> ------------------------------------------------------------------------
>
>
>
>
>
>
>
>
>
>
> =
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------


More information about the Wien mailing list