[Wien] commlib error
Peter Blaha
pblaha at theochem.tuwien.ac.at
Thu Jul 9 09:42:48 CEST 2015
The "comlib" error is certainly a system error, where the communication
between the nodes is broken somehow.
From wien2k you got the error that in the sumpara step (after lapw2) it
could not find the file Pr-Af.scf2up_31
So the first question you have to pose yourself is: do I have this file
and is it ok ?
ls -alsrp *scf2up_*
You should find many of these files (as many as k-parallel jobs are
submitted) and ALL of them should have a reasonable length (at least
non-zero).
My suspicion is, that the network filesystem on your system is a bit
slow in updating the files on different nodes and therefore the errors
occur randomly after a few iterations.
You did not say how you parallelize nor what the cputime is, but a few tips:
- reduce the number of k-point parallel jobs (I hope you did NOT
distribute the 200 k-points onto 200 cores !). Depending on the matrix
size, you may try some (higher) mpi-parallelism.
- make sure you are using a local "SCRATCH" directory to reduce network
load (AND a compatible k-parallelism, i.e. (num-kpt / n-core) must be an
integer)
- increase the "sleep" times in $WIENROOT/lapw2para (and maybe
lapw1para) from the defaults to larger values like
setenv DELAY 0.5 # delay launching of processes by n seconds
setenv SLEEPY 4 # additional sleep before checking
On 07/09/2015 07:51 AM, Imran Khan wrote:
> Dear wien2k experts and users,
> I am using wien2k version 14.2 on a queuing system (SGE), with intel
> compiler 11.1, MPI libraries mpi/openmpi-1.6.3 and math libraries
> fftw-3.3.4. With these options I install Wien2K without any compile time
> error.
> The purpose of my calculation is to find the stable site for different
> substituents in NdFeB intermetallics.
> I am running the case.struct given in the attachment, using 200 (6 6 4)
> k-points. My RKmax value is 7 and Gmax is 12, and I am using LDA+U method.
> I am using the following command runsp_lapw -p -orb -i 80 -ec 0.0001
> -cc 0.001
> Every time I submit my job after few scf cycles the job is terminated
> with the following error in the error tag file.
>
> error: commlib error: got select error (Connection reset by peer)
> error: executing task of job 2424636 failed: failed sending task to
> execd at tachyon1478: can't find connection
> .
> .
> .
> LAPW2 END
> LAPW2 END
> LAPW2 END
> LAPW2 END
> real 0m53.638s
> forrtl: No such file or directory
> forrtl: severe (29): file not found, unit 21, file
> /home01/x1030imr/khan/Wien2K/Neomagnet/Pr-doped/f-site/AFM/Pr-Af/Pr-Af.scf2up_31
> Image PC Routine Line Source
> sumpara 00000000004A671D Unknown Unknown Unknown
> sumpara 00000000004A5225 Unknown Unknown Unknown
> sumpara 0000000000456259 Unknown Unknown Unknown
> sumpara 0000000000416A5A Unknown Unknown Unknown
> sumpara 0000000000416250 Unknown Unknown Unknown
> sumpara 0000000000421E3D Unknown Unknown Unknown
> sumpara 0000000000410771 scfsum_ 126 scfsum.f
> sumpara 000000000040EE82 MAIN__ 219
> sumpara.f
> sumpara 00000000004033DC Unknown Unknown Unknown
> libc.so.6 00000035AA81D974 Unknown Unknown Unknown
> sumpara 00000000004032E9 Unknown Unknown Unknown
> cp: cannot stat `.in.tmp': No such file or directory
>
> I have discussed this error with the engineers of that queuing system
> (tachyon), and I have searched the mailing list as well but could not
> find any solutions.
> your guidance to solve this issue will be greatly appreciated.
> Imran
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------
More information about the Wien
mailing list