[Wien] commlib error
Gavin Abo
gsabo at crimson.ua.edu
Fri Jul 10 16:24:09 CEST 2015
An additional comment, in the post at:
https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032729.html
You can see that they have the error of the form:
error: commlib error: got select error (Connection reset by peer)
error: executing task of job x failed: failed sending task to
execd at hostname: can't find connection
It looks like they might have tracked down the problem to the master
daemon (qmaster), as seen in the post at:
https://arc.liv.ac.uk/pipermail/gridengine-users/2010-October/032758.html
So, maybe, the error could be caused by a daemon problem (with the
tachyon1478 node).
On 7/10/2015 5:01 AM, Laurence Marks wrote:
>
> From a brief Google search this is an mpi error.
>
> How did you compile, it is easy to use wrong blacs combinations.
>
> Have you run simpler cases such as TiC first?
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
>
> On Jul 10, 2015 03:05, "Imran Khan" <imrankhanswati80 at gmail.com
> <mailto:imrankhanswati80 at gmail.com>> wrote:
>
> Dear wien2k experts and users,
> I am using wien2k version 14.2 on a queuing system (SGE), with
> intel compiler 11.1, MPI libraries mpi/openmpi-1.6.3 and math
> libraries fftw-3.3.4. With these options I install Wien2K without
> any compile time error.
> The purpose of my calculation is to find the stable site for
> different substituents in NdFeB intermetallics.
> I am running the case.struct given in the attachment, using 200 (6
> 6 4) k-points. My RKmax value is 7 and Gmax is 12, and I am using
> LDA+U method.
> I am using the following command runsp_lapw -p -orb -i 80 -ec
> 0.0001 -cc 0.001
> Every time I submit my job after few scf cycles the job is
> terminated with the following error in the error tag file.
>
> error: commlib error: got select error (Connection reset by peer)
> error: executing task of job 2424636 failed: failed sending task
> to execd at tachyon1478: can't find connection
> .
> .
> .
> LAPW2 END
> LAPW2 END
> LAPW2 END
> LAPW2 END
> real 0m53.638s
> forrtl: No such file or directory
> forrtl: severe (29): file not found, unit 21, file
> /home01/x1030imr/khan/Wien2K/Neomagnet/Pr-doped/f-site/AFM/Pr-Af/Pr-Af.scf2up_31
> Image PC Routine Line Source
> sumpara 00000000004A671D Unknown Unknown Unknown
> sumpara 00000000004A5225 Unknown Unknown Unknown
> sumpara 0000000000456259 Unknown Unknown Unknown
> sumpara 0000000000416A5A Unknown Unknown Unknown
> sumpara 0000000000416250 Unknown Unknown Unknown
> sumpara 0000000000421E3D Unknown Unknown Unknown
> sumpara 0000000000410771 scfsum_ 126 scfsum.f
> sumpara 000000000040EE82 MAIN__ 219 sumpara.f
> sumpara 00000000004033DC Unknown Unknown Unknown
> libc.so.6 00000035AA81D974 Unknown Unknown Unknown
> sumpara 00000000004032E9 Unknown Unknown Unknown
> cp: cannot stat `.in.tmp': No such file or directory
>
> I have discussed this error with the engineers of that queuing
> system (tachyon), and I have searched the mailing list as well but
> could not find any solutions.
> your guidance to solve this issue will be greatly appreciated.
> Best regards
> Imran.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150710/491ac64b/attachment.html>
More information about the Wien
mailing list