[Wien] lapw1 hangs over nfs
Peter Blaha
pblaha at theochem.tuwien.ac.at
Fri Dec 20 08:44:05 CET 2013
This can happen for slow networks or not well configured ones (not
enough NFS daemons, ...)
a) k-parallelization makes only sense up to a certain granularity. This
means, you cannot expect to make the parallelization "faster" at a
certain level of processors. For instance when you have 100 k-points and
parallelization with 20 cores takes 20 seconds for lapw1 (i.e. 5
k-points take just 20 sec/core); for sure in most setups parallelization
with 50 cores will be even slower or even fail (from time to time
because of network problems).
b) One can always reduce network load by defining a SCRATCH directory on
the local nodes. These directories must exist and in that case your
k-list and processor-list must be "compatible" (eg. 100 k and 20 cores,
but not 16)
On 12/19/2013 11:16 PM, Oliver Albertini wrote:
> Hello,
>
> I am running k-point parallel over nfs, and every few iterations, a
> k-point process will hang, leaving 'ghost processes' visible under the
> top command. These processes have 0% cpu utilization.
>
> Looking at the error files, the k-point in question will have this type
> of error:
>
> $ cat dnlapw1_22.error
> Error in LAPW1
> 'INILPW' - can't open unit: 11
> 'INILPW' - filename: AgMgOCo.energydn_22
> 'INILPW' - status: unknown form: formatted
> 'LAPW1' - INILPW aborted unsuccessfully.
> 'Unknow' - Unknown signal received
>
>
> However, case.energydn_22 is present, but empty.
>
> I suspect that this could be related to network speed. Has anyone had a
> similar experience?
>
> Sincerely,
>
> Oliver Albertini
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WWW:
http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
More information about the Wien
mailing list