[Wien] LAPW1 doesn't show error in parallel calculation

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Sep 9 16:10:31 CEST 2020


The lapw1.error file are written by lapw1 itself (at the very beginning 
and made to zero length at the end of lapw1.

Of course, if lapw1 cannot be started (maybe because the executable is 
missing,....), then there are no error files.

Thus, Laurence's suggestion is a good one and the para scripts can 
already write something into the error files, which should be later on 
overwritten when lapw1 starts.


On 9/9/20 2:57 PM, Laurence Marks wrote:
> Unfortunately the structure of *.error files which are zero length when 
> the task runs correctly can easily be broken if there is remote 
> execution/ssh/mpi which does not work. I think in the cases you sent 
> there is sufficient information to debug; I suspect an issue with 
> directory names and/or mount.
> 
> Suggestion to Peter: perhaps add a "echo Startup Error > 
> lapw1[0-2].error" in lapw1[0-2]para to catch this?
> 
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
> 
> On Wed, Sep 9, 2020, 06:48 Lyudmila Dobysheva <lyuka17 at mail.ru 
> <mailto:lyuka17 at mail.ru>> wrote:
> 
>     09.09.2020 00:01, Peter Blaha wrote:
>      > alias   testerror       'if (! -z \!:1.error) goto error'
>      > you can catch a problem.
> 
>      > Am 08.09.2020 um 20:38 schrieb Yundi Quan:
>      >> The simplest way that I can think of is to check whether the
>      >> lawp1.error file is empty or not after executing x lapw1.
> 
>      >> On Tue, Sep 8, 2020 at 2:23 PM Rubel, Oleg <rubelo at mcmaster.ca
>     <mailto:rubelo at mcmaster.ca>
>      >> <mailto:rubelo at mcmaster.ca <mailto:rubelo at mcmaster.ca>>> wrote:
>      >>     I wonder if there is a _simple_ alternative way for sensing an
>      >>     error? Also message is not always "XXXXX - Error". It can be
> 
>     Just now I try to make a calculation at supercomputer with a random
>     structure for testing, I passed already some problems, but sometimes I
>     still meet errors, and there is no nonzero files. I am attaching three
>     files:
>     1. slurm*out, where errors are shown, the first one before lapw0 didn't
>     affect, do not know why?, lapw0 was calculated, all output files are
>     good. lapw1 was not calculated.
> 
>     2. *.dayfile I can see that lapw1 was not calculated only by too small
>     times:
>     tesla46(6) 0.006u 0.010s 0.75 2.11%      0+0k 0+0io 0pf+0w
>     (the next lines are my additional output inserted into lapw1para:
>     1 t taskset0 exe def_loop.def time srun 0 lapw1 lapw1_1.def)
> 
>     3. ls-l.output shows that all the *.error files are zero, and the files
>     that should be done by lapw1, are absent.
> 
>     Doesn't matter why the task didn't calculated, but why the
>     lapw1*.error's are zero?
>     I sent for testing run -e lapw1, otherwise it would have come to lapw2
>     without stopping.
> 
>     Best regards
>     Lyudmila Dobysheva
>     ------------------
>     https://urldefense.com/v3/__http://ftiudm.ru/content/view/25/103/lang,english/__;!!Dq0X2DkFhyF93HkjWTBQKhk!Cc2li1FWPTknXFHo7SLSTcHwYxmAXYvt52a4_PqAO7th-nFUOo9Iemg70fG8N1JIo8uRXg$
> 
>     Physics-Techn.Institute,
>     Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
>     426000 Izhevsk Kirov str. 132
>     Russia
>     ---
>     Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
>     Skype: lyuka18 (office), lyuka17 (home)
>     E-mail: lyuka17 at mail.ru <mailto:lyuka17 at mail.ru> (office),
>     lyuka17 at gmail.com <mailto:lyuka17 at gmail.com> (home)
> 
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!Cc2li1FWPTknXFHo7SLSTcHwYxmAXYvt52a4_PqAO7th-nFUOo9Iemg70fG8N1L-bFCp3A$
> 
>     SEARCH the MAILING-LIST at:
>     https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!Cc2li1FWPTknXFHo7SLSTcHwYxmAXYvt52a4_PqAO7th-nFUOo9Iemg70fG8N1IXddgg7w$
> 
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list