[Wien] Parallel run problems with version 19.1

Ricardo Moreira ricardopachecomoreira at gmail.com
Mon Jul 22 17:24:42 CEST 2019


Hi and thanks for the reply,

Regarding serial calculations, yes in both non spin-polarized and
spin-polarized everything runs properly in the cases you described. As for
parallel, it fails in both cases, with the error I indicated in my previous
email.

Best Regards,
Ricardo Moreira

On Mon, 22 Jul 2019 at 16:09, <tran at theochem.tuwien.ac.at> wrote:

> Hi,
>
> What you should never do is to mix spin-polarized and
> non-spin-polarized is the same directory.
>
> Since Your explanations about spin-polarized/non-spin-polarized are a
> bit confusing, the question is:
>
> Does the calculation run properly (in parallel and serial) if everything
> (init_lapw and run_lapw) in a directory is done from the beginning in
> non-spin-polarized? Same question with spin-polarized.
>
> F. Tran
>
> On Monday 2019-07-22 16:37, Ricardo Moreira wrote:
>
> >Date: Mon, 22 Jul 2019 16:37:30
> >From: Ricardo Moreira <ricardopachecomoreira at gmail.com>
> >Reply-To: A Mailing list for WIEN2k users <
> wien at zeus.theochem.tuwien.ac.at>
> >To: wien at zeus.theochem.tuwien.ac.at
> >Subject: [Wien] Parallel run problems with version 19.1
> >
> >Dear Wien2k users,
> >I am running Wien2k on a computer cluster, compiled with the GNU
> compilers version 7.2.3, OpenMPI with the operating system Scientific Linux
> release
> >7.4. Since changing from version 18.2 to 19.1 I've been unable to run
> Wien2k in parallel (neither mpi or simple k-parallel seem to work), with
> >calculations aborting with the following message:
> >
> >    start       (Mon Jul 22 14:49:31 WEST 2019) with lapw0 (40/99 to go)
> >
> >    cycle 1     (Mon Jul 22 14:49:31 WEST 2019)         (40/99 to go)
> >
> >>   lapw0   -p  (14:49:31) starting parallel lapw0 at Mon Jul 22 14:49:31
> WEST 2019
> >-------- .machine0 : 8 processors
> >0.058u 0.160s 0:03.50 6.0%      0+0k 48+344io 5pf+0w
> >>   lapw1  -up -p       (14:49:35) starting parallel lapw1 at Mon Jul 22
> 14:49:35 WEST 2019
> >->  starting parallel LAPW1 jobs at Mon Jul 22 14:49:35 WEST 2019
> >running LAPW1 in parallel mode (using .machines)
> >2 number_of_parallel_jobs
> >     ava01 ava01 ava01 ava01(8)      ava21 ava21 ava21 ava21(8)
>  Summary of lapw1para:
> >   ava01         k=8     user=0  wallclock=0
> >   ava21         k=16    user=0  wallclock=0
> >**  LAPW1 crashed!
> >0.164u 0.306s 0:03.82 12.0%     0+0k 112+648io 1pf+0w
> >error: command   /homes/fc-up201202493/WIEN2k_19.1/lapw1para -up
> uplapw1.def   failed
> >
> >>   stop error
> >
> >Inspecting the error files I find that the error printed to uplapw1.error
> is:
> >
> >**  Error in Parallel LAPW1
> >**  LAPW1 STOPPED at Mon Jul 22 14:49:39 WEST 2019
> >**  check ERROR FILES!
> > 'INILPW' - can't open unit:  18
>
> > 'INILPW' -        filename: TiC.vspup
>
> > 'INILPW' -          status: old          form: formatted
>
> > 'LAPW1' - INILPW aborted unsuccessfully.
> > 'INILPW' - can't open unit:  18
>
> > 'INILPW' -        filename: TiC.vspup
>
> > 'INILPW' -          status: old          form: formatted
>
> > 'LAPW1' - INILPW aborted unsuccessfully.
> >
> >As this error message on previous posts to the mailing lists is often
> pointed out as being due to running init_lapw for a non spin-polarized case
> >and then using runsp_lapw I should clarify that this also occurs when
> attempting to run a non spin-polarized case and instead of TiC.vspup it
> >changes to TiC.vsp in the error message.
> >I should point out, for it may be related to this issue that serial runs
> have the problem that after I perform my first simulation on a folder if I
> >first start with a spin-polarized case and then do another init_lapw for
> non spin-polarized and attempt to do run_lapw I get the errors as in before
> >of "can't open unit: 18" (this also occurs if I first run a non
> spin-polarized simulation and then attempt to do a spin-polarized one on
> the same
> >folder). The workaround I found for this was making a new folder, but
> since the error message is also related to TiC.vsp/vspup I thought I would
> >point it out still.
> >Lastly, I should mention that I deleted the line "15,'$file.tmp$updn',
>     'scratch','unformatted',0" from x_lapw as I previously had an error in
> >lapw2 reported elsewhere on the mailing list, that Professor Blaha
> indicated was solved by deleting the aforementioned line (and indeed it
> was).
> >Whether or not this could possibly be related to the issues I'm having
> now, I have no idea, so I felt it right to point out.
> >Thanks in advance for any assistance that might be provided.
> >
> >Best Regards,
> >Ricardo Moreira
> >
> >_______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190722/583b6e9c/attachment-0001.html>


More information about the Wien mailing list