[Wien] Parallel run problems with version 19.1
tran at theochem.tuwien.ac.at
tran at theochem.tuwien.ac.at
Mon Jul 22 17:09:44 CEST 2019
Hi,
What you should never do is to mix spin-polarized and
non-spin-polarized is the same directory.
Since Your explanations about spin-polarized/non-spin-polarized are a
bit confusing, the question is:
Does the calculation run properly (in parallel and serial) if everything
(init_lapw and run_lapw) in a directory is done from the beginning in
non-spin-polarized? Same question with spin-polarized.
F. Tran
On Monday 2019-07-22 16:37, Ricardo Moreira wrote:
>Date: Mon, 22 Jul 2019 16:37:30
>From: Ricardo Moreira <ricardopachecomoreira at gmail.com>
>Reply-To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
>To: wien at zeus.theochem.tuwien.ac.at
>Subject: [Wien] Parallel run problems with version 19.1
>
>Dear Wien2k users,
>I am running Wien2k on a computer cluster, compiled with the GNU compilers version 7.2.3, OpenMPI with the operating system Scientific Linux release
>7.4. Since changing from version 18.2 to 19.1 I've been unable to run Wien2k in parallel (neither mpi or simple k-parallel seem to work), with
>calculations aborting with the following message:
>
> start (Mon Jul 22 14:49:31 WEST 2019) with lapw0 (40/99 to go)
>
> cycle 1 (Mon Jul 22 14:49:31 WEST 2019) (40/99 to go)
>
>> lapw0 -p (14:49:31) starting parallel lapw0 at Mon Jul 22 14:49:31 WEST 2019
>-------- .machine0 : 8 processors
>0.058u 0.160s 0:03.50 6.0% 0+0k 48+344io 5pf+0w
>> lapw1 -up -p (14:49:35) starting parallel lapw1 at Mon Jul 22 14:49:35 WEST 2019
>-> starting parallel LAPW1 jobs at Mon Jul 22 14:49:35 WEST 2019
>running LAPW1 in parallel mode (using .machines)
>2 number_of_parallel_jobs
> ava01 ava01 ava01 ava01(8) ava21 ava21 ava21 ava21(8) Summary of lapw1para:
> ava01 k=8 user=0 wallclock=0
> ava21 k=16 user=0 wallclock=0
>** LAPW1 crashed!
>0.164u 0.306s 0:03.82 12.0% 0+0k 112+648io 1pf+0w
>error: command /homes/fc-up201202493/WIEN2k_19.1/lapw1para -up uplapw1.def failed
>
>> stop error
>
>Inspecting the error files I find that the error printed to uplapw1.error is:
>
>** Error in Parallel LAPW1
>** LAPW1 STOPPED at Mon Jul 22 14:49:39 WEST 2019
>** check ERROR FILES!
> 'INILPW' - can't open unit: 18
> 'INILPW' - filename: TiC.vspup
> 'INILPW' - status: old form: formatted
> 'LAPW1' - INILPW aborted unsuccessfully.
> 'INILPW' - can't open unit: 18
> 'INILPW' - filename: TiC.vspup
> 'INILPW' - status: old form: formatted
> 'LAPW1' - INILPW aborted unsuccessfully.
>
>As this error message on previous posts to the mailing lists is often pointed out as being due to running init_lapw for a non spin-polarized case
>and then using runsp_lapw I should clarify that this also occurs when attempting to run a non spin-polarized case and instead of TiC.vspup it
>changes to TiC.vsp in the error message.
>I should point out, for it may be related to this issue that serial runs have the problem that after I perform my first simulation on a folder if I
>first start with a spin-polarized case and then do another init_lapw for non spin-polarized and attempt to do run_lapw I get the errors as in before
>of "can't open unit: 18" (this also occurs if I first run a non spin-polarized simulation and then attempt to do a spin-polarized one on the same
>folder). The workaround I found for this was making a new folder, but since the error message is also related to TiC.vsp/vspup I thought I would
>point it out still.
>Lastly, I should mention that I deleted the line "15,'$file.tmp$updn', 'scratch','unformatted',0" from x_lapw as I previously had an error in
>lapw2 reported elsewhere on the mailing list, that Professor Blaha indicated was solved by deleting the aforementioned line (and indeed it was).
>Whether or not this could possibly be related to the issues I'm having now, I have no idea, so I felt it right to point out.
>Thanks in advance for any assistance that might be provided.
>
>Best Regards,
>Ricardo Moreira
>
>
More information about the Wien
mailing list