[Wien] lapwso_mpi error
Gavin Abo
gsabo at crimson.ua.edu
Sun Nov 13 01:57:51 CET 2016
If you use the terminal command: echo $SCRATCH
Does it return:
./
Looks like there might still be a problem with how SCRATCH is defined or
how "./" is resolved by your system.
In the error message, you can see:
/lunarc/nobackup/users/eishfh/WIEN2k/GaAs_ZB/David_project/3Mn001/ALL/test-so/*3Mn/./3Mn.vectordn_1*
The "./" may be the cause of the problem, because I would expect the
path to be:
/lunarc/nobackup/users/eishfh/WIEN2k/GaAs_ZB/David_project/3Mn001/ALL/test-so/*3Mn/3Mn.vectordn_1
*On 11/12/2016 5:33 PM, Md. Fhokrul Islam wrote:
>
> Hi Prof. Blaha,
>
>
> I wasn't aware of the bug but I will check the updates. I have
> repeated calculation
>
> with 16 cores (square processor grid) as you suggested but I still got
> the same error.
>
> As before, job crashes at lapwso. I don't see any missing file as you
> can see from the
>
> list of vector files.
>
>
> -rw-r--r--. 1 eishfh kalmar 12427583862 Nov 12 10:04 3Mn.vectordn_1
>
> -rw-r--r--. 1 eishfh kalmar 77760 Nov 12 10:26 3Mn.vectorsodn_1
>
> -rw-r--r--. 1 eishfh kalmar 77760 Nov 12 10:26 3Mn.vectorsoup_1
>
> -rw-r--r--. 1 eishfh kalmar 12428559726 Nov 12 04:17 3Mn.vectorup_1
>
>
> Here are the dayfile and output error files. These are the only error
> messages I got.
>
>
> case.dayfile:
>
>
> cycle 1 (Sat Nov 12 01:21:39 CET 2016) (100/99 to go)
>
>
> > lapw0 -p (01:21:39) starting parallel lapw0 at Sat Nov 12
> 01:21:39 CET 2016
>
> -------- .machine0 : 16 processors
>
> 14031.329u 15.362s 14:40.87 1594.6% 0+0k 90152+1974560io 175pf+0w
>
> > lapw1 -up -p -c (01:36:20) starting parallel lapw1 at Sat Nov
> 12 01:36:20 CET 2016
>
> -> starting parallel LAPW1 jobs at Sat Nov 12 01:36:20 CET 2016
>
> running LAPW1 in parallel mode (using .machines)
>
> 1 number_of_parallel_jobs
>
> au188 au188 au188 au188 au188 au188 au188 au188 au188 au188 au188
> au188 au188 au188 au188 au188(1) 121331.481u 33186.223s 2:41:04.62
> 1598.7% 0+0k 0+29485672io 118pf+0w
>
> Summary of lapw1para:
>
> au188 k=0 user=0 wallclock=0
>
> 121367.583u 33215.702s 2:41:06.83 1599.1% 0+0k 288+29487024io
> 121pf+0w
>
> > lapw1 -dn -p -c (04:17:27) starting parallel lapw1 at Sat Nov
> 12 04:17:27 CET 2016
>
> -> starting parallel LAPW1 jobs at Sat Nov 12 04:17:27 CET 2016
>
> running LAPW1 in parallel mode (using .machines.help)
>
> 1 number_of_parallel_jobs
>
> au188 au188 au188 au188 au188 au188 au188 au188 au188 au188 au188
> au188 au188 au188 au188 au188(1) 233187.228u 100041.449s 5:47:30.00
> 1598.2% 0+0k 5832+35169304io 116pf+0w
>
> Summary of lapw1para:
>
> au188 k=0 user=0 wallclock=0
>
> 233263.580u 100102.639s 5:47:31.69 1598.7% 0+0k 6296+35170640io
> 118pf+0w
>
> > lapwso -up -p -c (10:04:59) running LAPWSO in parallel mode
>
> ** LAPWSO crashed!
>
> 1233.319u 23.612s 21:29.72 97.4% 0+0k 13064+7712io 17pf+0w
>
> error: command
> /lunarc/nobackup/users/eishfh/SRC/Wien2k14.2-iomkl/lapwsopara -up -c
> lapwso.def failed
>
>
> > stop error
>
> -----------------------
>
> lapwso.error file:
>
> ** Error in Parallel LAPWSO
>
> ** Error in Parallel LAPWSO
>
>
> -----------------------
>
> output error file:
>
> LAPW0 END
>
> LAPW1 END
>
> LAPW1 END
>
> forrtl: severe (39): error during read, unit 9, file
> /lunarc/nobackup/users/eishfh/WIEN2k/GaAs_ZB/David_project/3Mn001/ALL/test-so/3Mn/./3Mn.vectordn_1
>
> Image PC Routine Line Source
>
> lapwso_mpi 00000000004634E3 Unknown Unknown Unknown
>
> lapwso_mpi 000000000047F3C4 Unknown Unknown Unknown
>
> lapwso_mpi 000000000042BA1F kptin_ 56 kptin.F
>
> lapwso_mpi 0000000000431566 MAIN__ 523
> lapwso.F
>
> lapwso_mpi 000000000040B3EE Unknown Unknown Unknown
>
> libc.so.6 00002BA34EDECB15 Unknown Unknown Unknown
>
> lapwso_mpi 000000000040B2E9 Unknown Unknown Unknown
>
>
> -----------------------
>
>
> Thanks,
> Fhokrul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20161112/fb3a94fe/attachment.html>
More information about the Wien
mailing list