[Wien] Bug with lapwso_mpi and ifort 18
MCDERMOTT Eamon 250772
Eamon.MCDERMOTT at cea.fr
Mon Oct 16 11:20:19 CEST 2017
It seems Intel has been trying to improve its compliance with more recent fortran specs, at the expense of breaking older file IO code that does "undefined" things (such as reading off the end of records, as happens in the lapwso get_nloat code).
For now, I have found two possible workarounds:
1) Using the attached get_nloat.f, which determines nloat from case.in1 in the same method as lapw2. I haven't tested it on very many cases but it solves my problem for my single threaded and MPI test jobs.
2) Compiling all WIEN2k binaries with the ifort flag "-fpscomp ioformat" (some Microsoft fortran studio compatibility flag). The resulting lapw1 produces vector files that are incompatible with binaries compiled without this flag, so this method is probably not of interest for anyone that has been doing calculations for a while. I also haven't tested all of WIEN2k using this flag, so I wouldn't be surprised if it creates more IO bugs elsewhere.
--
Eamon
-----Original Message-----
From: Wien [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha
Sent: Thursday, October 12, 2017 08:02
To: wien at zeus.theochem.tuwien.ac.at
Subject: Re: [Wien] Bug with lapwso_mpi and ifort 18
Try to put either a rewind statement or a close/open statement before
that. And of course use always -assume nobufferedio
Somehow Intel seems to be not able to fix its bugs with file handling which were introduced in 2016/17.
PS: I think the same file is already read previously in this code.
Am 11.10.2017 um 20:11 schrieb MCDERMOTT Eamon 250772:
> Neither of these suggestions help (I had tried error trapping as Prof. Marks had suggested already). More debugging just gets me more and more confused. I've inserted a few print statements, and now in a properly working binary built with ifort 15, I get something like (when running with lapwso_mpi lapwso_1.def):
>
> call to get_nloat( 3 192 1000 )
> i= 1 ii= 1 nloat= 0 mist= -106685467
> ios= 67 -> exit
> i= 2 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> i= 3 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> i= 4 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> i= 5 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> i= 6 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> i= 7 ii= 4 nloat= 4 mist= -106685467
> ios= 67 -> exit
> ...
>
> While with the broken version it's always terminating after the first set of reads:
>
> call to get_nloat( 3 192 1000 )
> i= 1 ii= 1 nloat= 0 mist= -106685467
> ios= -1 -> exit
> forrtl: severe (24): end-of-file during read, unit 9, file /path/to/case/./case.vector_1
> Image PC Routine Line Source
> lapwso_mpi 0000000000490758 Unknown Unknown Unknown
> lapwso_mpi 00000000004B4BF5 Unknown Unknown Unknown
> lapwso_mpi 000000000048AAA2 get_nloat_ 16 get_nloat.f
> lapwso_mpi 000000000044A127 MAIN__ 144 lapwso.F
> lapwso_mpi 0000000000406B9E Unknown Unknown Unknown
> libc-2.12.so 0000003352C1ED5D __libc_start_main Unknown Unknown
> lapwso_mpi 0000000000406AA9 Unknown Unknown Unknown
>
> Iostat = -1 indicates EOF, whereas 67 is "Input statement requires too much data". So somehow the second read in get_nloat.f:
>
> read(9,iostat=ios) elo(0:lomax,1:ii)
>
> is just running off the end of a 659MB vector file with ifort 18. Maybe they've changed the concept of how records are delimited, but this vector file is written with an ifort 18 compiled lapw1! This is very surprising for me, since ifort 18 is otherwise working very well for me with respect to optimization. Lapw1 works fine with -O3 -mavx, for example.
>
> I get the same error with an ifort 18 non-mpi lapwso:
>
> TiC x lapwso
> forrtl: severe (24): end-of-file during read, unit 9, file /path/to/TiC/./TiC.vector
> Image PC Routine Line Source
> lapwso 0000000000457328 Unknown Unknown Unknown
> lapwso 000000000047BCB5 Unknown Unknown Unknown
> lapwso 0000000000450926 get_nloat_ 16 get_nloat.f
> lapwso 000000000042779C MAIN__ 144 lapwso.F
> lapwso 000000000040595E Unknown Unknown Unknown
> libc-2.12.so 0000003352C1ED5D __libc_start_main Unknown Unknown
> lapwso 0000000000405869 Unknown Unknown Unknown
>
> So it's something special about the way the vector is being read in get_nloat.f...
>
> --
> Eamon
>
> -----Original Message-----
> From: Wien [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf
> Of Laurence Marks
> Sent: Wednesday, October 11, 2017 17:12
> To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
> Subject: Re: [Wien] Bug with lapwso_mpi and ifort 18
>
> Also, maybe try
>
> read(9,err=999)mist
> 999 continue
>
> On Wed, Oct 11, 2017 at 9:59 AM, Peter Blaha <pblaha at theochem.tuwien.ac.at> wrote:
>> Looks very strange.
>>
>> It happens only in mpi mode ?
>>
>> Just one try:
>>
>> replace
>>
>> read(9)
>> read(9) mist
>>
>> In fact, all what that statement should do is skipping one line
>> (record) in this unformatted vector file (which contains the same
>> numbers (E-parameters for LAPW) as the first line in the energy files).
>>
>> You may also print*, i,mist
>>
>> to see if it happens already for the first atom.
>>
>> PS: Naively, I would have expected that it has to do with this
>> -assume bufferedio which is (sometimes) broken in ifort17. And
>> clearly, this file has been read before, then rewinded and then is read again.
>>
>> On 10/11/2017 04:38 PM, MCDERMOTT Eamon 250772 wrote:
>>> Dear all,
>>>
>>>
>>>
>>> I have noticed a bug with the combination of lapwso_mpi (WIEN2k
>>> 17.1) and ifort 18.0.0 (20170811).
>>>
>>>
>>>
>>> On a well-formed case that works properly when lapwso_mpi is
>>> compiled with ifort 15.0.6, I get crashes on each process shortly
>>> after startup like the following:
>>>
>>>
>>>
>>> forrtl: severe (24): end-of-file during read, unit 9, file
>>> /path/case/./case.vector_1
>>>
>>> Image PC Routine Line Source
>>>
>>> lapwso_mpi 00000000004916D8 Unknown Unknown Unknown
>>>
>>> lapwso_mpi 00000000004B6065 Unknown Unknown Unknown
>>>
>>> lapwso_mpi 000000000048ABB6 get_nloat_ 16
>>> get_nloat.f
>>>
>>> lapwso_mpi 000000000044A361 MAIN__ 144 lapwso.F
>>>
>>> lapwso_mpi 0000000000406C5E Unknown Unknown Unknown
>>>
>>> libc-2.12.so 0000003FA401ED5D __libc_start_main Unknown Unknown
>>>
>>> lapwso_mpi 0000000000406B69 Unknown Unknown Unknown
>>>
>>>
>>>
>>> in get_nloat.f this line is simply “read(9)”, so I’m guessing there
>>> has been some change in raw file access in this ifort version.
>>>
>>>
>>>
>>> This crash does not seem to be dependent on optimizations, as I can
>>> reproduce it with
>>>
>>> FPOPT=-O0 -g -FR -traceback -I$(MKLROOT)/include
>>>
>>>
>>>
>>> Adding –assume bufferedio (or –assume nobufferedio) does not make a
>>> difference.
>>>
>>>
>>>
>>> Trapping the IO error on this line and exiting simply causes a later
>>> segfault, so there is something more complicated happening here than
>>> just reading off the end of the file at the end of the routine:
>>>
>>>
>>>
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>>
>>> Image PC Routine Line Source
>>>
>>> lapwso_mpi 000000000047358D Unknown Unknown Unknown
>>>
>>> libpthread-2.12.s 0000003BE880F7E0 Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF045B23D Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF045B040 Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF045AF6E Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF045C039 Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF045D7CB Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF0454F6E Unknown Unknown Unknown
>>>
>>> libiomp5.so 00002AAEF0455B6C Unknown Unknown Unknown
>>>
>>> lapwso_mpi 000000000049F07B Unknown Unknown Unknown
>>>
>>> lapwso_mpi 000000000040C4CC rotmat_mp_init_ro 229
>>> modules.F
>>>
>>> lapwso_mpi 000000000043027E MAIN__ 146 lapwso.F
>>>
>>> lapwso_mpi 0000000000406D5E Unknown Unknown Unknown
>>>
>>> libc-2.12.so 0000003BE7C1ED5D __libc_start_main Unknown Unknown
>>>
>>> lapwso_mpi 0000000000406C69 Unknown Unknown Unknown
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Any ideas? I may be forced to upgrade soon (some Intel cluster
>>> license SNAFU)…
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Eamon McDermott
>>>
>>> CEA Grenoble
>>>
>>> DRT/LETI/DTSI/SCMC
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tu
>>> w
>>> ien.ac.at_mailman_listinfo_wien&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE
>>> 7
>>> rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=ED
>>> g
>>> 4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=8kfWzu6CNQoi1dRXdiAMIqzLh
>>> K kUNdkGS4Zz4irSzJI&e= SEARCH the MAILING-LIST at:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchi
>>> v
>>> e.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIF-g&c=yHlS04
>>> H
>>> hBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818j
>>> n
>>> vqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=68wHJW
>>> S DHYVvqyiqT35vq86sV15y2J5YGqFiu2iMHPw&e=
>>>
>>
>> --
>>
>> P.Blaha
>> ---------------------------------------------------------------------
>> -
>> ---- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060
>> Vienna
>> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
>> Email: blaha at theochem.tuwien.ac.at WIEN2k: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wien2k.at&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=ric8wkLdJdc0FN9zt4YPHp6i1WfhBUF1QOIXAIeeMQ0&e=
>> WWW: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.imc.tuwien.ac.at_TC-5FBlaha&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=MycRai8BRbleMX1fjKUKQmPeptkO9Qb5u8p4bNmu5fw&e=
>> ---------------------------------------------------------------------
>> -
>> ---- _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuw
>> i
>> en.ac.at_mailman_listinfo_wien&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7r
>> t
>> NXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I
>> 6
>> trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=8kfWzu6CNQoi1dRXdiAMIqzLhKkUN
>> d kGS4Zz4irSzJI&e= SEARCH the MAILING-LIST at:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchiv
>> e
>> .com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIF-g&c=yHlS04Hh
>> B
>> raes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvq
>> K
>> FdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=68wHJWSDHY
>> V
>> vqyiqT35vq86sV15y2J5YGqFiu2iMHPw&e=
>
>
>
> --
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu ; Corrosion in 4D:
> MURI4D.numis.northwestern.edu Partner of the CFW 100% program for
> gender equity, www.cfw.org/100-percent Co-Editor, Acta Cryst A
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW:
http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get_nloat.f
Type: application/octet-stream
Size: 1774 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20171016/69fdcf49/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5390 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20171016/69fdcf49/attachment.p7s>
More information about the Wien
mailing list