[Wien] Bug with lapwso_mpi and ifort 18

MCDERMOTT Eamon 250772 Eamon.MCDERMOTT at cea.fr
Wed Oct 11 20:11:39 CEST 2017


Neither of these suggestions help (I had tried error trapping as Prof. Marks had suggested already). More debugging just gets me more and more confused. I've inserted a few print statements, and now in a properly working binary built with ifort 15, I get something like (when running with lapwso_mpi lapwso_1.def):

call to get_nloat(           3         192        1000 )
 i=           1 ii=           1 nloat=           0 mist=  -106685467
 ios=          67 -> exit
 i=           2 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
 i=           3 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
 i=           4 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
 i=           5 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
 i=           6 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
 i=           7 ii=           4 nloat=           4 mist=  -106685467
 ios=          67 -> exit
...

While with the broken version it's always terminating after the first set of reads:

call to get_nloat(           3         192        1000 )
 i=           1 ii=           1 nloat=           0 mist=  -106685467
 ios=          -1 -> exit
forrtl: severe (24): end-of-file during read, unit 9, file /path/to/case/./case.vector_1
Image              PC                Routine            Line        Source
lapwso_mpi         0000000000490758  Unknown               Unknown  Unknown
lapwso_mpi         00000000004B4BF5  Unknown               Unknown  Unknown
lapwso_mpi         000000000048AAA2  get_nloat_                 16  get_nloat.f
lapwso_mpi         000000000044A127  MAIN__                    144  lapwso.F
lapwso_mpi         0000000000406B9E  Unknown               Unknown  Unknown
libc-2.12.so       0000003352C1ED5D  __libc_start_main     Unknown  Unknown
lapwso_mpi         0000000000406AA9  Unknown               Unknown  Unknown

Iostat = -1 indicates EOF, whereas 67 is "Input statement requires too much data". So somehow the second read in get_nloat.f:

read(9,iostat=ios) elo(0:lomax,1:ii) 

is just running off the end of a 659MB vector file with ifort 18. Maybe they've changed the concept of how records are delimited, but this vector file is written with an ifort 18 compiled lapw1! This is very surprising for me, since ifort 18 is otherwise working very well for me with respect to optimization. Lapw1 works fine with -O3 -mavx, for example.

I get the same error with an ifort 18 non-mpi lapwso:

TiC x lapwso
forrtl: severe (24): end-of-file during read, unit 9, file /path/to/TiC/./TiC.vector
Image              PC                Routine            Line        Source
lapwso             0000000000457328  Unknown               Unknown  Unknown
lapwso             000000000047BCB5  Unknown               Unknown  Unknown
lapwso             0000000000450926  get_nloat_                 16  get_nloat.f
lapwso             000000000042779C  MAIN__                    144  lapwso.F
lapwso             000000000040595E  Unknown               Unknown  Unknown
libc-2.12.so       0000003352C1ED5D  __libc_start_main     Unknown  Unknown
lapwso             0000000000405869  Unknown               Unknown  Unknown

So it's something special about the way the vector is being read in get_nloat.f...

--
Eamon

-----Original Message-----
From: Wien [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Laurence Marks
Sent: Wednesday, October 11, 2017 17:12
To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
Subject: Re: [Wien] Bug with lapwso_mpi and ifort 18

Also, maybe try

read(9,err=999)mist
999  continue

On Wed, Oct 11, 2017 at 9:59 AM, Peter Blaha <pblaha at theochem.tuwien.ac.at> wrote:
> Looks very strange.
>
> It happens only in mpi mode ?
>
> Just one try:
>
> replace
>
> read(9)
> read(9) mist
>
> In fact, all what that statement should do is skipping one line 
> (record) in this unformatted vector file (which contains the same 
> numbers (E-parameters for LAPW) as the first line in the energy files).
>
> You may also print*, i,mist
>
> to see if it happens already for the first atom.
>
> PS: Naively, I would have expected that it has to do with this -assume  
> bufferedio which is (sometimes) broken in ifort17. And clearly, this 
> file has been read before, then rewinded and then is read again.
>
> On 10/11/2017 04:38 PM, MCDERMOTT Eamon 250772 wrote:
>> Dear all,
>>
>>
>>
>> I have noticed a bug with the combination of lapwso_mpi (WIEN2k 17.1) 
>> and ifort 18.0.0 (20170811).
>>
>>
>>
>> On a well-formed case that works properly when lapwso_mpi is compiled 
>> with ifort  15.0.6, I get crashes on each process shortly after 
>> startup like the following:
>>
>>
>>
>> forrtl: severe (24): end-of-file during read, unit 9, file
>> /path/case/./case.vector_1
>>
>> Image              PC                Routine            Line        Source
>>
>> lapwso_mpi         00000000004916D8  Unknown               Unknown  Unknown
>>
>> lapwso_mpi         00000000004B6065  Unknown               Unknown  Unknown
>>
>> lapwso_mpi         000000000048ABB6  get_nloat_                 16
>> get_nloat.f
>>
>> lapwso_mpi         000000000044A361  MAIN__                    144  lapwso.F
>>
>> lapwso_mpi         0000000000406C5E  Unknown               Unknown  Unknown
>>
>> libc-2.12.so       0000003FA401ED5D  __libc_start_main     Unknown  Unknown
>>
>> lapwso_mpi         0000000000406B69  Unknown               Unknown  Unknown
>>
>>
>>
>> in get_nloat.f this line is simply “read(9)”, so I’m guessing there 
>> has been some change in raw file access in this ifort version.
>>
>>
>>
>> This crash does not seem to be dependent on optimizations, as I can 
>> reproduce it with
>>
>> FPOPT=-O0 -g -FR -traceback -I$(MKLROOT)/include
>>
>>
>>
>> Adding –assume bufferedio (or –assume nobufferedio) does not make a 
>> difference.
>>
>>
>>
>> Trapping the IO error on this line and exiting simply causes a later 
>> segfault, so there is something more complicated happening here than 
>> just reading off the end of the file at the end of the routine:
>>
>>
>>
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>
>> Image              PC                Routine            Line        Source
>>
>> lapwso_mpi         000000000047358D  Unknown               Unknown  Unknown
>>
>> libpthread-2.12.s  0000003BE880F7E0  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF045B23D  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF045B040  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF045AF6E  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF045C039  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF045D7CB  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF0454F6E  Unknown               Unknown  Unknown
>>
>> libiomp5.so        00002AAEF0455B6C  Unknown               Unknown  Unknown
>>
>> lapwso_mpi         000000000049F07B  Unknown               Unknown  Unknown
>>
>> lapwso_mpi         000000000040C4CC  rotmat_mp_init_ro         229
>> modules.F
>>
>> lapwso_mpi         000000000043027E  MAIN__                    146  lapwso.F
>>
>> lapwso_mpi         0000000000406D5E  Unknown               Unknown  Unknown
>>
>> libc-2.12.so       0000003BE7C1ED5D  __libc_start_main     Unknown  Unknown
>>
>> lapwso_mpi         0000000000406C69  Unknown               Unknown  Unknown
>>
>>
>>
>>
>>
>>
>>
>> Any ideas? I may be forced to upgrade soon (some Intel cluster 
>> license SNAFU)…
>>
>>
>>
>>
>>
>> --
>>
>> Eamon McDermott
>>
>> CEA Grenoble
>>
>> DRT/LETI/DTSI/SCMC
>>
>>
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuw
>> ien.ac.at_mailman_listinfo_wien&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7
>> rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg
>> 4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=8kfWzu6CNQoi1dRXdiAMIqzLhK
>> kUNdkGS4Zz4irSzJI&e= SEARCH the MAILING-LIST at:  
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchiv
>> e.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIF-g&c=yHlS04H
>> hBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jn
>> vqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=68wHJWS
>> DHYVvqyiqT35vq86sV15y2J5YGqFiu2iMHPw&e=
>>
>
> --
>
>                                        P.Blaha
> ----------------------------------------------------------------------
> ---- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 
> Vienna
> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
> Email: blaha at theochem.tuwien.ac.at    WIEN2k: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wien2k.at&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=ric8wkLdJdc0FN9zt4YPHp6i1WfhBUF1QOIXAIeeMQ0&e=
> WWW:   https://urldefense.proofpoint.com/v2/url?u=http-3A__www.imc.tuwien.ac.at_TC-5FBlaha&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=MycRai8BRbleMX1fjKUKQmPeptkO9Qb5u8p4bNmu5fw&e=
> ----------------------------------------------------------------------
> ---- _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwi
> en.ac.at_mailman_listinfo_wien&d=DwIF-g&c=yHlS04HhBraes5BQ9ueu5zKhE7rt
> NXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=EDg4I6
> trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=8kfWzu6CNQoi1dRXdiAMIqzLhKkUNd
> kGS4Zz4irSzJI&e= SEARCH the MAILING-LIST at:  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive
> .com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIF-g&c=yHlS04HhB
> raes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqK
> FdqWLwmqg0&m=EDg4I6trPWK2mIBefpClkOk5bKR-5w9NGNYvXcx69ao&s=68wHJWSDHYV
> vqyiqT35vq86sV15y2J5YGqFiu2iMHPw&e=



--
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Gyorgi www.numis.northwestern.edu ; Corrosion in 4D: MURI4D.numis.northwestern.edu Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent Co-Editor, Acta Cryst A _______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5390 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20171011/2f9d62d0/attachment.p7s>


More information about the Wien mailing list