[Wien] $DEC! NOOPTIMIZE equivalents in gfortran?

Pavel Ondračka pavel.ondracka at email.cz
Mon Aug 13 17:01:44 CEST 2018


Dear prof. Marks,sorry for getting to this so late. Unfortunately I'm
still quite unsure how to use you testcase. It sort of works, although
takes a lot of iterations to converge. So I've rather tried to look at
the generated code.
I've looked at one specific place with one randomly chosen kahan loop
at the charge.f:142-148The ifort (IFORT) 17.0.1 20161005 with the
default Wien2k flags produces code like this:I've added some comments
so hopefully this is understandable-------snip----
--        .loc    1  144  is_stmt 1        movsd     -8(%rdx,%rcx,8),
%xmm2     #copy the F(J) value to %xmm2
register..LN397:        .loc    1  145  is_stmt
1        movaps    %xmm1, %xmm3                 #copy the sum value to
%xmm3 register ..LN398:        .loc    1  148  is_stmt
1        incq      %rcx                                         # this
is likely the loop counter scheduled here for better
latency..LN399:        .loc    1  144  is_stmt
1        subsd     %xmm4, %xmm2                                  #%xmm2
now contains F(J) - C, i.e. Y..LN400:        .loc    1  145  is_stmt
1        addsd     %xmm2, %xmm3                                  #%xmm3
now contains sum + Y, i.e. T, so far so
good..LN401:        .loc    1  146  is_stmt 1        movaps    %xmm3,
%xmm4                                  #copy %xmm3 (T) to %xmm4 (the
original C)..LN402:        subsd     %xmm2,
%xmm4                                  # %xmm4 now contains T-
Y!..LN403:        subsd     %xmm1,
%xmm4                                  #only now is the sum substracted
from %xmm4!------snip---------
The last two instructions are IMO (but I'm no assembly expert)
problematic. The parentheses are not honored and the line charge.f146
seems to be evaluated as C=(T-Y)-sum.

This is how the assembly looks with the gfortran, and default flags:---
--snip-----
       .loc 1 144 0
        movsd   24(%rax), %xmm2 # copy F(J) to %xmm2
       .loc 1 145 0
        movapd  %xmm3, %xmm4    # copy sum from %xmm3 to %xmm4
        addq    $8, %rax        #this is likely the loop counter
scheduled here for better latency
        .loc 1 144 0
        subsd   %xmm1, %xmm2    # F(J) - C, i.e. Y, store in %xmm3
.LVL8:
        .loc 1 145 0
        addsd   %xmm2, %xmm4    # SUM + Y, i.e. T, store in %xmm4
.LVL9:
        .loc 1 146 0
        movapd  %xmm4, %xmm1    # move T to %xmm1
.LVL10:
        subsd   %xmm3, %xmm1    # %xmm1 is now T - SUM
        .loc 1 145 0
        movapd  %xmm4, %xmm3    # SUM=T
        .loc 1 146 0
        subsd   %xmm2, %xmm1    # %xmm1 (which ATM contains T-SUM) - Y
-----snip-----
In other words the gfortran does it properly, and is unaffected this
time. ;-) I can force similar behavior if I add the -ffast-math switch
which force usafe math optimizations (but otherwise even the -O3
-march=native level is unaffected).
I'm unsure if this is ifort bug or if the default Wien2k flags are just
too aggressive? Unfortunately I'm no fortran specification expert. On
the other hand I was expecting the instructions to be optimized away.
They are not, but the order is wrong, so this might be just a bug?
BTW I've already seen some signs that the ifort is being ifort quite
aggressive before when I experimented with the libmvec to speed up the
Hamilt in lapw1. I need to use the -ffast-math switch to achieve
similar performance levels as ifort (and the gfortran with -ffast-math, 
although less precise, produces actually results which are more similar
to ifort with the default wien2k flags.)
Hope this helps a little, if you need more help let me know.Best
regardsPavel
Laurence Marks píše v Čt 09. 08. 2018 v 08:11 -0500:
> I changed the version to include a pre-converged MSR1
> A fresh pair of eyes would be useful. The minimum is somewhere in the
> range for atom 7 z 0.2610-0.2616, but it is very ill-conditioned and
> noisy. Why is unclear.
> 
> On Thu, Aug 9, 2018 at 7:11 AM, Laurence Marks <
> L-marks at northwestern.edu> wrote:
> > Test case is not a simple question, as the Kahan summations in
> > charge.f are used in lapw0/1/2 mixer and a few other places. I have
> > noticed minor changes in energies (1E-5 Ryd) with preventing ifort
> > from optimizing charge.f. However, this is very tricky as ifort
> > does 80 bit operations and only later truncates to 64, so it could
> > be that without optimization the results are less accurate. In an
> > ideal world the compiler will use 80 bit carries and not optimize
> > out the summation.
> > If useful, a nasty ill-conditioned case is at
> > 
https://drive.google.com/file/d/18xlI3-qf4RKOS8mWdR38bfo4a9Yq_t4u/view?usp=sharing
> > 
> > 
> > You will need to do lstart/dstart then a few (15-25) MSR1 before
> > switching to MSR1a. The convergence is somewhat random, some of
> > this may be in the mixer and some elsewhere. By ill-conditioned I
> > mean that if you first do 24 MSR1 versus 25, the number of
> > iterations to convergence, or even whether it converges will be
> > different. Sometimes two runs from the same starting points
> > converge (or not) in a different number of iterations.
> > 
> > Where the ill-conditioning comes from is unclear to me.
> > 
> > 
> > On Thu, Aug 9, 2018 at 3:21 AM, Pavel Ondračka <
> > pavel.ondracka at email.cz> wrote:
> > > I can look at the gfortran, what is your testcase?
> > > 
> > > 
> > > 
> > > I tried to take a quick look with the full mixer using one random
> > > TiO2
> > > 
> > > case. I put a breakpoint after some random Kahan sum
> > > (specifically this
> > > 
> > > was at charge.f:150 in Wien2k 18.2) and I looked for the
> > > differences
> > > 
> > > between O0 and O2. I was actually looking for small differences,
> > > but
> > > 
> > > the value of sum was 0 with -O2 vs 739.29 with -O0!
> > > 
> > > 
> > > 
> > > Hence in this case it looks like either the different
> > > optimization
> > > 
> > > levels influence the program flow, or the optimizations caused
> > > the
> > > 
> > > shift of the breakpoint to some other place.  
> > > 
> > > 
> > > 
> > > It might also be possible that this is a gdb problem since there
> > > is a
> > > 
> > > lot of 
> > > 
> > > ** On entry to DHSEQR parameter number  4 had an illegal value
> > > 
> > > ** On entry to DGEBAL parameter number  3 had an illegal value
> > > 
> > > ** On entry to DGEHRD  parameter number  2 had an illegal value
> > > 
> > > spam which I have no idea about and <error reading variable ....
> > > 
> > > (access outside bounds of object referenced via synthetic
> > > pointer)>
> > > 
> > > 
> > > 
> > > BTW valgrind is also not happy with the mixer (even at -O0 there
> > > are
> > > 
> > > lot of "Use of uninitialised value ... and On entry to DHSEQR
> > > parameter
> > > 
> > > number  4 had an illegal value )
> > > 
> > > 
> > > 
> > > If you can produce a simple testcase, I'd be happy to look into
> > > the
> > > 
> > > Kahan sum problem, but at the moment I can't reproduce with the
> > > full
> > > 
> > > mixer due to the aforementioned problems.
> > > 
> > > 
> > > 
> > > Best regards
> > > 
> > > Pavel
> > > 
> > > 
> > > 
> > > Laurence Marks píše v St 08. 08. 2018 v 11:44 -0500:
> > > 
> > > > I am testing adding the compiler directive !DEC$ NOOPTIMIZE to
> > > the
> > > 
> > > > Kahan summations in charge.f in order to prevent ifort from
> > > 
> > > > optimizing the summation away. It seems to help.
> > > 
> > > > 
> > > 
> > > > Does anyone know if there are equivalents in gfortran or other
> > > 
> > > > compilers? (I can't find anything for gfortran.)
> > > 
> > > > 
> > > 
> > > > N.B., if anyone has experience with directives and wants to
> > > suggest
> > > 
> > > > others that may be faster but will avoid optimizing away the
> > > 
> > > > summation I am open to suggestions.
> > > 
> > > > 
> > > 
> > > > _______________________________________________
> > > 
> > > > Wien mailing list
> > > 
> > > > Wien at zeus.theochem.tuwien.ac.at
> > > 
> > > > 
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=QyQYMuoW6bN9WUM8q9rgMGkZX1qo4mhebpNwy6CgZYg&s=AO9cbzTkaL7wjby4sVvXjic9gjJNqjr4Ok0j7lcCehA&e=
> > > 
> > > > SEARCH the MAILING-LIST at:  
> > > 
> > > > 
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=QyQYMuoW6bN9WUM8q9rgMGkZX1qo4mhebpNwy6CgZYg&s=JEkVkLlljxw4YibMTPxypqQqbu7L_RFseDpnQ2k4LC8&e=
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > Wien mailing list
> > > 
> > > Wien at zeus.theochem.tuwien.ac.at
> > > 
> > > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=QyQYMuoW6bN9WUM8q9rgMGkZX1qo4mhebpNwy6CgZYg&s=AO9cbzTkaL7wjby4sVvXjic9gjJNqjr4Ok0j7lcCehA&e=
> > > 
> > > SEARCH the MAILING-LIST at:  
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=QyQYMuoW6bN9WUM8q9rgMGkZX1qo4mhebpNwy6CgZYg&s=JEkVkLlljxw4YibMTPxypqQqbu7L_RFseDpnQ2k4LC8&e=
> > > 
> > 
> > 
> > 
> > -- 
> > Professor Laurence Marks
> > "Research is to see what everybody else has seen, and to think what
> > nobody else has thought", Albert Szent-Gyorgi
> > www.numis.northwestern.edu ; Corrosion in 4D:
> > MURI4D.numis.northwestern.eduPartner of the CFW 100% program for
> > gender equity, www.cfw.org/100-percent
> > Co-Editor, Acta Cryst A
> > 
> 
> 
> 
> _______________________________________________Wien mailing
> listWien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wienSEARCH the
> MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180813/1a56526c/attachment.html>


More information about the Wien mailing list