[Wien] compile wien2k_11.1 in mkl8.1, fortcom error
Peter Blaha
pblaha at theochem.tuwien.ac.at
Fri Apr 22 16:16:58 CEST 2011
I include a couple of routines, where I noticed problem with very old
compilers. I concerns the use of functions like dsqrt or acos in
"parameter" statements as well as some subroutines which are not
included in old mkl/vml versions.
You can replace the original ones by them.
PS: The list of subroutines may not be complete.
> Dear all,
> Recently, I have compiled the latest version, wien2k_11.1(downloaded on April 21, 2011), and got the 'fortcom errors' below.
> The errors mainly come from lapw2, so I also present the content in SRC_lapw2/compile.msg file below.
>
> Our system is a two nodes cluster, every machine has two EM64T Intel(R) Xeon(TM) Processor 3.6GHz/1M CPUs, and the system is the Redhat enterprise 9.0 linux distribution (somewhat too old, maybe). I used intel mkl 8.1 and intel ifort 9.1.032 compiler. All earlier versions of wien2k (from wien2k_05 to wien2k_10) have been successfully compiled and worked well in the last six years.
> The compiler options used for the newest version are
>
> O Compiler options: -I/opt/intel/mkl/8.1/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
> L Linker Flags: $(FOPT) -L/opt/intel/mkl/8.1/lib/em64t -lpthread
> P Preprocessor flags '-DParallel'
> R R_LIB (LAPACK+BLAS): -lmkl_lapack64 -lmkl_em64t -lguide -lvml -pthread
>
> By the way, no other errors occured except the list below. Are they bugs or my improper chosen compiler options? Any comment or response is appreciated!
>
>
>
> Compile time errors (if any) were:
> SRC_lapw1/compile.msg:make[1]: *** [lapw1c] Error 1
> SRC_lapw1/compile.msg:make: *** [complex] Error 2
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 680: This intrinsic function is invalid in constant expressions. [DSQRT]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 999: This intrinsic function is invalid in constant expressions. [DSQRT]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:make[1]: *** [c3fft.o] Error 1
> SRC_lapw2/compile.msg:make: *** [real] Error 2
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 680: This intrinsic function is invalid in constant expressions. [DSQRT]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 999: This intrinsic function is invalid in constant expressions. [DSQRT]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [COS]
> SRC_lapw2/compile.msg:fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [SIN]
> SRC_lapw2/compile.msg:make[1]: *** [c3fft.o] Error 1
> SRC_lapw2/compile.msg:make: *** [complex] Error 2
> SRC_mixer/compile.msg:make: *** [mixer] Error 1
>
>
> The content in the SRC_lapw2/compile.msg :
>
> fortcom: Error: c3fft_tmp_.F, line 680: This intrinsic function is invalid in constant expressions. [DSQRT]
> parameter (TAUR=-0.5D0, TAUI=dsqrt(0.75D0))
> -----------------------------------^
> fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [COS]
> parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=sin(acos(-1.D0)*0.4D0), &
> -----------------------^
> fortcom: Error: c3fft_tmp_.F, line 779: This intrinsic function is invalid in constant expressions. [SIN]
> parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=sin(acos(-1.D0)*0.4D0), &
> ----------------------------------------------------^
> fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [COS]
> TR12=-cos(acos(-1.D0)*0.2D0), TI12=sin(acos(-1.D0)*0.2D0) )
> -----------------------^
> fortcom: Error: c3fft_tmp_.F, line 780: This intrinsic function is invalid in constant expressions. [SIN]
> TR12=-cos(acos(-1.D0)*0.2D0), TI12=sin(acos(-1.D0)*0.2D0) )
> ----------------------------------------------------^
> fortcom: Error: c3fft_tmp_.F, line 999: This intrinsic function is invalid in constant expressions. [DSQRT]
> parameter (TAUR=-0.5D0, TAUI=-dsqrt(0.75D0))
> ------------------------------------^
> fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [COS]
> parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=-sin(acos(-1.D0)*0.4D0), &
> -----------------------^
> fortcom: Error: c3fft_tmp_.F, line 1097: This intrinsic function is invalid in constant expressions. [SIN]
> parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=-sin(acos(-1.D0)*0.4D0), &
> -----------------------------------------------------^
> fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [COS]
> TR12=-cos(acos(-1.D0)*0.2D0), TI12=-sin(acos(-1.D0)*0.2D0) )
> -----------------------^
> fortcom: Error: c3fft_tmp_.F, line 1098: This intrinsic function is invalid in constant expressions. [SIN]
> TR12=-cos(acos(-1.D0)*0.2D0), TI12=-sin(acos(-1.D0)*0.2D0) )
> -----------------------------------------------------^
> compilation aborted for c3fft_tmp_.F (code 1)
> make[1]: *** [c3fft.o] Error 1
>
>
> PhD. candidate, Zhen Chen
> ------------------------------
> Beijing Laboratory of Electron Microscopy
> Institute of Physics
> Chinese Academy of Sciences
> P. O. Box 603
> Beijing 100190, China
> Tel: 86-10-82648001
> zchen at blem.ac.cn
>
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
-------------- next part --------------
SUBROUTINE C3FFT(N1,N2,N3,C,LDC1,LDC2,ISIG,CWORK,DWORK,IERR)
!
! ..................................................................
! 1. PROGRAM UNIT 'C3FFT'
! 3-dimensional Fast Fourier Transform of a COMPLEX*16 array
! FORTRAN 77 SUBROUTINE
!
! 2. PURPOSE
! This routine performs either a forward or a backward FFT of
! three-dimensional double complex (COMPLEX*16) data.
! The primefactors of the number of datapoints along each
! dimension of the data array should be small to yield better
! performance.
!
! 3. USAGE
! PARAMETER-DESCRIPTION
! C - COMPLEX*16 array (input/output)
! Input: The (three-dimensional) array containing the
! data to be transformed.
! Output: the transformed (three-dimensional) data
! LDC1 - INTEGER variable (input)
! The exact dimension of the first index of C as
! defined in the calling routine.
! LDC2 - INTEGER variable (input)
! The exact dimension of the second index of C as
! defined in the calling routine.
! N1 - INTEGER variable (input)
! The number of datapoints in the first dimension of
! array C.
! Constraint: N1 .LE. LDC1 (not checked here)
! N2 - INTEGER variable (input)
! The number of datapoints in the second dimension of
! array C.
! Constraint: N2 .LE. LDC2 (not checked here)
! N3 - INTEGER variable (input)
! The number of datapoints in the third dimension of
! array C.
! IERR - INTEGER variable (output)
! Error indicator.
! IERR .EQ. 0 ... no error
! IERR .NE. 0 ... currently not supported
! ISIG - INTEGER variable (input)
! Specifies the direction of the transform:
! ISIG .LT. 0 ... forward FFT
! ISIG .GT. 0 ... backward FFT
! ISIG = 0 ...... nothing will be done
! CWORK - COMPLEX*16 array (input/output)
! A work array which must be dimensioned at least
! max(N1,N2,N3) in the calling routine.
! DWORK - DOUBLE PRECISION array (input/output)
! A work array which must be dimensioned at least
! 4*(max(N1,N2,N3))+15 in the calling routine.
!
! USED SUBROUTINES (DIRECTLY CALLED)
! CFFTB - backward Fast Fourier Transform (from FFTPACK)
! CFFTF - forward Fast Fourier Transform (from FFTPACK)
! CFFTI - initialization for CFFTB and CFFTF (from FFTPACK)
!
! INDIRECTLY CALLED SUBROUTINES
! PASSB - (from FFTPACK)
! PASSB2 - (from FFTPACK)
! PASSB3 - (from FFTPACK)
! PASSB4 - (from FFTPACK)
! PASSB5 - (from FFTPACK)
! PASSF - (from FFTPACK)
! PASSF2 - (from FFTPACK)
! PASSF3 - (from FFTPACK)
! PASSF4 - (from FFTPACK)
! PASSF5 - (from FFTPACK)
! PIMACH - (from FFTPACK)
!
! UTILITY-SUBROUTINES (USE BEFOREHAND OR AFTERWARDS)
! none
!
! INPUT/OUTPUT (READ/WRITE)
! none
!
! MACHINENDEPENDENT PROGRAMPARTS
! The array containing the three-dimensional data to be
! transformed is declared as COMPLEX*16.
!
! 4. REMARKS
! For repeated usage of this routine no separate set-up
! routine is provided, because the time needed for this
! set-up is neglectable compared to the actual transformation.
!
! 5. METHOD
! The three-dimensional FFT is done using one-dimensional
! complex FFT routines (from FFTPACK). This is done by:
! 1. transforming data values along the 1. dimension of C
! 2. transforming data values along the 2. dimension of C
! 3. transforming data values along the 3. dimension of C
! Because the data elements along the 2. and 3. dimension of
! C are not stored in contiguous storage locations and
! FFTPACK-routines don't provide means for specifying a
! stride, the involved elements are copied into a temporary
! one-dimensional field before transformation. Afterwards
! the data values are stored back to their original
! location in C.
!
! 6. DATE
! 28. June 1993 Version 0.91
!
! INSTITUT FUER TECHNISCHE ELEKTROCHEMIE -- TU VIENNA
! ..................................................................
!
INTEGER LDC1, LDC2, N1, N2, N3, ISIG, IERR
DOUBLE PRECISION DWORK(*)
COMPLEX*16 CWORK(*)
COMPLEX*16 C(LDC1,LDC2,N3)
!
INTEGER I, J, K
#ifdef FFTW2
call exec_fftw(isig,n1,n2,n3,C,ierr)
return
#else
!
IERR = 0
IF (ISIG .LT. 0) THEN
!
! forward transform in 1. dimension
!
CALL CFFTI(N1,DWORK)
DO 20 K = 1, N3
DO 10 J = 1, N2
CALL CFFTF(N1,C(1,J,K),DWORK)
10 CONTINUE
20 CONTINUE
!
! forward transform in 2. dimension
!
CALL CFFTI(N2,DWORK)
DO 60 K = 1, N3
DO 50 I = 1, N1
! DO 30 J = 1, N2
! CWORK(J) = C(I,J,K)
! 30 CONTINUE
call zcopy(N2,C(I,1,K),LDC1,CWORK,1)
CALL CFFTF(N2,CWORK,DWORK)
call zcopy(N2,CWORK,1,C(I,1,K),LDC1)
! DO 40 J = 1, N2
! C(I,J,K) = CWORK(J)
! 40 CONTINUE
50 CONTINUE
60 CONTINUE
!
! forward transform in 3. dimension
!
CALL CFFTI(N3,DWORK)
DO 100 J = 1, N2
DO 90 I = 1, N1
! DO 70 K = 1, N3
! CWORK(K) = C(I,J,K)
! 70 CONTINUE
call zcopy(N3,C(I,J,1),LDC1*LDC2,CWORK,1)
CALL CFFTF(N3,CWORK,DWORK)
call zcopy(N3,CWORK,1,C(I,J,1),LDC1*LDC2)
! DO 80 K = 1, N3
! C(I,J,K) = CWORK(K)
! 80 CONTINUE
90 CONTINUE
100 CONTINUE
ELSEIF (ISIG .GT. 0) THEN
!
! backward transform in 1. dimension
!
CALL CFFTI(N1,DWORK)
DO 120 K = 1, N3
DO 110 J = 1, N2
CALL CFFTB(N1,C(1,J,K),DWORK)
110 CONTINUE
120 CONTINUE
!
! backward transform in 2. dimension
!
CALL CFFTI(N2,DWORK)
DO 160 K = 1, N3
DO 150 I = 1, N1
! DO 130 J = 1, N2
! CWORK(J) = C(I,J,K)
! 130 CONTINUE
call zcopy(N2,C(I,1,K),LDC1,CWORK,1)
CALL CFFTB(N2,CWORK,DWORK)
call zcopy(N2,CWORK,1,C(I,1,K),LDC1)
! DO 140 J = 1, N2
! C(I,J,K) = CWORK(J)
! 140 CONTINUE
150 CONTINUE
160 CONTINUE
!
! backward transform in 3. dimension
!
CALL CFFTI(N3,DWORK)
DO 200 J = 1, N2
DO 190 I = 1, N1
! DO 170 K = 1, N3
! CWORK(K) = C(I,J,K)
! 170 CONTINUE
call zcopy(N3,C(I,J,1),LDC1*LDC2,CWORK,1)
CALL CFFTB(N3,CWORK,DWORK)
call zcopy(N3,CWORK,1,C(I,J,1),LDC1*LDC2)
! DO 180 K = 1, N3
! C(I,J,K) = CWORK(K)
! 180 CONTINUE
190 CONTINUE
200 CONTINUE
ENDIF
!
! End of 'C3FFT'
!
#endif
END
! SUBROUTINE CFFTB(N,C,WSAVE)
!
! SUBROUTINE CFFTB COMPUTES THE BACKWARD COMPLEX DISCRETE FOURIER
! TRANSFORM (THE FOURIER SYNTHESIS). EQUIVALENTLY , CFFTB COMPUTES
! A COMPLEX PERIODIC SEQUENCE FROM ITS FOURIER COEFFICIENTS.
! THE TRANSFORM IS DEFINED BELOW AT OUTPUT PARAMETER C.
!
! A CALL OF CFFTF FOLLOWED BY A CALL OF CFFTB WILL MULTIPLY THE
! SEQUENCE BY N.
!
! THE ARRAY WSAVE WHICH IS USED BY SUBROUTINE CFFTB MUST BE
! INITIALIZED BY CALLING SUBROUTINE CFFTI(N,WSAVE).
!
! INPUT PARAMETERS
!
!
! N THE LENGTH OF THE COMPLEX SEQUENCE C. THE METHOD IS
! MORE EFFICIENT WHEN N IS THE PRODUCT OF SMALL PRIMES.
!
! C A COMPLEX ARRAY OF LENGTH N WHICH CONTAINS THE SEQUENCE
!
! WSAVE A REAL WORK ARRAY WHICH MUST BE DIMENSIONED AT LEAST 4N+15
! IN THE PROGRAM THAT CALLS CFFTB. THE WSAVE ARRAY MUST BE
! INITIALIZED BY CALLING SUBROUTINE CFFTI(N,WSAVE) AND A
! DIFFERENT WSAVE ARRAY MUST BE USED FOR EACH DIFFERENT
! VALUE OF N. THIS INITIALIZATION DOES NOT HAVE TO BE
! REPEATED SO LONG AS N REMAINS UNCHANGED THUS SUBSEQUENT
! TRANSFORMS CAN BE OBTAINED FASTER THAN THE FIRST.
! THE SAME WSAVE ARRAY CAN BE USED BY CFFTF AND CFFTB.
!
! OUTPUT PARAMETERS
!
! C FOR J=1,...,N
!
! C(J)=THE SUM FROM K=1,...,N OF
!
! C(K)*EXP(I*(J-1)*(K-1)*2*PI/N)
!
! WHERE I=SQRT(-1)
!
! WSAVE CONTAINS INITIALIZATION CALCULATIONS WHICH MUST NOT BE
! DESTROYED BETWEEN CALLS OF SUBROUTINE CFFTF OR CFFTB
!
SUBROUTINE CFFTB (N,C,WSAVE)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION C(*) ,WSAVE(*)
!
IF (N .EQ. 1) RETURN
IW1 = N+N+1
IW2 = IW1+N+N
CALL CFFTB1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))
RETURN
END
SUBROUTINE CFFTB1 (N,C,CH,WA,IFAC)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CH(*) ,C(*) ,WA(*) ,IFAC(*)
NF = IFAC(2)
NA = 0
L1 = 1
IW = 1
DO 116 K1=1,NF
IP = IFAC(K1+2)
L2 = IP*L1
IDO = N/L2
IDOT = IDO+IDO
IDL1 = IDOT*L1
IF (IP .NE. 4) GO TO 103
IX2 = IW+IDOT
IX3 = IX2+IDOT
IF (NA .NE. 0) GO TO 101
CALL PASSB4 (IDOT,L1,C,CH,WA(IW),WA(IX2),WA(IX3))
GO TO 102
101 CALL PASSB4 (IDOT,L1,CH,C,WA(IW),WA(IX2),WA(IX3))
102 NA = 1-NA
GO TO 115
103 IF (IP .NE. 2) GO TO 106
IF (NA .NE. 0) GO TO 104
CALL PASSB2 (IDOT,L1,C,CH,WA(IW))
GO TO 105
104 CALL PASSB2 (IDOT,L1,CH,C,WA(IW))
105 NA = 1-NA
GO TO 115
106 IF (IP .NE. 3) GO TO 109
IX2 = IW+IDOT
IF (NA .NE. 0) GO TO 107
CALL PASSB3 (IDOT,L1,C,CH,WA(IW),WA(IX2))
GO TO 108
107 CALL PASSB3 (IDOT,L1,CH,C,WA(IW),WA(IX2))
108 NA = 1-NA
GO TO 115
109 IF (IP .NE. 5) GO TO 112
IX2 = IW+IDOT
IX3 = IX2+IDOT
IX4 = IX3+IDOT
IF (NA .NE. 0) GO TO 110
CALL PASSB5 (IDOT,L1,C,CH,WA(IW),WA(IX2),WA(IX3),WA(IX4))
GO TO 111
110 CALL PASSB5 (IDOT,L1,CH,C,WA(IW),WA(IX2),WA(IX3),WA(IX4))
111 NA = 1-NA
GO TO 115
112 IF (NA .NE. 0) GO TO 113
CALL PASSB (NAC,IDOT,IP,L1,IDL1,C,C,C,CH,CH,WA(IW))
GO TO 114
113 CALL PASSB (NAC,IDOT,IP,L1,IDL1,CH,CH,CH,C,C,WA(IW))
114 IF (NAC .NE. 0) NA = 1-NA
115 L1 = L2
IW = IW+(IP-1)*IDOT
116 CONTINUE
IF (NA .EQ. 0) RETURN
N2 = N+N
DO 117 I=1,N2
C(I) = CH(I)
117 CONTINUE
RETURN
END
! SUBROUTINE CFFTF(N,C,WSAVE)
!
! SUBROUTINE CFFTF COMPUTES THE FORWARD COMPLEX DISCRETE FOURIER
! TRANSFORM (THE FOURIER ANALYSIS). EQUIVALENTLY , CFFTF COMPUTES
! THE FOURIER COEFFICIENTS OF A COMPLEX PERIODIC SEQUENCE.
! THE TRANSFORM IS DEFINED BELOW AT OUTPUT PARAMETER C.
!
! THE TRANSFORM IS NOT NORMALIZED. TO OBTAIN A NORMALIZED TRANSFORM
! THE OUTPUT MUST BE DIVIDED BY N. OTHERWISE A CALL OF CFFTF
! FOLLOWED BY A CALL OF CFFTB WILL MULTIPLY THE SEQUENCE BY N.
!
! THE ARRAY WSAVE WHICH IS USED BY SUBROUTINE CFFTF MUST BE
! INITIALIZED BY CALLING SUBROUTINE CFFTI(N,WSAVE).
!
! INPUT PARAMETERS
!
!
! N THE LENGTH OF THE COMPLEX SEQUENCE C. THE METHOD IS
! MORE EFFICIENT WHEN N IS THE PRODUCT OF SMALL PRIMES. N
!
! C A COMPLEX ARRAY OF LENGTH N WHICH CONTAINS THE SEQUENCE
!
! WSAVE A REAL WORK ARRAY WHICH MUST BE DIMENSIONED AT LEAST 4N+15
! IN THE PROGRAM THAT CALLS CFFTF. THE WSAVE ARRAY MUST BE
! INITIALIZED BY CALLING SUBROUTINE CFFTI(N,WSAVE) AND A
! DIFFERENT WSAVE ARRAY MUST BE USED FOR EACH DIFFERENT
! VALUE OF N. THIS INITIALIZATION DOES NOT HAVE TO BE
! REPEATED SO LONG AS N REMAINS UNCHANGED THUS SUBSEQUENT
! TRANSFORMS CAN BE OBTAINED FASTER THAN THE FIRST.
! THE SAME WSAVE ARRAY CAN BE USED BY CFFTF AND CFFTB.
!
! OUTPUT PARAMETERS
!
! C FOR J=1,...,N
!
! C(J)=THE SUM FROM K=1,...,N OF
!
! C(K)*EXP(-I*(J-1)*(K-1)*2*PI/N)
!
! WHERE I=SQRT(-1)
!
! WSAVE CONTAINS INITIALIZATION CALCULATIONS WHICH MUST NOT BE
! DESTROYED BETWEEN CALLS OF SUBROUTINE CFFTF OR CFFTB
!
SUBROUTINE CFFTF (N,C,WSAVE)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION C(*) ,WSAVE(*)
!
IF (N .EQ. 1) RETURN
IW1 = N+N+1
IW2 = IW1+N+N
CALL CFFTF1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))
RETURN
END
SUBROUTINE CFFTF1 (N,C,CH,WA,IFAC)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CH(*) ,C(*) ,WA(*) ,IFAC(*)
NF = IFAC(2)
NA = 0
L1 = 1
IW = 1
DO 116 K1=1,NF
IP = IFAC(K1+2)
L2 = IP*L1
IDO = N/L2
IDOT = IDO+IDO
IDL1 = IDOT*L1
IF (IP .NE. 4) GO TO 103
IX2 = IW+IDOT
IX3 = IX2+IDOT
IF (NA .NE. 0) GO TO 101
CALL PASSF4 (IDOT,L1,C,CH,WA(IW),WA(IX2),WA(IX3))
GO TO 102
101 CALL PASSF4 (IDOT,L1,CH,C,WA(IW),WA(IX2),WA(IX3))
102 NA = 1-NA
GO TO 115
103 IF (IP .NE. 2) GO TO 106
IF (NA .NE. 0) GO TO 104
CALL PASSF2 (IDOT,L1,C,CH,WA(IW))
GO TO 105
104 CALL PASSF2 (IDOT,L1,CH,C,WA(IW))
105 NA = 1-NA
GO TO 115
106 IF (IP .NE. 3) GO TO 109
IX2 = IW+IDOT
IF (NA .NE. 0) GO TO 107
CALL PASSF3 (IDOT,L1,C,CH,WA(IW),WA(IX2))
GO TO 108
107 CALL PASSF3 (IDOT,L1,CH,C,WA(IW),WA(IX2))
108 NA = 1-NA
GO TO 115
109 IF (IP .NE. 5) GO TO 112
IX2 = IW+IDOT
IX3 = IX2+IDOT
IX4 = IX3+IDOT
IF (NA .NE. 0) GO TO 110
CALL PASSF5 (IDOT,L1,C,CH,WA(IW),WA(IX2),WA(IX3),WA(IX4))
GO TO 111
110 CALL PASSF5 (IDOT,L1,CH,C,WA(IW),WA(IX2),WA(IX3),WA(IX4))
111 NA = 1-NA
GO TO 115
112 IF (NA .NE. 0) GO TO 113
CALL PASSF (NAC,IDOT,IP,L1,IDL1,C,C,C,CH,CH,WA(IW))
GO TO 114
113 CALL PASSF (NAC,IDOT,IP,L1,IDL1,CH,CH,CH,C,C,WA(IW))
114 IF (NAC .NE. 0) NA = 1-NA
115 L1 = L2
IW = IW+(IP-1)*IDOT
116 CONTINUE
IF (NA .EQ. 0) RETURN
N2 = N+N
DO 117 I=1,N2
C(I) = CH(I)
117 CONTINUE
RETURN
END
! SUBROUTINE CFFTI(N,WSAVE)
!
! SUBROUTINE CFFTI INITIALIZES THE ARRAY WSAVE WHICH IS USED IN
! BOTH CFFTF AND CFFTB. THE PRIME FACTORIZATION OF N TOGETHER WITH
! A TABULATION OF THE TRIGONOMETRIC FUNCTIONS ARE COMPUTED AND
! STORED IN WSAVE.
!
! INPUT PARAMETER
!
! N THE LENGTH OF THE SEQUENCE TO BE TRANSFORMED
!
! OUTPUT PARAMETER
!
! WSAVE A WORK ARRAY WHICH MUST BE DIMENSIONED AT LEAST 4*N+15
! THE SAME WORK ARRAY CAN BE USED FOR BOTH CFFTF AND CFFTB
! AS LONG AS N REMAINS UNCHANGED. DIFFERENT WSAVE ARRAYS
! ARE REQUIRED FOR DIFFERENT VALUES OF N. THE CONTENTS OF
! WSAVE MUST NOT BE CHANGED BETWEEN CALLS OF CFFTF OR CFFTB.
!
SUBROUTINE CFFTI (N,WSAVE)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION WSAVE(*)
!
IF (N .EQ. 1) RETURN
IW1 = N+N+1
IW2 = IW1+N+N
CALL CFFTI1 (N,WSAVE(IW1),WSAVE(IW2))
RETURN
END
SUBROUTINE CFFTI1 (N,WA,IFAC)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION WA(*) ,IFAC(*) ,NTRYH(4)
DATA NTRYH(1),NTRYH(2),NTRYH(3),NTRYH(4)/3,4,2,5/
NL = N
NF = 0
J = 0
101 J = J+1
IF (J-4) 102,102,103
102 NTRY = NTRYH(J)
GO TO 104
103 NTRY = NTRY+2
104 NQ = NL/NTRY
NR = NL-NTRY*NQ
IF (NR) 101,105,101
105 NF = NF+1
IFAC(NF+2) = NTRY
NL = NQ
IF (NTRY .NE. 2) GO TO 107
IF (NF .EQ. 1) GO TO 107
DO 106 I=2,NF
IB = NF-I+2
IFAC(IB+2) = IFAC(IB+1)
106 CONTINUE
IFAC(3) = 2
107 IF (NL .NE. 1) GO TO 104
IFAC(1) = N
IFAC(2) = NF
TPI = 2.0D+0*PIMACH(DUM)
ARGH = TPI/DFLOAT(N)
I = 2
L1 = 1
DO 110 K1=1,NF
IP = IFAC(K1+2)
LD = 0
L2 = L1*IP
IDO = N/L2
IDOT = IDO+IDO+2
IPM = IP-1
DO 109 J=1,IPM
I1 = I
WA(I-1) = 1.0D+0
WA(I) = 0.0D+0
LD = LD+L1
FI = 0.0D+0
ARGLD = DFLOAT(LD)*ARGH
DO 108 II=4,IDOT,2
I = I+2
FI = FI+1.0D+0
ARG = FI*ARGLD
WA(I-1) = COS(ARG)
WA(I) = SIN(ARG)
108 CONTINUE
IF (IP .LE. 5) GO TO 109
WA(I1-1) = WA(I-1)
WA(I1) = WA(I)
109 CONTINUE
L1 = L2
110 CONTINUE
RETURN
END
SUBROUTINE PASSB (NAC,IDO,IP,L1,IDL1,CC,C1,C2,CH,CH2,WA)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CH(IDO,L1,IP) ,CC(IDO,IP,L1) , &
C1(IDO,L1,IP) ,WA(*) ,C2(IDL1,IP), &
CH2(IDL1,IP)
IDOT = IDO/2
NT = IP*IDL1
IPP2 = IP+2
IPPH = (IP+1)/2
IDP = IP*IDO
!
IF (IDO .LT. L1) GO TO 106
DO 103 J=2,IPPH
JC = IPP2-J
DO 102 K=1,L1
DO 101 I=1,IDO
CH(I,K,J) = CC(I,J,K)+CC(I,JC,K)
CH(I,K,JC) = CC(I,J,K)-CC(I,JC,K)
101 CONTINUE
102 CONTINUE
103 CONTINUE
DO 105 K=1,L1
DO 104 I=1,IDO
CH(I,K,1) = CC(I,1,K)
104 CONTINUE
105 CONTINUE
GO TO 112
106 DO 109 J=2,IPPH
JC = IPP2-J
DO 108 I=1,IDO
DO 107 K=1,L1
CH(I,K,J) = CC(I,J,K)+CC(I,JC,K)
CH(I,K,JC) = CC(I,J,K)-CC(I,JC,K)
107 CONTINUE
108 CONTINUE
109 CONTINUE
DO 111 I=1,IDO
DO 110 K=1,L1
CH(I,K,1) = CC(I,1,K)
110 CONTINUE
111 CONTINUE
112 IDL = 2-IDO
INC = 0
DO 116 L=2,IPPH
LC = IPP2-L
IDL = IDL+IDO
DO 113 IK=1,IDL1
C2(IK,L) = CH2(IK,1)+WA(IDL-1)*CH2(IK,2)
C2(IK,LC) = WA(IDL)*CH2(IK,IP)
113 CONTINUE
IDLJ = IDL
INC = INC+IDO
DO 115 J=3,IPPH
JC = IPP2-J
IDLJ = IDLJ+INC
IF (IDLJ .GT. IDP) IDLJ = IDLJ-IDP
WAR = WA(IDLJ-1)
WAI = WA(IDLJ)
DO 114 IK=1,IDL1
C2(IK,L) = C2(IK,L)+WAR*CH2(IK,J)
C2(IK,LC) = C2(IK,LC)+WAI*CH2(IK,JC)
114 CONTINUE
115 CONTINUE
116 CONTINUE
DO 118 J=2,IPPH
DO 117 IK=1,IDL1
CH2(IK,1) = CH2(IK,1)+CH2(IK,J)
117 CONTINUE
118 CONTINUE
DO 120 J=2,IPPH
JC = IPP2-J
DO 119 IK=2,IDL1,2
CH2(IK-1,J) = C2(IK-1,J)-C2(IK,JC)
CH2(IK-1,JC) = C2(IK-1,J)+C2(IK,JC)
CH2(IK,J) = C2(IK,J)+C2(IK-1,JC)
CH2(IK,JC) = C2(IK,J)-C2(IK-1,JC)
119 CONTINUE
120 CONTINUE
NAC = 1
IF (IDO .EQ. 2) RETURN
NAC = 0
DO 121 IK=1,IDL1
C2(IK,1) = CH2(IK,1)
121 CONTINUE
DO 123 J=2,IP
DO 122 K=1,L1
C1(1,K,J) = CH(1,K,J)
C1(2,K,J) = CH(2,K,J)
122 CONTINUE
123 CONTINUE
IF (IDOT .GT. L1) GO TO 127
IDIJ = 0
DO 126 J=2,IP
IDIJ = IDIJ+2
DO 125 I=4,IDO,2
IDIJ = IDIJ+2
DO 124 K=1,L1
C1(I-1,K,J) = WA(IDIJ-1)*CH(I-1,K,J)-WA(IDIJ)*CH(I,K,J)
C1(I,K,J) = WA(IDIJ-1)*CH(I,K,J)+WA(IDIJ)*CH(I-1,K,J)
124 CONTINUE
125 CONTINUE
126 CONTINUE
RETURN
127 IDJ = 2-IDO
DO 130 J=2,IP
IDJ = IDJ+IDO
DO 129 K=1,L1
IDIJ = IDJ
DO 128 I=4,IDO,2
IDIJ = IDIJ+2
C1(I-1,K,J) = WA(IDIJ-1)*CH(I-1,K,J)-WA(IDIJ)*CH(I,K,J)
C1(I,K,J) = WA(IDIJ-1)*CH(I,K,J)+WA(IDIJ)*CH(I-1,K,J)
128 CONTINUE
129 CONTINUE
130 CONTINUE
RETURN
END
SUBROUTINE PASSB2 (IDO,L1,CC,CH,WA1)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,2,L1) ,CH(IDO,L1,2) , &
WA1(*)
IF (IDO .GT. 2) GO TO 102
DO 101 K=1,L1
CH(1,K,1) = CC(1,1,K)+CC(1,2,K)
CH(1,K,2) = CC(1,1,K)-CC(1,2,K)
CH(2,K,1) = CC(2,1,K)+CC(2,2,K)
CH(2,K,2) = CC(2,1,K)-CC(2,2,K)
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
CH(I-1,K,1) = CC(I-1,1,K)+CC(I-1,2,K)
TR2 = CC(I-1,1,K)-CC(I-1,2,K)
CH(I,K,1) = CC(I,1,K)+CC(I,2,K)
TI2 = CC(I,1,K)-CC(I,2,K)
CH(I,K,2) = WA1(I-1)*TI2+WA1(I)*TR2
CH(I-1,K,2) = WA1(I-1)*TR2-WA1(I)*TI2
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSB3 (IDO,L1,CC,CH,WA1,WA2)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,3,L1) ,CH(IDO,L1,3) , &
WA1(*) ,WA2(*)
! DATA TAUR,TAUI /-0.5D+0,0.866025403784439D+0/
parameter (TAUR=-0.5D0)
TAUI=dsqrt(0.75D0) ! dsqrt not allowed in PARAMETER on aurora-ifort
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TR2 = CC(1,2,K)+CC(1,3,K)
CR2 = CC(1,1,K)+TAUR*TR2
CH(1,K,1) = CC(1,1,K)+TR2
TI2 = CC(2,2,K)+CC(2,3,K)
CI2 = CC(2,1,K)+TAUR*TI2
CH(2,K,1) = CC(2,1,K)+TI2
CR3 = TAUI*(CC(1,2,K)-CC(1,3,K))
CI3 = TAUI*(CC(2,2,K)-CC(2,3,K))
CH(1,K,2) = CR2-CI3
CH(1,K,3) = CR2+CI3
CH(2,K,2) = CI2+CR3
CH(2,K,3) = CI2-CR3
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TR2 = CC(I-1,2,K)+CC(I-1,3,K)
CR2 = CC(I-1,1,K)+TAUR*TR2
CH(I-1,K,1) = CC(I-1,1,K)+TR2
TI2 = CC(I,2,K)+CC(I,3,K)
CI2 = CC(I,1,K)+TAUR*TI2
CH(I,K,1) = CC(I,1,K)+TI2
CR3 = TAUI*(CC(I-1,2,K)-CC(I-1,3,K))
CI3 = TAUI*(CC(I,2,K)-CC(I,3,K))
DR2 = CR2-CI3
DR3 = CR2+CI3
DI2 = CI2+CR3
DI3 = CI2-CR3
CH(I,K,2) = WA1(I-1)*DI2+WA1(I)*DR2
CH(I-1,K,2) = WA1(I-1)*DR2-WA1(I)*DI2
CH(I,K,3) = WA2(I-1)*DI3+WA2(I)*DR3
CH(I-1,K,3) = WA2(I-1)*DR3-WA2(I)*DI3
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSB4 (IDO,L1,CC,CH,WA1,WA2,WA3)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,4,L1) ,CH(IDO,L1,4) , &
WA1(*) ,WA2(*) ,WA3(*)
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TI1 = CC(2,1,K)-CC(2,3,K)
TI2 = CC(2,1,K)+CC(2,3,K)
TR4 = CC(2,4,K)-CC(2,2,K)
TI3 = CC(2,2,K)+CC(2,4,K)
TR1 = CC(1,1,K)-CC(1,3,K)
TR2 = CC(1,1,K)+CC(1,3,K)
TI4 = CC(1,2,K)-CC(1,4,K)
TR3 = CC(1,2,K)+CC(1,4,K)
CH(1,K,1) = TR2+TR3
CH(1,K,3) = TR2-TR3
CH(2,K,1) = TI2+TI3
CH(2,K,3) = TI2-TI3
CH(1,K,2) = TR1+TR4
CH(1,K,4) = TR1-TR4
CH(2,K,2) = TI1+TI4
CH(2,K,4) = TI1-TI4
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TI1 = CC(I,1,K)-CC(I,3,K)
TI2 = CC(I,1,K)+CC(I,3,K)
TI3 = CC(I,2,K)+CC(I,4,K)
TR4 = CC(I,4,K)-CC(I,2,K)
TR1 = CC(I-1,1,K)-CC(I-1,3,K)
TR2 = CC(I-1,1,K)+CC(I-1,3,K)
TI4 = CC(I-1,2,K)-CC(I-1,4,K)
TR3 = CC(I-1,2,K)+CC(I-1,4,K)
CH(I-1,K,1) = TR2+TR3
CR3 = TR2-TR3
CH(I,K,1) = TI2+TI3
CI3 = TI2-TI3
CR2 = TR1+TR4
CR4 = TR1-TR4
CI2 = TI1+TI4
CI4 = TI1-TI4
CH(I-1,K,2) = WA1(I-1)*CR2-WA1(I)*CI2
CH(I,K,2) = WA1(I-1)*CI2+WA1(I)*CR2
CH(I-1,K,3) = WA2(I-1)*CR3-WA2(I)*CI3
CH(I,K,3) = WA2(I-1)*CI3+WA2(I)*CR3
CH(I-1,K,4) = WA3(I-1)*CR4-WA3(I)*CI4
CH(I,K,4) = WA3(I-1)*CI4+WA3(I)*CR4
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSB5 (IDO,L1,CC,CH,WA1,WA2,WA3,WA4)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,5,L1) ,CH(IDO,L1,5) , &
WA1(*) ,WA2(*) ,WA3(*) ,WA4(*)
DATA TR11,TI11,TR12,TI12 /0.309016994374947D+0, &
0.951056516295154D+0,-0.809016994374947D+0,0.587785252292473D+0/
! TR12 is cos(pi/5)
! TR11 is cos(2*pi/5) (and similar)
! parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=sin(acos(-1.D0)*0.4D0), &
! TR12=-cos(acos(-1.D0)*0.2D0), TI12=sin(acos(-1.D0)*0.2D0) )
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TI5 = CC(2,2,K)-CC(2,5,K)
TI2 = CC(2,2,K)+CC(2,5,K)
TI4 = CC(2,3,K)-CC(2,4,K)
TI3 = CC(2,3,K)+CC(2,4,K)
TR5 = CC(1,2,K)-CC(1,5,K)
TR2 = CC(1,2,K)+CC(1,5,K)
TR4 = CC(1,3,K)-CC(1,4,K)
TR3 = CC(1,3,K)+CC(1,4,K)
CH(1,K,1) = CC(1,1,K)+TR2+TR3
CH(2,K,1) = CC(2,1,K)+TI2+TI3
CR2 = CC(1,1,K)+TR11*TR2+TR12*TR3
CI2 = CC(2,1,K)+TR11*TI2+TR12*TI3
CR3 = CC(1,1,K)+TR12*TR2+TR11*TR3
CI3 = CC(2,1,K)+TR12*TI2+TR11*TI3
CR5 = TI11*TR5+TI12*TR4
CI5 = TI11*TI5+TI12*TI4
CR4 = TI12*TR5-TI11*TR4
CI4 = TI12*TI5-TI11*TI4
CH(1,K,2) = CR2-CI5
CH(1,K,5) = CR2+CI5
CH(2,K,2) = CI2+CR5
CH(2,K,3) = CI3+CR4
CH(1,K,3) = CR3-CI4
CH(1,K,4) = CR3+CI4
CH(2,K,4) = CI3-CR4
CH(2,K,5) = CI2-CR5
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TI5 = CC(I,2,K)-CC(I,5,K)
TI2 = CC(I,2,K)+CC(I,5,K)
TI4 = CC(I,3,K)-CC(I,4,K)
TI3 = CC(I,3,K)+CC(I,4,K)
TR5 = CC(I-1,2,K)-CC(I-1,5,K)
TR2 = CC(I-1,2,K)+CC(I-1,5,K)
TR4 = CC(I-1,3,K)-CC(I-1,4,K)
TR3 = CC(I-1,3,K)+CC(I-1,4,K)
CH(I-1,K,1) = CC(I-1,1,K)+TR2+TR3
CH(I,K,1) = CC(I,1,K)+TI2+TI3
CR2 = CC(I-1,1,K)+TR11*TR2+TR12*TR3
CI2 = CC(I,1,K)+TR11*TI2+TR12*TI3
CR3 = CC(I-1,1,K)+TR12*TR2+TR11*TR3
CI3 = CC(I,1,K)+TR12*TI2+TR11*TI3
CR5 = TI11*TR5+TI12*TR4
CI5 = TI11*TI5+TI12*TI4
CR4 = TI12*TR5-TI11*TR4
CI4 = TI12*TI5-TI11*TI4
DR3 = CR3-CI4
DR4 = CR3+CI4
DI3 = CI3+CR4
DI4 = CI3-CR4
DR5 = CR2+CI5
DR2 = CR2-CI5
DI5 = CI2-CR5
DI2 = CI2+CR5
CH(I-1,K,2) = WA1(I-1)*DR2-WA1(I)*DI2
CH(I,K,2) = WA1(I-1)*DI2+WA1(I)*DR2
CH(I-1,K,3) = WA2(I-1)*DR3-WA2(I)*DI3
CH(I,K,3) = WA2(I-1)*DI3+WA2(I)*DR3
CH(I-1,K,4) = WA3(I-1)*DR4-WA3(I)*DI4
CH(I,K,4) = WA3(I-1)*DI4+WA3(I)*DR4
CH(I-1,K,5) = WA4(I-1)*DR5-WA4(I)*DI5
CH(I,K,5) = WA4(I-1)*DI5+WA4(I)*DR5
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSF (NAC,IDO,IP,L1,IDL1,CC,C1,C2,CH,CH2,WA)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CH(IDO,L1,IP) ,CC(IDO,IP,L1) , &
C1(IDO,L1,IP) ,WA(*) ,C2(IDL1,IP), &
CH2(IDL1,IP)
IDOT = IDO/2
NT = IP*IDL1
IPP2 = IP+2
IPPH = (IP+1)/2
IDP = IP*IDO
!
IF (IDO .LT. L1) GO TO 106
DO 103 J=2,IPPH
JC = IPP2-J
DO 102 K=1,L1
DO 101 I=1,IDO
CH(I,K,J) = CC(I,J,K)+CC(I,JC,K)
CH(I,K,JC) = CC(I,J,K)-CC(I,JC,K)
101 CONTINUE
102 CONTINUE
103 CONTINUE
DO 105 K=1,L1
DO 104 I=1,IDO
CH(I,K,1) = CC(I,1,K)
104 CONTINUE
105 CONTINUE
GO TO 112
106 DO 109 J=2,IPPH
JC = IPP2-J
DO 108 I=1,IDO
DO 107 K=1,L1
CH(I,K,J) = CC(I,J,K)+CC(I,JC,K)
CH(I,K,JC) = CC(I,J,K)-CC(I,JC,K)
107 CONTINUE
108 CONTINUE
109 CONTINUE
DO 111 I=1,IDO
DO 110 K=1,L1
CH(I,K,1) = CC(I,1,K)
110 CONTINUE
111 CONTINUE
112 IDL = 2-IDO
INC = 0
DO 116 L=2,IPPH
LC = IPP2-L
IDL = IDL+IDO
DO 113 IK=1,IDL1
C2(IK,L) = CH2(IK,1)+WA(IDL-1)*CH2(IK,2)
C2(IK,LC) = -WA(IDL)*CH2(IK,IP)
113 CONTINUE
IDLJ = IDL
INC = INC+IDO
DO 115 J=3,IPPH
JC = IPP2-J
IDLJ = IDLJ+INC
IF (IDLJ .GT. IDP) IDLJ = IDLJ-IDP
WAR = WA(IDLJ-1)
WAI = WA(IDLJ)
DO 114 IK=1,IDL1
C2(IK,L) = C2(IK,L)+WAR*CH2(IK,J)
C2(IK,LC) = C2(IK,LC)-WAI*CH2(IK,JC)
114 CONTINUE
115 CONTINUE
116 CONTINUE
DO 118 J=2,IPPH
DO 117 IK=1,IDL1
CH2(IK,1) = CH2(IK,1)+CH2(IK,J)
117 CONTINUE
118 CONTINUE
DO 120 J=2,IPPH
JC = IPP2-J
DO 119 IK=2,IDL1,2
CH2(IK-1,J) = C2(IK-1,J)-C2(IK,JC)
CH2(IK-1,JC) = C2(IK-1,J)+C2(IK,JC)
CH2(IK,J) = C2(IK,J)+C2(IK-1,JC)
CH2(IK,JC) = C2(IK,J)-C2(IK-1,JC)
119 CONTINUE
120 CONTINUE
NAC = 1
IF (IDO .EQ. 2) RETURN
NAC = 0
DO 121 IK=1,IDL1
C2(IK,1) = CH2(IK,1)
121 CONTINUE
DO 123 J=2,IP
DO 122 K=1,L1
C1(1,K,J) = CH(1,K,J)
C1(2,K,J) = CH(2,K,J)
122 CONTINUE
123 CONTINUE
IF (IDOT .GT. L1) GO TO 127
IDIJ = 0
DO 126 J=2,IP
IDIJ = IDIJ+2
DO 125 I=4,IDO,2
IDIJ = IDIJ+2
DO 124 K=1,L1
C1(I-1,K,J) = WA(IDIJ-1)*CH(I-1,K,J)+WA(IDIJ)*CH(I,K,J)
C1(I,K,J) = WA(IDIJ-1)*CH(I,K,J)-WA(IDIJ)*CH(I-1,K,J)
124 CONTINUE
125 CONTINUE
126 CONTINUE
RETURN
127 IDJ = 2-IDO
DO 130 J=2,IP
IDJ = IDJ+IDO
DO 129 K=1,L1
IDIJ = IDJ
DO 128 I=4,IDO,2
IDIJ = IDIJ+2
C1(I-1,K,J) = WA(IDIJ-1)*CH(I-1,K,J)+WA(IDIJ)*CH(I,K,J)
C1(I,K,J) = WA(IDIJ-1)*CH(I,K,J)-WA(IDIJ)*CH(I-1,K,J)
128 CONTINUE
129 CONTINUE
130 CONTINUE
RETURN
END
SUBROUTINE PASSF2 (IDO,L1,CC,CH,WA1)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,2,L1) ,CH(IDO,L1,2) , &
WA1(*)
IF (IDO .GT. 2) GO TO 102
DO 101 K=1,L1
CH(1,K,1) = CC(1,1,K)+CC(1,2,K)
CH(1,K,2) = CC(1,1,K)-CC(1,2,K)
CH(2,K,1) = CC(2,1,K)+CC(2,2,K)
CH(2,K,2) = CC(2,1,K)-CC(2,2,K)
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
CH(I-1,K,1) = CC(I-1,1,K)+CC(I-1,2,K)
TR2 = CC(I-1,1,K)-CC(I-1,2,K)
CH(I,K,1) = CC(I,1,K)+CC(I,2,K)
TI2 = CC(I,1,K)-CC(I,2,K)
CH(I,K,2) = WA1(I-1)*TI2-WA1(I)*TR2
CH(I-1,K,2) = WA1(I-1)*TR2+WA1(I)*TI2
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSF3 (IDO,L1,CC,CH,WA1,WA2)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,3,L1) ,CH(IDO,L1,3) , &
WA1(*) ,WA2(*)
DATA TAUR,TAUI /-0.5D+0,-0.866025403784439D+0/
! TAUR=-0.5D0
! TAUI=-sqrt(0.75D0)
! parameter (TAUR=-0.5D0, TAUI=-dsqrt(0.75D0))
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TR2 = CC(1,2,K)+CC(1,3,K)
CR2 = CC(1,1,K)+TAUR*TR2
CH(1,K,1) = CC(1,1,K)+TR2
TI2 = CC(2,2,K)+CC(2,3,K)
CI2 = CC(2,1,K)+TAUR*TI2
CH(2,K,1) = CC(2,1,K)+TI2
CR3 = TAUI*(CC(1,2,K)-CC(1,3,K))
CI3 = TAUI*(CC(2,2,K)-CC(2,3,K))
CH(1,K,2) = CR2-CI3
CH(1,K,3) = CR2+CI3
CH(2,K,2) = CI2+CR3
CH(2,K,3) = CI2-CR3
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TR2 = CC(I-1,2,K)+CC(I-1,3,K)
CR2 = CC(I-1,1,K)+TAUR*TR2
CH(I-1,K,1) = CC(I-1,1,K)+TR2
TI2 = CC(I,2,K)+CC(I,3,K)
CI2 = CC(I,1,K)+TAUR*TI2
CH(I,K,1) = CC(I,1,K)+TI2
CR3 = TAUI*(CC(I-1,2,K)-CC(I-1,3,K))
CI3 = TAUI*(CC(I,2,K)-CC(I,3,K))
DR2 = CR2-CI3
DR3 = CR2+CI3
DI2 = CI2+CR3
DI3 = CI2-CR3
CH(I,K,2) = WA1(I-1)*DI2-WA1(I)*DR2
CH(I-1,K,2) = WA1(I-1)*DR2+WA1(I)*DI2
CH(I,K,3) = WA2(I-1)*DI3-WA2(I)*DR3
CH(I-1,K,3) = WA2(I-1)*DR3+WA2(I)*DI3
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSF4 (IDO,L1,CC,CH,WA1,WA2,WA3)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,4,L1) ,CH(IDO,L1,4) , &
WA1(*) ,WA2(*) ,WA3(*)
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TI1 = CC(2,1,K)-CC(2,3,K)
TI2 = CC(2,1,K)+CC(2,3,K)
TR4 = CC(2,2,K)-CC(2,4,K)
TI3 = CC(2,2,K)+CC(2,4,K)
TR1 = CC(1,1,K)-CC(1,3,K)
TR2 = CC(1,1,K)+CC(1,3,K)
TI4 = CC(1,4,K)-CC(1,2,K)
TR3 = CC(1,2,K)+CC(1,4,K)
CH(1,K,1) = TR2+TR3
CH(1,K,3) = TR2-TR3
CH(2,K,1) = TI2+TI3
CH(2,K,3) = TI2-TI3
CH(1,K,2) = TR1+TR4
CH(1,K,4) = TR1-TR4
CH(2,K,2) = TI1+TI4
CH(2,K,4) = TI1-TI4
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TI1 = CC(I,1,K)-CC(I,3,K)
TI2 = CC(I,1,K)+CC(I,3,K)
TI3 = CC(I,2,K)+CC(I,4,K)
TR4 = CC(I,2,K)-CC(I,4,K)
TR1 = CC(I-1,1,K)-CC(I-1,3,K)
TR2 = CC(I-1,1,K)+CC(I-1,3,K)
TI4 = CC(I-1,4,K)-CC(I-1,2,K)
TR3 = CC(I-1,2,K)+CC(I-1,4,K)
CH(I-1,K,1) = TR2+TR3
CR3 = TR2-TR3
CH(I,K,1) = TI2+TI3
CI3 = TI2-TI3
CR2 = TR1+TR4
CR4 = TR1-TR4
CI2 = TI1+TI4
CI4 = TI1-TI4
CH(I-1,K,2) = WA1(I-1)*CR2+WA1(I)*CI2
CH(I,K,2) = WA1(I-1)*CI2-WA1(I)*CR2
CH(I-1,K,3) = WA2(I-1)*CR3+WA2(I)*CI3
CH(I,K,3) = WA2(I-1)*CI3-WA2(I)*CR3
CH(I-1,K,4) = WA3(I-1)*CR4+WA3(I)*CI4
CH(I,K,4) = WA3(I-1)*CI4-WA3(I)*CR4
103 CONTINUE
104 CONTINUE
RETURN
END
SUBROUTINE PASSF5 (IDO,L1,CC,CH,WA1,WA2,WA3,WA4)
IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION CC(IDO,5,L1) ,CH(IDO,L1,5) , &
WA1(*) ,WA2(*) ,WA3(*) ,WA4(*)
DATA TR11,TI11,TR12,TI12 /0.309016994374947D+0, &
-0.951056516295154D+0,-0.809016994374947D+0, &
-0.587785252292473D+0/
! parameter (TR11= cos(acos(-1.D0)*0.4D0), TI11=-sin(acos(-1.D0)*0.4D0), &
! TR12=-cos(acos(-1.D0)*0.2D0), TI12=-sin(acos(-1.D0)*0.2D0) )
IF (IDO .NE. 2) GO TO 102
DO 101 K=1,L1
TI5 = CC(2,2,K)-CC(2,5,K)
TI2 = CC(2,2,K)+CC(2,5,K)
TI4 = CC(2,3,K)-CC(2,4,K)
TI3 = CC(2,3,K)+CC(2,4,K)
TR5 = CC(1,2,K)-CC(1,5,K)
TR2 = CC(1,2,K)+CC(1,5,K)
TR4 = CC(1,3,K)-CC(1,4,K)
TR3 = CC(1,3,K)+CC(1,4,K)
CH(1,K,1) = CC(1,1,K)+TR2+TR3
CH(2,K,1) = CC(2,1,K)+TI2+TI3
CR2 = CC(1,1,K)+TR11*TR2+TR12*TR3
CI2 = CC(2,1,K)+TR11*TI2+TR12*TI3
CR3 = CC(1,1,K)+TR12*TR2+TR11*TR3
CI3 = CC(2,1,K)+TR12*TI2+TR11*TI3
CR5 = TI11*TR5+TI12*TR4
CI5 = TI11*TI5+TI12*TI4
CR4 = TI12*TR5-TI11*TR4
CI4 = TI12*TI5-TI11*TI4
CH(1,K,2) = CR2-CI5
CH(1,K,5) = CR2+CI5
CH(2,K,2) = CI2+CR5
CH(2,K,3) = CI3+CR4
CH(1,K,3) = CR3-CI4
CH(1,K,4) = CR3+CI4
CH(2,K,4) = CI3-CR4
CH(2,K,5) = CI2-CR5
101 CONTINUE
RETURN
102 DO 104 K=1,L1
DO 103 I=2,IDO,2
TI5 = CC(I,2,K)-CC(I,5,K)
TI2 = CC(I,2,K)+CC(I,5,K)
TI4 = CC(I,3,K)-CC(I,4,K)
TI3 = CC(I,3,K)+CC(I,4,K)
TR5 = CC(I-1,2,K)-CC(I-1,5,K)
TR2 = CC(I-1,2,K)+CC(I-1,5,K)
TR4 = CC(I-1,3,K)-CC(I-1,4,K)
TR3 = CC(I-1,3,K)+CC(I-1,4,K)
CH(I-1,K,1) = CC(I-1,1,K)+TR2+TR3
CH(I,K,1) = CC(I,1,K)+TI2+TI3
CR2 = CC(I-1,1,K)+TR11*TR2+TR12*TR3
CI2 = CC(I,1,K)+TR11*TI2+TR12*TI3
CR3 = CC(I-1,1,K)+TR12*TR2+TR11*TR3
CI3 = CC(I,1,K)+TR12*TI2+TR11*TI3
CR5 = TI11*TR5+TI12*TR4
CI5 = TI11*TI5+TI12*TI4
CR4 = TI12*TR5-TI11*TR4
CI4 = TI12*TI5-TI11*TI4
DR3 = CR3-CI4
DR4 = CR3+CI4
DI3 = CI3+CR4
DI4 = CI3-CR4
DR5 = CR2+CI5
DR2 = CR2-CI5
DI5 = CI2-CR5
DI2 = CI2+CR5
CH(I-1,K,2) = WA1(I-1)*DR2+WA1(I)*DI2
CH(I,K,2) = WA1(I-1)*DI2-WA1(I)*DR2
CH(I-1,K,3) = WA2(I-1)*DR3+WA2(I)*DI3
CH(I,K,3) = WA2(I-1)*DI3-WA2(I)*DR3
CH(I-1,K,4) = WA3(I-1)*DR4+WA3(I)*DI4
CH(I,K,4) = WA3(I-1)*DI4-WA3(I)*DR4
CH(I-1,K,5) = WA4(I-1)*DR5+WA4(I)*DI5
CH(I,K,5) = WA4(I-1)*DI5-WA4(I)*DR5
103 CONTINUE
104 CONTINUE
RETURN
END
FUNCTION PIMACH (DUM)
IMPLICIT REAL*8 (A-H,O-Z)
! PI=3.1415926535897932384626433832795028841971693993751058209749446
!
PIMACH = 4.0D+0*ATAN(1.0D+0)
RETURN
END
-------------- next part --------------
REAL*8 FUNCTION GAUNT(L1,L2,L3,MM1,MM2,MM3)
IMPLICIT REAL*8 (A-H,O-Z)
DATA FPI/12.56637061436D0/
! parameter (FPI=4.D0*acos(-1.D0) )
J1=2*L1
J2=2*L2
J3=2*L3
M1=-2*MM1
M2=2*MM2
M3=2*MM3
! FAC=(-1.D0)**MM1*SQRT( (J1+1)*(J2+1)*(J3+1)/FPI)
FAC=SQRT(DBLE((J1+1)*(J2+1)*(J3+1))/FPI)
IF( 2*(MM1/2) .ne. MM1)FAC=-FAC
GAUNT=FAC*T3J0(J1,J2,J3)*T3J(J1,J2,J3,M1,M2,M3)
RETURN
END
-------------- next part --------------
Subroutine W2kinit ()
!
! Generic initialization with Wien2k
!
! This code sets the stack limit to the hard limit
! and setups up error-handlers.
!
! The intent with the handlers is to ensure that a
! mpi job will stop if one processor encounters an
! error, but signalling an MPI_ABORT (via abort_parallel)
! to all the other processors. Almost certainly this
! will not catch network failures and some other issues.
!
! It is probably only needed for mpi codes since in
! other cases the fortran rtl routines will give more
! information.
!
! The handlers are in C, as this is
! more portable than Intel's ifport routines, plus some
! handlers as well using ifport if -DINTEL_VML is in
! the compile line. It appears that some fortran issues
! are not going to the c rtl, so these are added as
! well although this could be dangerous!
!
! L. D. Marks, January 2010
!
#ifdef INTEL_VML
! include 'mkl_vml.fi'
#endif
call w2k_extend_limits()
#ifdef Parallel
! Register C signal dispatcher, needed for parallel only
call w2k_register_signals()
#endif
#ifdef INTEL_VML
! Initialize Intel Vector Math Library (part of MKL)
! oldmode = vmlsetmode( IOR(VML_HA, VML_DOUBLE_CONSISTENT) )
#endif
return
end
Subroutine W2k_catch_signal(wsig)
#ifdef Parallel
use parallel, only: abort_parallel, myid
#else
parameter (myid = 0)
#endif
integer wsig(*)
! Signal messages, for lack of Fortran psignal().
! The sequence is W2k-internal and should match
! W2kutils.c:w2k_dispatch_signal()
character *50 signal_message(0:12)
data signal_message / &
& 'Unknown signal received' & ! 0
& ,'Received an Interrupt' & ! 1
& ,'Found an Illegal Instruction' & ! 2
& ,'Abort' & ! 3
& ,'Bus Error' & ! 4
& ,'Floating-Point Exception' & ! 5
& ,'SIGSEGV, contact developers' & ! 6
& ,'Stack Fault, contact developers' & ! 7
& ,'Run out of CPU time allotted, sorry' & ! 8
& ,'Process kill signal received' & ! 9
& ,'Process termination signal received' & ! 10
& ,'Hangup signal received' & ! 11
& ,'IOT trap (abort)' & ! 12
& /
!
i = wsig(1)
if (i .lt. 1 .or. i .gt. 11) i = 0
call outerr('Unknown', trim(signal_message(i)))
write(*,*)'Child id',myid, trim(signal_message(i))
#ifdef Parallel
! Ensure all mpi processes know something has gone wrong
call abort_parallel
#endif
! Just in case we are still alive
stop 'WIEN2K ABORTING'
return
end
-------------- next part --------------
SUBROUTINE HAMILT(LTOP,NAT,NV,U,UP,DU,DUP,MULT,R, &
POS,V,VOLUME,CHN,E,INS)
!
#ifdef Parallel
use matrices, only : H, S,HSROWS, KZZ, XK, YK, ZK, RK
#else
use matrices, only : HS, HSDIAG, HSROWS, KZZ, XK, YK, ZK, RK
#endif
use loabc, only : ALO, BLO, CLO, ELO, PLO, DPLO, PELO, &
DPELO, PEILO, PI12LO, PE12LO
use parallel, only : ICTXTALL, DESCHS,DESCZ, NPE, MYID, &
NPROW, NPCOL, MYROWHS, MYCOLHS, &
BLOCKSIZE, LDHS, LCOLHS, LCOLZ, &
INIT_PARALLEL, BARRIER, CLOSE_PARALLEL
use lapw_timer, only : time_albl, time_loop260, time_phase, &
time_legendre, time_step_function, time_h, &
time_us, time_loop210, time_loop230, &
time_loop240, time_overlap, time_lo, &
START_TIMER, STOP_TIMER, READ_CPU_TIME, READ_wall_TIME, &
init_all_timer,time_distiouter
use lolog, only : nlo, loor, lapw, ilo
use rotmat, only: ROTIJ, ROTLOC
use struk, only : NDF,multmax
use out, only : WARP
use albl
! use out, only : WARP1, WARP2, WARP3
IMPLICIT NONE
INCLUDE 'param.inc'
!
! Parameters
!
DOUBLE PRECISION ONE, TOL, UTOP, ZERO, HALF
DOUBLE PRECISION TWO, THREE, FOUR
PARAMETER (ONE=1.0D+0, TOL=1.0D-8)
PARAMETER (TWO=2.0D+0, THREE=3.0D0, FOUR=4.0D0)
PARAMETER (HALF=0.5D0)
PARAMETER (UTOP=0.9999999D+0, ZERO=0.0D+0)
DOUBLE COMPLEX CIMAG
PARAMETER (CIMAG = (0.0D0,1.0D0))
DOUBLE COMPLEX CZERO
PARAMETER (CZERO=(0.0D0,0.0D0))
DOUBLE COMPLEX CONE
PARAMETER (CONE=(1.0d0,0.0d0))
!
! Arguments
!
INTEGER INS, LTOP, NAT, NV
INTEGER MULT(NAT)
DOUBLE PRECISION VOLUME
DOUBLE PRECISION R(NAT)
! DOUBLE PRECISION XK(HSROWS+1), YK(HSROWS+1), ZK(HSROWS+1)
DOUBLE PRECISION CHN(0:LMAX-1,NAT), DU(0:LMAX-1,NAT)
DOUBLE PRECISION DUP(0:LMAX-1,NAT), E(0:LMAX-1,NAT)
DOUBLE PRECISION POS(3,NDF)
DOUBLE PRECISION U(0:LMAX-1,NAT), UP(0:LMAX-1,NAT), V(NAT)
! YL(lm,:,i) - temporary storage of spherical harmonics for
! local orbitals
!
DOUBLE COMPLEX, allocatable :: YL(:,:,:)
!
! Locals
!
DOUBLE PRECISION V3
INTEGER jlo,jlop
DOUBLE PRECISION atmp1, tmp3, stmp, htmp, btmp1
DOUBLE PRECISION akinlor,akinlorup,akinloiup,akinbr,akinbi
INTEGER I, INDATM, indatm0, J, JEQ, JEQO, JNEQ, K, L
INTEGER L0M0, MO1, MO2, N, NNLO, IATM, IHELP, IOUTER
INTEGER IMIN, IMAX, II
DOUBLE PRECISION ARGX, ARGY, ARGZ, ATMP, BESR, BTMP
DOUBLE PRECISION ARGXX, ARGYY, ARGZZ
DOUBLE PRECISION C1R, C1I, C2R, C2I, C3R, C3I
DOUBLE PRECISION C11R, C11I, C12R, C12I, C13R, C13I
DOUBLE PRECISION C1RUP, C1IUP, C2RUP, C2IUP, C3RUP, C3IUP
DOUBLE PRECISION C11RUP,C11IUP,C12RUP,C12IUP,C13RUP,C13IUP
DOUBLE PRECISION C6R, C6I, C7R, C7I, C8R, C8I
DOUBLE PRECISION CABR, CABI, CABCR, CABCI, CLR, CLI
DOUBLE PRECISION CLYLR, CLYLI, CLYLNR, CLYLNI
DOUBLE PRECISION DLL1, PI4R2V
DOUBLE PRECISION DTIME1, DTIME2, DTIMH, DTIMLG, DTIMMA, DTIMPH,DTIMDIST
DOUBLE PRECISION DTIME3, dtime240, dtime241, dtime242
DOUBLE PRECISION dtime260, dtime211, dtime231, dtime210,dtime230,time_albl_tot
DOUBLE PRECISION time_albl_totw, DTIMLGw, DTIMMAw, DTIMPHw,DTIMDISTw,DTIMUSw
DOUBLE PRECISION DTIMS, DTIMUS, PI, Q, RKN, TMP, VIFPR4, rk_tmp1
DOUBLE PRECISION, allocatable :: DEN(:), DFJ(:), FJ(:), FK(:), rk_tmp(:)
DOUBLE PRECISION, allocatable :: PPLX(:), PPLY(:), ROTV1(:)
DOUBLE PRECISION, allocatable :: ROTV2(:), SLEN(:,:), TMP1(:)
DOUBLE PRECISION, allocatable :: TMP2(:), VEC(:)
DOUBLE PRECISION, allocatable :: X2(:,:), XL(:),x(:)
DOUBLE PRECISION, allocatable :: P(:,:,:)
DOUBLE PRECISION, allocatable :: help1(:), help_cos(:)
DOUBLE PRECISION :: xk_tmp_c,yk_tmp_c,zk_tmp_c
DOUBLE PRECISION, allocatable :: xk_tmp_r(:),yk_tmp_r(:),zk_tmp_r(:)
!_COMPLEX DOUBLE COMPLEX, allocatable :: help_exp(:)
DOUBLE PRECISION, allocatable :: help_sin(:), tmp_y(:)
#if defined (INTEL_VML)
!_COMPLEX DOUBLE PRECISION, allocatable :: help_tmpsin(:), help_tmpcos(:)
DOUBLE PRECISION, allocatable :: tmp_ytmp(:)
#endif
!_REAL DOUBLE PRECISION :: PHS
!_REAL DOUBLE PRECISION, allocatable :: PHASE(:),spanelus(:,:)
!_COMPLEX DOUBLE COMPLEX :: PHS
!_COMPLEX DOUBLE COMPLEX, allocatable :: PHASE(:),spanelus(:,:)
DOUBLE COMPLEX, allocatable :: PHSC(:)
!_REAL DOUBLE PRECISION, allocatable :: HPANEL(:,:)
!_REAL DOUBLE PRECISION, allocatable :: SPANEL(:,:)
!_COMPLEX DOUBLE COMPLEX, allocatable :: HPANEL(:, :)
!_COMPLEX DOUBLE COMPLEX, allocatable :: SPANEL(:, :)
integer, allocatable:: kzz_tmp(:,:), kzz_ihelp(:,:)
!
! External Functions
!
!_REAL DOUBLE PRECISION USTPHX, WARPIN
!_COMPLEX DOUBLE COMPLEX USTPHX, WARPIN
EXTERNAL USTPHX, WARPIN
!
! External Subroutines
!
EXTERNAL CPUTIM, DVBES1, ROTATE, SPHBES, YLM
!
! Intrinsic Functions
!
INTRINSIC ABS, ATAN, DCONJG, COS, DBLE, DCMPLX, EXP, SIN
INTRINSIC SQRT
integer i_g,j_g,hsdim_r,hsdim_c,i_col,ipr,ipc,ii_g
integer global2local, jnnlo
integer, allocatable :: j_end(:), j_end_1(:)
CALL BARRIER
!
allocate( DFJ(0:LMAX-1), FJ(0:LMAX-1), FK(3) )
allocate( ROTV1(3),ROTV2(3),VEC(3) )
allocate( XL(LMAX-2) )
#ifdef Parallel
HSdim_r=LDHS
HSdim_c=LCOLHS
#else
HSdim_c=HSROWS
HSdim_r=HSROWS
#endif
allocate( HPANEL( HSdim_r, BLOCKSIZE) )
allocate( SPANEL( HSdim_r, BLOCKSIZE) )
allocate( DEN(HSdim_r), rk_tmp(HSdim_r) )
allocate( PPLX(HSdim_r), PPLY(HSdim_r) )
allocate( SLEN(HSdim_r,blocksize), TMP1(HSdim_r) )
allocate( TMP2(HSdim_r) )
allocate( kzz_TMP(3,HSdim_r) )
allocate( kzz_ihelp(3,blocksize) )
allocate( X2(HSdim_r,blocksize), X(HSdim_r) )
allocate( P(HSdim_r,0:LTOP-1,blocksize) )
allocate( help1(HSdim_r), help_cos(HSdim_r), help_sin(HSdim_r) )
#if defined (INTEL_VML)
!_COMPLEX allocate( help_tmpcos(HSdim_r), help_tmpsin(HSdim_r) )
allocate( tmp_ytmp(HSdim_r) )
#endif
allocate( tmp_y(HSdim_r) )
!_COMPLEX allocate( help_exp(HSdim_r) )
allocate( PHASE(HSdim_r) )
allocate(spanelus(HSdim_r,blocksize))
allocate(j_end(blocksize),j_end_1(blocksize))
allocate(xk_tmp_r(HSdim_r),yk_tmp_r(HSdim_r),zk_tmp_r(HSdim_r))
CALL init_albl(ltop-1,'HAM')
!_REAL write(6,'(a20,f12.1,a,a,2i6)') 'allocate spanel',HSdim_r*blocksize*8.0/(1024.0*1024.0),' MB ',&
!_REAL ' dimensions',HSdim_r,blocksize
!_REAL write(6,'(a20,f12.1,a,a,2i6)') 'allocate hpanel',HSdim_r*blocksize*8.0/(1024.0*1024.0),' MB ',&
!_REAL ' dimensions',HSdim_r,blocksize
!_COMPLEX write(6,'(a20,f12.1,a,a,2i6)') 'allocate spanel',HSdim_r*blocksize*16.0/(1024.0*1024.0),' MB ',&
!_COMPLEX ' dimensions',HSdim_r,blocksize
!_COMPLEX write(6,'(a20,f12.1,a,a,2i6)') 'allocate hpanel',HSdim_r*blocksize*16.0/(1024.0*1024.0),' MB ',&
!_COMPLEX ' dimensions',HSdim_r,blocksize
!_REAL write(6,'(a20,f12.1,a,a,2i6)') 'allocate spanelus',HSdim_r*blocksize*8.0/(1024.0*1024.0),' MB ',&
!_REAL ' dimensions',HSdim_r,blocksize
!_COMPLEX write(6,'(a20,f12.1,a,a,2i6)') 'allocate spanelus',HSdim_r*blocksize*16.0/(1024.0*1024.0),' MB ',&
!_COMPLEX ' dimensions',HSdim_r,blocksize
write(6,'(a20,f12.1,a,a,2i6)') 'allocate slen',HSdim_r*blocksize*8.0/(1024.0*1024.0),' MB ',&
' dimensions',HSdim_r,blocksize
write(6,'(a20,f12.1,a,a,2i6)') 'allocate x2',HSdim_r*blocksize*8.0/(1024.0*1024.0),' MB ',&
' dimensions',HSdim_r,blocksize
write(6,'(a20,f12.1,a,a,3i6)') 'allocate legendre',dble((HSdim_r)*LMAX*blocksize*8)/(1024.0*1024.0),' MB ',&
' dimensions',HSdim_r,lmax,blocksize
write(6,'(a20,f12.1,a,a,2i6)') 'allocate al,bl (row)',HSdim_r*ltop*2.0*2*8.0/(1024.0*1024.0),' MB ',&
' dimensions',HSdim_r,ltop
write(6,'(a20,f12.1,a,a,2i6)') 'allocate al,bl (col)',blocksize*ltop*2.0*2*8.0/(1024.0*1024.0),' MB ',&
' dimensions',blocksize,ltop
write(6,'(a20,f12.1,a,a,3i6)') 'allocate YL',dble((LOMAX+1)*(LOMAX+1)*HSdim_r*multmax*16)/(1024.0*1024.0),' MB ',&
' dimensions',(LOMAX+1)*(LOMAX+1)-1,HSdim_r,multmax
PI = FOUR*ATAN(ONE)
DO I = 1, LMAX - 2
XL(I) = DBLE(I)/DBLE(I+1)
enddo
DTIMPH = ZERO
DTIMUS = ZERO
DTIMLG = ZERO
DTIMMA = ZERO
DTIMS = ZERO
DTIMH = ZERO
DTIMDIST=ZERO
time_albl_tot=zero
DTIMPHw = ZERO
DTIMUSw = ZERO
DTIMLGw = ZERO
DTIMMAw = ZERO
DTIMDISTw=ZERO
time_albl_totw=zero
!
! compute Overlap-matrix
!
CALL START_TIMER(time_loop260)
j=0
DO J_g = 1,NV
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
j=j+1
#else
j=j_g
#endif
kzz_tmp(1:3,j)=KZZ(1:3,J_g)
xk_tmp_r(j)=XK(J_g)
yk_tmp_r(j)=YK(J_g)
zk_tmp_r(j)=ZK(J_g)
rk_tmp(j)=RK(J_g)
enddo
iouter_loop: DO IOUTER = 0, NV/(BLOCKSIZE)
#ifdef Parallel
ipc=mod(iouter,npcol)
if (mycolhs.ne.ipc) cycle
#endif
IMIN = IOUTER*BLOCKSIZE + 1
IMAX = MIN(NV, (IOUTER + 1)*BLOCKSIZE)
!
! precompute Legendre-Polynomials
!
CALL START_TIMER(time_legendre)
IHELP = 0
!DEC$ PARALLEL
legrende: DO I_g = IMIN, IMAX
IHELP = IHELP + 1
#ifdef Parallel
j=0
DO J_g = 1, I_g-1
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
j=j+1
enddo
j_end_1(ihelp)=j
ipr=mod((i_g-1)/blocksize,nprow)
if (ipr.eq.myrowhs) j=j+1
j_end(ihelp)=j
#else
j_end(ihelp)=i_g
j_end_1(ihelp)=i_g-1
#endif
kzz_ihelp(1:3,ihelp)=KZZ(1:3,I_g)
xk_tmp_c=xk(i_g)
yk_tmp_c=yk(i_g)
zk_tmp_c=zk(i_g)
rk_tmp1=RK(I_g)
DO J = 1,j_end(ihelp)
X2(J,ihelp) = XK_tmp_c*XK_tmp_r(J) + &
YK_tmp_c*YK_tmp_r(J) + &
ZK_tmp_c*ZK_tmp_r(J)
SLEN(J,ihelp) = SQRT((XK_tmp_r(J)-XK_tmp_c)*(XK_tmp_r(J)-XK_tmp_c)+ &
(YK_tmp_r(J)-YK_tmp_c)*(YK_tmp_r(J)-YK_tmp_c)+ &
(ZK_tmp_r(J)-ZK_tmp_c)*(ZK_tmp_r(J)-ZK_tmp_c))
x(j)=X2(J,ihelp)
den(j)=rk_tmp1*rk_tmp(j)
enddo
DO J = 1,j_end(ihelp)
IF (DEN(J) .GT. TOL) X(J) = X(J)/DEN(J)
PPLX(J) = X(J)*X(J)
PPLY(J) = PPLX(J) - ONE
P(J,0,ihelp) = ONE
P(J,1,ihelp) = X(J)
enddo
DO K = 2, LTOP - 2
DO J = 1, j_end(ihelp)
IF (ABS(X(J)) .LT. UTOP) THEN
P(J,K,ihelp) = PPLX(J) + XL(K-1)*PPLY(J)
PPLX(J) = X(J)*P(J,K,ihelp)
PPLY(J) = PPLX(J) - P(J,K-1,ihelp)
P(J,LTOP-1,ihelp) = PPLX(J) + XL(LTOP-2)*PPLY(J)
ELSE
P(J,K,ihelp) = X(J)*P(J,K-1,ihelp)
ENDIF
enddo
enddo
DO J = 1, j_end(ihelp)
IF (ABS(X(J)) .LT. UTOP) THEN
P(J,LTOP-1,ihelp) = PPLX(J) + XL(LTOP-2)*PPLY(J)
ELSE
P(J,LTOP-1,ihelp) = X(J)*P(J,LTOP-2,ihelp)
ENDIF
enddo
enddo legrende
CALL STOP_TIMER(time_legendre)
DTIMLG = DTIMLG + READ_CPU_TIME(time_legendre)
! DTIMLGw = DTIMLGw + READ_wall_TIME(time_legendre)
!_REAL SPANEL(:,:) = ZERO
!_COMPLEX SPANEL(:,:) = CZERO
!_REAL spanelus(:,:)=ZERO
!_COMPLEX spanelus(:,:)=CZERO
!_REAL HPANEL(:,:) = ZERO
!_COMPLEX HPANEL(:,:) = CZERO
jneq_loop: DO JNEQ = 1, NAT
if (jneq.eq.1) then
INDATM0 = 0
else
indatm0= indatm0+mult(jneq-1)
endif
VIFPR4 = FOUR*PI*VOLUME*R(JNEQ)**4
!
! precompute al(kn) and bl(kn)
!
CALL START_TIMER(time_albl)
call make_albl(jneq,r(jneq),ltop-1,NV,'HAM',imin,imax)
CALL STOP_TIMER(time_albl)
time_albl_tot=time_albl_tot+READ_CPU_TIME(time_albl)
! time_albl_totw=time_albl_totw+READ_wall_TIME(time_albl)
IMIN = IOUTER*BLOCKSIZE + 1
IMAX = MIN(NV, (IOUTER + 1) * BLOCKSIZE)
IHELP = 0
!DEC$ PARALLEL
imin1: DO I_g = IMIN, IMAX
IHELP = IHELP + 1
!
! calculate phase-factors
!
CALL START_TIMER(time_phase)
!_REAL PHASE(:) = ZERO
!_COMPLEX PHASE(:) = CZERO
DO JEQ = 1, MULT(JNEQ)
INDATM = INDATM0 + jeq
ARGXX = POS(1,INDATM)*KZZ_ihelp(1,ihelp)
ARGXX = ARGXX + POS(2,INDATM)*KZZ_ihelp(2,ihelp)
ARGXX = ARGXX + POS(3,INDATM)*KZZ_ihelp(3,ihelp)
DO J = 1,j_end(ihelp)
ARGX = POS(1,INDATM)*KZZ_tmp(1,J) - ARGXX
ARGX = POS(2,INDATM)*KZZ_tmp(2,J) + ARGX
ARGX = POS(3,INDATM)*KZZ_tmp(3,J) + ARGX
help1(j)=TWO*PI*(ARGX)
enddo
!MASS-Lib http://www.rs6000.ibm.com/resource/technology/MASS
!MASS-Lib help_cos(j)-cos(help1(j)), j=1,i
! or vmllib from Intel
!
#if defined (INTEL_VML)
!_REAL call vdcos(j_end(ihelp),help1,help_cos)
! in case of older mk/vml you may miss vzcis (IPO Error: unresolved : vzcis_)
! Either upgrade the libraries (ifort+mkl) or:
! Comment the line below and uncomment the other ones.
! Uncomment also all other lines above where help_tmpcos occurs
!!_COMPLEX call vzcis(j_end(ihelp),help1,help_exp)
!_COMPLEX call vdcos(j_end(ihelp),help1,help_tmpcos)
!_COMPLEX call vdsin(j_end(ihelp),help1,help_tmpsin)
!_COMPLEX do j = 1, j_end(ihelp)
!_COMPLEX help_exp(j) = dcmplx(help_tmpcos(j),help_tmpsin(j))
!_COMPLEX end do
#else
!_REAL call vcos(help_cos,help1,j_end(ihelp))
!_COMPLEX call vcosisin(help_exp,help1,j_end(ihelp))
#endif
do j = 1,j_end(ihelp)
!_REAL PHASE(j) = PHASE(j) + help_cos(j)
!_COMPLEX PHASE(j) = PHASE(j) + help_exp(j)
enddo
enddo
CALL STOP_TIMER(time_phase)
DTIMPH = DTIMPH + READ_CPU_TIME(time_phase)
! DTIMPHw = DTIMPHw + READ_wall_TIME(time_phase)
CALL START_TIMER(time_us)
!
! initialize Overlap-matrixelements with step-function values
!
ins=0 !fixed PB 20.5.08. because alternative (rm case.vns) does not work anymore
IF (INS .EQ. 0) THEN
CALL START_TIMER(time_step_function)
do j=1,j_end_1(ihelp)
help1(j) = slen(j,ihelp)*r(jneq)
tmp_y(j) = help1(j)*help1(j)*help1(j)
end do
!MASS-Lib http://www.rs6000.ibm.com/resource/technology/MASS
!MASS-Lib help_sin(j)=sin(help1(j)), help_cos(j)-cos(help1(j))
#if defined (INTEL_VML)
call vdsincos(j_end_1(ihelp),help1,help_sin,help_cos)
#else
call vsincos(help_sin,help_cos,help1,j_end_1(ihelp))
#endif
!MASS-Lib http://www.rs6000.ibm.com/resource/technology/MASS
!MASS-Lib: tmp_y(j)=1/tmp_y(j), j=1,i-1
#if defined (INTEL_VML)
call vdinv(j_end_1(ihelp), tmp_y, tmp_ytmp)
call dcopy(j_end_1(ihelp), tmp_ytmp, 1, tmp_y, 1)
#else
call vrec(tmp_y,tmp_y,j_end_1(ihelp))
#endif
v3=THREE*V(JNEQ)
DO J = 1, j_end_1(ihelp)
BESR = (help1(j)*help_COS(j)-help_SIN(j))*tmp_y(j)
spanelus(j,ihelp)=spanelus(j,ihelp)+v3*PHASE(j)*BESR
enddo
#ifdef Parallel
ipr=mod((i_g-1)/blocksize,nprow)
if (ipr.eq.myrowhs) then
i=global2local(i_g,nprow,blocksize)
spanelus(i,ihelp)=spanelus(i,ihelp)+ONE/DBLE(NAT) - V(JNEQ)*DBLE(MULT(JNEQ))
endif
#else
spanelus(i_g,ihelp)=spanelus(i_g,ihelp)+ONE/DBLE(NAT) - V(JNEQ)*DBLE(MULT(JNEQ))
#endif
CALL STOP_TIMER(time_step_function)
DTIMS = DTIMS + READ_CPU_TIME(time_step_function)
CALL START_TIMER(time_h)
if (jneq.eq.nat) then
j=0
DO J = 1, j_end(ihelp)
SPANEL(J,IHELP) = SPANEL(J,IHELP)+spanelus(j,ihelp)
HPANEL(J,IHELP) = HPANEL(J,IHELP)+X2(j,ihelp)*spanelus(j,ihelp)
HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
WARPIN(KZZ_tmp(1,J)-KZZ_ihelp(1,ihelp), &
KZZ_tmp(2,J)-KZZ_ihelp(2,ihelp), &
KZZ_tmp(3,J)-KZZ_ihelp(3,ihelp))
enddo
endif
CALL STOP_TIMER(time_h)
DTIMH = DTIMH + READ_CPU_TIME(time_h)
ELSE
if (jneq.eq.nat) then
j=0
DO J_g = 1, I_g
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
#endif
j=j+1
FK(1) = XK(J_g) - XK(I_g)
FK(2) = YK(J_g) - YK(I_g)
FK(3) = ZK(J_g) - ZK(I_g)
SPANEL(J,IHELP) = USTPHX(FK,PHASE,j,NAT,V,R,MULT)
HPANEL(J,IHELP) = X2(j,ihelp)*SPANEL(J,IHELP)
enddo
endif
ENDIF
!
CALL STOP_TIMER(time_us)
DTIMUS = DTIMUS + READ_CPU_TIME(time_us)
! DTIMUSw = DTIMUSw + READ_wall_TIME(time_us)
CALL START_TIMER(time_overlap)
!
! calculate Overlap-matrixelement
!
TMP1(:) = ZERO
TMP2(:) = ZERO
DO L = 0, LTOP - 1
if (.not.lapw(l,jneq)) then
ATMP = AL_c(ihelp,L)
atmp1 = ATMP*U(L,JNEQ)*DU(L,JNEQ)*R(JNEQ)**2
DLL1 = DBLE(L+L+1)
DO J = 1, j_end(ihelp)
TMP3 = DLL1*P(J,L,ihelp)
STMP = ATMP*AL_r(J,L)
STMP = STMP*TMP3
TMP2(J) = TMP2(J) + STMP
HTMP = STMP*E(L,JNEQ)
TMP1(J) = TMP1(J) + HTMP
! surface part of kinetic energy
tmp1(j) = tmp1(j)+TMP3*ATMP1*AL_r(J,L)
end do
else
!....for lapw all BL lines must be kept
ATMP = AL_c(ihelp,L)
BTMP = BL_c(ihelp,L)*CHN(L,JNEQ)
atmp1 = R(JNEQ)**2* (ATMP*U(L,JNEQ)*DU(L,JNEQ) + &
BL_c(ihelp,L)*UP(L,JNEQ)*DU(L,JNEQ))
btmp1 = R(JNEQ)**2*(BL_c(Ihelp,L)*UP(L,JNEQ)*DUP(L,JNEQ) + &
AL_c(ihelp,L)*U(L,JNEQ)*DUp(L,JNEQ))
DLL1 = DBLE(L+L+1)
DO J = 1, j_end(ihelp)
TMP3 = DLL1*P(J,L,ihelp)
STMP = ATMP*AL_r(J,L)
STMP = STMP + BTMP*BL_r(J,L)
STMP = STMP*TMP3
TMP2(J) = TMP2(J) + STMP
HTMP = ATMP*BL_r(J,L)
HTMP = HTMP*TMP3
HTMP = HTMP + STMP*E(L,JNEQ)
TMP1(J) = TMP1(J) + HTMP
! surface part of kinetic energy
tmp1(j) = tmp1(j)+&
TMP3*(ATMP1*AL_r(J,L)+Btmp1*BL_r(J,L))
end do
endif
enddo
DO J = 1, j_end(ihelp)
TMP1(J) = TMP1(J)*VIFPR4
TMP2(J) = TMP2(J)*VIFPR4
SPANEL(J,IHELP) = SPANEL(J,IHELP) + PHASE(J)*TMP2(J)
HPANEL(J,IHELP) = HPANEL(J,IHELP) + PHASE(J)*TMP1(J)
enddo
CALL STOP_TIMER(time_overlap)
DTIMMA = DTIMMA + READ_CPU_TIME(time_overlap)
! DTIMMAw = DTIMMAw + READ_wall_TIME(time_overlap)
END DO imin1
enddo jneq_loop
CALL start_TIMER(time_distiouter)
IHELP = 0
IMIN = IOUTER*BLOCKSIZE + 1
IMAX = MIN(NV, (IOUTER + 1)* BLOCKSIZE)
DO I_g = IMIN, IMAX
IHELP = IHELP + 1
#ifdef Parallel
i=global2local(i_g,npcol,blocksize)
!_REAL CALL DCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,I), 1 )
!_REAL CALL DCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,I), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,I), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,I), 1 )
#else
!_REAL CALL DCOPY(I_g-1, HPANEL(1,IHELP), 1, HS(I_g, 1), HSrows )
!_REAL CALL DCOPY(I_g, SPANEL(1,IHELP), 1, HS(1,I_g), 1 )
!_COMPLEX CALL ZCOPY(I_g-1, HPANEL(1,IHELP), 1, HS(I_g, 1), HSrows )
!_COMPLEX CALL ZCOPY(I_g, SPANEL(1,IHELP), 1, HS(1,I_g), 1 )
HSDIAG(I_g)=HPANEL(I_g,IHELP)
#endif
END DO
CALL STOP_TIMER(time_distiouter)
DTIMDIST = DTIMDIST + READ_CPU_TIME(time_distiouter)
! DTIMDISTw = DTIMDISTw + READ_wall_TIME(time_distiouter)
END DO iouter_loop
CALL STOP_TIMER(time_loop260)
call end_albl('HAM')
deallocate( PHASE )
deallocate( spanelus )
!_COMPLEX deallocate( help_exp )
deallocate( DEN, DFJ, FJ, FK, rk_tmp )
deallocate( PPLX, PPLY )
deallocate( SLEN, TMP1,TMP2)
deallocate( X, XL, x2 )
deallocate( P )
deallocate( help1, help_cos, help_sin )
#if defined (INTEL_VML)
!!_COMPLEX deallocate( help_tmpcos, help_tmpsin )
deallocate( tmp_ytmp )
#endif
deallocate( tmp_y )
deallocate( kzz_TMP,kzz_ihelp )
deallocate( j_end, j_end_1)
deallocate( xk_tmp_r,yk_tmp_r,zk_tmp_r)
WRITE (6,'(a,2f12.1)') 'Time for al,bl (hamilt, cpu/wall) : ', time_albl_tot, time_albl_totw
WRITE (6,'(a,2f12.1)') 'Time for legendre (hamilt, cpu/wall) : ', DTIMLG, DTIMLGw
WRITE (6,'(a,2f12.1)') 'Time for phase (hamilt, cpu/wall) : ', DTIMPH, DTIMPHw
WRITE (6,'(a,2f12.1)') 'Time for us (hamilt, cpu/wall) : ', DTIMUS, DTIMUSw
WRITE (6,'(a,2f12.1)') 'Time for overlaps (hamilt, cpu/wall) : ', DTIMMA, DTIMMAw
WRITE (6,'(a,2f12.1)') 'Time for distrib (hamilt, cpu/wall) : ', DTIMDIST, DTIMDISTw
! WRITE (6,'(a,2f12.1)') 'Time sum iouter (hamilt, cpu/wall) : ', &
! READ_CPU_TIME(time_loop260), READ_wall_TIME(time_loop260)
CALL BARRIER
WRITE (6,'(a,i8)') ' number of local orbitals, nlo (hamilt) ', NLO
IF (NLO .NE. 0) THEN
allocate( PHSC(HSrows) )
allocate( YL(0:(LOMAX+1)*(LOMAX+1)-1,HSrows,multmax) )
write(6,'(a20,f12.1,a,a,3i6)') 'allocate YL ',dble((LOMAX+1)*(LOMAX+1)*HSrows*multmax*16)/(1024.0*1024.0),' MB ',&
' dimensions',(LOMAX+1)*(LOMAX+1)-1,HSrows,multmax
write(6,'(a20,f12.1,a,a,i6)') 'allocate phsc',dble(HSrows*16)/(1024.0*1024.0),' MB ',&
' dimensions',HSrows
CALL init_albl(lomax,'HAM')
!
!------ calculate LO-Overlap-matrixelements ---------------------------
!
CALL START_TIMER(time_lo)
INDATM = 0
IATM = 0
NNLO = 0
!
! initialize matrixelements
!
!_REAL HPANEL(1:HSdim_r, 1:BLOCKSIZE)=ZERO
!_REAL SPANEL(1:HSdim_r, 1:BLOCKSIZE)=ZERO
!_COMPLEX HPANEL(1:HSdim_r, 1:BLOCKSIZE)=CZERO
!_COMPLEX SPANEL(1:HSdim_r, 1:BLOCKSIZE)=CZERO
I_g = NV + 1
IMIN = I_g
IMAX = MIN(NV + NLO,((IMIN-1)/(BLOCKSIZE*NPCOL)+1)*BLOCKSIZE*NPCOL)
jneq_lo_loop: DO JNEQ = 1, NAT
!
! precompute al(kn) and bl(kn)
!
call make_albl(jneq,r(jneq),lomax,NV,'HAM',1,0)
DO JEQ = 1, MULT(JNEQ)
IATM = IATM + 1
!
! precalculate spherical-harmonics for Alm, Blm
!
DO J_g = 1, NV
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
#endif
VEC(1) = XK(J_g)
VEC(2) = YK(J_g)
VEC(3) = ZK(J_g)
CALL ROTATE(VEC,ROTIJ(1,1,IATM),ROTV1)
CALL ROTATE(ROTV1,ROTLOC(1,1,JNEQ),ROTV2)
CALL YLM(ROTV2,LOMAX,YL(0,J_g,JEQ))
enddo
enddo
PI4R2V = FOUR*PI*SQRT(VOLUME)*R(JNEQ)*R(JNEQ)
CLR = ONE
CLI = ZERO
l_loop: DO L = 0, LOMAX
jlo_loop: do jlo=1,ilo(l,jneq)
! precalculate factors of a,b,c for LO's
! C1,C2,C3 are needed for the Overlap-matrix update
! C11,C12,C13 are needed for the Hamilton-matrix update
!
C1R = ALO(L,jlo,JNEQ) + PI12LO(L,JNEQ)*CLO(L,jlo,JNEQ)
C1I = ZERO
C2R = BLO(L,jlo,JNEQ)*CHN(L,JNEQ) + &
PE12LO(L,JNEQ)*CLO(L,jlo,JNEQ)
C2I = ZERO
C11R = ALO(L,jlo,JNEQ)*E(L,JNEQ) + HALF*(BLO(L,jlo,JNEQ) + &
CLO(L,jlo,JNEQ)*PI12LO(L,JNEQ)*(ELO(L,jlo,JNEQ) + &
E(L,JNEQ)))
C11I = ZERO
C12R = BLO(L,jlo,JNEQ)*CHN(L,JNEQ)*E(L,JNEQ) + &
HALF*(ALO(L,jlo,JNEQ) + CLO(L,jlo,JNEQ)* &
PI12LO(L,JNEQ) + CLO(L,jlo,JNEQ)*PE12LO(L,JNEQ)* &
(ELO(L,jlo,JNEQ) + E(L,JNEQ)))
C12I = ZERO
C3R = CLO(L,jlo,JNEQ) + &
PE12LO(L,JNEQ)*BLO(L,jlo,JNEQ) + &
ALO(L,jlo,JNEQ)*PI12LO(L,JNEQ)
C3I = ZERO
C13R = CLO(L,jlo,JNEQ)*ELO(L,jlo,JNEQ)+ &
HALF*(BLO(L,jlo,JNEQ)*PI12LO(L,JNEQ) + &
(ELO(L,jlo,JNEQ) + E(L,JNEQ))*(ALO(L,jlo,JNEQ)* &
PI12LO(L,JNEQ)+PE12LO(L,JNEQ)*BLO(L,jlo,JNEQ)))
C13I = ZERO
!
! Precalculate surface kinetic energy factor
!
akinloR=HALF*r(jneq)**2*(alo(l,jlo,jneq)*DU(L,JNEQ)+ &
blo(l,jlo,jneq)*DUP(L,JNEQ)+ &
CLO(l,jlo,jneq)*DPLO(l,jneq))
jeq0_loop: DO JEQO = 1, MULT(JNEQ)
mo1_loop: DO MO1 = -L, L
NNLO = NNLO + 1
!
! compute only local columns
!
IHELP=MOD(I_g,BLOCKSIZE)
IF (IHELP.EQ.0) IHELP=BLOCKSIZE
jeq_loop: DO JEQ = 1, MULT(JNEQ)
INDATM = INDATM + 1
!
! calculate spherical-harmonics for Alm, Blm
!
VEC(1) = XK(NV+NNLO)
VEC(2) = YK(NV+NNLO)
VEC(3) = ZK(NV+NNLO)
CALL ROTATE(VEC,ROTIJ(1,1,INDATM),ROTV1)
CALL ROTATE(ROTV1,ROTLOC(1,1,JNEQ),ROTV2)
jnnlo=NV+NNLO
CALL YLM(ROTV2,L,YL(0,jnnlo,JEQ))
#ifdef Parallel
ipc=mod((i_g-1)/blocksize,npcol)
if (ipc.eq.mycolhs) then
#endif
!
! determine phase factor
!
DO J_g = 1, NV + NNLO
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
#endif
ARGX = POS(1,INDATM)*(KZZ(1,NV+NNLO)-KZZ(1,J_g))
ARGY = POS(2,INDATM)*(KZZ(2,NV+NNLO)-KZZ(2,J_g))
ARGZ = POS(3,INDATM)*(KZZ(3,NV+NNLO)-KZZ(3,J_g))
ARGX=(ARGX+ARGY+ARGZ)*TWO*PI
PHSC(J_g) = DCMPLX(DCOS(ARGX),-DSIN(ARGX) )
! PHSC(J_g) = EXP(CMPLX(ZERO,TWO)*PI*(ARGX+ARGY+ARGZ))
! PHSC(J_g) = DCONJG(PHSC(J_g))
enddo
mo2_loop: DO MO2 = -L, L
L0M0 = L*(L+1) + MO2
!
! CLYLN = (CIMAG**L)*DCONJG(YL(L0M0,NV+NNLO,JEQ))
!
CLYLNR = CLR*DBLE(YL(L0M0,jNNLO,JEQ)) + &
CLI*DIMAG(YL(L0M0,jNNLO,JEQ))
CLYLNI = CLI*DBLE(YL(L0M0,jNNLO,JEQ)) - &
CLR*DIMAG(YL(L0M0,jNNLO,JEQ))
!
! update C1,C2,C3 for the Overlap-matrix
! C11,C12,C13 for the Hamilton-matrix
!
C1RUP = C1R*CLYLNR*PI4R2V
C1IUP = C1R*CLYLNI*PI4R2V
C2RUP = C2R*CLYLNR*PI4R2V
C2IUP = C2R*CLYLNI*PI4R2V
C3RUP = C3R*CLYLNR*PI4R2V
C3IUP = C3R*CLYLNI*PI4R2V
C11RUP = C11R*CLYLNR*PI4R2V
C11IUP = C11R*CLYLNI*PI4R2V
C12RUP = C12R*CLYLNR*PI4R2V
C12IUP = C12R*CLYLNI*PI4R2V
C13RUP = C13R*CLYLNR*PI4R2V
C13IUP = C13R*CLYLNI*PI4R2V
akinlorup = akinloR*CLYLNR*PI4R2V
akinloiup = akinloR*CLYLNI*PI4R2V
j=0
DO J_g = 1, NV
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
#endif
j=j+1
!
! CLYL = (CIMAG**L)*DCONJG(YL(L0M0,J,JEQ))
!
CLYLR = CLR*DBLE(YL(L0M0,J_g,JEQ)) + &
CLI*DIMAG(YL(L0M0,J_g,JEQ))
CLYLI = CLI*DBLE(YL(L0M0,J_g,JEQ)) - &
CLR*DIMAG(YL(L0M0,J_g,JEQ))
!
! C6 ... Alm, C7 ... Blm
!
C6R = CLYLR*AL_r(J,L)*PI4R2V
C6I = CLYLI*AL_r(J,L)*PI4R2V
C7R = CLYLR*BL_r(J,L)*PI4R2V
C7I = CLYLI*BL_r(J,L)*PI4R2V
!
! CAB = DCONJG(C6)*C1 + DCONJG(C7)*C2
!
CABR = C6R*C1RUP + C7R*C2RUP + &
C6I*C1IUP + C7I*C2IUP
CABI = -C6I*C1RUP - C7I*C2RUP + &
C6R*C1IUP + C7R*C2IUP
!
! update Overlap-matrix
!
SPANEL(J,IHELP) = SPANEL(J,IHELP) + &
(CABR*DBLE(PHSC(J_g)) - CABI*DIMAG(PHSC(J_g)))
!_COMPLEX SPANEL(J,IHELP) = SPANEL(J,IHELP) + &
!_COMPLEX CIMAG * (CABR*DIMAG(PHSC(J_g)) + &
!_COMPLEX CABI*DBLE(PHSC(J_g)))
!
! CAB = DCONJG(C6)*C11 + DCONJG(C7)*C12
!
CABR = C6R*C11RUP + C7R*C12RUP + &
C6I*C11IUP + C7I*C12IUP
CABI = -C6I*C11RUP - C7I*C12RUP + &
C6R*C11IUP + C7R*C12IUP
akinbr = (c6r*u(l,jneq)+c7r*up(l,jneq)) * akinlorup &
+(c6i*u(l,jneq)+c7i*up(l,jneq)) * akinloiup
akinbi = (c6r*u(l,jneq)+c7r*up(l,jneq)) * akinloiup &
-(c6i*u(l,jneq)+c7i*up(l,jneq)) * akinlorup
!
! update Hamilton-matrix
!
HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
(CABR*DBLE(PHSC(J_g)) - CABI*DIMAG(PHSC(J_g)))
!_COMPLEX HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
!_COMPLEX CIMAG *(CABR*DIMAG(PHSC(J_g)) + &
!_COMPLEX CABI*DBLE(PHSC(J_g)))
HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
(akinbr*DBLE(PHSC(J_g)) - &
akinbi*DIMAG(PHSC(J_g)))
!_COMPLEX HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
!_COMPLEX CIMAG * (akinbr*DIMAG(PHSC(J_g)) + &
!_COMPLEX akinbi*DBLE(PHSC(J_g)))
enddo
jlop_loop: do jlop=1,jlo
DO J_g = NV + NNLO - (MO1+L) - &
(2*L+1)*(JEQO-1) - (jlo-jlop)*(2*l+1)*mult(jneq), NV + NNLO &
-(jlo-jlop)*(MO1+L+1)-(jlo-jlop)*(JEQO-1)*(2*l+1)
#ifdef Parallel
ipr=mod((j_g-1)/blocksize,nprow)
if (ipr.ne.myrowhs) cycle
j=global2local(j_g,nprow,blocksize)
#else
j=j_g
#endif
!
! CLYL = (CIMAG**L)*DCONJG(YL(L0M0,J,JEQ))
!
CLYLR = CLR*DBLE(YL(L0M0,J_g,JEQ)) + &
CLI*DIMAG(YL(L0M0,J_g,JEQ))
CLYLI = CLI*DBLE(YL(L0M0,J_g,JEQ)) - &
CLR*DIMAG(YL(L0M0,J_g,JEQ))
!
! C6 ... CONJG(ALOlm), C7 ... CONJG(BLOlm), C8 ... CONJG(CLOlm)
!
C6R = CLYLR*ALO(L,jlop,JNEQ)*PI4R2V
C6I = CLYLI*ALO(L,jlop,JNEQ)*PI4R2V
C7R = CLYLR*BLO(L,jlop,JNEQ)*PI4R2V
C7I = CLYLI*BLO(L,jlop,JNEQ)*PI4R2V
C8R = CLYLR*CLO(L,jlop,JNEQ)*PI4R2V
C8I = CLYLI*CLO(L,jlop,JNEQ)*PI4R2V
!
! CABC = DCONJG(C6)*C1 + DCONJG(C7)*C2 + DCONJG(C8)*C3
!
CABCR = C6R*C1RUP + C6I*C1IUP + &
C7R*C2RUP + C7I*C2IUP + &
C8R*C3RUP + C8I*C3IUP
CABCI = -C6I*C1RUP + C6R*C1IUP - &
C7I*C2RUP + C7R*C2IUP - &
C8I*C3RUP + C8R*C3IUP
!
! update Overlap-matrix
!
SPANEL(J,IHELP) = SPANEL(J,IHELP) + &
(CABCR*DBLE(PHSC(J_g)) - &
CABCI*DIMAG(PHSC(J_g)))
!_COMPLEX SPANEL(J,IHELP) = SPANEL(J,IHELP) + &
!_COMPLEX CIMAG * (CABCR*DIMAG(PHSC(J_g)) + &
!_COMPLEX CABCI*DBLE(PHSC(J_g)))
!
! CABC = DCONJG(C6)*C11 + DCONJG(C7)*C12 + DCONJG(C8)*C13
!
CABCR = C6R*C11RUP + C6I*C11IUP + &
C7R*C12RUP + C7I*C12IUP + &
C8R*C13RUP + C8I*C13IUP
CABCI = -C6I*C11RUP + C6R*C11IUP - &
C7I*C12RUP + C7R*C12IUP - &
C8I*C13RUP + C8R*C13IUP
!
! update Hamilton-matrix
!
HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
(CABCR*DBLE(PHSC(J_g)) - &
CABCI*DIMAG(PHSC(J_g)))
!_COMPLEX HPANEL(J,IHELP) = HPANEL(J,IHELP) + &
!_COMPLEX CIMAG * (CABCR*DIMAG(PHSC(J_g)) + &
!_COMPLEX CABCI*DBLE(PHSC(J_g)))
enddo
enddo jlop_loop
enddo mo2_loop
#ifdef Parallel
endif
#endif
enddo jeq_loop
INDATM = INDATM - MULT(JNEQ)
IF(I_g.EQ.IMAX) THEN
ii=0
DO II_g = IMIN, IMAX
#ifdef Parallel
ipc=mod((ii_g-1)/blocksize,npcol)
if (ipc.ne.mycolhs) cycle
ii=global2local(ii_g,npcol,blocksize)
IHELP = MOD(II_g,BLOCKSIZE)
IF(IHELP.EQ.0) IHELP=BLOCKSIZE
!_REAL CALL DCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,II), 1 )
!_REAL CALL DCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,II), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,II), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,II), 1 )
#else
ii=ii+1
ii=ii_g
IHELP = MOD(II,BLOCKSIZE)
IF(IHELP.EQ.0) IHELP=BLOCKSIZE
!_REAL CALL DCOPY(II-1, HPANEL(1,IHELP), 1, HS(II, 1), HSROWS )
!_REAL CALL DCOPY(II, SPANEL(1,IHELP), 1, HS(1,II), 1 )
!_COMPLEX CALL ZCOPY(II-1, HPANEL(1,IHELP), 1, HS(II, 1), HSROWS )
!_COMPLEX CALL ZCOPY(II, SPANEL(1,IHELP), 1, HS(1,II), 1 )
HSDIAG(II)=HPANEL(II,IHELP)
#endif
END DO
!_REAL HPANEL(:,:)=ZERO
!_REAL SPANEL(:,:)=ZERO
!_COMPLEX HPANEL(:,:)=CZERO
!_COMPLEX SPANEL(:,:)=CZERO
IMIN = IMAX+1
IMAX = MIN(NV + NLO,((IMIN-1)/(BLOCKSIZE*NPCOL)+1)*BLOCKSIZE*NPCOL)
END IF
I_g = I_g + 1
enddo mo1_loop
enddo jeq0_loop
enddo jlo_loop
enddo l_loop
TMP = CLR
CLR = -CLI
CLI = TMP
INDATM = INDATM + MULT(JNEQ)
enddo jneq_lo_loop
nevertrue: if(imax.gt.imin) THEN
ii=0
DO II_g = IMIN, IMAX
#ifdef Parallel
ipc=mod((ii_g-1)/blocksize,npcol)
if (ipc.ne.mycolhs) cycle
ii=global2local(ii_g,npcol,blocksize)
IHELP = MOD(II_g,BLOCKSIZE)
IF(IHELP.EQ.0) IHELP=BLOCKSIZE
!_REAL CALL DCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,II), 1 )
!_REAL CALL DCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,II), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, HPANEL(1,IHELP), 1, H(1,II), 1 )
!_COMPLEX CALL ZCOPY(HSdim_r, SPANEL(1,IHELP), 1, S(1,II), 1 )
#else
ii=ii+1
ii=ii_g
IHELP = MOD(II,BLOCKSIZE)
IF(IHELP.EQ.0) IHELP=BLOCKSIZE
!_REAL CALL DCOPY(II-1, HPANEL(1,IHELP), 1, HS(II, 1), HSROWS )
!_REAL CALL DCOPY(II, SPANEL(1,IHELP), 1, HS(1,II), 1 )
!_COMPLEX CALL ZCOPY(II-1, HPANEL(1,IHELP), 1, HS(II, 1), HSROWS )
!_COMPLEX CALL ZCOPY(II, SPANEL(1,IHELP), 1, HS(1,II), 1 )
HSDIAG(II)=HPANEL(II,IHELP)
#endif
END DO
END IF nevertrue
CALL STOP_TIMER(time_lo)
! WRITE (6,'(a,2f12.1)') 'Time for los (hamilt, cpu/wall) : ', &
! READ_CPU_TIME(time_lo),READ_wall_TIME(time_lo)
deallocate( PHSC )
deallocate( YL )
call end_albl('HAM')
ENDIF
deallocate( hpanel,spanel)
deallocate(ROTV2,ROTV1,VEC)
CALL BARRIER
RETURN
!
! End of 'HAMILT'
!
END
More information about the Wien
mailing list