[Wien] Wien post from pascal.boulet at univ-amu.fr (Errors with DGEMM and hybrid calculations)
pboulet
pascal.boulet at univ-amu.fr
Sat Aug 15 14:22:57 CEST 2020
Hi Fabien,
Mg2Si is a small structure with 2 irreducible positions in the IBZ. Here is the struct file:
Mg2Si cubic 225 Fm-3m
F LATTICE,NONEQUIV.ATOMS 2 225 Fm-3m
MODE OF CALC=RELA unit=bohr
11.999761 11.999761 11.999761 90.000000 90.000000 90.000000
ATOM 1: X=0.00000000 Y=0.00000000 Z=0.00000000
MULT= 1 ISPLIT= 2
Si NPT= 781 R0=.000050000 RMT= 2.39 Z: 14.00000
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
ATOM 2: X=0.25000000 Y=0.25000000 Z=0.25000000
MULT= 2 ISPLIT= 2
2: X=0.25000000 Y=0.25000000 Z=0.75000000
Mg NPT= 781 R0=.000050000 RMT= 2.50000 Z: 12.00000
LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 1.0000000
48 NUMBER OF SYMMETRY OPERATIONS
The number of k-points is 72 (12x12x12), RmtKmax is 7, GMAX=12 (for test purpose, usually I set it to 24).
Best,
Pascal
Pascal Boulet
—
Professor in computational materials - DEPARTMENT OF CHEMISTRY
University of Aix-Marseille - Avenue Escadrille Normandie Niemen - F-13013 Marseille - FRANCE
Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
Email : pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>
> Le 15 août 2020 à 11:40, Tran, Fabien <fabien.tran at tuwien.ac.at> a écrit :
>
> Hi,
>
> For calculations on small cells with many k-points it is more preferable (for speediness) to use k-point paralllelization instead of MPI parallelization. And, as mentioned by PB, MPI applied to small matrices may not work.
>
> Of course, if the number of cores that you have at disposal is larger (twice for instance) than the number of k-points in the IBZ, then you can combine the k-point and MPI parallelizations (two cores for each k-point).
>
> An example of .machines file for k-point parallelization is (supposing that you have 6 k-points in the IBZ and want to use one machine having 6 cores):
>
> lapw0: n1071 n1071 n1071 n1071 n1071 n1071
> dstart: n1071 n1071 n1071 n1071 n1071 n1071
> nlvdw: n1071 n1071 n1071 n1071 n1071 n1071
> 1:1071
> 1:1071
> 1:1071
> 1:1071
> 1:1071
> 1:1071
> granularity:1
> extrafine:1
>
> The six lines "1:1071" mean that lapw1, lapw2 and hf are k-point (no MPI) parallelized (one line fore each core). lapw0, dstart and nlvdw are MPI parallelized. In this example, the omp parallelization is ignored by supposing that OMP_NUM_THREADS is set to 1.
>
> Besides, I find your computational time with HF as very large. What is the number of atoms in the cell, the number of k-points (plz, specify th n1xn2nx3 k-mesh), RKmax, etc.?
>
> Best,
> FT
>
> From: Wien <wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of pboulet <pascal.boulet at univ-amu.fr>
> Sent: Saturday, August 15, 2020 10:35 AM
> To: A Mailing list for WIEN2k users
> Subject: Re: [Wien] Wien post from pascal.boulet at univ-amu.fr (Errors with DGEMM and hybrid calculations)
>
> Dear Peter,
>
> Thank you for your response. It clarifies some points for me.
>
> I have run another calculation for Mg2Si, which is a small system, on 12 cores (no MKL errors!). The job ran for 12 hours (CPU time limit I set) and made only 3 SCF cycles without converging.
>
> The .machines file I use looks like this:
>
> 1:1071:12
> lapw0: n1071 n1071
> dstart: n1071 n1071
> nlvdw: n1071 n1071
> granularity:1
> extrafine:1
>
> I guess I am not optimising the number of cores w.r.t. the size of the problem (72 k-points, 14 HF bands, 12 occupied +2 unoccupied).
>
> I changed the number of processors for 72, hoping for a 1 k-point/core parallelisation and commenting all the lines of .machines except granularity and extrafine. I got less than 1 cycle in 12 hours.
>
> What should I do to run the HF part on k-points parallelisation only (no mpi)? This point that is not clear for me from the manual.
>
> Thank you
> Best regards
> Pascal
>
> Pascal Boulet
> —
> Professor in computational materials - DEPARTMENT OF CHEMISTRY
>
> University of Aix-Marseille - Avenue Escadrille Normandie Niemen - F-13013 Marseille - FRANCE
> Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
> Email : pascal.boulet at univ-amu.fr
>
> Le 12 août 2020 à 14:12, Peter Blaha <pblaha at theochem.tuwien.ac.at> a écrit :
>
> Your message is too big to be accepted.
>
> Anyway, the DGEMM messages seem to be a relict of the mkl you are using, and most likely is related to the use of too many mpi-cores for such a small matrix. At least when I continue your Mg2Si calculations (in k-parallel mode) the :DIS and :ENE are continuous, which means that the previous results are ok.
>
> Concerning hf, I don't know. Again, running this in sequential (k-point parallel) mode is no problems and converges quickly.
>
> I suggest that you change your setup to a k-parallel run for such small systems.
>
> Best regards
> Peter Blaha
>
> ---------------------------------------------------------------------
> Subject:
> Errors with DGEMM and hybrid calculations
> From:
> pboulet <pascal.boulet at univ-amu.fr>
> Date:
> 8/11/20, 6:31 PM
> To:
> A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
>
> Dear all,
>
> I have a strange problem with LAPACK. I get an error message with wrong parameters sent to DGEMM, but still wien2k (19.2) seems to converge the scf. Is that possible? What could be the "problem"?
>
> I have attached an archive containing the summary of the SCF + compilation options + SLURM output file. The error message is in the dayfile.
> The same error shows up with Wien2k 18.1.
>
> Actually this case is a test case for testing hybrid calculations as I have problems with my real case, which is found to be metallic with PBE. At least Mg2Si is a small band gap semiconductor.
>
> When I go ahead with the HSE06 functional and Mg2Si I get a different error: segmentation fault during the hf run. As Mg2Si is a small system I guess this is not a memory problem: the node is 128GB.
> Note that the same problem occurs for my real case file, but the LAPACK problem does not occur.
>
> The second archive contains some files related to the hybrid calculation.
>
> Some hints would be welcome as I am completely lost in these (unrelated) errors!
>
> Thank you.
> Best regards,
> Pascal
>
>
> Pascal Boulet
> --
>
> P.Blaha
> --------------------------------------------------------------------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
> Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
> WWW: http://www.imc.tuwien.ac.at/TC_Blaha
> --------------------------------------------------------------------------
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20200815/6c0f5f70/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1585 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20200815/6c0f5f70/attachment.p7s>
More information about the Wien
mailing list