[Wien] Wien post from pascal.boulet at univ-amu.fr (Errors with DGEMM and hybrid calculations)
Peter Blaha
pblaha at theochem.tuwien.ac.at
Sat Aug 15 21:02:48 CEST 2020
Just to give you a clue:
For your Mg2Si I used only 8 k-points in the IBZ, but then only
k-parallel job on 8 nodes with OMP_NUM_THREADS=2 (16 cores in total)
lapw1 and 2: a few seconds /per k-point all together.
hf: 20 seconds per k-point
If I use instead only 4 nodes k-parallel, but each with mpi on 8 cores
(32 nodes in total) I get:
lapw1 and 2 times are a bit worse (eg 5 instead of 3 seconds because mpi
is not so efficient as k-parallel)
hf: 13 seconds for 2 k-points
In essence, if you manage to distribute the k-points very well, an scf
cycle with hf should run within a 30 seconds or so.
I don't quite understand your "hours" of cpu time, even when using a
better k-mesh.
Am 15.08.2020 um 10:35 schrieb pboulet:
> Dear Peter,
>
> Thank you for your response. It clarifies some points for me.
>
> I have run another calculation for Mg2Si, which is a small system, on 12
> cores (no MKL errors!). The job ran for 12 hours (CPU time limit I set)
> and made only 3 SCF cycles without converging.
>
> The .machines file I use looks like this:
> 1:1071:12
> lapw0: n1071 n1071
> dstart: n1071 n1071
> nlvdw: n1071 n1071
> granularity:1
> extrafine:1
>
> I guess I am not optimising the number of cores w.r.t. the size of the
> problem (72 k-points, 14 HF bands, 12 occupied +2 unoccupied).
>
> I changed the number of processors for 72, hoping for a 1 k-point/core
> parallelisation and commenting all the lines of .machines except
> granularity and extrafine. I got less than 1 cycle in 12 hours.
>
> What should I do to run the HF part on k-points parallelisation only (no
> mpi)? This point that is not clear for me from the manual.
>
> Thank you
> Best regards
> Pascal
>
>
> Pascal Boulet
> —
> /Professor in computational materials - DEPARTMENT OF CHEMISTRY/
> University of Aix-Marseille - Avenue Escadrille Normandie Niemen -
> F-13013 Marseille - FRANCE
> Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
> Email : pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>
>
>
>
>
>
>
>
>
>> Le 12 août 2020 à 14:12, Peter Blaha <pblaha at theochem.tuwien.ac.at
>> <mailto:pblaha at theochem.tuwien.ac.at>> a écrit :
>>
>> Your message is too big to be accepted.
>>
>> Anyway, the DGEMM messages seem to be a relict of the mkl you are
>> using, and most likely is related to the use of too many mpi-cores for
>> such a small matrix. At least when I continue your Mg2Si calculations
>> (in k-parallel mode) the :DIS and :ENE are continuous, which means
>> that the previous results are ok.
>>
>> Concerning hf, I don't know. Again, running this in sequential
>> (k-point parallel) mode is no problems and converges quickly.
>>
>> I suggest that you change your setup to a k-parallel run for such
>> small systems.
>>
>> Best regards
>> Peter Blaha
>>
>> ---------------------------------------------------------------------
>> Subject:
>> Errors with DGEMM and hybrid calculations
>> From:
>> pboulet <pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>>
>> Date:
>> 8/11/20, 6:31 PM
>> To:
>> A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at
>> <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>
>> Dear all,
>>
>> I have a strange problem with LAPACK. I get an error message with
>> wrong parameters sent to DGEMM, but still wien2k (19.2) seems to
>> converge the scf. Is that possible? What could be the "problem"?
>>
>> I have attached an archive containing the summary of the SCF +
>> compilation options + SLURM output file. The error message is in the
>> dayfile.
>> The same error shows up with Wien2k 18.1.
>>
>>
>> Actually this case is a test case for testing hybrid calculations as I
>> have problems with my real case, which is found to be metallic with
>> PBE. At least Mg2Si is a small band gap semiconductor.
>>
>> When I go ahead with the HSE06 functional and Mg2Si I get a different
>> error: segmentation fault during the hf run. As Mg2Si is a small
>> system I guess this is not a memory problem: the node is 128GB.
>> Note that the same problem occurs for my real case file, but the
>> LAPACK problem does not occur.
>>
>> The second archive contains some files related to the hybrid calculation.
>>
>> Some hints would be welcome as I am completely lost in these
>> (unrelated) errors!
>>
>> Thank you.
>> Best regards,
>> Pascal
>>
>>
>> Pascal Boulet
>> --
>>
>> P.Blaha
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
>> Email: blaha at theochem.tuwien.ac.at
>> <mailto:blaha at theochem.tuwien.ac.at> WIEN2k: http://www.wien2k.at
>> WWW: http://www.imc.tuwien.ac.at/TC_Blaha
>> --------------------------------------------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW:
http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------
More information about the Wien
mailing list