[Wien] Wien post from pascal.boulet at univ-amu.fr (Errors with DGEMM and hybrid calculations)

Peter Blaha pblaha at theochem.tuwien.ac.at
Sat Aug 15 21:02:48 CEST 2020


Just to give you a clue:

For your Mg2Si I used only 8 k-points in the IBZ, but then only 
k-parallel job on 8 nodes with OMP_NUM_THREADS=2 (16 cores in total)

lapw1 and 2: a few seconds /per k-point all together.
hf:          20 seconds per k-point

If I use instead only 4 nodes k-parallel, but each with mpi on 8 cores 
(32 nodes in total) I get:
lapw1 and 2 times are a bit worse (eg 5 instead of 3 seconds because mpi 
is not so efficient as k-parallel)
hf:  13 seconds for 2 k-points

In essence, if you manage to distribute the k-points very well, an scf 
cycle with hf should run within a 30 seconds or so.

I don't quite understand your "hours" of cpu time, even when using a 
better k-mesh.


Am 15.08.2020 um 10:35 schrieb pboulet:
> Dear Peter,
> 
> Thank you for your response. It clarifies some points for me.
> 
> I have run another calculation for Mg2Si, which is a small system, on 12 
> cores (no MKL errors!). The job ran for 12 hours  (CPU time limit I set) 
> and made only 3 SCF cycles without converging.
> 
> The .machines file I use looks like this:
> 1:1071:12
> lapw0: n1071 n1071
> dstart: n1071 n1071
> nlvdw: n1071 n1071
> granularity:1
> extrafine:1
> 
> I guess I am not optimising the number of cores w.r.t. the size of the 
> problem (72 k-points, 14 HF bands, 12 occupied +2 unoccupied).
> 
> I changed the number of processors for 72, hoping for a 1 k-point/core 
> parallelisation and commenting all the lines of .machines except 
> granularity and extrafine. I got less than 1 cycle in 12 hours.
> 
> What should I do to run the HF part on k-points parallelisation only (no 
> mpi)? This point that is not clear for me from the manual.
> 
> Thank you
> Best regards
> Pascal
> 
> 
> Pascal Boulet
>> /Professor in computational materials - DEPARTMENT OF CHEMISTRY/
> University of Aix-Marseille - Avenue Escadrille Normandie Niemen - 
> F-13013 Marseille - FRANCE
> Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
> Email : pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>
> 
> 
> 
> 
> 
> 
> 
> 
>> Le 12 août 2020 à 14:12, Peter Blaha <pblaha at theochem.tuwien.ac.at 
>> <mailto:pblaha at theochem.tuwien.ac.at>> a écrit :
>>
>> Your message is too big to be accepted.
>>
>> Anyway, the DGEMM messages seem to be a relict of the mkl you are 
>> using, and most likely is related to the use of too many mpi-cores for 
>> such a small matrix. At least when I continue your Mg2Si calculations 
>> (in k-parallel mode) the :DIS and :ENE are continuous, which means 
>> that the previous results are ok.
>>
>> Concerning hf, I don't know. Again, running this in sequential 
>> (k-point parallel) mode is no problems and converges quickly.
>>
>> I suggest that you change your setup to a k-parallel run for such 
>> small systems.
>>
>> Best regards
>> Peter Blaha
>>
>> ---------------------------------------------------------------------
>> Subject:
>> Errors with DGEMM and hybrid calculations
>> From:
>> pboulet <pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>>
>> Date:
>> 8/11/20, 6:31 PM
>> To:
>> A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at 
>> <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>
>> Dear all,
>>
>> I have a strange problem with LAPACK. I get an error message with 
>> wrong parameters sent to DGEMM, but still wien2k (19.2) seems to 
>> converge the scf. Is that possible? What could be the "problem"?
>>
>> I have attached an archive containing the summary of the SCF + 
>> compilation options + SLURM output file. The error message is in the 
>> dayfile.
>> The same error shows up with Wien2k 18.1.
>>
>>
>> Actually this case is a test case for testing hybrid calculations as I 
>> have problems with my real case, which is found to be metallic with 
>> PBE. At least Mg2Si is a small band gap semiconductor.
>>
>> When I go ahead with the HSE06 functional and Mg2Si I get a different 
>> error: segmentation fault during the hf run. As Mg2Si is a small 
>> system I guess this is not a memory problem: the node is 128GB.
>> Note that the same problem occurs for my real case file, but the 
>> LAPACK problem does not occur.
>>
>> The second archive contains some files related to the hybrid calculation.
>>
>> Some hints would be welcome as I am completely lost in these 
>> (unrelated) errors!
>>
>> Thank you.
>> Best regards,
>> Pascal
>>
>>
>> Pascal Boulet
>> -- 
>>
>>                                      P.Blaha
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>> Email: blaha at theochem.tuwien.ac.at 
>> <mailto:blaha at theochem.tuwien.ac.at>    WIEN2k: http://www.wien2k.at
>> WWW: http://www.imc.tuwien.ac.at/TC_Blaha
>> --------------------------------------------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at: 
>>  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW: 
http://www.imc.tuwien.ac.at/tc_blaha------------------------------------------------------------------------- 



More information about the Wien mailing list