[Wien] lapw2_mpi crashes during TB-mBJ calculations for WIEN2k_23.2

Peter Blaha peter.blaha at tuwien.ac.at
Sun Jun 18 19:26:12 CEST 2023


Thank you very much for the report.

I can confirm the problem, not only for mpi-calculations and mbj, but 
also for meta-GGAs.    The   -tau switch causes the problem in lapw2_mpi.

An endif statement was placed in the wrong place causing the problems.

Attached is a new   l2main.F.gz   subroutine, which should be put into 
$WIENROOT/SRC_lapw2. Change into this directory and type

gunzip l2main.F.gz

make all

cp  lapw2  lapw2c  lapw2_mpi  lapw2c_mpi   ..


Regards

Peter Blaha


Am 18.06.2023 um 13:06 schrieb 髙村仁:
> Dear WIEN2k developers and users,
>
> I would like to share the following situation I have for WIEN2k_23.2.
> WIEN2k_23.2 works fine for me, except for the crash of lapw2_mpi during TB-mBJ calculations using MPI parallel. First, I have performed TB-mBJ calculations for some oxides, such as MgO and TiO2, using WIEN2k_21.1 and MPI parallel without any problems. The results, e.g., corrected band gaps, are also excellent. Standard SCF calculations using WIEN2k_23.2, including MPI parallel, are also fine.
>
> Meanwhile, after init_mbj (now -tau switch is on for lapw2), MPI parallel calculations using WIEN2k_23.2 always crash during the first lapw2 process. The crash is reproducible for any case.struct I tested, including TiO2 on the Wien2k website. It should also be noted that serial or only k-point parallel (without MPI) TB-mBJ calculations are fine for the same WIEN2k_23.2 environment. The error messages regarding the lapw2_mpi crashes are just as follows:
>
> lapw2.error:
> **  testerror: Error in Parallel LAPW2
> lapw2_i.error:
> Error in LAPW2
>
> So, this crash appears to be a sudden death of MPI processes; STDOUT actually shows the following MPI error messages for 4 MPI processes:
>
> ...
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW1 END
> LAPW2 - FERMI; weights written
> Abort(805421582) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
> PMPI_Recv(171): MPI_Recv(buf=0x7ffdcd24c678, count=1, MPI_INTEGER, src=1, tag=MPI_ANY_TAG, comm=0x84000005, status=0x2ae8b59a3fe0) failed
> (unknown)(): Message truncated
> Abort(67224078) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
> PMPI_Recv(171): MPI_Recv(buf=0x7ffc824e7f78, count=1, MPI_INTEGER, src=1, tag=MPI_ANY_TAG, comm=0x84000005, status=0x2af2d64d3fe0) failed
> (unknown)(): Message truncated
> Abort(939639310) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
> PMPI_Recv(171): MPI_Recv(buf=0x7ffea8ea88f8, count=1, MPI_INTEGER, src=1, tag=MPI_ANY_TAG, comm=0x84000005, status=0x2b513d417fe0) failed
> (unknown)(): Message truncated
> Abort(402768398) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
> PMPI_Recv(171): MPI_Recv(buf=0x7ffde26becf8, count=1, MPI_INTEGER, src=1, tag=MPI_ANY_TAG, comm=0x84000005, status=0x2b92d0adffe0) failed
> (unknown)(): Message truncated
> ...
>
> I thought this might be only the case for my cluster:
>   12 nodes x Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz, Linux 3.10.0-1160.el7.x86_64
>   Intel compilers (2021.7.1 20221019)
>   Intel MPI libraries (Intel(R) MPI Library for Linux* OS, Version 2021.7 Build 20221022);
> So, I compiled WIEN2k_23.2 on a different cluster with different versions of Intel compilers and MPI libraries (ifort (IFORT) 19.1.3.304 20200925 and Intel(R) MPI Library for Linux* OS, Version 2019 Update 9 Build 20200923). The results are exactly the same, i.e., no problem for TB-mBJ calculations with MPI parallel for WIEN2k_21.1, but lapw2 always crashes for WIEN2k_23.2 only when TB-mBJ calculations are performed using MPI parallel (no problem for serial or k-point parallel (no MPI)). Again, this crash is reproducible for any case.struct for oxides I tested.
>
> I would greatly appreciate any comments and suggestions to solve this problem.
>
> Best regards,
>
>
> Dr. Hitoshi Takamura
> Tohoku Univ., Japan
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: l2main.F.gz
Type: application/x-gzip
Size: 23264 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20230618/c86d97f4/attachment.gz>


More information about the Wien mailing list