[Wien] Problem with parallel jobs of comlex structures (supercells) on hpc

Sergeev Gregory sgregory at live.ru
Tue Jan 28 11:13:45 CET 2025


Dear Prof. Blaha, Prof. Marks, Gavin,
I have carefully studied all the advices you gave me, and finally solved my problem.

Professor Blaha's advice to check the parallel_options file was especially valuable.
In this file, for variable WIEN_MPIRUN I used dafault value: "mpirun -np _NP_ _EXEC_"
After I replaced it to "srun --mpi=pmi2 -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_" all works fine.
I don't know, why with "mpirun -np _NP_ _EXEC_" simple case works on 2 nodes unlike supercell case, but problem solved now.

Thanks again for your help

 - Gregory Sergeev


________________________________
От: Wien <wien-bounces at zeus.theochem.tuwien.ac.at> от имени Peter Blaha <peter.blaha at tuwien.ac.at>
Отправлено: 24 января 2025 г. 20:52
Кому: wien at zeus.theochem.tuwien.ac.at <wien at zeus.theochem.tuwien.ac.at>
Тема: Re: [Wien] Problem with parallel jobs of comlex structures (supercells) on hpc

Check
$WIENROOT/WIEN2k_parallel_options

setenv TASKSET "no"
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv DELAY 0.1
setenv SLEEPY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"

Is your MPI_REMOTE set to zero or one ?
and     USE_REMOTE  ??

Can you do k-parallel only (no mpi) on 2 nodes ?

You did not show the   .machines file. Is it ok ?

And maybe the beginning of your job scipt, maybe some slurm parameters
are not set properly for the 2 node job ?

Am 24.01.2025 um 14:36 schrieb Sergeev Gregory:
> Dear developers,
> I do my calculations on hpc with slurm system and I have strange
> behaviour of parallel wien2k jobs:
>
> I have two structures:
> 1. Structure with 8 atoms in unitcell (simple structure)
> 2. Supercell structure with 64 atoms (2*2*2 supercell structure) based
> on cell from simple structure
>
> I try to do Wien2k calculations on parallel mode with two configs:
> 1. Calculations on 1 node (1 node has 48 processors) with 12 parallel
> jobs with 4 processors per each job (one node job)
> 2. Calculations on 2 nodes (2 node has 48*2=96 processors) with 24
> parallel jobs with 4 processors per each job (two node job)
>
> For "simple structure" "one node job" and "two node job" work without
> problems.
>
> For "supercell structure" "one node job" works well, but "two node job"
> crashs with errors in .time1_* files (I use Intel MPI):
>
> -----------------
> n053 n053 n053 n053(21)
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 21859 RUNNING AT n053
> =   EXIT CODE: 9
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 21859 RUNNING AT n053
> =   EXIT CODE: 9
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
>     Intel(R) MPI Library troubleshooting guide:
>        https://software.intel.com/node/561764
> ===================================================================================
> 0.042u 0.144s 2:45.42 0.1%    0+0k 4064+8io 60pf+0w
> -----------------
>
> First I thinked, that there are problems with unufficial memory on "2
> node job" (but why, if "1 node job" works with same processors per one
> parallel job?). I tried to twice increaced used memory per task (#SBATCH
> --cpus-per-task 2), but this fix haven't solve problem. Same error.
>
> Any ideas why such strange behavior?
> Does Wien2k have problems scaling to multiple nodes?
>
> I would appreciate your help. I want to speed up calculations for
> complex structures, I have the resources, but I can't do it.
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

--
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------

_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20250128/5bb2d542/attachment.htm>


More information about the Wien mailing list