[Wien] Problem with parallel jobs of comlex structures (supercells) on hpc

Laurence Marks laurence.marks at gmail.com
Fri Jan 24 19:50:10 CET 2025


grep "Matrix size" *output1* -A18

Somehow the "A" was lost in a cut & paste

You should also look at the end of case.scf1* and case.output1* for
messages, and check the error files.

---
Emeritus Professor Laurence Marks (Laurie)
www.numis.northwestern.edu
https://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en
"Research is to see what everybody else has seen, and to think what nobody
else has thought" Albert Szent-Györgyi

On Fri, Jan 24, 2025, 09:40 Laurence Marks <laurence.marks at gmail.com> wrote:

> Sorry, but you have not provided enough information for more than a guess.
>
> Exit code 9 is when the OS kills the task, often from out of memory (oom)
> buy it does not have to be. The larger calculation will require about 8*8
> more memory (perhaps more) than your simple calculation: do "grep "Matrix
> size" *output1* -18". You probably ran out of memory, and will need to use
> more mpi/kpt for the larger calculation.
>
> N.B., using 2 ompi per task is also useful in reducing the total memory
> useage. Combine this with mpi.
>
>
> ---
> Emeritus Professor Laurence Marks (Laurie)
> www.numis.northwestern.edu
> https://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought" Albert Szent-Györgyi
>
> On Fri, Jan 24, 2025, 07:46 Sergeev Gregory <sgregory at live.ru> wrote:
>
>> Dear developers,
>> I do my calculations on hpc with slurm system and I have strange
>> behaviour of parallel wien2k jobs:
>>
>> I have two structures:
>> 1. Structure with 8 atoms in unitcell (simple structure)
>> 2. Supercell structure with 64 atoms (2*2*2 supercell structure) based on
>> cell from simple structure
>>
>> I try to do Wien2k calculations on parallel mode with two configs:
>> 1. Calculations on 1 node (1 node has 48 processors) with 12 parallel
>> jobs with 4 processors per each job (one node job)
>> 2. Calculations on 2 nodes (2 node has 48*2=96 processors) with 24
>> parallel jobs with 4 processors per each job (two node job)
>>
>> For "simple structure" "one node job" and "two node job" work without
>> problems.
>>
>> For "supercell structure" "one node job" works well, but "two node job"
>> crashs with errors in .time1_* files (I use Intel MPI):
>>
>> -----------------
>> n053 n053 n053 n053(21)
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 21859 RUNNING AT n053
>> =   EXIT CODE: 9
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ===================================================================================
>>
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 21859 RUNNING AT n053
>> =   EXIT CODE: 9
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ===================================================================================
>>    Intel(R) MPI Library troubleshooting guide:
>>       https://software.intel.com/node/561764
>>
>> ===================================================================================
>> 0.042u 0.144s 2:45.42 0.1% 0+0k 4064+8io 60pf+0w
>> -----------------
>>
>> First I thinked, that there are problems with unufficial memory on "2
>> node job" (but why, if "1 node job" works with same processors per one
>> parallel job?). I tried to twice increaced used memory per task (#SBATCH
>> --cpus-per-task 2), but this fix haven't solve problem. Same error.
>>
>> Any ideas why such strange behavior?
>> Does Wien2k have problems scaling to multiple nodes?
>>
>> I would appreciate your help. I want to speed up calculations for complex
>> structures, I have the resources, but I can't do it.
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20250124/dbe5537c/attachment.htm>


More information about the Wien mailing list