[Wien] 24k points on 36processors ??. (a Fractional k-point per core)

Pavel Ondračka pavel.ondracka at email.cz
Thu Dec 12 13:33:37 CET 2019


I'm neither a PBS or csh expert but to me it looks like you are
spawning just a single kpoint job for each node and also for the lapw0
just a single process per node.

If you run just a single k-point job at the node and there is still not
enough memory than you probably need MPI. Or maybe the default memory
PBS gives you is not enough (maybe you need to specifically ask for
larger amount, no idea).

For the example from my last email to spawn 4 kpoint jobs per node at
three nodes with 3 openmp threads each the final .machines files should
look like this (with the nodexxx replaced with the actual nodenames
based on the $PBS_NODEFILE):

1:node1
1:node1
1:node1
1:node1
1:node2
1:node2
1:node2
1:node2
1:node3
1:node3
1:node3
1:node3
granularity:1
extrafine:1
omp_lapw2:3
omp_lapw1:3
omp_lapw0:4
omp_global:12
lapw0: node1:3 node2:3 node3:3

I would advice to read the .machines file section of the manual once
more, try to understand what should your .machines file look like and
than consult with whoever wrote your PBS script in the first place to
modify it so it generates the .machines file you need.

Best regards
Pavel

BTW you are actually not asking for 12cpus but just 8cpus...


On Thu, 2019-12-12 at 17:04 +0530, Ashwani Kumar wrote:
> Dear Sir,
>          Hyper-threading is disabled (just checked with facility
> expert). So 12 physical cores per node (intel xeon nehalem based
> arch.). Available Memory 4gb/core (48gb/node). 
>          Lapw1 stops with error "insufficient virtual memory". So i
> thought better to use 36 cores for 24k points as extra (48gb) memory
> will be available. I am using pbs queuing system (wien2k  V19.1
> compiled with openmpi_parallelization) which generates *.machine file
> when jobscript executed. Then how to set the omp_thread in *.machine
> file.  (jobscript file attached for your reference).
> 
> thanks,
> A. kumar
> 
> On Thu, Dec 12, 2019 at 2:55 PM Pavel Ondračka <
> pavel.ondracka at email.cz> wrote:
> > Hi,
> > 
> > do you have hyperthreading or not (in other words does this number
> > of
> > 12 already mean there are 6 physical CPUs and 12 virtuals, or 12
> > physical)? This would influence the advice maybe a bit...
> > 
> > Otherwise you need to experiment, the optimal setting is heavily
> > dependent on your specific CPU, memory speed and what you are
> > calculating (system size). 
> > 
> > When talking about the 24 kpoints and 36 processors, than running
> > 4kpoints on each node  (12 kpoints in parallel) with 3 openmp
> > threads
> > each might be a reasonable setting.
> > 
> > It is also possible that just leaving some cores idle might be the
> > best
> > thing to do (as running a lot of k-points in parallel you can get
> > limited by the memory speed so leaving some cores idle means more
> > memory bandwidth for the others):
> > This would correspond to running 8 kpoints on each node or 4
> > kpoints on
> > each node with 2 openmp threads each.
> > 
> > The linux kernel and modern processors are also usually good at
> > handling some small overload and load balancing so you can also try
> > to
> > overload the system a bit, i.e., 8kpoints per node with 2 openmp
> > threads each. 
> > 
> > Just try the different settings (single lapw1 run for each should
> > be
> > enough to get some idea) and compare the timings.
> > 
> > Best regards
> > Pavel
> > 
> > BTW for lapw0 I would go with something like 3 MPI processes per
> > node
> > with 4 OpenMP threads for each in this case.
> > 
> > On Thu, 2019-12-12 at 12:28 +0530, Ashwani Kumar wrote:
> > > Hi, 
> > >    This is related to no. of k-points which we provide during the
> > > initilization. No. of k-gen points given ; 120 with shifted mesh.
> > > Irr. k-points : 24k points. Running job on 3 nodes (12 x3
> > processors,
> > > 48 gb x 3 Ram). Job running on 24 processors only (with
> > granularity:
> > > 1, extrafine:1 in *.machine file) which means 1kpoint/1-core. How
> > can
> > > 24 k-points be made to run on 36  cores ?. Or how can 24 kpoints
> > can
> > > be distributed equally between 36 cores (or let's say 12 kpoints
> > on
> > > 24 processors to make calculation converge faster). 
> > > 
> > > thanks,
> > > A. Kumar
> > > _______________________________________________
> > > Wien mailing list
> > > Wien at zeus.theochem.tuwien.ac.at
> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > SEARCH the MAILING-LIST at:  
> > > 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



More information about the Wien mailing list