[Wien] Is it possible to use a local scratch directory when (# of k-points)/( of nodes) != integer?

Peter Blaha pblaha at theochem.tuwien.ac.at
Thu Oct 25 08:51:49 CEST 2007


For using $SCRATCH you must have a "fixed distribution of the k-points
to the different nodes, otherwise it could happen that lapw2 for
junk X is done on a different processor (because of load balancing).

The only way out is to specify a machines file such that all k-points will
be distributed at once. If the number of nodes does not fit to the k-points,
you can play around with the "weights" of the machines, until the
distribution fits.
3:node1
2:node2
granularity:1
will distribute 5 k-points on 2 processors without "rest". (testpara_lapw)

Steven Hahn schrieb:
> Dear WIEN2k users and developers,
> 
> I am having trouble running WIEN2k calculations on our cluster when  
> the number of k-points produced by "x kgen" does not have a  
> reasonable factor to use for the number of processes. For example,  
> with 246 k-points I'd like to be able to use more than 6 cores, but  
> the calculation isn't large enough to efficiently use 41 cores. Even  
> if the load balancing is no longer perfect, the calculation could  
> still be completed much faster with 13 or 19 cores. Also, I prefer  
> running with four processes per nodes so that my lapw1 and lapw2  
> processes are not scattered amongst the nodes in the cluster. The  
> home directory is too slow to consider using for scratch. Everything  
> I've tried so far has produced intermittent errors like "forrtl:  
> severe (24): end-of-file during read, unit 10, file /var/scratch/ 
> shahn/case_1/case_1.vector_16" in lapw2. Once I realized the problem  
> is because the (# of k-points)/( of nodes) != integer, I have tried:
> 
> 1) removing extrafine:1 from my PBS script. The calculation still  
> crashes occasionally because the residual k-points are not always  
> running on the same node. I got the misconception that extrafine:1 is  
> compatible with granularity:1 and a local scratch disk from the  
> example PBS script (http://www.wien2k.at/reg_user/faq/pbs.job).  
> Should the extrafine:1 line be removed from the example?
> 
> 2) replacing one line in my .machines file with residue:(name of  
> node). I'm not sure why this is failing. Testpara_lapw shows only one  
> list of k-points for each core.
> 
> Before I start modifying scripts, I was wondering if it is possible  
> to still use the existing scripts, a local scratch directory, and  
> overcome the  (# of k-points)/( of nodes) = integer requirement? Is  
> this limitation just in the lapwpara_lapw1 and lapwpara_lapw2  
> scripts, or is there a more fundamental concern? Would others be  
> interested in being able to run with any number of processes?
> 
> Sincerely,
> Steven Hahn
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------


More information about the Wien mailing list