[Wien] Is it possible to use a local scratch directory when (# of k-points)/( of nodes) != integer?
Peter Blaha
pblaha at theochem.tuwien.ac.at
Thu Oct 25 08:51:49 CEST 2007
For using $SCRATCH you must have a "fixed distribution of the k-points
to the different nodes, otherwise it could happen that lapw2 for
junk X is done on a different processor (because of load balancing).
The only way out is to specify a machines file such that all k-points will
be distributed at once. If the number of nodes does not fit to the k-points,
you can play around with the "weights" of the machines, until the
distribution fits.
3:node1
2:node2
granularity:1
will distribute 5 k-points on 2 processors without "rest". (testpara_lapw)
Steven Hahn schrieb:
> Dear WIEN2k users and developers,
>
> I am having trouble running WIEN2k calculations on our cluster when
> the number of k-points produced by "x kgen" does not have a
> reasonable factor to use for the number of processes. For example,
> with 246 k-points I'd like to be able to use more than 6 cores, but
> the calculation isn't large enough to efficiently use 41 cores. Even
> if the load balancing is no longer perfect, the calculation could
> still be completed much faster with 13 or 19 cores. Also, I prefer
> running with four processes per nodes so that my lapw1 and lapw2
> processes are not scattered amongst the nodes in the cluster. The
> home directory is too slow to consider using for scratch. Everything
> I've tried so far has produced intermittent errors like "forrtl:
> severe (24): end-of-file during read, unit 10, file /var/scratch/
> shahn/case_1/case_1.vector_16" in lapw2. Once I realized the problem
> is because the (# of k-points)/( of nodes) != integer, I have tried:
>
> 1) removing extrafine:1 from my PBS script. The calculation still
> crashes occasionally because the residual k-points are not always
> running on the same node. I got the misconception that extrafine:1 is
> compatible with granularity:1 and a local scratch disk from the
> example PBS script (http://www.wien2k.at/reg_user/faq/pbs.job).
> Should the extrafine:1 line be removed from the example?
>
> 2) replacing one line in my .machines file with residue:(name of
> node). I'm not sure why this is failing. Testpara_lapw shows only one
> list of k-points for each core.
>
> Before I start modifying scripts, I was wondering if it is possible
> to still use the existing scripts, a local scratch directory, and
> overcome the (# of k-points)/( of nodes) = integer requirement? Is
> this limitation just in the lapwpara_lapw1 and lapwpara_lapw2
> scripts, or is there a more fundamental concern? Would others be
> interested in being able to run with any number of processes?
>
> Sincerely,
> Steven Hahn
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
More information about the Wien
mailing list