[Wien] k point parallel in a supercomputer that returns a single name for .machines file

Thu Dec 23 11:08:41 CET 2010

Definitely you should not use such a .machines file.

If I understand you correctly, using   #PBS -l mppwidth=14
you request 14 cores. Why 14 ? I hope you do have 14 k-points
(or a multiple of 14 ???)

Now you have to find out in your job script, which nodes you got.
I'd assume,     echo $PBS_NODEFILE
gives you the required info. On the PBS I've used, it would give me a list
of 14 names.

You can also put a command like    env   into your job-file. It should
list all defined environment variables and once you have their names,
"echo" those which might be from PBS and contain useful info in the next test job.

Once you know the names of the nodes (and definitely it should NOT be 12
times the name "login6", you can create a .machines file with 14 lines like:

1:loginX       # max 4 times if you have 4 cores/node, or 8 times on the other nodes
1:loginX
1:loginX
1:loginX
1:loginY       # next node, repeated until you get 14 such lines.
....
granularity:1
extrafine:1

Of course, the required number of cores must be "commensurate" with your
k-points. It does NOT make much sense to request 14 cores, but having 15
k-points.

To get a speedup, each lapw1-junk must cost some reasonable time. Otherwise the
overhead (jobs are started with one sec delay, more disk-I/O, sumpara,...)
will kill you. In other words: don't expect any speedup for the TiC example
when running on 12 or 14 nodes. Each single lapw1-part must take at least about 10 sec.

Further considerations:
Some CPUs are limited by the speed of their memory access. This means that
on a 8-core node, eg. one single lapw1 would take 100 sec, but when running
4 lapw1 in parallel, each run would take 130 sec and using 8 lapw1 jobs, each
job may even run  200 sec. Obviously, you have to find out the  best throughput/
performance for your hardware.
Sometimes one can get best performance when using only 4 lapw1 on one 8 node
node, but setting OMP_NUM_THREAD to 2  (enabeling mkl-parallelization for ifort/mkl).

Am 21.12.2010 15:40, schrieb Markus Kaukonen:
> Dear WIEN2k,
>
> I have a system where there are 4 or 8 cores in one node.
> (http://www.csc.fi/english/pages/louhi_guide/hardware/computenodes)
>
> All cores see the same disk space.
> System has Portable Batch System, Professional Edition (PBS Pro)
> Typically $PBS_NODEFILE
> contains only a name of a single node (or some name for a collection
> of nodes???)
>
> System setup is such that one cannot use
> #PBS -l nodes=8:oneproc
> (which is probably rude agains other users) but must use
> #PBS -l mppwidth=14
>
> I would like to run k-point parallel wien2k in this system (and not
> mpi parallel).
> Is this possible?
> To me it seems that if I generate a .machines file
>   #
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> 1:login6:1
> granularity:1
> extrafine:1
>
> one does not gain in speed when compared to a single core.
>
> Terveisin, Markus
>
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------