[Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

Thu Oct 17 16:11:34 CEST 2013

There are so many possibilities, a few:

a) If you only request 1 core/node most queuing systems (qsub/msub
etc) will allocate the other cores to other jobs. You are then going
to be very dependent upon what those other jobs are doing. Normal is
to use all the cores on a given node.

b) When you run on cluster B, in addition to a) it is going to be
inefficient to run with mpi communications across nodes and it is much
better to run on a given node across cores. Are you using a machines
file with eight 1: nodeA lines (for instance) or one with a single 1:
nodeA nodeB....? The first does not use mpi, the second does. To use
mpi within a node you would use lines such as 1:node:8. Knowledge of
your .machines file will help people assist you.

c) The memory on those clusters is very small, whoever bought them was
not thinking about large scale jobs. I look for at least 4G/core, and
2G/core is barely acceptable. You are going to have to use mpi.

d) All mpi is equal, but some mpi is more equal than others. Depending
upon whether you have infiniband, ethernet, openmpi, impi and how
everything was compiled you can see enormous differences. One thing to
look at is the difference between the cpu time and wall time (both in
case.dayfile and at the bottom of case.output1_*). With a good mpi
setup the wall time should be 5-10% more than the cpu time; with a bad
setup it can be several times it.

On Thu, Oct 17, 2013 at 8:44 AM, Yundi Quan <quanyundi at gmail.com> wrote:
> Hi,
> I have access to two clusters as a low-level user. One cluster (cluster A)
> consists of nodes with 8 core and 8 G mem per node. The other cluster
> (cluster B) has 24G mem per node and each node has 14 cores or more. The
> cores on cluster A are Xeon CPU E5620 at 2.40GHz, while the cores on cluster B
> are Xeon CPU X5550 at 2.67GH. From the specifications (2.40GHz+12288 KB cache
> vs 2.67GHz+8192 KB cache), two machines should be very close in performance.
> But it does not seem to be so.
>
> I have job with 72 atoms per unit cell. I initialized the job on cluster A
> and ran it for a few iterations. Each iteration took 2 hours. Then, I moved
> the job to cluster B (14 cores per node with @2.67GHz). Now it takes more
> than 8 hours to finish one iteration. On both clusters, I request one core
> per node and 8 nodes per job ( 8 is the number of k points). I compiled
> WIEN2k_13 on cluster A without mpi. On cluster B, WIEN2k_12 was compiled by
> the administrator with mpi.
>
> What could have caused poor performance of cluster B? Is it because of MPI?
>
> On an unrelated question. Sometimes memory would run out on cluster B which
> has 24Gmem per node. Nevertheless the same job could run smoothly on cluster
> A which only has 8 G per node.
>
> Thanks.

-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi