<div dir="ltr">Hi,<div>I have access to two clusters as a low-level user. One cluster (cluster A) consists of nodes with 8 core and 8 G mem per node. The other cluster (cluster B) has 24G mem per node and each node has 14 cores or more. The cores on cluster A are Xeon CPU E5620@2.40GHz, while the cores on cluster B are Xeon CPU X5550@2.67GH. From the specifications (2.40GHz+12288 KB cache vs 2.67GHz+8192 KB cache), two machines should be very close in performance. But it does not seem to be so.</div>
<div><br></div><div>I have job with 72 atoms per unit cell. I initialized the job on cluster A and ran it for a few iterations. Each iteration took 2 hours. Then, I moved the job to cluster B (14 cores per node with @2.67GHz). Now it takes more than 8 hours to finish one iteration. On both clusters, I request one core per node and 8 nodes per job ( 8 is the number of k points). I compiled WIEN2k_13 on cluster A without mpi. On cluster B, WIEN2k_12 was compiled by the administrator with mpi.</div>
<div><br></div><div>What could have caused poor performance of cluster B? Is it because of MPI?</div><div><br></div><div>On an unrelated question. Sometimes memory would run out on cluster B which has 24Gmem per node. Nevertheless the same job could run smoothly on cluster A which only has 8 G per node. </div>
<div><br></div><div>Thanks.</div></div>