[Wien] What type of CPU's for big calculations?
fecher at uni-mainz.de
Sat Nov 11 09:10:00 CET 2006
Hi Peter just some other questions,
Do you suggest that a cluster build from Core 2 Duo (E6700) machines is better than a Xeon Dual core (5160, 5080, 7xxx) Cluster.
In the first you combine 2 CPU systems in the second you may combine at rather the same price 4 CPUs per machine (or more if you like to go
for an expensive solution.)
The Ithanum are real 64 bit processors, is there any fortran compiler breaking
the 2GB barrier on the pseudo-64 bit CPUS (Intel or AMD).
The last point is right, 2 calculations in parallel with OMP=1 is faster than 2 consecutive runs with OMP=2,
I played with it and up to 4 calculations in parallel and serial.
Von: wien-bounces at zeus.theochem.tuwien.ac.at im Auftrag von Peter Blaha
Gesendet: Sa 11.11.2006 08:55
An: A Mailing list for WIEN2k users
Betreff: Re: [Wien] What type of CPU's for big calculations?
> 1) Has anyone tested big cases (e.g. matrix sizes more than 14000,
> perhaps as high as 25000) on the newer CPU's with more RAM?
There is not much size dependency and definitely TODAY Intels Dore2 Duo
E6X00 processors are best.
However, for systems with NMAT=25000 you will hardly need any
k-point-parallelism AND you will RUN OUT OF MEMORY (in ther complex
version more than 13GB). So you would have to use the mpi version.
For such a case I'd consider to get cpu time on one of the big
"computercenters", which have Infiniband-based clusters for such type of
calculations. (Infiniband is quite expensive and you don't need it for
PS: Of course, soon quadcore CPUs come out ...
> 2) Does one have to have a 64bit system for big cases.
Can you still buy a 32 bit system ? An of course: Yes for big cases you
definitely need a 64 bit system (with 32 bit you are limited to 2 GB
> 3) Does threading the cores really matter with big matrices
> (OMP_NUM_THREADS=2) ?
Switch it on if you run only one job on a dual core node (you gain, but
of course not a factor of two)
Switch it off, when running 2 k-point-parallel jobs on the same node.
You loose a bit because of memory constrains.
> 4) What about the specifics of the memory architecture (shared versus
> independent) and the L2 cache (size and whether it is shared)?
Not particular important. All big jobs run out of cache anyway. Thus it
is only a matter how often you need to load data in the cache, but our
blocked algorithm will lead to almost cache-size independency. (It
matters of course for "small" progreams, which may fit totally in the
cache (or not).
> 5) Has anyone tried running two jobs each with two threads to a two
> CPU dual-core machine (four effective cores) -- or does the OS get
> confused about who does what
"Hyperthreading" is a big "flop" (at least for hight performance computing.
Try to avoid that and run either 1-k point + OMP=2 or 2-kpoints and OMP=1
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 4548 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20061111/f137a3bb/attachment.bin
More information about the Wien