[Wien] Low cpu usage on open-mosix Linux cluster

Thu Oct 21 12:49:55 CEST 2004

Helly Enrico,

here are the scripts.  Some words of warning :

* They have only been tested on my machine.  There's a good chance they will not work straight away on yours.  If you have problems, please run the script with option -xf in its first line (instead of -f), capture the output in a file, and send me that file (together with .machines, relevant part of the dayfile).

* I have adapted lapw1para, lapw2para and lapwsopara; so you may run 'run -p', or 'run -p -so', or 'runsp -p', or 'runsp -p -so'.  I have never worried about eg. lapwdmpara.

* I never use residue machines.  They may work, or they may not.

* The scripts are designed to use local scratch.  They will work for global (nfs mounted) scratch also.  However, in that case the default scripts (from the wien installation) are more flexible and are therefore recommended.

* Before you use the scripts, edit each of them.  You will find lines like :
sed -e 's%/data/scratch/jorissen%/scratch/jorissen%g' dnlapw1_$loop.def > tempdef
What happens here?  The sed command changes the lapw1(,2,so)_i.def file.  Normally some files (vector, help) would be in global scratch on the server (/data/scratch/jorissen in my case : this is the value of $SCRATCH in your .cshrc or .bashrc file).  This is transformed to /scratch/jorissen, which is the path of the local scratch on a node.
Evidently, you have to edit these statements and replace the 'jorissen' stuff with whatever is relevant for you.

If the scratch paths are the same on server and nodes, this is no problem (you might comment out the sed commands, but they don't hurt).
The nodes are not allowed to have different scratch paths.
If you are using on the server $SCRATCH = .       , then you will have to change the sed-commands (i.e., sth like sed /pwd/case.vector to /localscratch/case.vector) since you cannot expect the input files to be found on the nodes ...

* The qtl-section from lapw2para is extended, so that in case you run x lapw2 -p -qtl, it will know exactly where to take the vectorfiles from.

* The treatment of the .machines-files is somewhat different, because these scripts allow for extra lines that specify to do certain jobs on a node also; eg., you could have a line
qtl:node-1
in your .machines-file, which will cause the lapw2 -qtl to be executed on a node; equivalently, you could have 
lapw0 : node-1
in your .machines-file.  While the standard lapw0para runs either mpi or just 'serially' on the server, now you can use this to redirect the lapw0 job without MPI to another machine (this can be useful, when many calculations run at the same time on a system with many nodes, and the lapw0's on the server become a bottleneck.  Eg. you have big cases, few k-points, so you can run more jobs but the lapw0's also get heavier ...).   [I have a similar extension to x_lapw, though this is less useful, as (see a comment about it by Peter Blaha some time ago), it's very easy to launch single programs remotely from the commandline ...]

I hope there are no bugs and things will work just fine ...

Good luck,

Kevin Jorissen

EMAT - Electron Microscopy for Materials Science   (http://webhost.ua.ac.be/emat/)
Dept. of Physics

UA - Universiteit Antwerpen
Groenenborgerlaan 171
B-2020 Antwerpen
Belgium

tel  +32 3 2653249
fax + 32 3 2653257
e-mail kevin.jorissen at ua.ac.be

________________________________

Van: wien-admin at zeus.theochem.tuwien.ac.at namens EB Lombardi
Verzonden: wo 20-10-2004 12:02
Aan: wien at zeus.theochem.tuwien.ac.at
Onderwerp: Re: [Wien] Low cpu usage on open-mosix Linux cluster

Dear Torsten and Kevin

Thank you for your suggestions.

So it seems that as long as the number of jobs <= number of processors
(per node), Wien should be able to run unmodified with local scratch
partitions. (PS: I mostly run spinpolarized cases.)

To Kevin: I'd appreciate it if I could try your scripts

Thank you

Enrico

Jorissen Kevin wrote:

>There are two ways around it :
>
>- Like Stefaan does : fix everything by specifying the weights in .machines such that all the k-points are distributed and the lapw1/2/so/para has no choice anymore
>- Like I do : reprogram lapw1/2/so/para so that they read from .processes which nodes were used exactly in the previous job, and then force the current job to do the same.
>
>Both situations are less flexible than the NFS solution, but the network communication is reduced drastically and the harddisks that are present in the nodes anyway get some exercise.
>
>If you want to try out my scripts, just let me know.
>
>
>
>Kevin Jorissen
>
>EMAT - Electron Microscopy for Materials Science   (http://webhost.ua.ac.be/emat/)
>Dept. of Physics
>
>UA - Universiteit Antwerpen
>Groenenborgerlaan 171
>B-2020 Antwerpen
>Belgium
>
>tel  +32 3 2653249
>fax + 32 3 2653257
>e-mail kevin.jorissen at ua.ac.be
>
>
>________________________________
>
>Van: wien-admin at zeus.theochem.tuwien.ac.at namens EB Lombardi
>Verzonden: di 19-10-2004 12:31
>Aan: wien at zeus.theochem.tuwien.ac.at
>Onderwerp: Re: [Wien] Low cpu usage on open-mosix Linux cluster
>
>
>
>Dear Dr Andersen
>
>Thank you for your e-mail
>
>Up to now I have been using the NFS mounted "case" directory as working
>directory - so what you wrote about NFS mounted scratch directories also
>applies here.
>To check, I ran a test with Wien running only on the home node (i.e no
>slow networks involved), which resulted in both lapw1 and lapw2 running
>at 99%.
>
>I have a question regarding local scratch partitions: when lapw1 has run
>on dual processor nodes 1 and 2 (ie k-point parallel over 4 processors),
>leaving case.vector_1 & vector_2 on the scratch partition of node 1,
>while vector_3 and vector_4 are left on node 2. When lapw2 runs, it
>cannot be guaranteed that lapw2 processes will be distributed among the
>nodes in the same order as lapw1 was.  Hence the "first" lapw2 job may
>well run on node 2, but will not find case.vector_1 there. I assume this
>would lead lapw2 to crash? If this is so, is there any way to work
>around this?
>
>About the configuration of the machine: it is a group of dual processor
>PIII, PIV and Xeon machines, grouped together as a mosix cluster using
>Linux version 2.4.22-openmosix2smp. Each node has 4GB RAM, with an NFS
>mounted file system.
>
>Thank you.
>
>Best regards
>
>Enrico
>
>Torsten Andersen wrote:
>
> 
>
>>Dear Mr. Lombardi,
>>
>>well, at least for lapw2, a fast file system (15k-RPM local disks with
>>huge caches and hardware-based RAID-0) is essential to utilizing more
>>than 1% of the CPU time... and if more than one process wants to
>>access the same file system at the same time (e.g., parallel lapw2),
>>this requirement becomes even more essential.
>>
>>If you have problems to get lapw1 to run at 100% CPU-time, the system
>>seems to be seriously misconfigured. I can think of two (there might
>>be more) problems in the setup:
>>
>>1. The scratch partition is NFS-mounted instead of local (and despite
>>many manufacturers claims to the opposite, networked file systems are
>>still VERY SLOW compared to local disks).
>>
>>2. The system memory bandwidth is too slow, e.g., using DDR-266 with
>>Xeons, or the memory is only connected to one CPU on Opterons.
>>
>>In order to "diagnose" a little better we need to know the
>>configuration in detail:-)
>>
>>Best regards,
>>Torsten Andersen.
>>
>>EB Lombardi wrote:
>>
>>   
>>
>>>Dear Wien users
>>>
>>>When I run Wien2k on a Linux-openMosix cluster, lapw1 and lapw2
>>>(k-point parallel) processes mostly use a low percentage of the
>>>available CPU time. Typically only 10-50% of each processor is used,
>>>with values below 10% and above 90% also occuring. On the other hand
>>>single processes, such as lapw0, etc, typically use 99.9% processor
>>>power of one processor. On each node, (number of jobs) = (number of
>>>processors).
>>>
>>>This low CPU utilizatioin does not occur on a dual processor linux
>>>machine, where cpu utilization is mostly 99.9%.
>>>
>>>Any suggestions on improving the CPU utilisation of lapw1c and lapw2
>>>on mosix clusters would be appreciated.
>>>
>>>Regards
>>>
>>>Enrico Lombardi
>>>     
>>>
>>   
>>
>
>_______________________________________________
>Wien mailing list
>Wien at zeus.theochem.tuwien.ac.at
>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
> 
>

_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 64537 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20041021/388c2d7e/attachment.bin