[Wien] Low cpu usage on open-mosix Linux cluster
    Torsten Andersen 
    thor at physik.uni-kl.de
       
    Tue Oct 19 13:18:12 CEST 2004
    
    
  
Dear Mr. Lombardi,
EB Lombardi wrote:
> Dear Dr Andersen
> 
> Thank you for your e-mail
> 
> Up to now I have been using the NFS mounted "case" directory as working 
> directory - so what you wrote about NFS mounted scratch directories also 
> applies here.
> To check, I ran a test with Wien running only on the home node (i.e no 
> slow networks involved), which resulted in both lapw1 and lapw2 running 
> at 99%.
So the NFS is the problem...
> 
> I have a question regarding local scratch partitions: when lapw1 has run 
> on dual processor nodes 1 and 2 (ie k-point parallel over 4 processors), 
> leaving case.vector_1 & vector_2 on the scratch partition of node 1, 
> while vector_3 and vector_4 are left on node 2. When lapw2 runs, it 
> cannot be guaranteed that lapw2 processes will be distributed among the 
> nodes in the same order as lapw1 was.  Hence the "first" lapw2 job may 
> well run on node 2, but will not find case.vector_1 there. I assume this 
> would lead lapw2 to crash? If this is so, is there any way to work 
> around this?
In newer versions of Wien2k (at least in Wien2k_02 and up), lapw1, 
lapwso, and lapw2 are "synchronized" with respect to the machines. See 
the .machine* files in the case directory. The only constraint is that 
the scratch directory is in the same location on all your machines 
(since your .cshrc, .login, .profile is the same on all nodes - of 
course this can be tuned individually, but...).
> 
> About the configuration of the machine: it is a group of dual processor 
> PIII, PIV and Xeon machines, grouped together as a mosix cluster using 
> Linux version 2.4.22-openmosix2smp. Each node has 4GB RAM, with an NFS 
> mounted file system.
If you change the scratch partitions to local, everything should be ok.
Best regards,
Torsten Andersen.
> 
> Thank you.
> 
> Best regards
> 
> Enrico
> 
> Torsten Andersen wrote:
> 
>> Dear Mr. Lombardi,
>>
>> well, at least for lapw2, a fast file system (15k-RPM local disks with 
>> huge caches and hardware-based RAID-0) is essential to utilizing more 
>> than 1% of the CPU time... and if more than one process wants to 
>> access the same file system at the same time (e.g., parallel lapw2), 
>> this requirement becomes even more essential.
>>
>> If you have problems to get lapw1 to run at 100% CPU-time, the system 
>> seems to be seriously misconfigured. I can think of two (there might 
>> be more) problems in the setup:
>>
>> 1. The scratch partition is NFS-mounted instead of local (and despite 
>> many manufacturers claims to the opposite, networked file systems are 
>> still VERY SLOW compared to local disks).
>>
>> 2. The system memory bandwidth is too slow, e.g., using DDR-266 with 
>> Xeons, or the memory is only connected to one CPU on Opterons.
>>
>> In order to "diagnose" a little better we need to know the 
>> configuration in detail:-)
>>
>> Best regards,
>> Torsten Andersen.
>>
>> EB Lombardi wrote:
>>
>>> Dear Wien users
>>>
>>> When I run Wien2k on a Linux-openMosix cluster, lapw1 and lapw2 
>>> (k-point parallel) processes mostly use a low percentage of the 
>>> available CPU time. Typically only 10-50% of each processor is used, 
>>> with values below 10% and above 90% also occuring. On the other hand 
>>> single processes, such as lapw0, etc, typically use 99.9% processor 
>>> power of one processor. On each node, (number of jobs) = (number of 
>>> processors).
>>>
>>> This low CPU utilizatioin does not occur on a dual processor linux 
>>> machine, where cpu utilization is mostly 99.9%.
>>>
>>> Any suggestions on improving the CPU utilisation of lapw1c and lapw2 
>>> on mosix clusters would be appreciated.
>>>
>>> Regards
>>>
>>> Enrico Lombardi
>>
>>
>>
>>
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> 
-- 
Dr. Torsten Andersen        TA-web: http://deep.at/myspace/
AG Hübner, Department of Physics, Kaiserslautern University
http://cmt.physik.uni-kl.de    http://www.physik.uni-kl.de/
    
    
More information about the Wien
mailing list