[Wien] inefficiency of lapw2 on clusters

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Dec 28 09:38:48 CET 2020


This points to a fileserver/network problem on your cluster.

On a big cluster one usually has a home-directory on a fileserver drive, 
which is mounted over a network (eg. NFS, HPFS,..) on all nodes. While 
this is very convenient, it is on many systems also a huge bottleneck. 
Either the fileserver breaks down due to too many simultaneously 
requests or the network does not have sufficient band width.

You probably have set the SCRATCH variable to your working directory 
"./", which means that the large case.vector* files are stored in your 
working directory, leading to the bad performance of lapw2.

On most clusters there is a local scratch or tmp directory (files exist 
only on this node) and when setting your SCRATCH variable to this, the 
slow lapw2 should be gone.

In addition, I see that also lapw1 gets only 90% of a core. I recommend 
setting OMP_NUM_THREADS=2 and spanning only 4 k-parallel jobs in 
parallel (I assume you have a node with 8 cores).


On 12/28/20 9:14 AM, Fan wrote:
> Dear wien2k users,
> 
> I am trying to run wien2k on clusters, but I encountered a very strange 
> issue that the performance of lapw2 is very inefficient. For example,
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>  >   lapw1  -dn -p       (15:27:45) starting parallel lapw1 at Mon Dec 
> 28 15:27:46 CST 2020
> ->  starting parallel LAPW1 jobs at Mon Dec 28 15:27:46 CST 2020
> running LAPW1 in parallel mode (using .machines.help)
> 8 number_of_parallel_jobs
>       f02n10(58) 178.212u 3.134s 3:15.91 92.57%      0+0k 0+0io 0pf+0w
>       f02n10(58) 180.210u 3.011s 3:19.46 91.86%      0+0k 0+0io 0pf+0w
>       f02n10(58) 183.239u 3.019s 3:22.73 91.87%      0+0k 0+0io 0pf+0w
>       f02n10(57) 181.113u 2.884s 3:20.20 91.90%      0+0k 0+0io 0pf+0w
>       f02n10(57) 178.433u 2.965s 3:18.85 91.22%      0+0k 0+0io 0pf+0w
>       f02n10(57) 151.420u 2.756s 2:48.26 91.63%      0+0k 0+0io 0pf+0w
>       f02n10(57) 183.799u 3.065s 3:22.58 92.24%      0+0k 0+0io 0pf+0w
>       f02n10(57) 185.867u 3.109s 3:27.50 91.07%      0+0k 0+0io 0pf+0w
>     Summary of lapw1para:
>     f02n10        k=459   user=1422.29    wallclock=1575.49
> 1.034u 1.698s 3:30.15 1.2%      0+0k 472+128io 2pf+0w
>  >   lapw2 -up -p        (15:31:16) running LAPW2 in parallel mode
>        f02n10 37.680u 1.597s 6:32.94 10.00% 0+0k 0+0io 0pf+0w
>        f02n10 38.841u 1.682s 7:07.57 9.48% 0+0k 0+0io 0pf+0w
>        f02n10 38.611u 1.727s 6:51.53 9.80% 0+0k 0+0io 0pf+0w
>        f02n10 38.715u 1.728s 6:48.76 9.89% 0+0k 0+0io 0pf+0w
>        f02n10 37.847u 1.639s 7:01.08 9.38% 0+0k 0+0io 0pf+0w
>        f02n10 38.170u 1.709s 6:45.01 9.85% 0+0k 0+0io 0pf+0w
>        f02n10 39.261u 1.727s 7:01.11 9.73% 0+0k 0+0io 0pf+0w
>        f02n10 39.772u 1.765s 7:04.40 9.79% 0+0k 0+0io 0pf+0w
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
> As you can see the cpu utilizations are much lower than that of lapw1, 
> which made lawp2 more time-consuming than lapw1. MPI parallelization 
> performed even worse.
> 
> More strangely, it seems to be case-dependent. For TiC and some other 
> systems, it works fine. I also tried other nodes, the problem still 
> persists, however in my local workstation everything went well.
> 
> The version of wien2k is 19.2 compiled with intel icc and ifort without 
> any errors. MPI, FFTW, and ELPA are all available.
> 
>   Any suggestion will be appreciated.
> 
> Fan
> 
> 
> 
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list