[Wien] inefficiency of lapw2 on clusters

Fan fxinwei123 at gmail.com
Mon Dec 28 09:14:11 CET 2020


Dear wien2k users,

I am trying to run wien2k on clusters, but I encountered a very strange
issue that the performance of lapw2 is very inefficient. For example,

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
>   lapw1  -dn -p       (15:27:45) starting parallel lapw1 at Mon Dec 28
15:27:46 CST 2020
->  starting parallel LAPW1 jobs at Mon Dec 28 15:27:46 CST 2020
running LAPW1 in parallel mode (using .machines.help)
8 number_of_parallel_jobs
     f02n10(58) 178.212u 3.134s 3:15.91 92.57%      0+0k 0+0io 0pf+0w
     f02n10(58) 180.210u 3.011s 3:19.46 91.86%      0+0k 0+0io 0pf+0w
     f02n10(58) 183.239u 3.019s 3:22.73 91.87%      0+0k 0+0io 0pf+0w
     f02n10(57) 181.113u 2.884s 3:20.20 91.90%      0+0k 0+0io 0pf+0w
     f02n10(57) 178.433u 2.965s 3:18.85 91.22%      0+0k 0+0io 0pf+0w
     f02n10(57) 151.420u 2.756s 2:48.26 91.63%      0+0k 0+0io 0pf+0w
     f02n10(57) 183.799u 3.065s 3:22.58 92.24%      0+0k 0+0io 0pf+0w
     f02n10(57) 185.867u 3.109s 3:27.50 91.07%      0+0k 0+0io 0pf+0w
   Summary of lapw1para:
   f02n10        k=459   user=1422.29    wallclock=1575.49
1.034u 1.698s 3:30.15 1.2%      0+0k 472+128io 2pf+0w
>   lapw2 -up -p        (15:31:16) running LAPW2 in parallel mode
      f02n10 37.680u 1.597s 6:32.94 10.00% 0+0k 0+0io 0pf+0w
      f02n10 38.841u 1.682s 7:07.57 9.48% 0+0k 0+0io 0pf+0w
      f02n10 38.611u 1.727s 6:51.53 9.80% 0+0k 0+0io 0pf+0w
      f02n10 38.715u 1.728s 6:48.76 9.89% 0+0k 0+0io 0pf+0w
      f02n10 37.847u 1.639s 7:01.08 9.38% 0+0k 0+0io 0pf+0w
      f02n10 38.170u 1.709s 6:45.01 9.85% 0+0k 0+0io 0pf+0w
      f02n10 39.261u 1.727s 7:01.11 9.73% 0+0k 0+0io 0pf+0w
      f02n10 39.772u 1.765s 7:04.40 9.79% 0+0k 0+0io 0pf+0w

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
As you can see the cpu utilizations are much lower than that of lapw1,
which made lawp2 more time-consuming than lapw1. MPI parallelization
performed even worse.

More strangely, it seems to be case-dependent. For TiC and some other
systems, it works fine. I also tried other nodes, the problem still
persists, however in my local workstation everything went well.

The version of wien2k is 19.2 compiled with intel icc and ifort without any
errors. MPI, FFTW, and ELPA are all available.

 Any suggestion will be appreciated.

Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201228/ef924e8f/attachment.htm>


More information about the Wien mailing list