[Wien] inefficiency of lapw2 on clusters

Laurence Marks laurence.marks at gmail.com
Mon Dec 28 12:12:56 CET 2020


While the server is probably the issue, two other things to check:
a) lapw2 can need more memory, so swapping issues are not impossible.
b) How have you set omp_lapw2 ?

_____
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu

On Mon, Dec 28, 2020, 03:23 Fan <fxinwei123 at gmail.com> wrote:

> Thank you for your quick response. I believe that is the key to the
> problem. So with larger vector files, the lapw2 will run slower in my
> situation. That makes sense.
>
> Peter Blaha <pblaha at theochem.tuwien.ac.at> 于2020年12月28日周一 下午4:38写道:
>
>> This points to a fileserver/network problem on your cluster.
>>
>> On a big cluster one usually has a home-directory on a fileserver drive,
>> which is mounted over a network (eg. NFS, HPFS,..) on all nodes. While
>> this is very convenient, it is on many systems also a huge bottleneck.
>> Either the fileserver breaks down due to too many simultaneously
>> requests or the network does not have sufficient band width.
>>
>> You probably have set the SCRATCH variable to your working directory
>> "./", which means that the large case.vector* files are stored in your
>> working directory, leading to the bad performance of lapw2.
>>
>> On most clusters there is a local scratch or tmp directory (files exist
>> only on this node) and when setting your SCRATCH variable to this, the
>> slow lapw2 should be gone.
>>
>> In addition, I see that also lapw1 gets only 90% of a core. I recommend
>> setting OMP_NUM_THREADS=2 and spanning only 4 k-parallel jobs in
>> parallel (I assume you have a node with 8 cores).
>>
>>
>> On 12/28/20 9:14 AM, Fan wrote:
>> > Dear wien2k users,
>> >
>> > I am trying to run wien2k on clusters, but I encountered a very strange
>> > issue that the performance of lapw2 is very inefficient. For example,
>> >
>> >
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >  >   lapw1  -dn -p       (15:27:45) starting parallel lapw1 at Mon Dec
>> > 28 15:27:46 CST 2020
>> > ->  starting parallel LAPW1 jobs at Mon Dec 28 15:27:46 CST 2020
>> > running LAPW1 in parallel mode (using .machines.help)
>> > 8 number_of_parallel_jobs
>> >       f02n10(58) 178.212u 3.134s 3:15.91 92.57%      0+0k 0+0io 0pf+0w
>> >       f02n10(58) 180.210u 3.011s 3:19.46 91.86%      0+0k 0+0io 0pf+0w
>> >       f02n10(58) 183.239u 3.019s 3:22.73 91.87%      0+0k 0+0io 0pf+0w
>> >       f02n10(57) 181.113u 2.884s 3:20.20 91.90%      0+0k 0+0io 0pf+0w
>> >       f02n10(57) 178.433u 2.965s 3:18.85 91.22%      0+0k 0+0io 0pf+0w
>> >       f02n10(57) 151.420u 2.756s 2:48.26 91.63%      0+0k 0+0io 0pf+0w
>> >       f02n10(57) 183.799u 3.065s 3:22.58 92.24%      0+0k 0+0io 0pf+0w
>> >       f02n10(57) 185.867u 3.109s 3:27.50 91.07%      0+0k 0+0io 0pf+0w
>> >     Summary of lapw1para:
>> >     f02n10        k=459   user=1422.29    wallclock=1575.49
>> > 1.034u 1.698s 3:30.15 1.2%      0+0k 472+128io 2pf+0w
>> >  >   lapw2 -up -p        (15:31:16) running LAPW2 in parallel mode
>> >        f02n10 37.680u 1.597s 6:32.94 10.00% 0+0k 0+0io 0pf+0w
>> >        f02n10 38.841u 1.682s 7:07.57 9.48% 0+0k 0+0io 0pf+0w
>> >        f02n10 38.611u 1.727s 6:51.53 9.80% 0+0k 0+0io 0pf+0w
>> >        f02n10 38.715u 1.728s 6:48.76 9.89% 0+0k 0+0io 0pf+0w
>> >        f02n10 37.847u 1.639s 7:01.08 9.38% 0+0k 0+0io 0pf+0w
>> >        f02n10 38.170u 1.709s 6:45.01 9.85% 0+0k 0+0io 0pf+0w
>> >        f02n10 39.261u 1.727s 7:01.11 9.73% 0+0k 0+0io 0pf+0w
>> >        f02n10 39.772u 1.765s 7:04.40 9.79% 0+0k 0+0io 0pf+0w
>> >
>> >
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>> > As you can see the cpu utilizations are much lower than that of lapw1,
>> > which made lawp2 more time-consuming than lapw1. MPI parallelization
>> > performed even worse.
>> >
>> > More strangely, it seems to be case-dependent. For TiC and some other
>> > systems, it works fine. I also tried other nodes, the problem still
>> > persists, however in my local workstation everything went well.
>> >
>> > The version of wien2k is 19.2 compiled with intel icc and ifort without
>> > any errors. MPI, FFTW, and ELPA are all available.
>> >
>> >   Any suggestion will be appreciated.
>> >
>> > Fan
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Wien mailing list
>> > Wien at zeus.theochem.tuwien.ac.at
>> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> <https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0O2-yJfew$>
>> > SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> <https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0OxfQ5bgA$>
>> >
>>
>> --
>>
>>                                        P.Blaha
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>> Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
>> <https://urldefense.com/v3/__http://www.wien2k.at__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0NUbjv7Xw$>
>> WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
>> <https://urldefense.com/v3/__http://www.imc.tuwien.ac.at/TC_Blaha__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0PkvhVHaQ$>
>> --------------------------------------------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> <https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0O2-yJfew$>
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> <https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0OxfQ5bgA$>
>>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0O2-yJfew$
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjV3JB-TAlZJGOYIcsqasIMWKmgqDXUbL1tizifidFHUioB3wf4Tl84i7MNpF0OxfQ5bgA$
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201228/3be84285/attachment-0001.htm>


More information about the Wien mailing list