[Wien] Querry about warning appearing in dayfile and editing of .machine file
Gavin Abo
gsabo at crimson.ua.edu
Sun Aug 25 15:18:37 CEST 2019
For hf calculations, I recommend using WIEN2k 19.1. Refer to what Prof.
Tran wrote in the post at:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18956.html
For the WARNING: VX .gt. +1.0, refer to what Prof. Blaha wrote in the
post at:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16782.html
Referring to the post at
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18967.html
and the .machine file in your post below, the .machine you have is set
for lapw1/lapw2 as k-point parallel.
As a previous post [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg10367.html
] and the WIEN2k 19.1 usersguide [
http://www.wien2k.at/reg_user/textbooks/usersguide.pdf (section "5.5
Running programs in parallel mode" starting on page 84) ] show, the
.machine file can be set to have lapw0 as mpi parallel but mpi has to be
installed.
The user and wallclock are timings of the scripts [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg04397.html
]. For example, in your post below, the following is for the
lawp1para_lapw script:
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
localhost(8) 8.5u 0.2s 0:10.24 85.9% 0+0k 200+48832io 1pf+0w
localhost(7) 8.3u 0.2s 0:10.09 85.7% 0+0k 0+42288io 0pf+0w
localhost(7) 8.3u 0.2s 0:09.65 89.2% 0+0k 0+42896io 0pf+0w
localhost(7) 8.4u 0.2s 0:10.73 81.2% 0+0k 0+42472io 0pf+0w
Summary of lapw1para:
localhost k=29 user=33.5 wallclock=40.71
The u should be the User and I believe the wallclock is the Real from
the Linux time command [
https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
].
The user and wallclock are the summed time from each of the parallel
processes:
8.5 + 8.3 + 8.3 + 8.4 = 33.5
0:10.24 + 0:10.09 +0:09.65 +0:10.73 = 40.71
On 8/25/2019 2:08 AM, Peeyush kumar kamlesh wrote:
> Dear Sir,
> Greetings!
> I am using *wien2k_18 with i3 processor (4 cores) Laptop* and
> calculating electronic properties using *hf potential in parallel mode
> with non reduced k mash* of 550 k points. I got following dayfile for
> cycle 1 of this calculation:
> ----------------------------------------------------------------------------------------------------------
> cycle 1 (Sat Aug 24 23:03:32 IST 2019) (40/99 to go)
>
> > lapw0 -grr -p (23:03:32) starting parallel lapw0 at Sat Aug 24 23:03:32 IST 2019
> -------- .machine0 : processors
> *running lapw0 in single mode*
> 9.8u 0.0s 0:10.11 98.4% 0+0k 3560+2824io 7pf+0w
> > lapw0 -p (23:03:42) starting parallel lapw0 at Sat Aug 24 23:03:42 IST 2019
> -------- .machine0 : processors
> *running lapw0 in single mode*
> *:WARNING: VX .gt. +1.0 6464.92409732206 13.9828285722624 *
> 6.0u 0.0s 0:06.08 99.8% 0+0k 0+824io 0pf+0w
> > lapw1 -p -c (23:03:48) starting parallel lapw1 at Sat Aug 24 23:03:48 IST 2019
> -> starting parallel LAPW1 jobs at Sat Aug 24 23:03:48 IST 2019
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
> localhost(8) 8.5u 0.2s 0:10.24 85.9% 0+0k 200+48832io 1pf+0w
> localhost(7) 8.3u 0.2s 0:10.09 85.7% 0+0k 0+42288io 0pf+0w
> localhost(7) 8.3u 0.2s 0:09.65 89.2% 0+0k 0+42896io 0pf+0w
> localhost(7) 8.4u 0.2s 0:10.73 81.2% 0+0k 0+42472io 0pf+0w
> *Summary of lapw1para: localhost k=29 user=33.5 wallclock=40.71*
> 34.1u 1.2s 0:12.48 283.3% 0+0k 216+177112io 2pf+0w
> > lapw2 -fermi -c (23:04:02) 0.1u 0.0s 0:00.10 100.0% 0+0k 0+2440io 0pf+0w
> > lapw2 -p -c (23:04:02) running LAPW2 in parallel mode
> localhost 0.7u 0.0s 0:00.87 96.5% 0+0k 0+824io 0pf+0w
> localhost 0.7u 0.0s 0:00.86 90.6% 0+0k 0+720io 0pf+0w
> localhost 0.7u 0.0s 0:00.85 90.5% 0+0k 0+720io 0pf+0w
> localhost 0.6u 0.0s 0:00.74 93.2% 0+0k 0+720io 0pf+0w
> *Summary of lapw2para:*
> *localhost user=2.7 wallclock=3.32*
> 3.1u 0.4s 0:02.56 141.4% 0+0k 232+6488io 1pf+0w
> > lcore (23:04:04) 0.0u 0.0s 0:00.06 66.6% 0+0k 216+1808io 1pf+0w
> > hf -mode1 -p -c (23:04:05) running HF in parallel mode
> localhost 11519.0u 48.3s 3:21:07.82 95.8% 0+0k 1104+3392io 7pf+0w
> localhost 10668.4u 45.8s 3:06:41.43 95.6% 0+0k 200+3016io 1pf+0w
> localhost 10693.2u 48.1s 3:06:53.42 95.7% 0+0k 8+3040io 0pf+0w
> localhost 10782.2u 55.1s 3:08:35.84 95.7% 0+0k 8+3032io 0pf+0w
> *Summary of hfpara:*
> *localhost user=43662.8 wallclock=761*
> 43663.3u 197.6s 3:21:09.62 363.4% 0+0k 3224+24968io 16pf+0w
> > lapw2 -hf -p -c (02:25:14) running LAPW2 in parallel mode
> localhost 0.6u 0.0s 0:00.74 97.2% 0+0k 0+824io 0pf+0w
> localhost 0.6u 0.0s 0:00.68 97.0% 0+0k 0+720io 0pf+0w
> localhost 0.6u 0.0s 0:00.65 95.3% 0+0k 0+720io 0pf+0w
> localhost 0.5u 0.0s 0:00.63 88.8% 0+0k 0+720io 0pf+0w
> *Summary of lapw2para: localhost user=2.3 wallclock=2.7*
> 2.7u 0.3s 0:02.38 128.1% 0+0k 0+4320io 0pf+0w
> > lcore (02:25:17) 0.0u 0.0s 0:00.04 75.0% 0+0k 0+1808io 0pf+0w
> > mixer (02:25:17) 0.0u 0.0s 0:00.15 40.0% 0+0k 3640+1672io 13pf+0w
> :ENERGY convergence: 0 0.0001 .1745377450000000
> :CHARGE convergence: 0 0.0000 .1056782
> ec cc and fc_conv 0 1 1
> ---------------------------------------------------------------------------------------------------------
>
> *I have following queries:*
>
> _*1.* As we can see that a warning *(WARNING: VX .gt. +1.0
> 6464.92409732206 13.9828285722624) *is appear here, which increase in
> every next cycle. I want to know why do this appear here? And what is
> its effect on our results?_
> _*2.* Also we can see that *lapw0 starts in single mode*, while I used
> following .machine file for parallel execution:_
> ---------------------------------------------------------------------------------------------------------
> # .machines is the control file for parallel execution. Add lines like
> #
> # speed:machine_name
> #
> # for each machine specifying there relative speed. For mpi
> parallelization use
> #
> # speed:machine_name:1 machine_name:1
> # lapw0:machine_name:1 machine_name:1
> #
> # further options are:
> #
> # granularity:number (for loadbalancing on irregularly used machines)
> # residue:machine_name (on shared memory machines)
> # extrafine (to distribute the remaining k-points one after
> the other)
> #
> # granularity sets the number of files that will be approximately
> # be generated by each processor; this is used for load-balancing.
> # On very homogeneous systems set number to 1
> # if after distributing the k-points to the various machines residual
> # k-points are left, they will be distributed to the
> residual-machine_name.
> #
> 100:localhost
> 100:localhost
> 100:localhost
> 100:localhost
> granularity:1
> extrafine:1
> ---------------------------------------------------------------------------------------------------------
> *3.* Is there any problem in editing .machine file for parallel
> calculation, so that lapw0 could start in parallel mode? Or is any
> other best suitable method available for editing .machine file for
> parallel calculations?
> *4.* I got following summaries in dayfile:
> *Summary of lapw1para:*
> *localhost k=29 user=33.5 wallclock=40.71*
> *Summary of lapw2para:*
> *localhost user=2.7 wallclock=3.32*
> *Summary of hfpara:*
> *localhost user=43662.8 wallclock=76*
> *Summary of lapw2para: localhost user=2.3 wallclock=2.7*
> What is meaning of "user" and "wallclock" here, as it also changes in each summary?
>
> Best Regards
> Peeyush Kumar Kamlesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190825/f80d2910/attachment.html>
More information about the Wien
mailing list