[Wien] Querry about warning appearing in dayfile and editing of .machine file

Gavin Abo gsabo at crimson.ua.edu
Sun Aug 25 15:18:37 CEST 2019


For hf calculations, I recommend using WIEN2k 19.1.  Refer to what Prof. 
Tran wrote in the post at:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18956.html

For the WARNING: VX .gt. +1.0, refer to what Prof. Blaha wrote in the 
post at:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16782.html

Referring to the post at

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18967.html

and the .machine file in your post below, the .machine you have is set 
for lapw1/lapw2 as k-point parallel.

As a previous post [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg10367.html 
] and the WIEN2k 19.1 usersguide [ 
http://www.wien2k.at/reg_user/textbooks/usersguide.pdf  (section "5.5    
Running programs in parallel mode" starting on page 84) ] show, the 
.machine file can be set to have lapw0 as mpi parallel but mpi has to be 
installed.

The user and wallclock are timings of the scripts [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg04397.html 
].  For example, in your post below, the following is for the 
lawp1para_lapw script:

running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
      localhost(8) 8.5u 0.2s 0:10.24 85.9% 0+0k 200+48832io 1pf+0w
      localhost(7) 8.3u 0.2s 0:10.09 85.7% 0+0k 0+42288io 0pf+0w
      localhost(7) 8.3u 0.2s 0:09.65 89.2% 0+0k 0+42896io 0pf+0w
      localhost(7) 8.4u 0.2s 0:10.73 81.2% 0+0k 0+42472io 0pf+0w
    Summary of lapw1para:
    localhost     k=29 user=33.5 wallclock=40.71

The u should be the User and I believe the wallclock is the Real from 
the Linux time command [ 
https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1 
].

The user and wallclock are the summed time from each of the parallel 
processes:

8.5 + 8.3 + 8.3 + 8.4 = 33.5

0:10.24 + 0:10.09 +0:09.65 +0:10.73 = 40.71


On 8/25/2019 2:08 AM, Peeyush kumar kamlesh wrote:
> Dear Sir,
> Greetings!
> I am using *wien2k_18 with i3 processor (4 cores) Laptop* and 
> calculating electronic properties using *hf potential in parallel mode 
> with non reduced k mash* of 550 k points. I got following dayfile for 
> cycle 1 of this calculation:
> ----------------------------------------------------------------------------------------------------------
> cycle 1 	(Sat Aug 24 23:03:32 IST 2019) 	(40/99 to go)
>
> >   lapw0 -grr -p	(23:03:32) starting parallel lapw0 at Sat Aug 24 23:03:32 IST 2019
> -------- .machine0 : processors
> *running lapw0 in single mode*
> 9.8u 0.0s 0:10.11 98.4% 0+0k 3560+2824io 7pf+0w
> >   lapw0  -p	(23:03:42) starting parallel lapw0 at Sat Aug 24 23:03:42 IST 2019
> -------- .machine0 : processors
> *running lapw0 in single mode*
>   *:WARNING: VX .gt. +1.0 6464.92409732206 13.9828285722624 *     
> 6.0u 0.0s 0:06.08 99.8% 0+0k 0+824io 0pf+0w
> >   lapw1  -p   -c 	(23:03:48) starting parallel lapw1 at Sat Aug 24 23:03:48 IST 2019
> ->  starting parallel LAPW1 jobs at Sat Aug 24 23:03:48 IST 2019
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
>       localhost(8) 8.5u 0.2s 0:10.24 85.9% 0+0k 200+48832io 1pf+0w
>       localhost(7) 8.3u 0.2s 0:10.09 85.7% 0+0k 0+42288io 0pf+0w
>       localhost(7) 8.3u 0.2s 0:09.65 89.2% 0+0k 0+42896io 0pf+0w
>       localhost(7) 8.4u 0.2s 0:10.73 81.2% 0+0k 0+42472io 0pf+0w
>     *Summary of lapw1para: localhost k=29 user=33.5 wallclock=40.71*
> 34.1u 1.2s 0:12.48 283.3% 0+0k 216+177112io 2pf+0w
> >   lapw2 -fermi   -c  	(23:04:02) 0.1u 0.0s 0:00.10 100.0% 0+0k 0+2440io 0pf+0w
> >   lapw2 -p    -c 	(23:04:02) running LAPW2 in parallel mode
>        localhost 0.7u 0.0s 0:00.87 96.5% 0+0k 0+824io 0pf+0w
>        localhost 0.7u 0.0s 0:00.86 90.6% 0+0k 0+720io 0pf+0w
>        localhost 0.7u 0.0s 0:00.85 90.5% 0+0k 0+720io 0pf+0w
>        localhost 0.6u 0.0s 0:00.74 93.2% 0+0k 0+720io 0pf+0w
>     *Summary of lapw2para:*
>     *localhost user=2.7 wallclock=3.32*
> 3.1u 0.4s 0:02.56 141.4% 0+0k 232+6488io 1pf+0w
> >   lcore	(23:04:04) 0.0u 0.0s 0:00.06 66.6% 0+0k 216+1808io 1pf+0w
> >   hf   -mode1     -p -c  	(23:04:05) running HF in parallel mode
>        localhost 11519.0u 48.3s 3:21:07.82 95.8% 0+0k 1104+3392io 7pf+0w
>        localhost 10668.4u 45.8s 3:06:41.43 95.6% 0+0k 200+3016io 1pf+0w
>        localhost 10693.2u 48.1s 3:06:53.42 95.7% 0+0k 8+3040io 0pf+0w
>        localhost 10782.2u 55.1s 3:08:35.84 95.7% 0+0k 8+3032io 0pf+0w
>     *Summary of hfpara:*
>     *localhost user=43662.8 wallclock=761*
> 43663.3u 197.6s 3:21:09.62 363.4% 0+0k 3224+24968io 16pf+0w
> >   lapw2 -hf -p   -c  	(02:25:14) running LAPW2 in parallel mode
>        localhost 0.6u 0.0s 0:00.74 97.2% 0+0k 0+824io 0pf+0w
>        localhost 0.6u 0.0s 0:00.68 97.0% 0+0k 0+720io 0pf+0w
>        localhost 0.6u 0.0s 0:00.65 95.3% 0+0k 0+720io 0pf+0w
>        localhost 0.5u 0.0s 0:00.63 88.8% 0+0k 0+720io 0pf+0w
>     *Summary of lapw2para: localhost user=2.3 wallclock=2.7*
> 2.7u 0.3s 0:02.38 128.1% 0+0k 0+4320io 0pf+0w
> >   lcore 	(02:25:17) 0.0u 0.0s 0:00.04 75.0% 0+0k 0+1808io 0pf+0w
> >   mixer	(02:25:17) 0.0u 0.0s 0:00.15 40.0% 0+0k 3640+1672io 13pf+0w
> :ENERGY convergence:  0 0.0001 .1745377450000000
> :CHARGE convergence:  0 0.0000 .1056782
> ec cc and fc_conv 0 1 1
> ---------------------------------------------------------------------------------------------------------
>
> *I have following queries:*
>
> _*1.* As we can see that a warning *(WARNING: VX .gt. +1.0 
> 6464.92409732206 13.9828285722624) *is appear here, which increase in 
> every next cycle. I want to know why do this appear here? And what is 
> its effect on our results?_
> _*2.* Also we can see that *lapw0 starts in single mode*, while I used 
> following .machine file for parallel execution:_
> ---------------------------------------------------------------------------------------------------------
> # .machines is the control file for parallel execution. Add lines like
> #
> #   speed:machine_name
> #
> # for each machine specifying there relative speed. For mpi 
> parallelization use
> #
> #   speed:machine_name:1 machine_name:1
> #   lapw0:machine_name:1 machine_name:1
> #
> # further options are:
> #
> #   granularity:number (for loadbalancing on irregularly used machines)
> #   residue:machine_name  (on shared memory machines)
> #   extrafine         (to distribute the remaining k-points one after 
> the other)
> #
> # granularity sets the number of files that will be approximately
> # be generated by each processor; this is used for load-balancing.
> # On very homogeneous systems set number to 1
> # if after distributing the k-points to the various machines residual
> # k-points are left, they will be distributed to the 
> residual-machine_name.
> #
> 100:localhost
> 100:localhost
> 100:localhost
> 100:localhost
> granularity:1
> extrafine:1
> ---------------------------------------------------------------------------------------------------------
> *3.* Is there any problem in editing .machine file for parallel 
> calculation, so that lapw0 could start in parallel mode? Or is any 
> other best suitable method available for editing .machine file for 
> parallel calculations?
> *4.* I got following summaries in dayfile:
> *Summary of lapw1para:*
> *localhost k=29 user=33.5 wallclock=40.71*
> *Summary of lapw2para:*
>     *localhost user=2.7 wallclock=3.32*
>   *Summary of hfpara:*
>     *localhost user=43662.8 wallclock=76*
> *Summary of lapw2para: localhost user=2.3 wallclock=2.7*
> What is meaning of "user" and "wallclock" here, as it also changes in each summary?
>
> Best Regards
> Peeyush Kumar Kamlesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190825/f80d2910/attachment.html>


More information about the Wien mailing list