[Wien] Querry about warning appearing in dayfile and editing of .machine file

Peeyush kumar kamlesh peeyush.physik.rku at gmail.com
Sun Aug 25 10:08:08 CEST 2019


Dear Sir,
Greetings!
I am using *wien2k_18 with i3 processor (4 cores) Laptop* and calculating
electronic properties using *hf potential in parallel mode with non reduced
k mash* of 550 k points. I got following dayfile for cycle 1 of this
calculation:
----------------------------------------------------------------------------------------------------------

cycle 1 	(Sat Aug 24 23:03:32 IST 2019) 	(40/99 to go)

>   lapw0 -grr -p	(23:03:32) starting parallel lapw0 at Sat Aug 24 23:03:32 IST 2019
-------- .machine0 : processors*running lapw0 in single mode*
9.8u 0.0s 0:10.11 98.4% 0+0k 3560+2824io 7pf+0w
>   lapw0  -p	(23:03:42) starting parallel lapw0 at Sat Aug 24 23:03:42 IST 2019
-------- .machine0 : processors*running lapw0 in single mode*
 *:WARNING: VX .gt. +1.0   6464.92409732206        13.9828285722624 *
6.0u 0.0s 0:06.08 99.8% 0+0k 0+824io 0pf+0w
>   lapw1  -p   -c 	(23:03:48) starting parallel lapw1 at Sat Aug 24 23:03:48 IST 2019
->  starting parallel LAPW1 jobs at Sat Aug 24 23:03:48 IST 2019
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
     localhost(8) 8.5u 0.2s 0:10.24 85.9% 0+0k 200+48832io 1pf+0w
     localhost(7) 8.3u 0.2s 0:10.09 85.7% 0+0k 0+42288io 0pf+0w
     localhost(7) 8.3u 0.2s 0:09.65 89.2% 0+0k 0+42896io 0pf+0w
     localhost(7) 8.4u 0.2s 0:10.73 81.2% 0+0k 0+42472io 0pf+0w
   *Summary of lapw1para:
   localhost	 k=29	 user=33.5	 wallclock=40.71*
34.1u 1.2s 0:12.48 283.3% 0+0k 216+177112io 2pf+0w
>   lapw2 -fermi   -c  	(23:04:02) 0.1u 0.0s 0:00.10 100.0% 0+0k 0+2440io 0pf+0w
>   lapw2 -p    -c 	(23:04:02) running LAPW2 in parallel mode
      localhost 0.7u 0.0s 0:00.87 96.5% 0+0k 0+824io 0pf+0w
      localhost 0.7u 0.0s 0:00.86 90.6% 0+0k 0+720io 0pf+0w
      localhost 0.7u 0.0s 0:00.85 90.5% 0+0k 0+720io 0pf+0w
      localhost 0.6u 0.0s 0:00.74 93.2% 0+0k 0+720io 0pf+0w
   *Summary of lapw2para:*
   *localhost	 user=2.7	 wallclock=3.32*
3.1u 0.4s 0:02.56 141.4% 0+0k 232+6488io 1pf+0w
>   lcore	(23:04:04) 0.0u 0.0s 0:00.06 66.6% 0+0k 216+1808io 1pf+0w
>   hf   -mode1     -p -c  	(23:04:05) running HF in parallel mode
      localhost 11519.0u 48.3s 3:21:07.82 95.8% 0+0k 1104+3392io 7pf+0w
      localhost 10668.4u 45.8s 3:06:41.43 95.6% 0+0k 200+3016io 1pf+0w
      localhost 10693.2u 48.1s 3:06:53.42 95.7% 0+0k 8+3040io 0pf+0w
      localhost 10782.2u 55.1s 3:08:35.84 95.7% 0+0k 8+3032io 0pf+0w
   *Summary of hfpara:*
   *localhost	 user=43662.8	 wallclock=761*
43663.3u 197.6s 3:21:09.62 363.4% 0+0k 3224+24968io 16pf+0w
>   lapw2 -hf -p   -c  	(02:25:14) running LAPW2 in parallel mode
      localhost 0.6u 0.0s 0:00.74 97.2% 0+0k 0+824io 0pf+0w
      localhost 0.6u 0.0s 0:00.68 97.0% 0+0k 0+720io 0pf+0w
      localhost 0.6u 0.0s 0:00.65 95.3% 0+0k 0+720io 0pf+0w
      localhost 0.5u 0.0s 0:00.63 88.8% 0+0k 0+720io 0pf+0w
   *Summary of lapw2para:
   localhost	 user=2.3	 wallclock=2.7*
2.7u 0.3s 0:02.38 128.1% 0+0k 0+4320io 0pf+0w
>   lcore 	(02:25:17) 0.0u 0.0s 0:00.04 75.0% 0+0k 0+1808io 0pf+0w
>   mixer	(02:25:17) 0.0u 0.0s 0:00.15 40.0% 0+0k 3640+1672io 13pf+0w
:ENERGY convergence:  0 0.0001 .1745377450000000
:CHARGE convergence:  0 0.0000 .1056782

ec cc and fc_conv 0 1 1
---------------------------------------------------------------------------------------------------------

*I have following queries:*

*1. As we can see that a warning (WARNING: VX .gt. +1.0 6464.92409732206
13.9828285722624) is appear here, which increase in every next cycle. I
want to know why do this appear here? And what is its effect on our
results?*
*2. Also we can see that lapw0 starts in single mode, while I used
following .machine file for parallel execution:*
---------------------------------------------------------------------------------------------------------
# .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi parallelization
use
#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine         (to distribute the remaining k-points one after the
other)
#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the residual-machine_name.
#
100:localhost
100:localhost
100:localhost
100:localhost
granularity:1
extrafine:1
---------------------------------------------------------------------------------------------------------
*3.* Is there any problem in editing .machine file for parallel
calculation, so that lapw0 could start in parallel mode? Or is any other
best suitable method available for editing .machine file for parallel
calculations?
*4.* I got following summaries in dayfile:
*Summary of lapw1para:*

*   localhost	 k=29	 user=33.5	 wallclock=40.71*

*Summary of lapw2para:*
   *localhost	 user=2.7	 wallclock=3.32*

 *Summary of hfpara:*
   *localhost	 user=43662.8	 wallclock=76*

*Summary of lapw2para:
   localhost	 user=2.3	 wallclock=2.7*

What is meaning of "user" and "wallclock" here, as it also changes in
each summary?


Best Regards
Peeyush Kumar Kamlesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190825/f5eefea4/attachment.html>


More information about the Wien mailing list