[Wien] System configuration

Pavel Ondračka pavel.ondracka at email.cz
Thu May 23 20:35:59 CEST 2019


I'm putting this also back to the list after I received several private 
emails.




Your timing and the ldd shows that you are linking against reference lapack 
and blas. You need to replace -llapack -lblas in R_LIBS with -lopenblas 
(this was discussed before in this thread: https://www.mail-archive.com/
wien at zeus.theochem.tuwien.ac.at/msg18194.html )




Also your config is a weird mix of ifort and gfortran options, which results
in a ton of errors for the parallel programs (as was shown in another off-
the-list email). At this moment this doesn't matter as we need to make the 
serial stuff working first.




Best regards


Pavel

"

grep "TIME HAMILT" test_case.output1
       TIME HAMILT (CPU)  =    22.8, HNS =    12.3, HORB =     0.0, DIAG =  
  78.9
       TIME HAMILT (WALL) =    22.9, HNS =    12.4, HORB =     0.0, DIAG =  
  78.9

"
 

"
current:FOPT:-ffree-form -O2 -ffree-line-length-none
current:FPOPT:-O1 -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -
traceback -assume buffered_io -I$(MKLROOT)/include
current:LDFLAGS:$(FOPT)
current:DPARALLEL:'-DParallel'
current:R_LIBS:-llapack -lblas -lpthread
current:FFTWROOT:
current:FFTW_VERSION:
current:FFTW_LIB:
current:FFTW_LIBNAME:
current:LIBXCROOT:/opt/etsf/
current:LIBXC_FORTRAN:xcf03
current:LIBXC_LIBNAME:xc
current:LIBXC_LIBDNAME:lib/
current:SCALAPACKROOT:
current:SCALAPACK_LIBNAME:
current:BLACSROOT:
current:BLACS_LIBNAME:
current:ELPAROOT:
current:ELPA_VERSION:
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
current:CORES_PER_NODE:1
current:MKL_TARGET_ARCH:intel64
current:RP_LIBS:

linux-vdso.so.1 (0x00007ffd78bac000)
liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x000015344ad
82000)
libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x000015344ab15000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000015344a8f
6000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x000015344a
517000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000015344a179000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000153449d88000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000153449b70000)
/lib64/ld-linux-x86-64.so.2 (0x000015344ba2e000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x
0000153449930000)




On Thu, May 23, 2019 at 8:52 PM Indranil mal <indranil.mal at gmail.com
(mailto:indranil.mal at gmail.com)> wrote:

"

Thanks a lot. 


Sir my calculations are running when I do the x lapw1 may be due to that 
this time is too long. 


I have installed ifort and intel mpi mkl but could not configured that is 
why  I am using GFORTRAN and gcc the basic gnu compiler and open blas. If 
you dont mind you can access my pc through team viewer.  











On Thu, May 23, 2019 at 7:50 PM Pavel Ondračka <pavel.ondracka at email.cz
(mailto:pavel.ondracka at email.cz)> wrote:

"Well,

first we need to figure out why is your serial lapw so slow...
You definitely don't have the libmvec patches, however almost two min
runtime suggest that even your BLAS might be bad?

In the test_case folder run:
$ grep "TIME HAMILT" test_case.output1
and post the output. Also please go to the Wien2k folder and send the
output of 
$ cat WIEN2k_OPTION
and
$ ldd lapw1

Next Wien2k version will have this simplified, however for now some
patching needs to be to be done. The other option would be to get MKL
and ifort from Intel and use it instead...

Anyway if you don't want MKL, you need to download the attached patch
to the SRC_lapw1 folder in Wien2k base folder.
Go to the folder, and apply the patch with (you might need the patch
package for that)
$ patch -p1 < lapw1.patch
then set the FOPT compile flags via siteconfig to: 
-ffree-form -O2 -ffree-line-length-none -march=native -ftree-vectorize
-DHAVE_LIBMVEC -fopenmp
and recompile lapw1.
Now when you do again
$ ldd lapw1
it should show line with "libmvec.so.1 => /lib64/libmvec.so.1"

Compare timings again with the test_case.
Also try:
$ OMP_NUM_THREADS=2 x lapw1
$ OMP_NUM_THREADS=4 x lapw1

And after each run show total timings as well as
$ grep "TIME HAMILT" test_case.output1
Hopefully, you are already linking the multithreaded Openblas (but
dunno what is the Ubuntu default)...

I'll help you with the parallel execution in the next step.

Best regards
Pavel

On Thu, 2019-05-23 at 18:58 +0530, Indranil mal wrote:
> Dear sir 
> 
> After running x lapw1  I got the following 
> 
> ~/test_case$ x lapw1
> STOP  LAPW1 END
> 114.577u 0.247s 1:54.82 99.9% 0+0k 0+51864io 0pf+0w
> 
> I am using parallel k point execution only 8 GB memory is in use and
> for 100 atom (100 kpoints) calculation it is taking around 12 hours
> to complete one cycle. 
> please help me.      
> 
> Thanking you
> 
> Indranil 
> 
> On Thu, May 23, 2019 at 11:22 AM Pavel Ondračka <
> pavel.ondracka at email.cz(mailto:pavel.ondracka at email.cz)> wrote:
> > Hi Indranil,
> > 
> > While the k-point parallelization is usually the most efficient 
> > (provided you have sufficient number of k-points) and does not need
> > any
> > extra libraries, for 100atoms case it might be problematic to fit
> > 12
> > processes into 32GB of memory. I assume you are already using it
> > since
> > you claim to run on two cores?
> > 
> > Instead check what is the maximum memory requirement of lapw1 when
> > run
> > in serial and based on that find how much processes you can run in
> > parallel, than for each place one line "1:localhost" into .machines
> > file (there is no need to copy .machines from templates, or use
> > random
> > scripts, instead read the userguide to understand what you are
> > doing,
> > it will save you time in the long run). If you can run at least few
> > k-
> > points in parallel it might be enough to speed it up significantly.
> > 
> > For MPI you would need openmpi-devel scalapack-devel and fftw3-
> > devel
> > (I'm not sure how exactly are they named on Ubuntu) packages.
> > Especially the scalapack configuration could be tricky, it is
> > probably
> > easiest to start with lapw0 as this needs only MPI and fftw.
> > 
> > Also based on my experience with default gfortran settings, it is
> > likely that you don't have even optimized the single core
> > performance,
> > try to download the serial benchmark 
> > http://susi.theochem.tuwien.ac.at/reg_user/benchmark/test_case.tar.gz
(http://susi.theochem.tuwien.ac.at/reg_user/benchmark/test_case.tar.gz)
> > untar, run x lapw1 and report timings (on average i7 CPU it should
> > take
> > below 30 seconds, if it takes significantly more, you will need
> > some
> > more tweaks).
> > 
> > Best regards
> > Pavel
> > 
> > On Thu, 2019-05-23 at 10:42 +0530, Dr. K. C. Bhamu wrote:
> > > Hii,
> > > 
> > > If you are doing k-point parallel calculation (having number of
> > k-
> > > points in IBZ more then 12) then use below script on terminal
> > where
> > > you want  to run the calculation or use in your job script with
> > -p
> > > option in run(sp)_lapw (-so).
> > > 
> > > if anyone knows how to repeat a nth line m times in a file then
> > this
> > > script can be changed.
> > > 
> > > Below script simply coping machine file from temple directory and
> > > updating it as per your need.
> > > So you do not need copy it, open it in your favorite editor and
> > do it
> > > manually.
> > > 
> > > cp $WIENROOT/SRC_templates/.machines . ; grep localhost .machines
> > |
> > > perl -ne 'print $_ x 6' > LOCALHOST.dat ; tail -n 2 .machines >
> > > grang.dat ; sed '22,25d' .machines > MACHINE.dat ; cat
> > MACHINE.dat
> > > localhost.dat grang.dat > .machines ; rm LOCALHOST.dat
> > MACHINE.dat
> > > grang.dat
> > > 
> > > regards
> > > Bhamu
> > > 
> > > 
> > > On Wed, May 22, 2019 at 10:52 PM Indranil mal <
> > indranil.mal at gmail.com(mailto:indranil.mal at gmail.com)
> > > > wrote:
> > > > respected sir/ Users,
> > > >                     I am using a PC with intel i7 8th gen (with
> > 12
> > > > cores) 32GB RAM and 2TB HDD with UBUNTU 18.04 LTS. I have
> > installed
> > > > OpenBLAS-0.2.20 and using GNU FORTRAN and c compiler. I am
> > trying
> > > > to run a system with 100 atoms only two cores are using the
> > rest of
> > > > them are idle and the calculation taking a too long time. I
> > have
> > > > not installed mpi ScaLAPACK or elpa. Please help me what should
> > I
> > > > do to utilize all of the cores of my cpu.
> > > > 
> > > > 
> > > > 
> > > > Thanking you 
> > > > 
> > > > Indranil
> > > > _______________________________________________
> > > > Wien mailing list
> > > > Wien at zeus.theochem.tuwien.ac.at
(mailto:Wien at zeus.theochem.tuwien.ac.at)
> > > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
(http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien)
> > > > SEARCH the MAILING-LIST at:  
> > > > 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
(http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html)
> > > 
> > > _______________________________________________
> > > Wien mailing list
> > > Wien at zeus.theochem.tuwien.ac.at
(mailto:Wien at zeus.theochem.tuwien.ac.at)
> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
(http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien)
> > > SEARCH the MAILING-LIST at:  
> > > 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
(http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html)
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at(mailto:Wien at zeus.theochem.tuwien.ac.at)
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
(http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien)
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
(http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html)
"
"
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190523/e0c0b372/attachment.html>


More information about the Wien mailing list