[Wien] A question about the Rkm

Mon Jan 11 01:27:22 CET 2016

 From the backtrace, it does look like it crashed in libmpi.so.1, which 
I believe is an Open MPI library.  I don't know if it will solve the 
problem or not, but I would try a different Open MPI version or 
recompile Open MPI (while tweaking the configuration options [ 
https://software.intel.com/en-us/articles/performance-tools-for-software-developers-building-open-mpi-with-the-intel-compilers 
]).

composer_xe_2015.3.187 => ifort version 15.0.3 [ 
https://software.intel.com/en-us/articles/intel-compiler-and-composer-update-version-numbers-to-compiler-version-number-mapping 
]

In the post at the following link on the Intel forum it looks like 
openmpi-1.10.1rc2 (or newer) was recommended for ifort 15.0 (or newer) 
to resolve a Fortran run-time library (RTL) issue:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/564266

On 1/10/2016 3:42 PM, Hu, Wenhao wrote:
>
> (I accidentally replied with a wrong title. To ensure consistency, I 
> send this post again. Maybe the mail list manager can delete the wrong 
> post for me^)
>
> Hi, Peter:
>
> Thank you very much for your reply. By following your suggestion, I 
> unified the version of all the library to be compiled or consistent 
> with intel composer xe 2015 (MKL, fftw, openmpi etc.) and recompiled 
> wien2k. The version of my openmpi is 1.6.5. However, I still get the 
> same problem. Except for the message I posted earlier, I also have the 
> following backtrace information of the process:
>
> lapw1c_mpi:14596 terminated with signal 11 at PC=2ab4dac4df79 
> SP=7fff78b8e310.  Backtrace:
>
> lapw1c_mpi:14597 terminated with signal 11 at PC=2b847d2a1f79 
> SP=7fff8ef89690.  Backtrace:
> /opt/openmpi-intel-composer_xe_2015.3.187/1.6.5/lib/libmpi.so.1(MPI_Comm_size+0x59)[0x2ab4dac4df79]
> /opt/openmpi-intel-composer_xe_2015.3.187/1.6.5/lib/libmpi.so.1(MPI_Comm_size+0x59)[0x2b847d2a1f79]
> /Users/wenhhu/wien2k14/lapw1c_mpi(blacs_pinfo_+0x92)[0x49cf02]
> /Users/wenhhu/wien2k14/lapw1c_mpi(blacs_pinfo_+0x92)[0x49cf02]
> /opt/intel/composer_xe_2015.3.187/mkl/lib/intel64/libmkl_scalapack_lp64.so(sl_init_+0x21)[0x2b8478d2e171]
> /opt/intel/composer_xe_2015.3.187/mkl/lib/intel64/libmkl_scalapack_lp64.so(sl_init_+0x21)[0x2ab4d66da171]
> /Users/wenhhu/wien2k14/lapw1c_mpi(parallel_mp_init_parallel_+0x63)[0x463cd3]
> /Users/wenhhu/wien2k14/lapw1c_mpi(parallel_mp_init_parallel_+0x63)[0x463cd3]
> /Users/wenhhu/wien2k14/lapw1c_mpi(gtfnam_+0x22)[0x426372]
> /Users/wenhhu/wien2k14/lapw1c_mpi(MAIN__+0x6c)[0x4493dc]
> /Users/wenhhu/wien2k14/lapw1c_mpi(main+0x2e)[0x40d19e]
> /Users/wenhhu/wien2k14/lapw1c_mpi(gtfnam_+0x22)[0x426372]
> /Users/wenhhu/wien2k14/lapw1c_mpi(MAIN__+0x6c)[0x4493dc]
> /Users/wenhhu/wien2k14/lapw1c_mpi(main+0x2e)[0x40d19e]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x339101ed5d]
> /Users/wenhhu/wien2k14/lapw1c_mpi[0x40d0a9]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x339101ed5d]
> /Users/wenhhu/wien2k14/lapw1c_mpi[0x40d0a9]
>
> Do you think it’s still the problem of my MKL or there’re some other 
> issues I miss?
>
> Best,
> Wenhao
>
>
>
>> a) Clearly, for a nanowire simulation the mpi-parallelization is 
>> best. Unfortunately, on some clusters mpi is not set-up properly, or 
>> users do not use the proper mkl-libraries for hthe particular mpi. 
>> Please use the Intel link-library advisor, as was mentioned in 
>> previous posts. The mkl-scalapack will NOT work unless you use proper 
>> version of the blacs_lp64 library.
>> b) As a short term solution you should:
>>
>> i) Use a parallelization with OMP_NUM_THREAD=2. This speeds up the 
>> calculation by nearly a factor of 2 and uses 2 cores in a single 
>> lapw1 without memory increase. ii) Reduce the number of k-points. I'm 
>> pretty sure you can reduce it to 2-4 for scf and structure 
>> optimization. This will save memory due to fewer k-parallel jobs. 
>> iii) During structure optimization you will end up with very small 
>> Si-H and C-H distances. So I'd reduce the H sphere right now to about 
>> 0.6, but keep Si and C large (for C use around 1.2). With such a 
>> setup, a preliminary structure optimization can be done with 
>> RKMAX=2.0, which should later be checked with 2.5 and 3.0 iv) Use 
>> iterative diagonalization ! After the first cycle, this will speed-up 
>> the scf by a factor of 5 !! v) And of course, reconsider the size of 
>> your "vacuum", i.e. the seperation of your wires. "Vacuum" is VERY 
>> expensive in terms of memory and one should not set it too large 
>> without test. Optimize your wire with small a,b; then increase the 
>> vacuum later on (x supercell) and check if forces appear again and 
>> distances, ban structure, ... change.
>>
>>> Am 09.01.2016 um 22:07 schrieb Hu, Wenhao:
>>>
>>> Hi, Marks and Peter:
>>>
>>> Thank you for your suggestions. About your reply, I have several
>>> follow-up questions. Actually, I’m using a intermediate cluster in my
>>> university, which has 16 cores and 64 GB memory on standard nodes. The
>>> calculation I’m doing is k-point but not MPI parallelized. From the :RKM
>>> flag I posted in my first email, I estimate that the matrix size I need
>>> for a Rkmax=5+ will be at least 40000. In my current calculation, the
>>> lapw1 program will occupy as large as 3GB on each slot (1 k point/slot).
>>> So I estimate the memory for each slot will be at least 12 GB. I have 8
>>> k points so that 96 GB memory will be required at least (if my
>>> estimation is correct). Considering the current computation resources I
>>> have, this is way too memory demanding. On our clusters, there’s a 4 GB
>>> memory limit for each slot on standard node. Although I can submit
>>> request for high memory node, but their usages are very competitive
>>> among cluster users. Do you have any suggestions on accomplishing this
>>> calculation within the limitation of my cluster?
>>>
>>> About the details of my calculation, the material I'm looking at is a
>>> hydrogen terminated silicon carbide with 56 atoms. A 1x1x14 k-mesh is
>>> picked for k-point sampling. The radius of 1.2 is achieved from
>>> setrmt_lapw actually. Indeed, the radius of hydrogen is too large and
>>> I’m adjusting its radius during the progress of optimization all the
>>> time. The reason why I have such a huge matrix is mainly due to size of
>>> my unit cell. I’m using large unit cell to isolate the coupling between
>>> neighboring nanowire.
>>>
>>> Except for the above questions, I also met some problems in mpi
>>> calculation. By following Marks’ suggestion on parallel calculation, I
>>> want to test the efficiency of mpi calculation since I only used k-point
>>> parallelized calculation before. The MPI installed on my cluster is
>>> openmpi. In the output file, I get the following error:
>>>
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  LAPW0 END
>>>
>>> lapw1c_mpi:19058 terminated with signal 11 at PC=2b56d9118f79
>>> SP=7fffc23d6890.  Backtrace:
>>> ...
>>> mpirun has exited due to process rank 14 with PID 19061 on
>>> node neon-compute-2-25.local exiting improperly. There are two reasons
>>> this could occur:
>>>
>>> 1. this process did not call "init" before exiting, but others in
>>> the job did. This can cause a job to hang indefinitely while it waits
>>> for all processes to call "init". By rule, if one process calls "init",
>>> then ALL processes must call "init" prior to termination.
>>>
>>> 2. this process called "init", but exited without calling "finalize".
>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>> exiting or it will be considered an "abnormal termination"
>>>
>>> This may have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> --------------------------------------------------------------------------
>>> Uni_+6%.scf1up_1: No such file or directory.
>>> grep: *scf1up*: No such file or directory
>>>
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> The job script I’m using is:
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> !/bin/csh -f
>>> # -S /bin/sh
>>> #
>>> #$ -N uni_6
>>> #$ -q MF
>>> #$ -m be
>>> #$ -M wenhao... at uiowa.edu <http://uiowa.edu> <
>>> mailto:wenhao... at uiowa.edu <http://uiowa.edu>
>>>>
>>> #$ -pe smp 16
>>> #$ -cwd
>>> #$ -j y
>>>
>>> cp $PE_HOSTFILE hostfile
>>> echo "PE_HOSTFILE:"
>>> echo $PE_HOSTFILE
>>> rm .machines
>>> echo granularity:1 >>.machines
>>> while read hostname slot useless; do
>>>     i=0
>>>     l0=$hostname
>>>     while [ $i -lt $slot ]; do
>>>         echo 1:$hostname:2 >>.machines
>>>         let i=i+2
>>>     done
>>> done<hostfile
>>>
>>> echo lapw0:$l0:16 >>.machines
>>>
>>> runsp_lapw -p -min -ec 0.0001 -cc 0.001 -fc 0.5
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Is there any mistake I made or something missing in my script?
>>>
>>> Thank your very much for your help.
>>>
>>> Wenhao
>>>
>>>
>>>> I do not know many compounds, for which an RMT=1.2 bohr for H makes
>>>> any sense (maybe LiH). Use setrmt and follow the suggestion. Usually,
>>>> H spheres of CH or OH bonds should be less than 0.6 bohr.
>>>> Experimental H-position are often very unreliable.
>>>> How many k-points ? Often 1 k-point is enough for 50+ atoms (at least
>>>> at the beginning), in particular when you ahve an insulator.
>>>> Otherwise, follow the suggestions of L.Marks about parallelization.
>>>>
>>>>
>>>> Am 08.01.2016 um 07:28 schrieb Hu, Wenhao:
>>>>
>>>> Hi, all:
>>>>
>>>> I have some confusions on the Rkm in calculations with 50+ atoms. In
>>>> my wien2k,
>>>> the NATMAX and NUME are set to 15000 and 1700. With the highest NE
>>>> and NAT, the
>>>> Rkmax can only be as large as 2.05, which is much lower than the
>>>> suggested
>>>> value in FAQ page of WIEN2K (the smallest atom in my case is a H atom
>>>> with
>>>> radius of 1.2). By checking the :RKM flag in case.scf, I have the
>>>> following
>>>> information:
>>>>
>>>> :RKM  : MATRIX SIZE 11292LOs: 979  RKM= 2.05  WEIGHT= 1.00  PGR:
>>>>
>>>> With such a matrix size, the single cycle can take as long as two and
>>>> half
>>>> hours. Although I can increase the NATMAX and NUME to raise Rkmax, the
>>>> calculation will be way slower, which will make the optimization
>>>> calculation
>>>> almost impossible. Before making convergence test on Rkmax, can
>>>> anyone tell me
>>>> whether such a Rkmax is a reasonable value?
>>>>
>>>> If any further information is needed, please let me know. Thanks in
>>>> advance.
>>>>
>>>> Best,
>>>> Wenhao
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at 
>>> <mailto:Wien at zeus.theochem.tuwien.ac.at> <
>>> mailto:Wien at zeus.theochem.tuwien.ac.at
>>>>
>>>
>>>
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>
>>>
>>> SEARCH the MAILING-LIST at:
>>>
>>>
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>> --
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>> Email: bl... at theochem.tuwien.ac.at <http://theochem.tuwien.ac.at> <
>> http://theochem.tuwien.ac.at
>>>
>> WIEN2k:
>>
>> http://www.wien2k.at
>>
>>
>> WWW:
>>
>> http://www.imc.tuwien.ac.at/staff/tc_group_e.php
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20160110/f9291b06/attachment.html>