[Wien] Error in paralel Lapw1

Gavin Abo gsabo at crimson.ua.edu
Mon Feb 8 14:36:46 CET 2021


You might also check what OMP_NUM_THREADS is set to on your system in 
.bashrc or .cshrc?

For example, on my Ubuntu system, I do:

username at computername:~/Desktop$ grep OMP_NUM_THREADS ~/.bashrc
export OMP_NUM_THREADS=1

As you can see I'm using a different value than the default that would 
have been set by userconfig_lapw during installation of WIEN2k.  I 
believe the default value is OMP_NUM_THREADS=4.

Is your Xeon processor a E5-2698 v3?  If it is, the following link has 
"# of Threads" as 32:

https://ark.intel.com/content/www/us/en/ark/products/81060/intel-xeon-processor-e5-2698-v3-40m-cache-2-30-ghz.html

With your .machines file requesting 16 cores, if you OMP_NUM_THREADS is 
4, you would be requesting 16 cores * 4 threads/core = 64 threads.  That 
should be 32 threads (=64 requested threads - 32 processor core threads) 
more than your processor could handle at one time.

If you using a different processor, you would have to look on Intel's 
website to find out the "# of Threads" your particular processor can handle.

The OMP_NUM_THREADS of course can be overridden by using omp_global in 
the .machines file.

If the problem is coming from a memory error as previously discussed as 
a possibility in the post:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20807.html

Then, you might want to check /var/log.  The following post might help 
with that:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19703.html

You might also check what parallel_options are set to with the command:

cat $WIENROOT/parallel_options

If the problem is related to passwordless login.  One of the posts in 
the mailing list archive that might help is:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg02295.html


On 2/8/2021 5:33 AM, Peter Blaha wrote:
> We still don't know much about your case.
>
> Please modify your .machinesfile and use only 2 instead of 16 lines with
> 1:localhost
> If this solves the problem, increase it to 4 or 6 (when you have 12 
> k-points) or 8 (if you have more k-points).
> Also uncomment
> omp_global:2   or 4
> Then you are still using all your cores, but you will need less memory.
>
> Am 08.02.2021 um 11:24 schrieb Murat Aycibin:
>> Hi Dr/ Blaha
>> Thanks you for your reply
>> I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I 
>> got the same mistake. I have computer which has 64 GB Ramand i have 
>> 16 core (intel xeon processes). My machine file is
>>
>>   .machines is the control file for parallel execution. Add lines like
>> #
>> #   speed:machine_name
>> #
>> # for each machine specifying there relative speed. For mpi 
>> parallelization use
>> #
>> #   speed:machine_name:1 machine_name:1
>> #   lapw0:machine_name:1 machine_name:1
>> #
>> # further options are:
>> #
>> #   granularity:number (for loadbalancing on irregularly used machines)
>> #   residue:machine_name  (on shared memory machines)
>> #   extrafine         (to distribute the remaining k-points one after 
>> the other)
>> #
>> # granularity sets the number of files that will be approximately
>> # be generated by each processor; this is used for load-balancing.
>> # On very homogeneous systems set number to 1
>> # if after distributing the k-points to the various machines residual
>> # k-points are left, they will be distributed to the 
>> residual-machine_name.
>> #
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> granularity:1
>> extrafine:1
>> #
>> # Uncomment for specific OMP-parallelization (overwriting a global 
>> OMP_NUM_THREADS)
>> #
>> #omp_global:4
>> # or use program-specific parallelization:
>> #omp_lapw0:4
>> #omp_lapw1:4
>> #omp_lapw2:4
>> #omp_lapwso:4
>> #omp_dstart:4
>> #omp_sumpara:4
>> #omp_nlvdw:4
>>
>>   I had RTmax 7 percent. The error
>>
>>   .machines is the control file for parallel execution. Add lines like
>> #
>> #   speed:machine_name
>> #
>> # for each machine specifying there relative speed. For mpi 
>> parallelization use
>> #
>> #   speed:machine_name:1 machine_name:1
>> #   lapw0:machine_name:1 machine_name:1
>> #
>> # further options are:
>> #
>> #   granularity:number (for loadbalancing on irregularly used machines)
>> #   residue:machine_name  (on shared memory machines)
>> #   extrafine         (to distribute the remaining k-points one after 
>> the other)
>> #
>> # granularity sets the number of files that will be approximately
>> # be generated by each processor; this is used for load-balancing.
>> # On very homogeneous systems set number to 1
>> # if after distributing the k-points to the various machines residual
>> # k-points are left, they will be distributed to the 
>> residual-machine_name.
>> #
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> granularity:1
>> extrafine:1
>> #
>> # Uncomment for specific OMP-parallelization (overwriting a global 
>> OMP_NUM_THREADS)
>> #
>> #omp_global:4
>> # or use program-specific parallelization:
>> #omp_lapw0:4
>> #omp_lapw1:4
>> #omp_lapw2:4
>> #omp_lapwso:4
>> #omp_dstart:4
>> #omp_sumpara:4
>> #omp_nlvdw:4
>>
>> . I do not have any idea what is wrong now.
>>
>> -- 
>> Yrd Doc Dr. Murat Aycibin
>> Van Yuzuncu Yil Universitesi
>> Fizik Bolumu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20210208/95ee7127/attachment.htm>


More information about the Wien mailing list