[Wien] Error in paralel Lapw1
Gavin Abo
gsabo at crimson.ua.edu
Mon Feb 8 14:36:46 CET 2021
You might also check what OMP_NUM_THREADS is set to on your system in
.bashrc or .cshrc?
For example, on my Ubuntu system, I do:
username at computername:~/Desktop$ grep OMP_NUM_THREADS ~/.bashrc
export OMP_NUM_THREADS=1
As you can see I'm using a different value than the default that would
have been set by userconfig_lapw during installation of WIEN2k. I
believe the default value is OMP_NUM_THREADS=4.
Is your Xeon processor a E5-2698 v3? If it is, the following link has
"# of Threads" as 32:
https://ark.intel.com/content/www/us/en/ark/products/81060/intel-xeon-processor-e5-2698-v3-40m-cache-2-30-ghz.html
With your .machines file requesting 16 cores, if you OMP_NUM_THREADS is
4, you would be requesting 16 cores * 4 threads/core = 64 threads. That
should be 32 threads (=64 requested threads - 32 processor core threads)
more than your processor could handle at one time.
If you using a different processor, you would have to look on Intel's
website to find out the "# of Threads" your particular processor can handle.
The OMP_NUM_THREADS of course can be overridden by using omp_global in
the .machines file.
If the problem is coming from a memory error as previously discussed as
a possibility in the post:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20807.html
Then, you might want to check /var/log. The following post might help
with that:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19703.html
You might also check what parallel_options are set to with the command:
cat $WIENROOT/parallel_options
If the problem is related to passwordless login. One of the posts in
the mailing list archive that might help is:
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg02295.html
On 2/8/2021 5:33 AM, Peter Blaha wrote:
> We still don't know much about your case.
>
> Please modify your .machinesfile and use only 2 instead of 16 lines with
> 1:localhost
> If this solves the problem, increase it to 4 or 6 (when you have 12
> k-points) or 8 (if you have more k-points).
> Also uncomment
> omp_global:2 or 4
> Then you are still using all your cores, but you will need less memory.
>
> Am 08.02.2021 um 11:24 schrieb Murat Aycibin:
>> Hi Dr/ Blaha
>> Thanks you for your reply
>> I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I
>> got the same mistake. I have computer which has 64 GB Ramand i have
>> 16 core (intel xeon processes). My machine file is
>>
>> .machines is the control file for parallel execution. Add lines like
>> #
>> # speed:machine_name
>> #
>> # for each machine specifying there relative speed. For mpi
>> parallelization use
>> #
>> # speed:machine_name:1 machine_name:1
>> # lapw0:machine_name:1 machine_name:1
>> #
>> # further options are:
>> #
>> # granularity:number (for loadbalancing on irregularly used machines)
>> # residue:machine_name (on shared memory machines)
>> # extrafine (to distribute the remaining k-points one after
>> the other)
>> #
>> # granularity sets the number of files that will be approximately
>> # be generated by each processor; this is used for load-balancing.
>> # On very homogeneous systems set number to 1
>> # if after distributing the k-points to the various machines residual
>> # k-points are left, they will be distributed to the
>> residual-machine_name.
>> #
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> granularity:1
>> extrafine:1
>> #
>> # Uncomment for specific OMP-parallelization (overwriting a global
>> OMP_NUM_THREADS)
>> #
>> #omp_global:4
>> # or use program-specific parallelization:
>> #omp_lapw0:4
>> #omp_lapw1:4
>> #omp_lapw2:4
>> #omp_lapwso:4
>> #omp_dstart:4
>> #omp_sumpara:4
>> #omp_nlvdw:4
>>
>> I had RTmax 7 percent. The error
>>
>> .machines is the control file for parallel execution. Add lines like
>> #
>> # speed:machine_name
>> #
>> # for each machine specifying there relative speed. For mpi
>> parallelization use
>> #
>> # speed:machine_name:1 machine_name:1
>> # lapw0:machine_name:1 machine_name:1
>> #
>> # further options are:
>> #
>> # granularity:number (for loadbalancing on irregularly used machines)
>> # residue:machine_name (on shared memory machines)
>> # extrafine (to distribute the remaining k-points one after
>> the other)
>> #
>> # granularity sets the number of files that will be approximately
>> # be generated by each processor; this is used for load-balancing.
>> # On very homogeneous systems set number to 1
>> # if after distributing the k-points to the various machines residual
>> # k-points are left, they will be distributed to the
>> residual-machine_name.
>> #
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> granularity:1
>> extrafine:1
>> #
>> # Uncomment for specific OMP-parallelization (overwriting a global
>> OMP_NUM_THREADS)
>> #
>> #omp_global:4
>> # or use program-specific parallelization:
>> #omp_lapw0:4
>> #omp_lapw1:4
>> #omp_lapw2:4
>> #omp_lapwso:4
>> #omp_dstart:4
>> #omp_sumpara:4
>> #omp_nlvdw:4
>>
>> . I do not have any idea what is wrong now.
>>
>> --
>> Yrd Doc Dr. Murat Aycibin
>> Van Yuzuncu Yil Universitesi
>> Fizik Bolumu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20210208/95ee7127/attachment.htm>
More information about the Wien
mailing list