[Wien] MPI Error while running lapw0_mpi

venky ch chvenkateshphy at gmail.com
Wed Mar 23 11:48:57 CET 2022


Dear Prof. Marks and Prof. Blaha,

Thanks for your quick responses. The answers are as follows,

a) Is this a supercomputer, a lab cluster or your cluster?

Ans: It is a supercomputer

b) Did you set it up or did someone else?

Ans: I haven't set up these ulimits.

c) Do you have root/su rights?

Ans:  I don't have root/su rights

What case is it, that you run it on 32 cores ?  How many atoms ??

Ans: The case.struct has 3 atoms and for the total of 1000 k-points, it
gave 102 reduced k-points. As I want to test the running parallel
calculations, I just took a small system with more k-points.

thanks and regards
venkatesh


On Wed, Mar 23, 2022 at 2:12 PM Laurence Marks <laurence.marks at gmail.com>
wrote:

> There are many things wrong, but let's start with the critical one --
> ulimit.
> a) Is this a supercomputer, a lab cluster or your cluster?
> b) Did you set it up or did someone else?
> c) Do you have root/su rights?
>
> Someone has set limits in such a way that it is interfering with the
> calculations. It used to be more common to see this, but it has been some
> years since I have seen it. You can look at, for instance,
> https://ss64.com/bash/ulimit.html.
>
> The best solution is to find out how these got set and remove them. For
> that you need to do some local research.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering, Northwestern University
> www.numis.northwestern.edu
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought" Albert Szent-Györgyi
>
> On Wed, Mar 23, 2022, 3:31 AM venky ch <chvenkateshphy at gmail.com> wrote:
>
>> Dear Wien2k users,
>>
>> I have successfully installed the wien2k.21 version in the HPC cluster.
>> However, while running a test calculation, I am getting the following error
>> so that the lapw0_mpi crashed.
>>
>> =========
>>
>> /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size: cannot modify
>> limit: Operation not permitted
>> /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size: cannot modify
>> limit: Operation not permitted
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> setrlimit(): WARNING: Cannot raise stack limit, continuing: Invalid
>> argument
>> Abort(744562703) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Bcast:
>> Other MPI error, error stack:
>> PMPI_Bcast(432).........................: MPI_Bcast(buf=0x7ffd8f8d359c,
>> count=1, MPI_INTEGER, root=0, comm=MPI_COMM_WORLD) failed
>> PMPI_Bcast(418).........................:
>> MPIDI_Bcast_intra_composition_gamma(391):
>> MPIDI_NM_mpi_bcast(153).................:
>> MPIR_Bcast_intra_tree(219)..............: Failure during collective
>> MPIR_Bcast_intra_tree(211)..............:
>> MPIR_Bcast_intra_tree_generic(176)......: Failure during collective
>> [1]    Exit 15                       mpirun -np 32 -machinefile .machine0 */home/proj/21/phyvech/soft/win2k2/lapw0_mpi
>> lapw0.def >> .time00*
>> cat: No match.
>> grep: *scf1*: No such file or directory
>> grep: lapw2*.error: No such file or directory
>>
>> =========
>>
>> the .machines file is
>>
>> ======= for 102 reduced k-points =========
>>
>> #
>> lapw0:node16:16 node22:16
>> 51:node16:16
>> 51:node22:16
>> granularity:1
>> extrafine:1
>>
>> ========
>>
>> "export OMP_NUM_THREADS=1" has been used in the job submission script.
>>
>> "run_lapw -p -NI -i 400 -ec 0.00001 -cc 0.0001" has been used to start
>> the parallel calculations in available nodes.
>>
>> Can someone please explain to me where I am going wrong here. Thanks in
>> advance.
>>
>> Regards,
>> Venkatesh
>> Physics department
>> IISc Bangalore, INDIA
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>>
>> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5RoG8I5VBA$
>> SEARCH the MAILING-LIST at:
>> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5Rrs-nrkzw$
>>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20220323/df45704c/attachment-0001.htm>


More information about the Wien mailing list