[Wien] MPI Error while running lapw0_mpi

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Mar 23 11:54:47 CET 2022


Try to run the lapw0_mpi on 4 cores only.

Am 3/23/22 um 11:48 schrieb venky ch:
> 
> Dear Prof. Marks and Prof. Blaha,
> 
> Thanks for your quick responses. The answers are as follows,
> 
> a) Is this a supercomputer, a lab cluster or your cluster?
> 
> Ans: It is a supercomputer
> 
> b) Did you set it up or did someone else?
> 
> Ans: I haven't set up these ulimits.
> 
> c) Do you have root/su rights?
> 
> Ans:  I don't have root/su rights
> 
> What case is it, that you run it on 32 cores ?  How many atoms ??
> 
> Ans: The case.struct has 3 atoms and for the total of 1000 k-points, it 
> gave 102 reduced k-points. As I want to test the running parallel 
> calculations, I just took a small system with more k-points.
> 
> thanks and regards
> venkatesh
> 
> 
> On Wed, Mar 23, 2022 at 2:12 PM Laurence Marks <laurence.marks at gmail.com 
> <mailto:laurence.marks at gmail.com>> wrote:
> 
>     There are many things wrong, but let's start with the critical one
>     -- ulimit.
>     a) Is this a supercomputer, a lab cluster or your cluster?
>     b) Did you set it up or did someone else?
>     c) Do you have root/su rights?
> 
>     Someone has set limits in such a way that it is interfering with the
>     calculations. It used to be more common to see this, but it has been
>     some years since I have seen it. You can look at, for instance,
>     https://ss64.com/bash/ulimit.html <https://ss64.com/bash/ulimit.html>.
> 
>     The best solution is to find out how these got set and remove them.
>     For that you need to do some local research.
> 
>     --
>     Professor Laurence Marks
>     Department of Materials Science and Engineering, Northwestern University
>     www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>     "Research is to see what everybody else has seen, and to think what
>     nobody else has thought" Albert Szent-Györgyi
> 
>     On Wed, Mar 23, 2022, 3:31 AM venky ch <chvenkateshphy at gmail.com
>     <mailto:chvenkateshphy at gmail.com>> wrote:
> 
>         Dear Wien2k users,
> 
>         I have successfully installed the wien2k.21 version in the HPC
>         cluster. However, while running a test calculation, I am getting
>         the following error so that the lapw0_mpi crashed.
> 
>         =========
> 
>         /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size:
>         cannot modify limit: Operation not permitted
>         /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size:
>         cannot modify limit: Operation not permitted
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         setrlimit(): WARNING: Cannot raise stack limit, continuing:
>         Invalid argument
>         Abort(744562703) on node 1 (rank 1 in comm 0): Fatal error in
>         PMPI_Bcast: Other MPI error, error stack:
>         PMPI_Bcast(432).........................:
>         MPI_Bcast(buf=0x7ffd8f8d359c, count=1, MPI_INTEGER, root=0,
>         comm=MPI_COMM_WORLD) failed
>         PMPI_Bcast(418).........................:
>         MPIDI_Bcast_intra_composition_gamma(391):
>         MPIDI_NM_mpi_bcast(153).................:
>         MPIR_Bcast_intra_tree(219)..............: Failure during collective
>         MPIR_Bcast_intra_tree(211)..............:
>         MPIR_Bcast_intra_tree_generic(176)......: Failure during collective
>         [1]    Exit 15                       mpirun -np 32 -machinefile
>         .machine0 */home/proj/21/phyvech/soft/win2k2/lapw0_mpi lapw0.def
>          >> .time00*
>         cat: No match.
>         grep: *scf1*: No such file or directory
>         grep: lapw2*.error: No such file or directory
> 
>         =========
> 
>         the .machines file is
> 
>         ======= for 102 reduced k-points =========
> 
>         #
>         lapw0:node16:16 node22:16
>         51:node16:16
>         51:node22:16
>         granularity:1
>         extrafine:1
> 
>         ========
> 
>         "export OMP_NUM_THREADS=1" has been used in the job submission
>         script.
> 
>         "run_lapw -p -NI -i 400 -ec 0.00001 -cc 0.0001" has been used to
>         start the parallel calculations in available nodes.
> 
>         Can someone please explain to me where I am going wrong here.
>         Thanks in advance.
> 
>         Regards,
>         Venkatesh
>         Physics department
>         IISc Bangalore, INDIA
> 
>         _______________________________________________
>         Wien mailing list
>         Wien at zeus.theochem.tuwien.ac.at
>         <mailto:Wien at zeus.theochem.tuwien.ac.at>
>         https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5RoG8I5VBA$
>         <https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5RoG8I5VBA$>
> 
>         SEARCH the MAILING-LIST at:
>         https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5Rrs-nrkzw$
>         <https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5Rrs-nrkzw$>
> 
> 
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>     <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>     SEARCH the MAILING-LIST at:
>     http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>     <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
Peter Blaha, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300          Email: peter.blaha at tuwien.ac.at
WWW: http://www.imc.tuwien.ac      WIEN2k: http://www.wien2k.at


More information about the Wien mailing list