[Wien] MPI Error while running lapw0_mpi
Peter Blaha
pblaha at theochem.tuwien.ac.at
Wed Mar 23 11:54:47 CET 2022
Try to run the lapw0_mpi on 4 cores only.
Am 3/23/22 um 11:48 schrieb venky ch:
>
> Dear Prof. Marks and Prof. Blaha,
>
> Thanks for your quick responses. The answers are as follows,
>
> a) Is this a supercomputer, a lab cluster or your cluster?
>
> Ans: It is a supercomputer
>
> b) Did you set it up or did someone else?
>
> Ans: I haven't set up these ulimits.
>
> c) Do you have root/su rights?
>
> Ans: I don't have root/su rights
>
> What case is it, that you run it on 32 cores ? How many atoms ??
>
> Ans: The case.struct has 3 atoms and for the total of 1000 k-points, it
> gave 102 reduced k-points. As I want to test the running parallel
> calculations, I just took a small system with more k-points.
>
> thanks and regards
> venkatesh
>
>
> On Wed, Mar 23, 2022 at 2:12 PM Laurence Marks <laurence.marks at gmail.com
> <mailto:laurence.marks at gmail.com>> wrote:
>
> There are many things wrong, but let's start with the critical one
> -- ulimit.
> a) Is this a supercomputer, a lab cluster or your cluster?
> b) Did you set it up or did someone else?
> c) Do you have root/su rights?
>
> Someone has set limits in such a way that it is interfering with the
> calculations. It used to be more common to see this, but it has been
> some years since I have seen it. You can look at, for instance,
> https://ss64.com/bash/ulimit.html <https://ss64.com/bash/ulimit.html>.
>
> The best solution is to find out how these got set and remove them.
> For that you need to do some local research.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering, Northwestern University
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought" Albert Szent-Györgyi
>
> On Wed, Mar 23, 2022, 3:31 AM venky ch <chvenkateshphy at gmail.com
> <mailto:chvenkateshphy at gmail.com>> wrote:
>
> Dear Wien2k users,
>
> I have successfully installed the wien2k.21 version in the HPC
> cluster. However, while running a test calculation, I am getting
> the following error so that the lapw0_mpi crashed.
>
> =========
>
> /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size:
> cannot modify limit: Operation not permitted
> /home/proj/21/phyvech/.bashrc: line 43: ulimit: stack size:
> cannot modify limit: Operation not permitted
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> setrlimit(): WARNING: Cannot raise stack limit, continuing:
> Invalid argument
> Abort(744562703) on node 1 (rank 1 in comm 0): Fatal error in
> PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(432).........................:
> MPI_Bcast(buf=0x7ffd8f8d359c, count=1, MPI_INTEGER, root=0,
> comm=MPI_COMM_WORLD) failed
> PMPI_Bcast(418).........................:
> MPIDI_Bcast_intra_composition_gamma(391):
> MPIDI_NM_mpi_bcast(153).................:
> MPIR_Bcast_intra_tree(219)..............: Failure during collective
> MPIR_Bcast_intra_tree(211)..............:
> MPIR_Bcast_intra_tree_generic(176)......: Failure during collective
> [1] Exit 15 mpirun -np 32 -machinefile
> .machine0 */home/proj/21/phyvech/soft/win2k2/lapw0_mpi lapw0.def
> >> .time00*
> cat: No match.
> grep: *scf1*: No such file or directory
> grep: lapw2*.error: No such file or directory
>
> =========
>
> the .machines file is
>
> ======= for 102 reduced k-points =========
>
> #
> lapw0:node16:16 node22:16
> 51:node16:16
> 51:node22:16
> granularity:1
> extrafine:1
>
> ========
>
> "export OMP_NUM_THREADS=1" has been used in the job submission
> script.
>
> "run_lapw -p -NI -i 400 -ec 0.00001 -cc 0.0001" has been used to
> start the parallel calculations in available nodes.
>
> Can someone please explain to me where I am going wrong here.
> Thanks in advance.
>
> Regards,
> Venkatesh
> Physics department
> IISc Bangalore, INDIA
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> <mailto:Wien at zeus.theochem.tuwien.ac.at>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5RoG8I5VBA$
> <https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5RoG8I5VBA$>
>
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5Rrs-nrkzw$
> <https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!EjsOR9LgTq0Sb4Fdkumkx-ojEFXIwmm_9DGv6dLC8CVy5poh_mIqR5ltPMeK5Rrs-nrkzw$>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
--
Peter Blaha, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 Email: peter.blaha at tuwien.ac.at
WWW: http://www.imc.tuwien.ac WIEN2k: http://www.wien2k.at
More information about the Wien
mailing list