[Wien] Parallel run problems with version 19.1
tran at theochem.tuwien.ac.at
tran at theochem.tuwien.ac.at
Tue Jul 23 15:37:28 CEST 2019
Are you sure that libfftw3_mpi.so.3 is really there?
Where it should be is indicated in the Makefile of SRC_lapw0
(the path is FFTWROOT combined with FFTW_LIB).
On Tuesday 2019-07-23 15:24, Ricardo Moreira wrote:
>Date: Tue, 23 Jul 2019 15:24:25
>From: Ricardo Moreira <ricardopachecomoreira at gmail.com>
>Reply-To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
>To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
>Subject: Re: [Wien] Parallel run problems with version 19.1
>
>Yes, the calculation was initialized with spin-polarization, x lapw0 generates case.vspup and case.vspdn and runsp_lapw runs without issue until
>convergence is reached. Regarding the message that is shown, it is as follows:
>
>starting parallel lapw0 at Tue Jul 23 14:06:25 WEST 2019
>-------- .machine0 : 2 processors
>[1] 18397
>/homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file or
>directory
>--------------------------------------------------------------------------
>Primary job terminated normally, but 1 process returned
>a non-zero exit code. Per user-direction, the job has been aborted.
>--------------------------------------------------------------------------
>/homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file or
>directory
>[1] Exit 127 mpirun -np 2 -machinefile .machine0 /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi lapw0.def >> .time00
>0.059u 0.133s 0:03.36 5.3% 0+0k 1312+240io 6pf+0w
>
>I looiked at the lib folder for fftw and the file is definitely there so I'm not sure what the cause for this would be.
>
>As for Professor Blaha's questions, I shall attempt to answer in order:
>1) Yes it does work
>2)The .machines file is:
>#
>1:ava18:1
>1:ava18:1
>lapw0:ava18:2
>granularity:1
>extrafine:1
>3) ls -als *output00* returns that there is no such file or directory but there is a file called TiC.output0 so I'll assume this is the file of
>interest here. The output os ls -als TiC.output0 is "68 -rw-r--r-- 1 fc-up201202493 cfp 65791 Jul 22 19:59 TiC.output0."
>4) The end of TiC.output0 has the following:
>
> =====>>> CPU-TIME SUMMARY
> TOTAL CPU/WALL-TIME USED : 2.8 100. PERCENT 2.8 100. PERCENT
> TIME MULTIPOLMOMENTS: 0.0 1. PERCENT 0.0 1. PERCENT
> TIME COULOMB POT INT: 0.0 0. PERCENT 0.0 0. PERCENT
> TIME COULOMB POT RMT: 0.0 0. PERCENT 0.0 0. PERCENT
> TIME COULOMB POT SPH: 0.0 1. PERCENT 0.0 1. PERCENT
> TIME XCPOT SPHERES : 1.8 64. PERCENT 1.8 63. PERCENT
> TIME XCPOT INTERST : 0.8 29. PERCENT 0.8 29. PERCENT
> TIME TOTAL ENERGY : 0.1 2. PERCENT 0.1 2. PERCENT
> TIME REAN0, REAN3 : 0.1 0. PERCENT 0.1 0. PERCENT
> TIME REANALYSE : 0.0 2. PERCENT 0.1 2. PERCENT
>
>(the spacings are a bit off compared to what shows up on the actual file).
>
>Lastly regarding fftw-mpi. I had to update the GNU compilers I was previously using for version 18.2 as they were deemed to be too old a version by
>./siteconfig_lapw. As such I compiled a new version of OpenMPI with the newer version of the compilers. I wasn't sure if I had done the same for fftw
>so I went and recompiled fftw and then recompiled Wien2k version 19.1 afterwards but the error persists so this does not seem to be the cause of the
>it.
>
> On Mon, 22 Jul 2019 at 19:54, Peter Blaha <pblaha at theochem.tuwien.ac.at> wrote:
> Please:
> 1) does x lapw0 work ???
> 2) list your .machines file. In particular: for TiC use only 2 cores
> (because of 2 atoms)
> 3) ls -als *output00*
> 4) what is at the end of *.output0000 ??? Please check for any errors.
>
> Is your fftw-mpi compiled with the same compiler as wien2k ??
>
>
> Am 22.07.2019 um 20:45 schrieb Ricardo Moreira:
> > I had it at 4 as per the default value suggested during configuration
> > but I changed it to 1 now. In spite of that, "x lapw0 -p" still did not
> > generate case.vspup or case.vspdn.
> >
> > On Mon, 22 Jul 2019 at 19:01, <tran at theochem.tuwien.ac.at
> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
> >
> > Do you have the variable OMP_NUM_THREADS set in your .bashrc or .cshrc
> > file? If yes and the value is greater than 1, then set it to 1 and
> > execute agian "x lapw0 -p".
> >
> > On Monday 2019-07-22 18:39, Ricardo Moreira wrote:
> >
> > >Date: Mon, 22 Jul 2019 18:39:45
> > >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
> > <mailto:ricardopachecomoreira at gmail.com>>
> > >Reply-To: A Mailing list for WIEN2k users
> > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > >To: A Mailing list for WIEN2k users
> > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > >Subject: Re: [Wien] Parallel run problems with version 19.1
> > >
> > >That is indeed the case, neither case.vspup or case.vspdn were
> > generated after running "x lapw0 -p".
> > >
> > >On Mon, 22 Jul 2019 at 17:09, <tran at theochem.tuwien.ac.at
> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
> > > It seems that lapw0 does not generate case.vspup and
> > > case.vspdn (and case.vsp for non-spin-polarized calculation).
> > > Can you confirm that by executing "x lapw0 -p" on the command
> > > line?
> > >
> > > On Monday 2019-07-22 17:45, Ricardo Moreira wrote:
> > >
> > > >Date: Mon, 22 Jul 2019 17:45:51
> > > >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
> > <mailto:ricardopachecomoreira at gmail.com>>
> > > >Reply-To: A Mailing list for WIEN2k users
> > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > > >To: A Mailing list for WIEN2k users
> > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > > >Subject: Re: [Wien] Parallel run problems with version 19.1
> > > >
> > > >The command "ls *vsp*" returns only the files
> > "TiC.vspdn_st" and
> > > >"TiC.vsp_st", so it would appear that the file is not
> > created at all when
> > > >using the -p switch to runsp_lapw.
> > > >
> > > >On Mon, 22 Jul 2019 at 16:29, <tran at theochem.tuwien.ac.at
> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
> > > > Is the file TiC.vspup emtpy?
> > > >
> > > > On Monday 2019-07-22 17:24, Ricardo Moreira wrote:
> > > >
> > > > >Date: Mon, 22 Jul 2019 17:24:42
> > > > >From: Ricardo Moreira
> > <ricardopachecomoreira at gmail.com
> > <mailto:ricardopachecomoreira at gmail.com>>
> > > > >Reply-To: A Mailing list for WIEN2k users
> > > > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > > > >To: A Mailing list for WIEN2k users
> > > > <wien at zeus.theochem.tuwien.ac.at
> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
> > > > >Subject: Re: [Wien] Parallel run problems with
> > version 19.1
> > > > >
> > > > >Hi and thanks for the reply,
> > > > >Regarding serial calculations, yes in both non
> > spin-polarized
> > > > and spin-polarized everything runs properly in the
> > cases you
> > > > described. As
> > > > >for parallel, it fails in both cases, with the error I
> > > > indicated in my previous email.
> > > > >
> > > > >Best Regards,
> > > > >Ricardo Moreira
>
>
>
More information about the Wien
mailing list