[Wien] Parallel run problems with version 19.1
Laurence Marks
laurence.marks at gmail.com
Tue Jul 23 15:38:03 CEST 2019
Either:
1) You are running on a remote node and the shared file has not been
exported (nfs) mounted
2) Your implimentation of mpirun is not exporting the location of the fftw
file, e.g. LIBRARY_PATH is not being exported.
This is not a WIen2k problem, it is an OS problem. I suggest that you work
through the documentation on how to run YOUR mpi, and use (at a terminal)
"x lapw0 -p" until it works.
On Tue, Jul 23, 2019 at 2:24 PM Ricardo Moreira <
ricardopachecomoreira at gmail.com> wrote:
> Yes, the calculation was initialized with spin-polarization, x lapw0
> generates case.vspup and case.vspdn and runsp_lapw runs without issue until
> convergence is reached. Regarding the message that is shown, it is as
> follows:
>
> starting parallel lapw0 at Tue Jul 23 14:06:25 WEST 2019
> -------- .machine0 : 2 processors
> [1] 18397
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared
> libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file
> or directory
> --------------------------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared
> libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file
> or directory
> [1] Exit 127 mpirun -np 2 -machinefile .machine0
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi lapw0.def >> .time00
> 0.059u 0.133s 0:03.36 5.3% 0+0k 1312+240io 6pf+0w
>
> I looiked at the lib folder for fftw and the file is definitely there so
> I'm not sure what the cause for this would be.
>
> As for Professor Blaha's questions, I shall attempt to answer in order:
> 1) Yes it does work
> 2)The .machines file is:
> #
> 1:ava18:1
> 1:ava18:1
> lapw0:ava18:2
> granularity:1
> extrafine:1
> 3) ls -als *output00* returns that there is no such file or directory but
> there is a file called TiC.output0 so I'll assume this is the file of
> interest here. The output os ls -als TiC.output0 is "68 -rw-r--r-- 1
> fc-up201202493 cfp 65791 Jul 22 19:59 TiC.output0."
> 4) The end of TiC.output0 has the following:
>
> =====>>> CPU-TIME SUMMARY
> TOTAL CPU/WALL-TIME USED : 2.8 100. PERCENT 2.8
> 100. PERCENT
> TIME MULTIPOLMOMENTS: 0.0 1. PERCENT 0.0
> 1. PERCENT
> TIME COULOMB POT INT: 0.0 0. PERCENT
> 0.0 0. PERCENT
> TIME COULOMB POT RMT: 0.0 0. PERCENT 0.0
> 0. PERCENT
> TIME COULOMB POT SPH: 0.0 1. PERCENT 0.0
> 1. PERCENT
> TIME XCPOT SPHERES : 1.8 64. PERCENT 1.8
> 63. PERCENT
> TIME XCPOT INTERST : 0.8 29. PERCENT
> 0.8 29. PERCENT
> TIME TOTAL ENERGY : 0.1 2. PERCENT
> 0.1 2. PERCENT
> TIME REAN0, REAN3 : 0.1 0. PERCENT
> 0.1 0. PERCENT
> TIME REANALYSE : 0.0 2. PERCENT
> 0.1 2. PERCENT
>
> (the spacings are a bit off compared to what shows up on the actual file).
>
> Lastly regarding fftw-mpi. I had to update the GNU compilers I was
> previously using for version 18.2 as they were deemed to be too old a
> version by ./siteconfig_lapw. As such I compiled a new version of OpenMPI
> with the newer version of the compilers. I wasn't sure if I had done the
> same for fftw so I went and recompiled fftw and then recompiled Wien2k
> version 19.1 afterwards but the error persists so this does not seem to be
> the cause of the it.
>
>
>> On Mon, 22 Jul 2019 at 19:54, Peter Blaha <pblaha at theochem.tuwien.ac.at>
>> wrote:
>>
>>> Please:
>>> 1) does x lapw0 work ???
>>> 2) list your .machines file. In particular: for TiC use only 2 cores
>>> (because of 2 atoms)
>>> 3) ls -als *output00*
>>> 4) what is at the end of *.output0000 ??? Please check for any errors.
>>>
>>> Is your fftw-mpi compiled with the same compiler as wien2k ??
>>>
>>>
>>> Am 22.07.2019 um 20:45 schrieb Ricardo Moreira:
>>> > I had it at 4 as per the default value suggested during configuration
>>> > but I changed it to 1 now. In spite of that, "x lapw0 -p" still did
>>> not
>>> > generate case.vspup or case.vspdn.
>>> >
>>> > On Mon, 22 Jul 2019 at 19:01, <tran at theochem.tuwien.ac.at
>>> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> >
>>> > Do you have the variable OMP_NUM_THREADS set in your .bashrc or
>>> .cshrc
>>> > file? If yes and the value is greater than 1, then set it to 1 and
>>> > execute agian "x lapw0 -p".
>>> >
>>> > On Monday 2019-07-22 18:39, Ricardo Moreira wrote:
>>> >
>>> > >Date: Mon, 22 Jul 2019 18:39:45
>>> > >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
>>> > <mailto:ricardopachecomoreira at gmail.com>>
>>> > >Reply-To: A Mailing list for WIEN2k users
>>> > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > >To: A Mailing list for WIEN2k users
>>> > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > >Subject: Re: [Wien] Parallel run problems with version 19.1
>>> > >
>>> > >That is indeed the case, neither case.vspup or case.vspdn were
>>> > generated after running "x lapw0 -p".
>>> > >
>>> > >On Mon, 22 Jul 2019 at 17:09, <tran at theochem.tuwien.ac.at
>>> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> > > It seems that lapw0 does not generate case.vspup and
>>> > > case.vspdn (and case.vsp for non-spin-polarized
>>> calculation).
>>> > > Can you confirm that by executing "x lapw0 -p" on the
>>> command
>>> > > line?
>>> > >
>>> > > On Monday 2019-07-22 17:45, Ricardo Moreira wrote:
>>> > >
>>> > > >Date: Mon, 22 Jul 2019 17:45:51
>>> > > >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
>>> > <mailto:ricardopachecomoreira at gmail.com>>
>>> > > >Reply-To: A Mailing list for WIEN2k users
>>> > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > > >To: A Mailing list for WIEN2k users
>>> > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > > >Subject: Re: [Wien] Parallel run problems with version
>>> 19.1
>>> > > >
>>> > > >The command "ls *vsp*" returns only the files
>>> > "TiC.vspdn_st" and
>>> > > >"TiC.vsp_st", so it would appear that the file is not
>>> > created at all when
>>> > > >using the -p switch to runsp_lapw.
>>> > > >
>>> > > >On Mon, 22 Jul 2019 at 16:29, <tran at theochem.tuwien.ac.at
>>> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> > > > Is the file TiC.vspup emtpy?
>>> > > >
>>> > > > On Monday 2019-07-22 17:24, Ricardo Moreira wrote:
>>> > > >
>>> > > > >Date: Mon, 22 Jul 2019 17:24:42
>>> > > > >From: Ricardo Moreira
>>> > <ricardopachecomoreira at gmail.com
>>> > <mailto:ricardopachecomoreira at gmail.com>>
>>> > > > >Reply-To: A Mailing list for WIEN2k users
>>> > > > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > > > >To: A Mailing list for WIEN2k users
>>> > > > <wien at zeus.theochem.tuwien.ac.at
>>> > <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> > > > >Subject: Re: [Wien] Parallel run problems with
>>> > version 19.1
>>> > > > >
>>> > > > >Hi and thanks for the reply,
>>> > > > >Regarding serial calculations, yes in both non
>>> > spin-polarized
>>> > > > and spin-polarized everything runs properly in the
>>> > cases you
>>> > > > described. As
>>> > > > >for parallel, it fails in both cases, with the
>>> error I
>>> > > > indicated in my previous email.
>>> > > > >
>>> > > > >Best Regards,
>>> > > > >Ricardo Moreira
>>>
>>> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=hEMBhio6EH0crCV5UH4PN2UqzBDTYSHFCI8wLmDQ32M&s=4mLRc61qqy0iwNKNtsuAFWhydjNwkr08vIKcdCi_5pg&e=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=hEMBhio6EH0crCV5UH4PN2UqzBDTYSHFCI8wLmDQ32M&s=QJIQ7wk6saD5fS8JhkYTExa6i2LXIyySFcfDbJ_fNVg&e=
>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190723/800e7d21/attachment.html>
More information about the Wien
mailing list