[Wien] Parallel run problems with version 19.1

Laurence Marks laurence.marks at gmail.com
Tue Jul 23 15:38:03 CEST 2019


Either:
1) You are running on a remote node and the shared file has not been
exported (nfs) mounted
2) Your implimentation of mpirun is not exporting the location of the fftw
file, e.g. LIBRARY_PATH is not being exported.

This is not a WIen2k problem, it is an OS problem. I suggest that you work
through the documentation on how to run YOUR mpi, and use (at a terminal)
"x lapw0 -p" until it works.

On Tue, Jul 23, 2019 at 2:24 PM Ricardo Moreira <
ricardopachecomoreira at gmail.com> wrote:

> Yes, the calculation was initialized with spin-polarization, x lapw0
> generates case.vspup and case.vspdn and runsp_lapw runs without issue until
> convergence is reached. Regarding the message that is shown, it is as
> follows:
>
> starting parallel lapw0 at Tue Jul 23 14:06:25 WEST 2019
> -------- .machine0 : 2 processors
> [1] 18397
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared
> libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file
> or directory
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi: error while loading shared
> libraries: libfftw3_mpi.so.3: cannot open shared object file: No such file
> or directory
> [1]    Exit 127                      mpirun -np 2 -machinefile .machine0
> /homes/fc-up201202493/WIEN2k_19.1/lapw0_mpi lapw0.def >> .time00
> 0.059u 0.133s 0:03.36 5.3%      0+0k 1312+240io 6pf+0w
>
> I looiked at the lib folder for fftw and the file is definitely there so
> I'm not sure what the cause for this would be.
>
> As for Professor Blaha's questions, I shall attempt to answer in order:
> 1) Yes it does work
> 2)The .machines file is:
> #
> 1:ava18:1
> 1:ava18:1
> lapw0:ava18:2
> granularity:1
> extrafine:1
> 3) ls -als *output00* returns that there is no such file or directory but
> there is a file called TiC.output0 so I'll assume this is the file of
> interest here. The output os ls -als TiC.output0 is "68 -rw-r--r-- 1
> fc-up201202493 cfp 65791 Jul 22 19:59 TiC.output0."
> 4) The end of TiC.output0 has the following:
>
>    =====>>> CPU-TIME SUMMARY
>             TOTAL CPU/WALL-TIME USED :     2.8     100. PERCENT    2.8
> 100. PERCENT
>             TIME MULTIPOLMOMENTS:           0.0          1. PERCENT    0.0
>       1. PERCENT
>             TIME COULOMB POT INT:              0.0          0. PERCENT
>  0.0       0. PERCENT
>             TIME COULOMB POT RMT:            0.0       0. PERCENT    0.0
>     0. PERCENT
>             TIME COULOMB POT SPH:            0.0       1. PERCENT    0.0
>     1. PERCENT
>             TIME XCPOT SPHERES  :              1.8      64. PERCENT    1.8
>      63. PERCENT
>             TIME XCPOT INTERST  :                0.8      29. PERCENT
>  0.8      29. PERCENT
>             TIME TOTAL ENERGY   :                0.1       2. PERCENT
>  0.1       2. PERCENT
>             TIME REAN0, REAN3   :                  0.1       0. PERCENT
>  0.1       0. PERCENT
>             TIME REANALYSE      :                    0.0       2. PERCENT
>    0.1       2. PERCENT
>
> (the spacings are a bit off compared to what shows up on the actual file).
>
> Lastly regarding fftw-mpi. I had to update the GNU compilers I was
> previously using for version 18.2 as they were deemed to be too old a
> version by ./siteconfig_lapw. As such I compiled a new version of OpenMPI
> with the newer version of the compilers. I wasn't sure if I had done the
> same for fftw so I went and recompiled fftw and then recompiled Wien2k
> version 19.1 afterwards but the error persists so this does not seem to be
> the cause of the it.
>
>
>> On Mon, 22 Jul 2019 at 19:54, Peter Blaha <pblaha at theochem.tuwien.ac.at>
>> wrote:
>>
>>> Please:
>>> 1) does   x lapw0   work ???
>>> 2) list your .machines file. In particular: for TiC use only 2 cores
>>> (because of 2 atoms)
>>> 3) ls -als *output00*
>>> 4) what is at the end of *.output0000  ??? Please check for any errors.
>>>
>>> Is your fftw-mpi compiled with the same compiler as wien2k ??
>>>
>>>
>>> Am 22.07.2019 um 20:45 schrieb Ricardo Moreira:
>>> > I had it at 4 as per the default value suggested during configuration
>>> > but I changed it to 1 now. In spite of that, "x lapw0 -p" still did
>>> not
>>> > generate case.vspup or case.vspdn.
>>> >
>>> > On Mon, 22 Jul 2019 at 19:01, <tran at theochem.tuwien.ac.at
>>> > <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> >
>>> >     Do you have the variable OMP_NUM_THREADS set in your .bashrc or
>>> .cshrc
>>> >     file? If yes and the value is greater than 1, then set it to 1 and
>>> >     execute agian "x lapw0 -p".
>>> >
>>> >     On Monday 2019-07-22 18:39, Ricardo Moreira wrote:
>>> >
>>> >      >Date: Mon, 22 Jul 2019 18:39:45
>>> >      >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
>>> >     <mailto:ricardopachecomoreira at gmail.com>>
>>> >      >Reply-To: A Mailing list for WIEN2k users
>>> >     <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >To: A Mailing list for WIEN2k users
>>> >     <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >Subject: Re: [Wien] Parallel run problems with version 19.1
>>> >      >
>>> >      >That is indeed the case, neither case.vspup or case.vspdn were
>>> >     generated after running "x lapw0 -p".
>>> >      >
>>> >      >On Mon, 22 Jul 2019 at 17:09, <tran at theochem.tuwien.ac.at
>>> >     <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> >      >      It seems that lapw0 does not generate case.vspup and
>>> >      >      case.vspdn (and case.vsp for non-spin-polarized
>>> calculation).
>>> >      >      Can you confirm that by executing "x lapw0 -p" on the
>>> command
>>> >      >      line?
>>> >      >
>>> >      >      On Monday 2019-07-22 17:45, Ricardo Moreira wrote:
>>> >      >
>>> >      >      >Date: Mon, 22 Jul 2019 17:45:51
>>> >      >      >From: Ricardo Moreira <ricardopachecomoreira at gmail.com
>>> >     <mailto:ricardopachecomoreira at gmail.com>>
>>> >      >      >Reply-To: A Mailing list for WIEN2k users
>>> >     <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >      >To: A Mailing list for WIEN2k users
>>> >     <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >      >Subject: Re: [Wien] Parallel run problems with version
>>> 19.1
>>> >      >      >
>>> >      >      >The command "ls *vsp*" returns only the files
>>> >     "TiC.vspdn_st" and
>>> >      >      >"TiC.vsp_st", so it would appear that the file is not
>>> >     created at all when
>>> >      >      >using the -p switch to runsp_lapw.
>>> >      >      >
>>> >      >      >On Mon, 22 Jul 2019 at 16:29, <tran at theochem.tuwien.ac.at
>>> >     <mailto:tran at theochem.tuwien.ac.at>> wrote:
>>> >      >      >      Is the file TiC.vspup emtpy?
>>> >      >      >
>>> >      >      >      On Monday 2019-07-22 17:24, Ricardo Moreira wrote:
>>> >      >      >
>>> >      >      >      >Date: Mon, 22 Jul 2019 17:24:42
>>> >      >      >      >From: Ricardo Moreira
>>> >     <ricardopachecomoreira at gmail.com
>>> >     <mailto:ricardopachecomoreira at gmail.com>>
>>> >      >      >      >Reply-To: A Mailing list for WIEN2k users
>>> >      >      >      <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >      >      >To: A Mailing list for WIEN2k users
>>> >      >      >      <wien at zeus.theochem.tuwien.ac.at
>>> >     <mailto:wien at zeus.theochem.tuwien.ac.at>>
>>> >      >      >      >Subject: Re: [Wien] Parallel run problems with
>>> >     version 19.1
>>> >      >      >      >
>>> >      >      >      >Hi and thanks for the reply,
>>> >      >      >      >Regarding serial calculations, yes in both non
>>> >     spin-polarized
>>> >      >      >      and spin-polarized everything runs properly in the
>>> >     cases you
>>> >      >      >      described. As
>>> >      >      >      >for parallel, it fails in both cases, with the
>>> error I
>>> >      >      >      indicated in my previous email.
>>> >      >      >      >
>>> >      >      >      >Best Regards,
>>> >      >      >      >Ricardo Moreira
>>>
>>> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=hEMBhio6EH0crCV5UH4PN2UqzBDTYSHFCI8wLmDQ32M&s=4mLRc61qqy0iwNKNtsuAFWhydjNwkr08vIKcdCi_5pg&e=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=hEMBhio6EH0crCV5UH4PN2UqzBDTYSHFCI8wLmDQ32M&s=QJIQ7wk6saD5fS8JhkYTExa6i2LXIyySFcfDbJ_fNVg&e=
>


-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190723/800e7d21/attachment.html>


More information about the Wien mailing list