[Wien] Segmentation fault in Supercell Calculation

Laurence Marks L-marks at northwestern.edu
Tue Jul 28 23:00:18 CEST 2015


You have the wrong blacs for openmpi, please use the Intel link advisor I
sent to work out what you need. It looks like you may need static linking
with openmpi.

I am certain that you misread the email about "-C -g", it will only
diagnose problems and will in general create problems (and make the
programs run 25-50% slower).

On Tue, Jul 28, 2015 at 3:41 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:

>  Dear professor:
>
>
>  I use Open MPI, version 1.4.5.
>
>
>  I added  "-C -g" because some people in the mail-list said it probably
> will solve the problem.
>
> Thanks for your advice, I will recompile the package soon.
>
> Sincerely
> Wangwei
>  ------------------------------
> *From:* wien-bounces at zeus.theochem.tuwien.ac.at <
> wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Laurence Marks <
> L-marks at northwestern.edu>
> *Sent:* Tuesday, July 28, 2015 15:36
>
> *To:* A Mailing list for WIEN2k users
> *Subject:* Re: [Wien] Segmentation fault in Supercell Calculation
>
>  N.B., unless you are a code developer "-C -g" are a terrible idea.
> Remove them, they may easily lead to the code crashing. Replace them by
> just "-O1"
>
> On Tue, Jul 28, 2015 at 3:28 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:
>
>>  Dear Professor:
>>
>>
>>
>>  When I type "mpif90 --version", it give me " ifort (IFORT) 12.1.3
>> 20120212". So, I thought it should work.
>>
>>
>>  My Libraries linking are listed below:
>>
>> Parallel excution:
>>
>>      FFTW_LIB + FFTW_OPT    : -lfftw3_mpi -lfftw3 -L/opt/fftw3.3.3/lib  +
>>  -DFFTW3 -I/opt/fftw3.3.3/include (already set)
>>      RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_blacs_lp64
>> $(R_LIBS)
>>      FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip
>> -DINTEL_VML -traceback -assume buffered_io
>>
>> Compiler Option
>>
>>  O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip
>> -DINTEL_VML -traceback -assume buffered_io -C -g
>>  F   FFTW options:            -DFFTW3 -I/opt/fftw3.3.3/include
>>  L   Linker Flags:            $(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH)
>> -pthread
>>  P   Preprocessor flags       '-DParallel'
>>  R   R_LIB (LAPACK+BLAS):     -lmkl_lapack95_lp64 -lmkl_intel_lp64
>> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lmkl_solver_lp64
>>  FL  FFTW_LIBS:               -lfftw3_mpi -lfftw3 -L/opt/fftw3.3.3/lib
>>
>>
>>
>>  Sincerely
>> Wangwei
>>
>> ------------------------------
>> *From:* wien-bounces at zeus.theochem.tuwien.ac.at <
>> wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Laurence Marks <
>> L-marks at northwestern.edu>
>> *Sent:* Tuesday, July 28, 2015 14:59
>> *To:* A Mailing list for WIEN2k users
>> *Subject:* Re: [Wien] Segmentation fault in Supercell Calculation
>>
>>  Your options are probably wrong:
>>
>>  a) mpif90 is normally gfortran, the Intel version is mpiifort
>> b) It is easy to use the wrong linking with the Intel mkl libraries.
>> Please provide the information I requested.
>>
>>
>> On Tue, Jul 28, 2015 at 2:55 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:
>>
>>>  Dear Professor:
>>>
>>>
>>>  Yes, "x lapw0" works without mpi.
>>>
>>>
>>>  My mpi compile : mpif90
>>>
>>> I use Open MPI, version 1.4.5
>>>
>>> the parallel compilation options are
>>>
>>>  -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume
>>> buffered_io
>>>
>>> I use Intel MKL libraries, that part should be fine.
>>>
>>>
>>>  Thanks very much for your help.
>>>
>>>  Sincerely
>>> Wangwei Lan
>>>  ------------------------------
>>> *From:* wien-bounces at zeus.theochem.tuwien.ac.at <
>>> wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Laurence Marks <
>>> L-marks at northwestern.edu>
>>> *Sent:* Tuesday, July 28, 2015 14:30
>>> *To:* A Mailing list for WIEN2k users
>>> *Subject:* Re: [Wien] Segmentation fault in Supercell Calculation
>>>
>>>  Does a simple "x lapw0" work, i.e. without mpi, for this specific
>>> case?
>>>
>>>  If it does then there is probably an error in how you have
>>> linked/compiled the mpi versions. Please provide:
>>>
>>>  a) The mpi compiler you used.
>>> b) Which type of mpi you are using (openmpi, mvapich, intel mpi etc)
>>> c) The parallel compilation options.
>>>
>>>  N.B., a useful resource is
>>> https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
>>>
>>>  N.N.B., ulimit -s is not needed, this is (now) done in the software.
>>>
>>> On Tue, Jul 28, 2015 at 2:22 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:
>>>
>>>>  Dear Professor Marks:
>>>>
>>>>
>>>>  I've check everything you have mentioned, they are all fine,
>>>> nevertheless it still don't work. I think the input files are ok since I
>>>> have no problem running in non-parallel mode.
>>>>
>>>> I tried to make the supercell smaller (2x1x1), then it works. However,
>>>> I don't know why that happens.
>>>>
>>>> By the way, I have "ulimit -s unlimited " in my .bashrc file. I'v also
>>>> adjusted the RKMAX and RMT before.
>>>>
>>>>
>>>>  Sincerely
>>>>
>>>> Wangwei Lan
>>>>
>>>>
>>>>
>>>>
>>>>  ------------------------------
>>>> *From:* wien-bounces at zeus.theochem.tuwien.ac.at <
>>>> wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Laurence Marks <
>>>> L-marks at northwestern.edu>
>>>> *Sent:* Tuesday, July 28, 2015 13:09
>>>> *To:* A Mailing list for WIEN2k users
>>>> *Subject:* Re: [Wien] Segmentation fault in Supercell Calculation
>>>>
>>>>  You have what is called a "Segmentation Violation" which was detected
>>>> by 4 of the nodes and they called an error handler which stopped the mpi
>>>> job on all the CPU's.
>>>>
>>>>  This is normally because you have an error of some sort in your input
>>>> files, any of case.in0, case.clmsum (and clmup/dn if you are using spin
>>>> polarized).
>>>>
>>>>  1) Check that you do not have overlapping spheres and/or other
>>>> mistakes.
>>>> 2) Check your error files, e.g. "cat *.error". Are any others (e.g.
>>>> dstart.error) not empty? Did you ignore an error during setup?
>>>> 3) Check the lapw0 output in case.output0* -- maybe shows what is wrong.
>>>>
>>>>  There are many possible sources, you have to find the specific one.
>>>>
>>>>
>>>> On Tue, Jul 28, 2015 at 12:57 PM, Lan, Wangwei <wl13c at my.fsu.edu>
>>>> wrote:
>>>>
>>>>>  Dear WIEN2k user:
>>>>>
>>>>>
>>>>>  I am using wien2k_14.2 on CentOS release 5.8. ifort version 12.1.3
>>>>> with MKL.
>>>>>
>>>>>
>>>>>
>>>>>  After generating a 2x2x1 supercell with 30 atoms, I tried to do the
>>>>> scf calculation. However, I got some errors. I'v attached it at the end of
>>>>> this email. My wien2k was installed correctly. It works well for other
>>>>> calculations. It also worked if I run non-parallel calculation for
>>>>> supercell. I'v searched the mail-list, but can't find any solutions. Could
>>>>> you give me a hint on how to solve the problem? Thank you very much.
>>>>>
>>>>>
>>>>>
>>>>>  Sincerely
>>>>>
>>>>> Wangwei Lan
>>>>>
>>>>>
>>>>>
>>>>>  On lapw0.error shows:
>>>>>
>>>>>
>>>>>
>>>>> 'Unknown' - SIGSEGV
>>>>>
>>>>>
>>>>>
>>>>>  On super.dayfile shows:
>>>>>
>>>>>
>>>>>  Child id           0 SIGSEGV
>>>>>
>>>>>  Child id           8 SIGSEGV
>>>>>
>>>>>  Child id          18 SIGSEGV
>>>>>
>>>>>  Child id          23 SIGSEGV
>>>>>
>>>>>  Child id          17 SIGSEGV
>>>>>
>>>>>
>>>>>
>>>>>  On Screen shows:
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>> w2k_dispatch_signal(): received: Segmentation fault
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> MPI_ABORT was invoked on rank 18 in communicator MPI_COMM_WORLD
>>>>>
>>>>> with errorcode 451782144.
>>>>>
>>>>>
>>>>>  NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>>>
>>>>> You may or may not see output from other processes, depending on
>>>>>
>>>>> exactly when Open MPI kills them.
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> mpirun has exited due to process rank 18 with PID 26388 on
>>>>>
>>>>> node corfu.magnet.fsu.edu exiting without calling "finalize". This may
>>>>>
>>>>> have caused other processes in the application to be
>>>>>
>>>>> terminated by signals sent by mpirun (as reported here).
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> [corfu.magnet.fsu.edu:26369] 23 more processes have sent help message
>>>>> help-mpi-api.txt / mpi-abort
>>>>>
>>>>> [corfu.magnet.fsu.edu:26369] Set MCA parameter
>>>>> "orte_base_help_aggregate" to 0 to see all help / error messages
>>>>>
>>>>>
>>>>>  >   stop error
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>>  Professor Laurence Marks
>>>> Department of Materials Science and Engineering
>>>> Northwestern University
>>>> www.numis.northwestern.edu
>>>> Corrosion in 4D: MURI4D.numis.northwestern.edu
>>>> Co-Editor, Acta Cryst A
>>>> "Research is to see what everybody else has seen, and to think what
>>>> nobody else has thought"
>>>> Albert Szent-Gyorgi
>>>>
>>>
>>>
>>>
>>>  --
>>>  Professor Laurence Marks
>>> Department of Materials Science and Engineering
>>> Northwestern University
>>> www.numis.northwestern.edu
>>> Corrosion in 4D: MURI4D.numis.northwestern.edu
>>> Co-Editor, Acta Cryst A
>>> "Research is to see what everybody else has seen, and to think what
>>> nobody else has thought"
>>> Albert Szent-Gyorgi
>>>
>>
>>
>>
>>  --
>>  Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu
>> Corrosion in 4D: MURI4D.numis.northwestern.edu
>> Co-Editor, Acta Cryst A
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>>
>
>
>
>  --
>  Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu
> Corrosion in 4D: MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150728/f8b2b720/attachment-0001.html>


More information about the Wien mailing list