[Wien] Segmentation fault in Supercell Calculation

Laurence Marks L-marks at northwestern.edu
Tue Jul 28 21:30:24 CEST 2015


Does a simple "x lapw0" work, i.e. without mpi, for this specific case?

If it does then there is probably an error in how you have linked/compiled
the mpi versions. Please provide:

a) The mpi compiler you used.
b) Which type of mpi you are using (openmpi, mvapich, intel mpi etc)
c) The parallel compilation options.

N.B., a useful resource is
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

N.N.B., ulimit -s is not needed, this is (now) done in the software.

On Tue, Jul 28, 2015 at 2:22 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:

>  Dear Professor Marks:
>
>
>  I've check everything you have mentioned, they are all fine,
> nevertheless it still don't work. I think the input files are ok since I
> have no problem running in non-parallel mode.
>
> I tried to make the supercell smaller (2x1x1), then it works. However, I
> don't know why that happens.
>
> By the way, I have "ulimit -s unlimited " in my .bashrc file. I'v also adjusted
> the RKMAX and RMT before.
>
>
>  Sincerely
>
> Wangwei Lan
>
>
>
>
>  ------------------------------
> *From:* wien-bounces at zeus.theochem.tuwien.ac.at <
> wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Laurence Marks <
> L-marks at northwestern.edu>
> *Sent:* Tuesday, July 28, 2015 13:09
> *To:* A Mailing list for WIEN2k users
> *Subject:* Re: [Wien] Segmentation fault in Supercell Calculation
>
>  You have what is called a "Segmentation Violation" which was detected by
> 4 of the nodes and they called an error handler which stopped the mpi job
> on all the CPU's.
>
>  This is normally because you have an error of some sort in your input
> files, any of case.in0, case.clmsum (and clmup/dn if you are using spin
> polarized).
>
>  1) Check that you do not have overlapping spheres and/or other mistakes.
> 2) Check your error files, e.g. "cat *.error". Are any others (e.g.
> dstart.error) not empty? Did you ignore an error during setup?
> 3) Check the lapw0 output in case.output0* -- maybe shows what is wrong.
>
>  There are many possible sources, you have to find the specific one.
>
>
> On Tue, Jul 28, 2015 at 12:57 PM, Lan, Wangwei <wl13c at my.fsu.edu> wrote:
>
>>  Dear WIEN2k user:
>>
>>
>>  I am using wien2k_14.2 on CentOS release 5.8. ifort version 12.1.3 with
>> MKL.
>>
>>
>>
>>  After generating a 2x2x1 supercell with 30 atoms, I tried to do the scf
>> calculation. However, I got some errors. I'v attached it at the end of this
>> email. My wien2k was installed correctly. It works well for other
>> calculations. It also worked if I run non-parallel calculation for
>> supercell. I'v searched the mail-list, but can't find any solutions. Could
>> you give me a hint on how to solve the problem? Thank you very much.
>>
>>
>>
>>  Sincerely
>>
>> Wangwei Lan
>>
>>
>>
>>  On lapw0.error shows:
>>
>>
>>
>> 'Unknown' - SIGSEGV
>>
>>
>>
>>  On super.dayfile shows:
>>
>>
>>  Child id           0 SIGSEGV
>>
>>  Child id           8 SIGSEGV
>>
>>  Child id          18 SIGSEGV
>>
>>  Child id          23 SIGSEGV
>>
>>  Child id          17 SIGSEGV
>>
>>
>>
>>  On Screen shows:
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> w2k_dispatch_signal(): received: Segmentation fault
>>
>> --------------------------------------------------------------------------
>>
>> MPI_ABORT was invoked on rank 18 in communicator MPI_COMM_WORLD
>>
>> with errorcode 451782144.
>>
>>
>>  NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>
>> You may or may not see output from other processes, depending on
>>
>> exactly when Open MPI kills them.
>>
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun has exited due to process rank 18 with PID 26388 on
>>
>> node corfu.magnet.fsu.edu exiting without calling "finalize". This may
>>
>> have caused other processes in the application to be
>>
>> terminated by signals sent by mpirun (as reported here).
>>
>> --------------------------------------------------------------------------
>>
>> [corfu.magnet.fsu.edu:26369] 23 more processes have sent help message
>> help-mpi-api.txt / mpi-abort
>>
>> [corfu.magnet.fsu.edu:26369] Set MCA parameter
>> "orte_base_help_aggregate" to 0 to see all help / error messages
>>
>>
>>  >   stop error
>>
>>
>>
>>
>
>
>  --
>  Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu
> Corrosion in 4D: MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150728/8a072dd0/attachment.html>


More information about the Wien mailing list