[Wien] OMP abort: stack overflow detected

Mahmoud Payami mpayami at aeoi.org.ir
Tue Aug 31 15:30:59 CEST 2004


Dears Laurence, Kevin, Steven, Torsten, 

Thanking all of you.
I found the solution for this problem:
In RedHat 10 Fedora Core II, there exists a new feature: Kernel Tuning. Going inside and checking the "Overcommit Memory" fixes the problem.
This feature is explained below:
--------------------------
Startup-> System Tools-> Kernel Tuning->Virtual Memory->Swapping:

Overcommit Memory:
 The following algorithm is used to decide if there's enough memory: if this option is checked, then there's always enough memory. This is a useful feature, since programs often malloc() huge amounts of memory 'just in case', while they only use a small part of it. Leaving this option unchecked will lead to the failure of such a huge malloc(), when in fact the system has enough memory for the program to run.

 On the other hand, enabling this feature can cause you to run out of memory and thrash the system to death, so large and/or important servers will want to set this disabled.
-----------------------------------------


  Dear Wien Users,

  I used ifort8+gcc+mkl70 with RH_10 to compile the new version Wien2k_04.8 on P4, 3.2GHz with 1GB RAM. Compilation was OK. 
  Using the default values for NMATMAX (5000) and NUME (1000), I encountered the "stack overflow" problem for the TiC example.
  However, when I decreased them (for all programs) to 2000 and 500, respectively, the TiC example was successfully performed.
  So, in the new version the "mixer" problem  I had before is fixed. But for parallel TiC, I get the following message complaining for stack and ifcor_meesage_cat:
  ------------------------------
   LAPW0 END
   LAPW1 END

  real  0m3.009s
  user  0m2.155s
  sys   0m0.440s
   LAPW1 END

  real  0m2.826s
  user  0m2.020s
  sys   0m0.460s
  Warning: No xauth data; using fake authentication data for X11 forwarding.
  /usr/X11R6/bin/xauth:  error in locking authority file /home/mahmoud/.Xauthority
  Warning: No xauth data; using fake authentication data for X11 forwarding.
  /usr/X11R6/bin/xauth:  error in locking authority file /home/mahmoud/.Xauthority
   LAPW1 END

  real  0m0.332s
  user  0m0.257s
  sys   0m0.063s
   LAPW1 END

  real  0m0.326s
  user  0m0.268s
  sys   0m0.057s
   LAPW1 END

  real  0m0.348s
  user  0m0.265s
  sys   0m0.057s
   LAPW1 END

  real  0m0.346s
  user  0m0.260s
  sys   0m0.060s
   LAPW1 END

  real  0m26.922s
  user  0m1.504s
  sys   0m0.454s
   LAPW1 END

  real  0m24.894s
  user  0m1.520s
  sys   0m0.430s
   LAPW1 END

  real  0m24.765s
  user  0m1.665s
  sys   0m0.432s
   LAPW1 END

  real  0m22.891s
  user  0m1.582s
  sys   0m0.464s
   LAPW1 END

  real  0m23.227s
  user  0m1.692s
  sys   0m0.451s
   LAPW1 END

  real  0m20.655s
  user  0m1.558s
  sys   0m0.422s
   LAPW1 END

  real  0m20.854s
  user  0m1.631s
  sys   0m0.409s
   LAPW1 END

  real  0m18.644s
  user  0m1.648s
  sys   0m0.415s
   LAPW1 END

  real  0m4.926s
  user  0m0.253s
  sys   0m0.064s
  LAPW2 - FERMI; weighs written
   LAPW2 END

  real  0m1.105s
  user  0m0.897s
  sys   0m0.105s
   LAPW2 END

  real  0m1.042s
  user  0m0.877s
  sys   0m0.117s
  Warning: No xauth data; using fake authentication data for X11 forwarding.
  /usr/X11R6/bin/xauth:  error in locking authority file /home/mahmoud/.Xauthority
   LAPW2 END

  real  0m0.187s
  user  0m0.106s
  sys   0m0.052s
   LAPW2 END

  real  0m0.158s
  user  0m0.093s
  sys   0m0.050s
   LAPW2 END

  real  0m0.226s
  user  0m0.118s
  sys   0m0.059s
   LAPW2 END

  real  0m0.169s
  user  0m0.098s
  sys   0m0.047s
   LAPW2 END

  real  0m0.196s
  user  0m0.100s
  sys   0m0.047s
  OMP abort: stack overflow detected (address = 0xfeddfec8) for OpenMP thread #0!

  forrtl: info: Fortran error message number is 76.
  forrtl: warning: Could not open message catalog: ifcore_msg.cat.
  forrtl: info: Check environment variable NLSPATH and protection of /usr/lib/ifcore_msg.cat.

  real  0m22.454s
  user  0m0.022s
  sys   0m0.032s
   LAPW2 END

  real  1m14.266s
  user  0m0.814s
  sys   0m0.259s
   LAPW2 END

  real  1m11.441s
  user  0m0.800s
  sys   0m0.289s
   LAPW2 END

  real  1m10.934s
  user  0m0.845s
  sys   0m0.333s
   LAPW2 END

  real  1m9.277s
  user  0m0.870s
  sys   0m0.345s
   LAPW2 END

  real  1m9.222s
  user  0m1.093s
  sys   0m0.309s
   LAPW2 END

  real  1m14.892s
  user  0m0.806s
  sys   0m0.240s
   LAPW2 END

  real  1m7.358s
  user  0m1.085s
  sys   0m0.316s
  cp: cannot stat `.in.tmp': No such file or directory
  rm: cannot remove `.in.tmp': No such file or directory
  rm: cannot remove `.in.tmp1': No such file or directory
  -------------------------------------------------------------

  I do not know how can I fix it.

  Any suggestions is highly appreciated.


  Best regards,

  Mahmoud Payami



   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20040831/1102b14f/attachment.html


More information about the Wien mailing list