2011년 12월 5일 월요일

[WebSphere] Crash on AIX produces no core or a truncated core

Crash on AIX produces no core or a truncated core 

Problem(Abstract)

This document outlines what needs to be done to ensure that a full core file is produced on AIX if WebSphere Application Server crashes.

Resolving the problem

Follow these directions in order, unless directed by IBM support.
NOTE: The settings below require a restart of your application server. If you also use a nodeagent to start your server(s), you will need to restart this as well. Changing ulimit settings additionally require the restart to occur on the same command line (or terminal) session.



1. Setting Ulimits

To set ulimits on the core and file sizes to unlimited, run these two commands as the user who starts the nodeagent and/or application server
ulimit -c unlimited​​
ulimit -f unlimited​​


You can run ulimit -a to verify current ulimit settings.

Ulimits can also be altered at a global level. See the FAQ for more information.



2. Configuring the Operating System for Full Core Generation

If you do not have access to the SMIT administration tool, the following flag can be set from the command line (as the root user):

To set full core generation:
chdev -a fullcore=true -lsys0 ​​


To verify full core is set:
lsattr -Elsys0 | grep full​​





Additional steps if still unable to capture a core file


3. Disable Signal Handlers

Sometimes a loaded library or external process can trap some signals, especially signal 3 and 11, which prevent any core file generation by the JVM.

    a. Disable MQ Signal Traps WebSphere MQ is known to trap a subset of signals that the JVM also uses. If you are using WebSphere MQ, or are not sure, simply add this environment variable to your configuration:
    name:  MQS_NO_SYNC_SIGNAL_HANDLING​​
    value: true​​
    b. Disable All Signal Handlers To force the operating system to handle all signals sent to the JVM process, you can disable all JVM signal handlers. For IBM SDK 5.0 and later, set this JVM argument:
    -Xrs​​NOTE: ​​On SDK 6.0, to prevent unintentional crashes due to SIGTRAP, clear the shared class cache by executing ​​<WAS_HOME>/bin/clearClassCache.sh​​
    For prior versions of the IBM SDK, set this environment variable:
    name: ​​ IBM_NOSIGHANDLER​​
    value: true   ​​




4. Disable Javacore Generation

On rare instances, disabling javacore generation will help produce a core file.
To disable, simply add the following environment variable:

name:​​  DISABLE_JAVADUMP​​
value:​​ true​​



5. Execute pdump.sh

In cases where core files are still not being produced, you can execute the attached script pdump.sh to extract information from the running process. This is especially helpful if you suspect the process is in a zombie state and does not respond to any signals.

You can download the latest version from this location:
ftp://ftp.software.ibm.com/aix/tools/debug/pdump.sh


    pdump.sh <Java_PID>​​



This will create a file pdump.java.###.txt file. Locate the line containing the string "sigcatch". If SEGV is listed in output, then the signal is being caught. Both SEGV and SIGSEGV represent signal 11.





Frequently Asked Questions (FAQ)


What happens if I do not have write permission in the profile's root directory, or the directory I am redirecting javacores, heapdumps, and system core files to?

This will result in a failure when writing these files to the system. The error may be recorded in the native_stderr.log.

Also make sure that you have enough free space on your file system.



Even with all ulimit settings set to unlimited, core files are truncated at 2GB?

This is a limitation on 32-bit processes. You can avoid this issue if you enable large file support on the operating system, or use a 64-bit version of WebSphere Application Server.
For the first workaround, use the Large File Enabled option when adding a journaled filesystem. Refer to AIX operating system documentation for additional details.

Additionally, running out of free space can cause file truncation.



Can I test my configuration to see if a core can be generated?

Yes you can simulate a crash by sending a signal 11 to the JVM process. This will terminate the process.

kill -11 PID​​

An alternative is to use the gencore command. This will produce a core file and will allow the process to continue running.

gencore PID​​


Are ulimit settings permanent?

No, they are temporary and last as long as the session is alive. Ulimits are set on a per user basis, and the settings are applied per session, such as a command-line window. If a brand new session is started, and is not spawned from the current session, the ulimits will load the defaults.



Can I set ulimit settings globally?
By editing the /etc/security/limits file, ulimit settings can be set globally.

In the stanza for the user that runs the process, set fsize = -1 and core = -1. Setting these values to -1 changes the setting to unlimited.




Where can core files be generated?

Normally found in the profile's root directory, but can be in a number of alternative locations. Try searching in these locations first:

If you cannot find a core file in any of these locations, search your entire machine for core* files.

Related information

댓글 없음:

댓글 쓰기