2012년 3월 5일 월요일

[Technote] Using TPROF data to aid in debugging high CPU and performance degradation issues for an AIX JVM

Using TPROF data to aid in debugging high CPU and performance degradation issues for an AIX JVM


Problem(Abstract)

Processes or threads may consume unexpected CPU cycles. Output from the AIX tprof command can information about CPU utilization by process ID, thread, library, and function. This technote explores how to interpret tprof data to aid in debugging such issues in a WebSphere Application Server environment.

Resolving the problem

Data from the AIX tprof command is indispensable in debugging 100% CPU, high CPU usage, and performance degradation issues on WebSphere JVMs executing on the AIX operating system. In order to map threads in the tprof data to the Java™ stacks in Java cores, analysts will also need DBX data taken from the JVM process ID (PID). If the issue is related to CPU usage, the 100% CPU Mustgather technote will usually provide all the documentation needed to debug the problem.
The 100% CPU Mustgather technote contains a script that executes the tprof command with the following options:

tprof -skex sleep 60


The
–skex parameters configure the following profiling:
-s​Enables shared library profiling​
-k​Enables kernel profiling​
-e​Turns on kernel extension profiling​
-x ​Program ​Specifies the program to be executed by ​tprof​. Data collection stops when ​Program​ completes or ​trace​ is manually stopped with either ​trcoff​ or ​trcstop​

The tprof utility gathers CPU statistics while the sleep 60 command executes for 60 seconds. On an AIX 5.2 machine, the output from the script is stored in the sleep.prof file and will have the following format:

Process                           Freq  Total Kernel   User Shared  Other

=======                           ====  ===== ======   ==== ======  =====

wait                                 4  70.48  70.48   0.00   0.00   0.00

/usr/java131/jre/bin/java            7  25.04   0.01   0.00  25.02   0.01

/usr/bin/tprof                       2   0.15   0.02   0.00   0.13   0.00

/home/db2inst1/sqllib/bin/db2fm      7   0.11   0.08   0.00   0.03   0.00

/usr/bin/sh                         11   0.04   0.04   0.00   0.01   0.00

...

Process                   PID      TID  Total Kernel   User Shared  Other

=======                   ===      ===  ===== ======   ==== ======  =====

java131/jre/bin/java    20386    78913  25.00   0.00   0.00  25.00   0.00

wait                     1290     1291  23.79  23.79   0.00   0.00   0.00

wait                      516      517  23.60  23.60   0.00   0.00   0.00

wait                     1032     1033  21.64  21.64   0.00   0.00   0.00

wait                      774      775   1.47   1.47   0.00   0.00   0.00

/usr/bin/tprof          19884   112213   0.15   0.02   0.00   0.13   0.00

java131/jre/bin/java    12336    55499   0.01   0.00   0.00   0.00   0.00

java131/jre/bin/java    12336   117689   0.01   0.00   0.00   0.00   0.00



The first section presents a summation of CPU usage based on the process name and breaks the usage into the
Kernel, User, Shared, and Other library sections. In this example, the java131/jre/bin/java is consuming 25.04% of the CPU cycles during this period of time, most of which is used in the shared libraries. The “wait” process is using a total of 70.48% of the CPU, all of which is in the kernel.

The second section breaks down CPU usage by
PID. The “java131/jre/bin/java” PID 20386 is consuming 25% of the CPU in shared libraries. The Java PID 12336 is using very little CPU.

When debugging WebSphere Application Server issues, CPU usage from a Java process in the shared library generally indicates garbage collection (GC) activity. CPU usage in the
Other category usually indicates a problem in Java code. If tprof shows a suspicious process (non-wait) using CPU in the Kernel libraries, you can usually assume the usage is due to an AIX operating system issue.

The next section of data compiled by
 tprof provides a break down of CPU usage in each major category by Shared Object and Subroutine:


Total % For All Processes (SH-LIBs) = 25.22


Shared Object                                                         %

=============                                                    ======

/usr/WebSphere/AppServer/java/jre/bin/classic/libjvm.a            25.00

/usr/lib/libc.a[shr.o]                                             0.10

/usr/lib/libpthreads.a[shr_xpg5.o]                                 0.08

/usr/lib/libtrace.a[shr.o]                                         0.03

/usr/WebSphere/AppServer/java/jre/bin/libjitc.a                    0.00

/usr/lib/libC.a[shr.o]                                             0.00

/usr/WebSphere/AppServer/java/bin/../jre/bin/libzip.a              0.00


Profile: /usr/WebSphere/AppServer/java/jre/bin/classic/libjvm.a


Total % For All Processes (/usr/WebSphere/AppServer/java/jre/bin/classic/libjvm.a) = 25.00


Subroutine                                      %   Source              

==========                                   ====== ======              

.verifyHeapAfterCompaction0                   25.00 ov/st/msc/gc_trace.c

.xeCreateNativeFrame                           0.00 ov/xe/common/stack.c


Profile: /usr/lib/libc.a[shr.o]


Total % For All Processes (/usr/lib/libc.a[shr.o]) = 0.10


Subroutine                                      %   Source              

==========                                   ====== ======              

._rec_mutex_lock                               0.02 lib/libc/rec_mutex.c

.strpbrk                                       0.01 s/lib/libc/strpbrk.c

.strncpy                                       0.01 strncpy.s          

.__divi64                                      0.01 divi64.s            



From this data, you can determine the WebSphere Application Server library and the code being executed. In the above example, from the shared libraries category (SH-LIBS), you can see that
usr/WebSphere/AppServer/java/jre/bin/classic/libjvm.a is using 25% of the CPU. The verifyHeapAfterCompaction subroutine from the ov/st/msc/gc_trace.c source is responsible for all of that 25%. In this case, it appears that GC is responsible for 25% of the CPU usage during the period the tprof data was taken. This data can be invaluable in pinpointing the source of CPU usage.

Also note that the process name from the
Summary section of the tprof output shows the Java process as originating from the directory path /usr/java131/jre/bin/java instead of the /usr/WebSphere/AppServer/java/jre/bin shown in the Shared Objects and Subroutine sections:

Process                          Freq  Total Kernel   User Shared  Other

=======                          ====  ===== ======   ==== ======  =====

wait                                4  70.48  70.48   0.00   0.00   0.00

/usr/java131/jre/bin/java           7  29.04   0.01   0.00  29.02   0.01


vs.


Total % For All Processes (SH-LIBs) = 25.22


Shared Object                                                          %  

=============                                                     ======

/usr/WebSphere/AppServer/java/jre/bin/classic/libjvm.a             25.00


This is a defect in the
tprof utility addressed by the 5.2.0.51 release of the bos.perf.tools library.

On AIX 5.1 and earlier, tprof data is captured in
a _prof.all file and the CPU usage is not quantified as a percentage of the total but the number of ticks the processes use. The first section of the _prof.all file contains the CPU usage of each thread in the system processes:
        Process      PID      TID  Total Kernel   User Shared  Other
        =======      ===      ===  ===== ======   ==== ======  =====
           java    18932    49219   5940    117      0   4667   1156
           wait     1290     1291   4405   4405      0      0      0
           wait      774      775   3991   3991      0      0      0
           wait      516      517   3472   3472      0      0      0
  ...
  /usr/bin/grep    30300   100087      1      1      0      0      0
        =======      ===      ===  ===== ======   ==== ======  =====
          Total                    24005  15755     26   6083   2141


In this example, thread ID 49219 used 5940 of the total 24005 ticks during the sample period (about 25%).


The next section of data in the
_prof.all file contains the summation of CPU usage by process name. Once again, instead of percentage CPU, we see the number of ticks. In this case, the Java process used 8551 of the 24005 ticks:
          Process   FREQ  Total Kernel   User Shared  Other
          =======    ===  ===== ======   ==== ======  =====
             wait      4  15237  15237      0      0      0
             java     42   8551    376      0   6034   2141
      PatrolAgent      6     48     17     16     15      0
    ...
    /usr/bin/grep      1      1      1      0      0      0
          =======    ===  ===== ======   ==== ======  =====
            Total    110  24005  15755     26   6083   2141


Finally, the
_prof.all file also contains the breakdown of the CPU usage grouped by category (Shared, Other, etc.), Shared Object, and Subroutine. The format is similar to the one discussed above.

When debugging CPU usage issues, you should first inspect the tprof output to determine which processes/threads are consuming the CPU, usually concentrating on Java processes. Then, determine the libraries from which the process is using the CPU. If it is Java process using CPU from the shared libraries, you can quickly check the
Shared Objects and Subroutine sections to determine if GC is the culprit. This can then be verified by analyzing the verbose GC data if it is provided.

If the CPU is consumed from the
Other libraries, you should attempt to map the problem thread ID back to the javacore file to determine the source of the problem. You will need dbxtrace data taken from the JVM PID in order to do so. Detailed instructions for mapping the tprof data back to the javacores can be found in Correlating TPROF Output to a Specific Thread in the JAVACORE using DBXTrace -a Output

Related information

Mustgather: 100% CPU Usage on AIX Platforms
Correlating TPROF output to a Javacore Thread

댓글 없음:

댓글 쓰기