Tuning guide for dynamic cache and data replication service


This guide to tuning IBM WebSphere Application Server dynamic caching and the data replication service (DRS) can help you improve the performance of your Web solutions.


Purpose of the document and introduction
This document will guide WebSphere Application Server customers that use dynamic caching and data replication service (DRS) through the various tuning options and guidelines that are available. This document refers to dynamic caching as the WebSphere Application Server component that is responsible for providing the dynamic caching service of the Web container for servlet and Java™ Server Pages (JSP) caching, and the object level caching available through object cache instances. This component also drives the Distributed MAP (DMAP) API available in the WebSphere Application Server Enterprise offering. This document assumes that you are familiar with defining the cache policies using the cachespec.xml file, configuring the cache service, and the basic usage contexts of dynamic caching and DRS. This document builds on this base and guides the administrator and the application architect through guidelines for defining effective cache policies, and provides a reference for tuning dynamic cache and DRS in a production environment.

Dynamic caching policies
The caching policy that is set up for the application is critical in contributing to the savings in response time and providing better end user experience. Take care to specify the policies so that the correct content is served out of the cache, and recognize that it is not beneficial to cache all content indiscriminately. Be aware that caching should be used as a mechanism for improving performance and scalability of the solution and not as a fix to mask problems with the application or infrastructure. The cost of regenerating a response, in terms of CPU cycles that are needed and the critical resources that are accessed (such as the number of database queries that are executed), should be weighed against the reusability of the response within the window of time when the response will be valid. The reusability of the object should also be considered in terms of whether the object is specific to a user, session, store, or if it is a site wide or publicly reusable object.

A worksheet like the one that follows can form the basis of categorizing the candidate content to be cached and help in determining the effectiveness of caching:

Object to be cachedCategory

Degrees of connectivity
Dependencies / Variations
/ Relationships

Average size of object

Cost of generating the
Response generation time

Critical resource access

I/O wait time

Validity of the objectExpiration / TTL

Invalidation rate

PopularityFrequency of access

Reusability (User) / Session
/ Store / Public)

Business ValueRelative Importance

Dynamic cache tuning is much like tuning any other performance-enhancing component and is an iterative process. It should begin at application design, with guidelines from the application architect on what can and should not be cached. This is typically based on input from the requirement stage and knowledge of the application scenarios. This process is further refined through the development, validation, and production phase of the project. It is invaluable during validation and pre-production phases of development to monitor and collect data to understand and rectify the impact of cache policies and tuning on system behavior.

It is possible that you will need to run some projected workload without caching to determine values such as the cost of generating the object. Otherwise, you will have to rely on the intuition and experience of the application architect.

Guidelines for determining the effectiveness of caching should take into account the following:
  • The cost of generating a response should be greater than the maximum cache access time, where the maximum cache access time should factor in overhead for disk access, distribution policy, and so on.
  • The lower the validity of the object and response, the more likely that it will not be reused. This can result in larger latencies due to cache misses and cleanup overhead, than by simply not caching the object.
  • The objects with more popularity and business value should be assigned a higher relative priority.
  • The higher the degrees of connectivity of an object, the more costly it is to invalidate and evict the object from the cache. Take this into account when determining where to cache the object in terms of keeping it in the memory cache, disk cache, or distributing the object across the cluster.

The dynamic cache specification provides attributes that can be used to declare properties of the cached object such as timeout (in seconds), priority (LOWEST PRIORITY = 0, HIGHEST PRIORITY = 16), and inactivity, to affect the treatment of these cached objects.

Memory caching
Dynamic cache accesses and retrieves objects primarily from the memory cache. This cache keeps references to the cached objects and can be configured with limits on the number of entries that will be cached in memory. After the limit of entries that are specified for the memory cache is reached, adding additional entries in the cache will require that entries be evicted out of memory. Eviction of entries is based on how recently the evicted entry was last accessed, and the priority of the object that is inserted into the cache.

Choosing the size of the memory cache, in terms of the number of entries, should be done based on how much memory is available for caching. The average memory, in bytes, that is used by the system to reference a cached object with its dependency IDs can be computed as the average size of the object + the average size of the cache ID + k * ( the number of templates + dependency IDs that are associated with this object + 128) where k is 4 for 32-bit platforms and 8 for 64-bit platforms. The number of entries that are specified should be large enough to hold the cache entries that are associated with the popular or more frequently used categories. The memory cache, and therefore the memory dedicated for the cache, should be large enough to not only cache content belonging to categories that have higher business value, but also enough additional entries to form a working set in order to minimize the amount of thrashing due to Least Recent Used (LRU) eviction.

The Java Virtual Machine (JVM™) heap settings should also be set. The recommended setting for the JVM heap is to have 40% of free heap after caching. This tuning involves either increasing the size of JVM or reducing the size of the in-memory cache (or cache objects that require less memory). There are lots of trade offs here such as higher JVM causing longer GC. It is a fine balance that can only be determined with proper testing.

The cache attempts to clean up the expired entries from the memory cache in the background. By default, the daemon responsible for this cleanup will wake up every five seconds. This is sufficient for most deployments. On the other hand, this can probably be set higher for deployments that do have infrequent invalidation and possibly invalidate entries once a day. Again, if the deployment has a lot of automated or trigger-driven invalidation, this should be set lower.

Disk caching
Dynamic cache provides the option to cache content in disk when the content is evicted from the memory cache. It is highly recommended that the off-load directory be located on a separate disk or partition that is dedicated for caching. This enables better response times for the disk cache through reduced contention for disk space with application data and code on the file system where WebSphere Application Server is installed. The partition should be sized to be at least twice the expected volume of cached content.

The storage and access of objects from disk involves the serialization and deserialization of objects. This feature comes at a higher cost, and should be taken into consideration when deciding what content should be persisted to disk. It is possible to selectively cache content to the disk through cache policies that are defined in the cachespec.xml file, in particular the persist-to-disk property.

Disk cache cleanup and tuning
Objects that are in the disk cache are cleaned up when they are explicitly invalidated through either programmatic or policy-based invalidations, or when the objects expire. The process of cleaning up objects from the cache consists of updating the tables that host the dependency ID to cache ID mappings and template ID to cache ID mappings, in addition to freeing up disk space to the internal storage manager. The available space on the file system does not increase after the objects are deleted from the cache, as the space is reused by the storage manager so that it can be reused by other objects that are cached to the disk.

The disk cache cleanup is done in the background as a low priority thread to reduce contention for the disk from active request and response threads. The time to perform this cleanup, as reported in the logs, tracks the duration of the scan. With the low priority of the scan, it can take several minutes.

You can activate the disk cache cleanup once a day at a specified time by using the com.ibm.ws.cache.CacheConfig.htodCleanupHour system property To set any system property in the Application Server
  1. In the console, click through Application servers -> <your server> -> Process Definition -> Java Virtual Machine -> custom properties
  2. Click the 'New' button and declare the system property as the key and its value in the value field, which defaults to 0 (12:00 midnight), or you can specify the cleanup to run at a specific frequency (in minutes) by using the com.ibm.ws.cacheCacheConfig.htodCleanupFrequency system property. The disk cache cleanup occurs in two phases: scan and delete. In the scan phase, the algorithm identifies objects that have expired on disk. Since the cleanup algorithm is looking only for expired entries, cached objects without an expiration value (an expiration value of 0) will always remain on disk until explicitly invalidated The policy of never expiring objects should be reconsidered if disk space is an issue in the deployment. The delete phase returns disk space to the internal storage manager and ensures that all references to the object are correctly purged. Most large deployments that have a large amount of content on the disk typically choose to specify that cleanup occurs at a frequency that ranges from 30 minutes to a couple of hours, depending on the average expiration time of content in the cache.

You can optimize the disk cache cleanup for disk I/O by buffering the metadata that is associated with cached objects in memory. These auxiliary buffers can hold the dependency and template information for the objects so the object deletion time is decreased. Turn on this optimization by setting the com.ibm.ws.cache.CacheConfig.htodDelayOffload system property to true. You can tune the memory that is utilized by this optimization by setting the com.ibm.ws.cache.CacheConfig.htodDelayOffloadEntiresLimit system property to a value that specifies the maximum number of cache IDs that any dependency ID can map to in the auxiliary buffer. Any dependency that maps to more cache IDs than those specified using the htodDelayOffloadEntriesLimit are not buffered and are written to disk. Large deployments prefer to set this value to a value that approximates the total number of entries in the entire cache for optimal performance.

For more details related to Disk Cache Enhancement, please see the Technote for Disk Cache Enhancements:

Dynamic Cache replication using DRS
There are three primary replication settings for dynamic cache that control the amount and type of information, including the object name, the object value, and invalidation messages, that flows between servers:

With all share types, object invalidation messages are always sent to other servers to ensure that outdated information is never served to a user. In the case of SHARED_PUSH, the cached object and its ID are sent to all servers in the replication domain at the time that the object is placed in cache. This makes the object immediately available to the applications on other servers. It also speeds up application server performance at the expense of greater network traffic and additional I/O churn, in the case of objects that are cached in disk. With SHARED_PUSH_PULL, the cached object is kept locally to the server that created it, but the cache ID is shared with other servers. If a remote server needs the object, it requests the object by name from the creating server. With the NOT_SHARED policy, no objects or IDs are shared with the server, except when invalidated.

The NOT_SHARED policy is adequate for most cache deployments. You can use the SHARED_PUSH_PULL policy to optimize the performance by fetching the object from another server in the cluster at the cost of additional latency in the response time for the first miss. The object is cached locally so subsequent accesses are serviced locally. Use the SHARED_PUSH policy with care and only for specific objects that meet the criteria of requiring no additional latency for the first access and have the property of being infrequently invalidated.

An additional limitation with the SHARED_PUSH policy is that DRS has a size limit for Dynamic Cache batch updates to cached content and pushes them out to the cluster being replicated. The batch size defaults to 5 MB and is updated by setting the system property MAX_MESSAGE_SIZE to the size required. Set the system property with care, since it has implications on how fast DRS can replicate objects. If the maximum update size is increased then the replication domain time-out also needs to be increased to allow for the transfer of the larger objects. If an update exceeds this maximum size, the update is dropped and the objects referenced within the update will go out of sync with the rest of the cluster members.

In the PUSH replication mode WebSphere Application Server Dynamic Cache sends DRS messages that are large, which frequently causes the JVMs in a clustered environment to exhaust their heaps, resulting in OOM errors and heap dumps. We have fixed this in APAR PK32201 with a fix that makes Dynamic Cache batch these messages, sending only a few cache entries at a time in a message, resulting in smaller objects and helping these OOM issues. This DRS batch size can now be configured using the following custom properties
  • com.ibm.ws.cache.CacheConfig.cachePercentageWindow: Specifies a limit on the number of cache entries sent by DRS in terms of the percentage of total cache in memory. Default value: 2% of the number of entries in the cache Scope: configurable per cache instance
  • com.ibm.ws.cache.CacheConfig.cacheEntryWindow: Specifies a limit on the total number of cache entires sent by DRS in terms of number of entries. Default value: 50 entries Scope: configurable per cache instance
Before the PK32201 fix, all pushed entries were sent in one DRS message. Now, they are sent in batches determined by the above two properties, which default to 2 and 50, respectively. The least of the two values will be used to determine the batch update size in the PUSH replication mode. In most cases the default values for batchUpdateMilliseconds, cachePercentageWindow and cacheEntryWindow will suffice; however, in extreme cases the cacheEntryWindow needs to be set as low as 1 or 2 entries.

Dynamic Cache has also provided a way to control the frequency of the updates Dynamic Cache sends to DRS using the com.ibm.ws.cache.CacheConfig.batchUpdateMilliseconds custom property. This property specifies the batch update interval in milliseconds. This property applies to all cache instances irrespective of the replication mode. Reducing batchUpdateMilliseconds results in Dynamic Cache sending updates, and processing invalidations and new entries more frequently, which will reduce the overall DRS payload size. However, reducing batchUpdateMilliseconds also results in adding extra CPU processing overhead. Default value: 1000ms

DRS and replicators
In WebSphere Application Server V5.0 and V5.1, DRS uses replicators, defined in the Internal Replication Domains panel, to replicate objects across the cluster. Every application server does not need a replicator defined on the same node. The recommended policy is to have 1 replicator per 4 application servers. Divide the total number of application servers by 4 and plan to make that many replicators. If there are 16 application servers, then 4 replicators would be desired. Since these replicators would be managing the workload of 4 application servers, it would be best to configure the replicators on 4 systems that are not a part of the cluster. This way the replicator systems will be dedicated to replication of cache and the application servers will be dedicated to servicing application requests.

Use the following instructions to configure replicators if needed.
  1. Create a cluster using the cluster wizard.
    1. Check the box to create a replication domain, but
    2. Leave the replicator box unchecked for all members added to the cluster.

  2. Click Internal Replication Domains > your_domain > Replicator Entries > New
    1. Fill in the following configuration:
      Replicator name: Any string identifying this replicator
      Available server: Choose one nodes that will not be in the cluster
      Hostname: This is the hostname of the node
      Replicator port: Choose an unused port - our default is 7974
      Client port: Choose an unused port - our default is 7973
    2. Click OK and Save.
      Note: The replicator and client ports must not be the same.

  3. Repeat step 2 for each replicator.

  4. Enable cache replication on each Application Server in the cluster:
    1. Click Application Servers > your_server > Dynamic Cache Service > Enable cache replication
    2. Ensure the following for "Internal messaging server" configuration:
      Domain: Domain created through cluster wizard
      Replicator: Choose one of the replicators
    3. Click OK and Save.

  5. Repeat step 4 for each member in the cluster.

Nuggets from the field
  • Make dynamic cache part of design. Talk with business users and architects.
  • Discuss invalidation requirement scenarios and frequency.
  • Test for exceptions and privacy scenarios. Make sure nothing unexpected is cached.
  • Create cache specific test scenarios that are different than function test.
  • Monitor cache statistics in live site. Tuning is a continuous task.
  • Stay connected with latest dynamic cache fixes. Improve stability, performance, and gain new features.
