WAS reference: [DevWorks] Modernized Java-based batch processing in WebSphere Application Server, Part 1: Introducing Modern Batch and the compute-intensive programming model

Modernized Java-based batch processing in WebSphere Application Server, Part 1: Introducing Modern Batch and the compute-intensive programming model

Introduction
Batch programs are a traditional and essential component of any enterprise IT landscape. The current development trend for dealing with batch processing is to leverage in-house Java skills for both online and batch programs to ensure:

Maximum re-use of implementation.
Easier development and maintenance, as the same sets of tools are used.
Consistency in enforcement of enterprise standards and quality of service.

IBM has developed solutions that provide a cohesive batch program management paradigm. The Modern Batch feature for IBM WebSphere Application Server (available in WebSphere Application Server V8, WebSphere Application Server V7.0 Feature Pack for Modern Batch, and IBM WebSphere Extended Deployment Compute Grid V8.0) provides a batch middleware framework that offers:

Container managed execution of batch jobs: Provides the structure and support function that Java batch applications require, and helps you avoid the “custom middleware trap.”
Job control interface: An XML file that describes the Java class files that are used in a batch step and the steps that are included in the batch job.
Job checkpoint and restart capability: Ability to create checkpoints on the basis of record count or time. This enables restarting a job from a known checkpoint.
Common batch data stream (BDS): Contains functions that abstract data into easily accessible record formats so that the batch programming can focus on the business functions rather than basic code that reads and writes the data.

Having such a framework in hand provides a welcome alternative to developing custom batch middleware, and permits developers to focus on achieving core business objectives. With Modern Batch, developing batch applications is reduced to simply writing the business logic for the job. This separation of concern between the business logic and the “plumbing” code is an important benefit of the batch framework. It enables a more efficient modularization of batch functions, which permits better re-use, and the ability to expose batch as a modular service.
Modern Batch support two batch programming paradigms:

Compute-intensive: For simple jobs that perform computationally intensive work and don’t require restart capability.
Transaction batch: For jobs that need a container-managed checkpoint and a restart mechanism. This enables batch jobs to be restarted from the last checkpoint if interrupted by a planned or unplanned outage.

This article looks at the compute-intensive model and presents a sample implementation using new functionality provided in IBM Rational Application Developer v8.
See Resources for more information on the importance of a batch platform, details on the Modern Batch middleware framework, and the role of WebSphere Extended Deployment Compute Grid.

Compute-intensive programming model
The compute-intensive programming consists of these elements:

Controller bean: A stateless session bean that enables the run time environment to control jobs for the application. The implementation of this stateless session bean (CIControllerBean) is provided by the application server.
Job Step Implementation class: The job step represents the business logic to be performed by the job. It is represented by an instance of a class that implements the com.ibm.websphere.ci.CIWork interface. The CIWork interface contains these following methods:
- run() method would be executed when the CI job runs.
- getProperties() and setProperties() methods are provided to get input values via properties from the client.
- release() method is invoked when the job needs to be discontinued by the client in middle of execution.
- isDaemon() method returns “true” if the work is long-lived versus short-lived.
xJCL file: An XML-based configuration file that is submitted to the job scheduler to run. The job scheduler uses information in this file to determine where and when the job runs, its inputs and outputs. The xJCL definition of a job is not part of the batch application.

Figure 1 shows the compute-intensive programming model. (This is a simplified version of the actual programming model, which will be discussed in detail in Part 2.)

Figure 1. Compute-intensive programming model

Figure 1. Compute-intensive programming model

To develop a compute-intensive job, then, you need to:

Define the xJCL file.
Create Java classes that implement the CIWork interface with the business logic to be performed for each job step.
Package the CIWork appropriately with the stateless session bean pointing to com.ibm.ws.ci.CIControllerBean as the implementation class.

Before building a sample compute-intensive job, it’s important to first understand how a compute-intensive application behaves during run time. In summary, what happens (see Resources for more details) is, the application server uses the xJCL file to find and then invoke the controller bean. The bean reads the xJCL file, and for each job step in the xJCL, the bean:

Instantiates the application CIWork object, specified by the class name element in the xJCL for the job step, using the no-argument constructor of the CIWork class.
Invokes the setProperties() method of the CIWork object to pass any properties defined in the xJCL for the job step.
Looks up the work manager defined in the deployment descriptor of the enterprise bean module, and uses it to asynchronously call the run() method of the CIWork object.

With this understanding of the programming model, let’s look at the steps to develop a compute intensive application.

Sample business scenario

Related products and versions

The example presented here was developed using Rational Application Developer V8.0 and deployed on WebSphere Application Server V8.0. Tooling Support for developing batch programming is also available with Rational Software Architect for WebSphere V8 and later. Run time support for Modern Batch is available in WebSphere Application Server V7.0.0.11 and later with the Modern Batch feature pack, and is available in V8 as an integrated component.

The business in this sample scenario is a financial organization that has many branches in different states. The organization’s clients submit applications to the branches for processing. The compute-intensive application example would generate a report that summarizes the number of applications from each state, plus other metrics for the organization.
To develop this application:

In Rational Application Developer, create a new batch project named dWSampleBatch by navigating to File > New > Batch Project. Click Finish when done (Figure 2). This also creates an EJB project that holds the stateless session bean and the EAR project.

Figure 2. Create a new batch project
Now that you have the projects setup, you need to create the job definition for this batch. Create the batch job definition by right-clicking on the xJCL folder in the new batch project you just created, then select New > Batch Job (Figure 3).

Figure 3. Create a new batch job
Choose Compute Intensive for the Job Type and enter SummaryReportJob for the Job Name (Figure 4). Click Next.

Figure 4. Create the xJCL file
On the Batch Step Creation panel, enter SingleStep as the name of the step and choose the default pre-defined CI Work for the job step pattern (Figure 5). The CI Work pattern ensures that the Job Step class implements the CIWork interface, as required by the compute-intensive programming model.

Figure 5. Create the batch step
Now, you need to create the implementation class for the CIWork interface. Create the SummaryReport class, which will implement CIWork and have the logic for the business requirement, which, in this case, is to create the summary report. Click the Create Class button to create the implementation class (Figure 5). Enter the details as shown in Figure 6 and click on Finish.

Figure 6. Create the batch step implementation class
You will return to the Batch Step Creation panel. The next step is to create the parameters for the batch job program, SummaryReport.java. Create the two Required Properties listed below by selecting Add (for each) and then Finish (Figure 7).
- InputFileLocation: Holds the location of the input file of data to be processed.
- OutputFileLocation: Holds the location of the output summary report file.
The Batch Job wizard ends and the SummaryReportJob.xml file (specified in step 3) is created in the xJCL folder.

Figure 7. Add required properties
The Required Properties created in the previous step can either be hardcoded with values or they can be passed through the xJCL file at run time. For this sample, they will be passed via the xJCL file. To achieve this, open the SummaryReportJob.xml file under the xJCL folder by double-clicking it. This will open the XML job definition file in the xJCL editor, as shown in Figure 8.

Figure 8. Editing xJCL file
You want to pass the values of the file locations through the xJCL at run time. You can do this using Substitution Properties, which enable you to create default name-value pairs that can be used in the xJCL. Create the Substitution Properties by clicking on Add and then choosing Substitution Properties in the Add Item dialog and click OK (Figure 9).

Figure 9. Add substitution properties
Add the properties listed below in the Substitution Properties dialog and click on Finish (Figure 10).
- inputFile: assign a default value of C:\\InputFile.txt
- outputFile: assign a default value of C:\\OutputFile.txt
Figure 10. Add substitution properties
In the xJCL editor, update the required properties values with the corresponding substitution properties (Figure 11):
- InputFileLocation to ${inputFile}
- OutputFileLocation to ${outputFile}
By doing this, you have now linked the Required Properties that you defined in the job definition with the substitution properties. This lets you pass the actual file locations at run time.

Figure 11. Updating required properties
You might have noticed that the Rational tooling generates the EJB and EAR project also. Review the EJB project to ensure that the resource reference is correctly set to CIWorkManager. Do this in the EJB Bindings editor by double-clicking ibm-ejb-jar-bnd.xml file under the EJB project (Figure 12). The batch job is now configured.

Figure 12. Validate resource reference
SummaryReport.java implements the business logic of reading the data file (InputFile.txt), preparing the report, and then writing it to the output file (OutputFile.txt). Use the SummaryReport.java file with the file included with this article for download to implement this business logic. Also, place the InputFile.txt file under the directory chosen for input file location so that the Summary program can read it. You are now ready to deploy and test the batch application.

Running the sample

Deployment options

Here, the sample application will be deployed and run from Rational Application Developer. In Part 2, when transaction batch is discussed, deployment on an application server will be demonstrated, along with calling the job from various interfaces, like the job management console, EJB client, web services client, or the command line.

To run the sample from Rational Application Developer:

Right-click on the dWSampleBatchEAR and select Run As > Run on Server. Select the server that you want to use and click Finish (Figure 13).

Figure 13. Run batch application on server
To submit the xJCL to the server runtime, right-click on the SummaryReportJob.xml file and choose Run As > Modern Batch Job (Figure 14).

Figure 14. Submit xJCL job
If Security is enabled on this server, check the box and enter a valid User ID and Password. If you have placed the inputFile.txt file in a different location other than C:\\InputFile.txt, update the location with new value. Click Run. The job is submitted to the server runtime and opens the Job log file in the Modern Batch Job Management Console (Figure 16).

Figure 15. Modify substitution properties

Figure 16. Job log
To view the logs of the jobs run, you can access the Modern Batch Job Management console by right clicking on the server runtime in the Server view and choosing Modern Batch Job Management Console, or by using the URL: http://<hostname>:<wc_defaulthost port>/jmc/console.jsp. The console is meant to manage batch jobs and has been purposefully kept separated from the WebSphere Application Server admin console, as operating a batch environment and managing a middleware infrastructure are two very different things. Figure 17 shows the Modern Batch Job Management Console, which provides many capabilities for managing jobs.

Figure 17. Modern Batch Job Management Console
When the run is complete, the submitted job should produce a summary report as a file at C:\\outputFile.txt, concluding the test.

Conclusion
The Modern Batch feature for WebSphere Application Server provides a robust batch framework that enables you to develop batch programs with minimum effort. As part of WebSphere Application Server, the reliability offered by WebSphere products is built into the solution. It provides a simple Java-based programming model enabling you to leverage your Java skills to build dependable batch programs without the need to reinvent the framework. It also gives IT managers an opportunity to move jobs to a managed WebSphere Application Server environment.
Part 2 will discuss the transaction batch programming model, and show with another working example.

Acknowledgements
The authors thank Edward McCarthy for reviewing this article and providing invaluable input.

Download

Description	Name	Size	Download method
Code sample	1203_narain_attachment.zip	642 KB	HTTP

Information about download methods

WAS reference

2013년 7월 16일 화요일

[DevWorks] Modernized Java-based batch processing in WebSphere Application Server, Part 1: Introducing Modern Batch and the compute-intensive programming model

Related products and versions

Deployment options

댓글 없음:

댓글 쓰기

referance site

가장 많이 본 글