Admittedly, batch jobs are not sexy or even “in fashion” compared to other technologies, but the fact remains that in a lot of organizations there is still a need for batch jobs, and especially for batch jobs to handle high volumes of data with excellent performance and robust error handling and recovery. After all, how many of us enjoy getting a phone call from work at 2 AM for a batch job production issue? Not me!
Over the course of my career, I have needed to write batch jobs multiple times to handle scenarios such as to process large batches of events (data was NOT available real time for a variety of reasons), to send thousands of orders for new vehicles to auto manufacturers, and to receive requests for driving record results (via batch based web services). To get started with batch jobs, I needed a scheduling tool.
In my experience writing batch jobs, there was no need for me to implement a batch scheduler because the client already had an existing batch scheduling tool. Most of my experience in actually writing the batch application required handling things such as:
- Business logic
- Transaction management
- File processing
- Exception handling
- Job retry / restart
Batch Job Implementation
There are several ways of implementing batch jobs to take the above matters into consideration:
- EJB client
- Custom batch framework implemented via servlet
- Spring Batch
The first two methods were far less than satisfactory for a number of reasons including:
- Only tied to specific implementation in container, such as websphere
- Poor transaction management: forced the jobs to run as one big transaction, which is not great when the same app server is used for online transactions at the same time
- Poor job control and status monitoring: if the database went down, job status was lost and had to be manually recovered
So after dealing with these and other frailties of custom batch frameworks, I went in search of something better, something that had all the features needed for high performance and high volume batch jobs. What I found was Spring Batch 2.0.
Spring Batch 2.0
I experimented with it by writing a “classic” batch job: reading from a file, manipulating the data, and finally saving data to a database. I found that Spring Batch provided many built-in components to take care of the “nuts and bolts” batch processing. Here are the ones that I used for my batch job:
- FlatFileItemReader: This component, used in conjunction with the LineMapper interface and related classes, allows you to specify the configuration and layout for a flat file. It will then handle reading the flat file without having to write a line of Java code. This component is also restartable.
- ItemProcessor: This interface allows you to implement your own component to perform data transformations such as converting state abbreviation codes to full state names, converting strings to integers, etc.
- JDBCBatchItemWriter: This component implements Spring Batch’s ItemWriter interface and greatly simplifies implementing JDBC batch. You pretty much just have to put the SQL code and the parameters into the XML configuration file and you’re done!
In addition to these three components, I had to “wire up” the Spring Batch configuration to layout what steps the job included, and most importantly, the “chunk size” (aka the commit count). The chunk size is used to indicate how often the batch job should commit to the database. For example, if the size is 50, the item reader and item processor will go through 50 rows before calling the item writer to process any SQL statements to update the database. If the job fails after this point, when it is restarted the Spring Batch framework will know to skip the records in the file that have already been written and pick up with the next “new” row.
I have touched very briefly on a few of Spring Batch’s simplest features, but there is much more to it than what I have covered.
So what do you think about the need for batch jobs in general? What do you think about Spring Batch? Do you know of other batch frameworks that you prefer instead?