Admittedly, batch jobs are not sexy or even “in fashion” compared to other technologies, but the fact remains that in a lot of organizations there is still a need for batch jobs, and especially for batch jobs to handle high volumes of data with excellent performance and robust error handling and recovery. After all, how many […]
Admittedly, batch jobs are not sexy or even “in fashion” compared to other technologies, but the fact remains that in a lot of organizations there is still a need for batch jobs, and especially for batch jobs to handle high volumes of data with excellent performance and robust error handling and recovery. After all, how many of us enjoy getting a phone call from work at 2 AM for a batch job production issue? Not me!
Over the course of my career, I have needed to write batch jobs multiple times to handle scenarios such as to process large batches of events (data was NOT available real time for a variety of reasons), to send thousands of orders for new vehicles to auto manufacturers, and to receive requests for driving record results (via batch based web services). To get started with batch jobs, I needed a scheduling tool.
In my experience writing batch jobs, there was no need for me to implement a batch scheduler because the client already had an existing batch scheduling tool. Most of my experience in actually writing the batch application required handling things such as:
There are several ways of implementing batch jobs to take the above matters into consideration:
The first two methods were far less than satisfactory for a number of reasons including:
So after dealing with these and other frailties of custom batch frameworks, I went in search of something better, something that had all the features needed for high performance and high volume batch jobs. What I found was Spring Batch 2.0.
I experimented with it by writing a “classic” batch job: reading from a file, manipulating the data, and finally saving data to a database. I found that Spring Batch provided many built-in components to take care of the “nuts and bolts” batch processing. Here are the ones that I used for my batch job:
In addition to these three components, I had to “wire up” the Spring Batch configuration to layout what steps the job included, and most importantly, the “chunk size” (aka the commit count). The chunk size is used to indicate how often the batch job should commit to the database. For example, if the size is 50, the item reader and item processor will go through 50 rows before calling the item writer to process any SQL statements to update the database. If the job fails after this point, when it is restarted the Spring Batch framework will know to skip the records in the file that have already been written and pick up with the next “new” row.
I have touched very briefly on a few of Spring Batch’s simplest features, but there is much more to it than what I have covered.
So what do you think about the need for batch jobs in general? What do you think about Spring Batch? Do you know of other batch frameworks that you prefer instead?