Batch processing systems are regularly used to integrate data from multiple applications, usually developed and supported by different suppliers, and/or hosted on different software and hardware environments. JSR 352 (batch processing for the Java platform), which is part of the Java EE 7 platform, defines a programming model for batch applications and an execution environment on which to run and manage these batch processes. This presentation will show you how to create a batch processing system using the Java EE batch API. Topics covered will include batch processing architecture, developing Java EE batch jobs, the lifecycle of jobs, integration with message queues, and scalability and strength.
2. About Me
• JUG Leader do GUJavaSC
• http://gujavasc.org
• Twitter
• @rcandidosilva
• Contatos
• http://rodrigocandido.me
3. Agenda
• Conceitos
• Batch Domain Language
• Chunk vs. Batchlet
• Partitioned Step
• Flow, Split e Decision
• Listeners e Exceptions
• Execution
• Integration
• Demo
4. Porque Batch?
• É muito comum em aplicações
• Várias soluções "personalizadas"
• Produtos começaram a surgir
• Spring Batch
• WebSphere Compute Grid
• Ideal para sistemas ETL
5. Batch API
• Chunk / Batchlet
• Implementação de um Step
• Contexts
• Job e Step at runtime
• Persistência de metadados
• Listeners
• Callback lifecycle events
• Partitioning
• Processamento paralelo
6. Batch Domain Language
• Batch job XML definition
• Descreve os steps como um agrupamento de batch
artefacts
8. Chunk vs. Batchlet
• Implementam step dentro do job
• Chunk
• Encapsula padrão ETL
• Single Reader, Processor e Writer
• Executado por pedaços de dados (chunk)
• Chunk output é escrito unitariamente
• Batchlet
• Promove a execução de um único e simples processo
• Executado até o fim produzindo um código de retorno
10. Batchlet
@Named
public class MyBatchlet {
@Process
public String process() throws Exception {..}
@Stop
public void stopMe() throws Exception {..}
}
<step id="step1">
<batchlet ref="MyBatchlet"/>
</step>
public class MyBatchlet implements Batchlet {..}
11. Chunk
<step id="sendStatements">
<chunk reader="accountReader"
processor="accountProcessor"
writer="emailWriter" item-count="10"/>
</step>
@Named(“accountReader")
...implements ItemReader... {
public Account readItem() {
// read account using JPA
@Named(“accountProcessor")
...implements ItemProcessor... {
public Statement processItems(Account account) {
// read Account, return Statement
@Named(“emailWriter")
...implements ItemWriter... {
public void writeItems(List<Statements> statements) {
// use JavaMail to send email
• Step Job
12. Chunk
public interface ItemReader<T> {
public void open(Externalizable checkpoint);
public T readItem();
public Externalizable checkpointInfo();
public void close();
}
public interface ItemWriter<T> {
public void open(Externalizable checkpoint);
public void writeItems(List<T> items);
public Externalizable checkpointInfo();
public void close();
}
public interface ItemProcessor<T, R> {
public R processItem(T item);
}
13. Checkpoint
• Para tarefas intensivas, longo período de tempo
• Checkpoint/restart é bastante utilizado
• Basicamente…
• Armazena estado do ItemReader, ItemWriter
• Método chamados
• reader.checkpointInfo()
• writer.checkpointInfo()
public interface ItemReader<T> {
public void open(Externalizable checkpoint);
public Externalizable checkpointInfo();
}
public interface ItemWriter<T> {
public void open(Externalizable checkpoint);
public Externalizable checkpointInfo();
}
<chunk checkpoint-policy="item"
commit-interval="10" item-count="10">
14. Partitioned Step
• Step pode rodar particionado
• [N] instâncias do mesmo step em [N] Threads
• Uma partição por Thread
<step id="step1">
<chunk>
<partition>
<plan partitions="10" threads="2"/>
<reducer />
</partition>
</chunk>
</step>
15. Partitioned Step
• Partition Mapper
• Decide dinamicamente o número de partições
• Partition Plan
• Partition Reducer
• Demarca a unidade lógica de trabalho
• Partition Collector
• Enviar resultados de processamento das partições
• Partition Analyzer
• Ponto de controle e análise dos resultados enviados
16. Flow, Split e Decision
Flow
Step I
Task
Step II
Chunk
ItemReader
ItemWriter
Step III
Chunk
Deci-
sion
ItemReader
ItemWriter
Step IV
Chunk
ItemReader
ItemWriter
EndStart
ItemProcess
or
ItemProcess
or
ItemProcess
or
17. Flow
• Define a lista de steps a ser executado (unitário)
<flow id="flow-1" next="{flow, step, decision}-id">
<step id="flow_1_step_1">
</step>
<step id="flow_1_step_2">
</step>
</flow>
18. Split
• Define a lista de flows a serem executados (paralelo)
• Coletores e analisadores para monitoramento
<split >
<flow /> <!-- each flow runs on a separate thread -->
<flow />
</split>
24. • JobOperator
• Runtime interface para gerenciamento
• start, stop, restart
• JobRepository interface commands
• JobRepository
• Contém informações sobre os jobs
• Completos e em execução
JobOperator e Repository
25. Execution
• JobInstance
• Representação lógica de um job
runtime
• JobExecution
• Suporte clustering, segurança,
gerenciamento de recursos
• StepExecution
• Tentativa de rodar um step de um
job
26. Integration
• Suporte ao Java SE
• Application Server Runtime
• Suporte clustering, segurança, gerenciamento de recursos
• Dependency Injection com CDI
• XML descriptors
• META-INF/batch-jobs/myJob.xml
• Empacotamento
• JAR, WAR, EJB
27. Demo
• Java EE 7 Samples
• Diferentes exemplos de utilização Batch API
• https://github.com/javaee-samples/javaee7-samples/tree/master/batch