
International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 16 Issue 2
2025
Indexing Partners



















Incremental Processing For Handling Late-Arriving Data in Batch Processing
Author(s) | Arjun Reddy Lingala |
---|---|
Country | United States |
Abstract | Batch processing systems often struggle with the challenge of handling late-arriving data [5], leading to incon- sistencies in analytical results and unnecessary computational overhead. This paper introduces an incremental processing [10] that efficiently incorporates late data into batch datasets by reducing user overhead of maintenance, avoid mistakes from users and saving computational overhead. Some of the data arriving into data warehouse often gets delayed due to multiple upstream issues like network outages, upstream delays, fixes from data reconciliation. Late-arriving data presents significant challenges in batch and real-time data processing environments which impact data accuracy [3], system efficiency, and overall analytics reliability. Unaccounted late arriving data can lead to incomplete and inaccurate analytical results and the dashboards generated from them represent incorrect metrics. It often re- quires late arriving data to be caught or raised with an alert and then the user who owns the ETL has to re-run the batch pipelines that are within the time or date range of the arrived data. This approach significantly reduces the computational over- head of batch reprocessing while ensuring data consistency and completeness. The proposed framework introduces a real-time detection mechanism that continuously monitors incoming data and identifies late records by comparing timestamps with pre- existing batch data. Once late data is detected, a targeted backfill strategy is applied, ensuring that only the affected time partitions starting from the hour of late data arrival until the current processing period are recomputed in sequential approach for a dataset that depends on order and historical information and only delta from the time of late arriving data is actioned upon for a dataset that doesn’t depend on the historical data. This selective reprocessing minimizes redundant computations and optimizes system performance with fault tolerance and scalability handling large volumes of late-arriving data in distributed environments. |
Keywords | Batch Processing, Late Data, Incremental processing, Distributed systems, ETL, Real-time, Intra-day pipelines, Scalability, Fault tolerance, Backfilling, Dashboards, Metrics |
Field | Engineering |
Published In | Volume 14, Issue 3, July-September 2023 |
Published On | 2023-07-05 |
Cite This | Incremental Processing For Handling Late-Arriving Data in Batch Processing - Arjun Reddy Lingala - IJSAT Volume 14, Issue 3, July-September 2023. DOI 10.71097/IJSAT.v14.i3.2266 |
DOI | https://doi.org/10.71097/IJSAT.v14.i3.2266 |
Short DOI | https://doi.org/g869wv |
Share this


CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
