International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 16 Issue 1 January-March 2025 Submit your research before last 3 days of March to publish your research paper in the issue of January-March.

Data Quality using WAP Pattern for Data Pipelines

Author(s) Arjun Reddy Lingala
Country United States
Abstract One of the major problems in batch and real- time data pipelines in making sure data is accurate. Without data quality at every step of batch pipeline, it is tough to build reliable analytics and decision-making platforms. In this paper, we discuss an approach based on WAP (Write, Audit and Publish) pattern to improve data quality across each step of data pipeline processing. Many organizations use different batch processing approaches and WAP pattern facilitates all approaches systematic data transformation, validation and storage to mitigate quality issues such as inconsistencies, missing values, and outliers. WAP pattern eliminates downstream processing pipelines from consuming incorrect data and also eliminates re-processing of data to fix the incorrectness. The WAP pattern structures the data flow into three distinct phases: Write, where data is ingested and processed; Audit, where quality checks and validations are conducted; and Publish, where verified data is made available for downstream pipelines or frameworks. Usage of WAP pattern in real-world scenarios have enhanced traceability, accountability, and consistency through batch pipelines and also saves the compute by eliminating the need of re-running pipelines to address data incorrectness. This paper provides a modular approach to data quality that is adaptable to various pipeline structures, highlighting its practical relevance in data engineering workflows.
Keywords data quality, batch pipelines, data anamolies, audit- ing, fault tolerance, monitoring, observability, streaming pipelines
Published In Volume 13, Issue 4, October-December 2022
Published On 2022-11-02
Cite This Data Quality using WAP Pattern for Data Pipelines - Arjun Reddy Lingala - IJSAT Volume 13, Issue 4, October-December 2022. DOI 10.5281/zenodo.14288292
DOI https://doi.org/10.5281/zenodo.14288292
Short DOI https://doi.org/g8txsm

Share this