International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Reviewer Referral Program
Get Membership Certificate
Current Issue
Publication Archive
Conference
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 16 Issue 1
2025
Indexing Partners
Data Quality using WAP Pattern for Data Pipelines
Author(s) | Arjun Reddy Lingala |
---|---|
Country | United States |
Abstract | One of the major problems in batch and real- time data pipelines in making sure data is accurate. Without data quality at every step of batch pipeline, it is tough to build reliable analytics and decision-making platforms. In this paper, we discuss an approach based on WAP (Write, Audit and Publish) pattern to improve data quality across each step of data pipeline processing. Many organizations use different batch processing approaches and WAP pattern facilitates all approaches systematic data transformation, validation and storage to mitigate quality issues such as inconsistencies, missing values, and outliers. WAP pattern eliminates downstream processing pipelines from consuming incorrect data and also eliminates re-processing of data to fix the incorrectness. The WAP pattern structures the data flow into three distinct phases: Write, where data is ingested and processed; Audit, where quality checks and validations are conducted; and Publish, where verified data is made available for downstream pipelines or frameworks. Usage of WAP pattern in real-world scenarios have enhanced traceability, accountability, and consistency through batch pipelines and also saves the compute by eliminating the need of re-running pipelines to address data incorrectness. This paper provides a modular approach to data quality that is adaptable to various pipeline structures, highlighting its practical relevance in data engineering workflows. |
Keywords | data quality, batch pipelines, data anamolies, audit- ing, fault tolerance, monitoring, observability, streaming pipelines |
Published In | Volume 13, Issue 4, October-December 2022 |
Published On | 2022-11-02 |
Cite This | Data Quality using WAP Pattern for Data Pipelines - Arjun Reddy Lingala - IJSAT Volume 13, Issue 4, October-December 2022. DOI 10.5281/zenodo.14288292 |
DOI | https://doi.org/10.5281/zenodo.14288292 |
Short DOI | https://doi.org/g8txsm |
Share this
doi
CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.