International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 16 Issue 2 April-June 2025 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Parquet’s Columnar Storage Advantage: A Case Study in Big Data Analytics

Author(s) Pradeep Bhosale
Country United States
Abstract As enterprises increasingly rely on large-scale analytics to extract insights from data lakes and data warehouses, the choice of storage format has a profound impact on query performance, cost, and resource utilization. Apache Parquet, a popular columnar storage format, has gained widespread adoption in the big data ecosystem due to its efficient compression, encoding, and predicate pushdown capabilities. By storing data column-wise, Parquet reduces I/O, network transfer, and CPU overhead when analyzing selective subsets of large datasets. This paper provides a comprehensive examination of Parquet’s columnar architecture, comparing it to row-based formats and highlighting its benefits in terms of query acceleration, storage optimization, and seamless integration with modern analytical engines. Through architectural explanations, benchmarking results, code snippets, and real-world case studies, we illustrate how Parquet’s design principles translate into tangible performance gains in analytical workloads. We also present emerging best practices, discuss integration with query engines like Spark, Trino, and Presto, and consider future directions in columnar format evolution. By understanding Parquet’s advantages and applying its features judiciously, data engineers and architects can unlock faster, cheaper, and more flexible big data analytics.
Keywords Apache Parquet, Columnar Storage, Big Data Analytics, Data Lakes, Predicate Pushdown, Data Compression, Spark, Trino, Presto
Field Engineering
Published In Volume 15, Issue 2, April-June 2024
Published On 2024-04-10
Cite This Parquet’s Columnar Storage Advantage: A Case Study in Big Data Analytics - Pradeep Bhosale - IJSAT Volume 15, Issue 2, April-June 2024. DOI 10.5281/zenodo.14631461
DOI https://doi.org/10.5281/zenodo.14631461
Short DOI https://doi.org/g8zdqv

Share this