International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 16 Issue 2 April-June 2025 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Big Data File Formats: Evolution, Performance, and the Rise of Columnar Storage

Author(s) Sainath Muvva
Country United States
Abstract The unprecedented growth in data volume has necessitated more optimal storage mechanisms. This paper traces the evolution of big data file formats, from the well-known CSV/text to sophisticated serialization standards such as Avro, Parquet, and ORC. We analyze the constraints of conventional row-based formats and explore the emergence of columnar storage structures designed for high-performance analytics. Thispaper investigates how contemporary formats tackle key challenges in big data ecosystems, including data compression, schema adaptability, and query optimization, particularly in distributed computing environments. We examine the growing popularity of Avro, Parquet, and ORC within platforms like Hadoop and Spark, evaluating their influence on storage efficiency and data access speeds.
Keywords Big data, CSV, Avro, Parquet, ORC, data compression, schema flexibility, distributed computing, Hadoop, Spark
Published In Volume 11, Issue 3, July-September 2020
Published On 2020-09-02
Cite This Big Data File Formats: Evolution, Performance, and the Rise of Columnar Storage - Sainath Muvva - IJSAT Volume 11, Issue 3, July-September 2020. DOI 10.5281/zenodo.14474085
DOI https://doi.org/10.5281/zenodo.14474085
Short DOI https://doi.org/g8vmmt

Share this