BigDataFileConverter

A Python library for easily converting between popular big data file formats. Enables efficient data interchange and transformation for analytics workflows.

Overview

BigDataFileConverter provides a simple and scalable way to convert files between CSV, JSON, Parquet, Avro, ORC, and other columnar formats. This allows data to be ingested, stored, and queried in the optimal format for different use cases. The library utilizes Spark DataFrames under the hood, taking advantage of Spark's capabilities for distributed processing of large datasets. Functions are designed for both batch ETL jobs and interactive data exploration.

Getting Started

Install the package with pip install bigdata-file-converter and import functions as needed. Documentation and examples below demonstrate common usage patterns.

Key Features

Intuitive function interfaces for common conversion tasks
Leverages Spark for scalability and performance on large files
Support for major columnar formats like Parquet and ORC
Schema inference and validation where applicable
Options for compression, encoding, and other I/O settings
Easy to use from Python, Scala, or as part of ETL workflows
File Format Support

Conversion is currently supported between:

CSV
JSON
Parquet

More formats will be added over time based on demand. Contributions welcome!

Usage Examples:

python:
# Convert CSV to Parquet for storage
csv_to_parquet(input_file, output_dir)

# Convert JSON to Avro with a specified schema
json_to_avro(data_file, schema, output_path)

# Convert ORC to Parquet for analysis in Spark
orc_to_parquet(table, output_path)

See full documentation for all functions and parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
converters.py		converters.py
init.py		init.py
requirements.txt		requirements.txt
setup.py		setup.py
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigDataFileConverter

Overview

Getting Started

Key Features

Conversion is currently supported between:

Usage Examples:

About

Releases

Packages

Languages

License

anderdam/bigdata_file_converter

Folders and files

Latest commit

History

Repository files navigation

BigDataFileConverter

Overview

Getting Started

Key Features

Conversion is currently supported between:

Usage Examples:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages