Skip to content

Tool for converting raw stackoverflow data dump into csv/parquet format

Notifications You must be signed in to change notification settings

sakalouski/stackoverflow_xml_to_parquet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

stackoverflow_xml_to_parquet

An efficient tool for converting raw stackoverflow data dump into .csv format. The processing speed is around 50k rows/second for python csv conversion and around an order faster for scala spark solution.

The data is available here: https://archive.org/details/stackexchange

About

Tool for converting raw stackoverflow data dump into csv/parquet format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published