JSON Roller is a tool for flattening complex JSON data structures into tables. I know there are a lot of tools that perform this exact function, but json-roller is fast and native, it also features some other options for manipulating json on the fly! I often found myself in need of a tool to convert large chunks of json records into something i could share in an excel friendly format, this is the main purpose of json-roller.
In this document i refer to layers a lot, json-roller examines the depth of a document in order to decide what is a column and what is a row. Lets take a look at a small example.
{
"firstLayer1key": {
"firstLayer2key": {
"numbers": [ 1, 2, 3 ]
}
},
"anotherLayer1key": {
"anotherLayer2key": {
"numbers": [ 4, 5, 6 ]
}
}
}
Converting this to a table with the keys "WHAT","STUFF" would result in the following table:
$ json-roller -i keys.json -k WHAT,STUFF -m
WHAT | STUFF | numbers[0] | numbers[1] | numbers[2] |
---|---|---|---|---|
anotherLayer1key | anotherLayer2key | 4 | 5 | 6 |
firstLayer1key | firstLayer2key | 1 | 2 | 3 |
As you can see, "container" layers are treated as fields in the final output. When passing data into json-roller there are 3 formats to consider
- Singular root object as the entire file (a full scan and pivot will be performed like above)
- one object per line (each is treated as a row, columns are created as needed)
- a singular root array with a json object as each entry (each object is treated as a row, columns are created as needed)
Note: when inputing multiple files they are concatinated first and treated as above
usage: json-roller
JSON Roller v1.2: A tool for manipulating JSON and flattening complex
structures into a simple table, or converting csv data into JSON
Project Page - https://openstatic.org/projects/json-roller/
-?,--help Shows help
-c,--csv <filename.csv> Output Table CSV file (exclude
filename for STDOUT)
-d,--dissect Dissect JSON data into each
nested key value pair (STDOUT)
-e,--merge <spaces> Merge all input objects into a
single object (STDOUT)
optional numerical argument to
format the data using spaces
-f,--filter <column=value,column!=value> Comma seperated list of
filters (= != >= <= < >)
output data will be limited by
filters
-h,--html <filename.html> Output HTML file (exclude
filename for STDOUT)
-i,--input <filename> Input file .json or .csv only
use commas for multiple files
-j,--json <filename.json> Output Table as JSON Array
file (exclude filename for
STDOUT)
-k,--keys <key1,key2> Comma seperated list of keys
for nested structures. Used to
replace layer0key,layer1key or
provide keys for nesting
-m,--md <filename.md> Output Markdown Table file
(exclude filename for STDOUT)
-p,--properties <filename.ini> Dissect JSON data into
properties for each nested key
value pair (exclude filename
for STDOUT)
-t,--tsv <filename.tsv> Output Table TSV file (exclude
filename for STDOUT)
-u,--url <arg> URL to read json from
-v,--verbose Be Verbose
Input: people.json
[
{
"name": "Bob",
"id": 44432,
"age": 34
},
{
"name": "Mark",
"id": 14535,
"age": 31,
"extra": "stuff"
},
{
"name": "Erin",
"id": 43232,
"age": 27
}
]
So in order to turn this into a nice table we would run:
json-roller -i people.json -c people.csv
Output: people.csv
extra,name,id,age
,Bob,44432,34
stuff,Mark,14535,31
,Erin,43232,27
This tool really shines when dealing with more complex data structures, lets try another example that isnt as straight forward.
{
"IT Dept":
{
"4432": {
"name": "Bob",
"age": 34
},
"14535": {
"name": "Mark",
"age": 31,
"extra": "stuff"
},
"43232": {
"name": "Erin",
"age": 27
}
},
"Sales":
{
"387532": {
"name": "Karen",
"age": 22
},
"142348": {
"name": "Lenny",
"age": 25,
"extra": "stuff"
},
"531241": {
"name": "Joe",
"age": 30
}
}
}
Lets run the tool again with verbose logging enabled. This time we are going to output a Markdown file for ease of viewing.
json-roller -v -i people.json -m people.md
Root JSONObject detected
Columns Created: layer1key, extra, name, layer0key, age
Result (note if you are viewing this on my website, the markdown is probably being rendered to a table):
layer1key | extra | name | layer0key | age |
---|---|---|---|---|
387532 | Karen | Sales | 22 | |
142348 | stuff | Lenny | Sales | 25 |
531241 | Joe | Sales | 30 | |
43232 | Erin | IT Dept | 27 | |
4432 | Bob | IT Dept | 34 | |
14535 | stuff | Mark | IT Dept | 31 |
As you can see the data structure was still flattened, with new columns created for dept and user_id, (layer0key and layer1key). If the names of the keys are known in advance you can provide the -k option to add key names based on layer.
json-roller -v -i people.json -m people.md -k dept,user_id
Root JSONObject detected
Columns Created: user_id, extra, name, dept, age
Result:
user_id | extra | name | dept | age |
---|---|---|---|---|
387532 | Karen | Sales | 22 | |
142348 | stuff | Lenny | Sales | 25 |
531241 | Joe | Sales | 30 | |
43232 | Erin | IT Dept | 27 | |
4432 | Bob | IT Dept | 34 | |
14535 | stuff | Mark | IT Dept | 31 |
Using a test file movies.json (3.22 Megs, 3 layers of data)
$ time json-roller -i movies.json -c movies.csv
real 0m1.318s
user 0m1.000s
sys 0m0.313s
$ wc movies.csv
28796 173765 3130845 movies.csv
That's 28,796 records in 1.318 seconds! Told you it was fast.
Recently i've been working on some new features to really make this tool more powerful, I find myself thinking of ways to represent json in more piece-by-piece aspects. Which is why i added these features
Lets imagine you want to stream a large json object as individual json objects that can be later merged back together
$ json-roller -i movies.json -d | head -n 20
{"movies":[{"actors":"Alec Baldwin, Geena Davis, Annie McEnroe, Maurice Page"}]}
{"movies":[{"posterUrl":"https://images-na.ssl-images-amazon.com/images/M/MV5BMTUwODE3MDE0MV5BMl5BanBnXkFtZTgwNTk1MjI4MzE@._V1_SX300.jpg"}]}
{"movies":[{"year":"1988"}]}
{"movies":[{"plot":"A couple of recently deceased ghosts contract the services of a \"bio-exorcist\" in order to remove the obnoxious new owners of their house."}]}
{"movies":[{"genres":["Comedy"]}]}
{"movies":[{"genres":[null,"Fantasy"]}]}
{"movies":[{"director":"Tim Burton"}]}
{"movies":[{"runtime":"92"}]}
{"movies":[{"id":1}]}
{"movies":[{"title":"Beetlejuice"}]}
{"movies":[null,{"actors":"Richard Gere, Gregory Hines, Diane Lane, Lonette McKee"}]}
{"movies":[null,{"posterUrl":"https://images-na.ssl-images-amazon.com/images/M/MV5BMTU5ODAyNzA4OV5BMl5BanBnXkFtZTcwNzYwNTIzNA@@._V1_SX300.jpg"}]}
{"movies":[null,{"year":"1984"}]}
{"movies":[null,{"plot":"The Cotton Club was a famous night club in Harlem. The story follows the people that visited the club, those that ran it, and is peppered with the Jazz music that made it so famous."}]}
{"movies":[null,{"genres":["Crime"]}]}
{"movies":[null,{"genres":[null,"Drama"]}]}
{"movies":[null,{"genres":[null,null,"Music"]}]}
{"movies":[null,{"director":"Francis Ford Coppola"}]}
{"movies":[null,{"runtime":"127"}]}
{"movies":[null,{"id":2}]}
Each non-json object or array value is broken into its own object following the same structure as the original. Arrays are filled with nulls wherever index ordering needs to be preserved.
Another way to dissect json is as a bunch of properties
$ json-roller -i movies.json -p | sort | head -n 23
#Fri Jun 17 13:51:01 EDT 2022
#Generated by json-roller http://openstatic.org/projects/json-roller/
movies[0].actors=Alec Baldwin, Geena Davis, Annie McEnroe, Maurice Page
movies[0].director=Tim Burton
movies[0].genres[0]=Comedy
movies[0].genres[1]=Fantasy
movies[0].id=1
movies[0].plot=A couple of recently deceased ghosts contract the services of a "bio-exorcist" in order to remove the obnoxious new owners of their house.
movies[0].posterUrl=https\://images-na.ssl-images-amazon.com/images/M/MV5BMTUwODE3MDE0MV5BMl5BanBnXkFtZTgwNTk1MjI4MzE@._V1_SX300.jpg
movies[0].runtime=92
movies[0].title=Beetlejuice
movies[0].year=1988
movies[100].actors=Ralph Fiennes, Juliette Binoche, Willem Dafoe, Kristin Scott Thomas
movies[100].director=Anthony Minghella
movies[100].genres[0]=Drama
movies[100].genres[1]=Romance
movies[100].genres[2]=War
movies[100].id=101
movies[100].plot=At the close of WWII, a young nurse tends to a badly-burned plane crash victim. His past is shown in flashbacks, revealing an involvement in a fateful love affair.
movies[100].posterUrl=https\://images-na.ssl-images-amazon.com/images/M/MV5BNDg2OTcxNDE0OF5BMl5BanBnXkFtZTgwOTg2MDM0MDE@._V1_SX300.jpg
movies[100].runtime=162
movies[100].title=The English Patient
movies[100].year=1996
Lets say we took the output from the "-d" dissection above and saved it to a file. We can restore the original using the "-e" merge option this will take all input json objects and merge them into a signular json object. Latter objects will overwrite former objects fields if there is a conflict so ordering is important.
$ json-roller -i file.json -e 2
{"movies": [
{
"actors": "Alec Baldwin, Geena Davis, Annie McEnroe, Maurice Page",
"posterUrl": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTUwODE3MDE0MV5BMl5BanBnXkFtZTgwNTk1MjI4MzE@._V1_SX300.jpg",
"year": "1988",
"plot": "A couple of recently deceased ghosts contract the services of a \"bio-exorcist\" in order to remove the obnoxious new owners of their house.",
"genres": [
"Comedy",
"Fantasy"
],
"director": "Tim Burton",
"runtime": "92",
"id": 1,
"title": "Beetlejuice"
},
{
"actors": "Richard Gere, Gregory Hines, Diane Lane, Lonette McKee",
"posterUrl": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTU5ODAyNzA4OV5BMl5BanBnXkFtZTcwNzYwNTIzNA@@._V1_SX300.jpg",
"year": "1984",
"plot": "The Cotton Club was a famous night club in Harlem. The story follows the people that visited the club, those that ran it, and is peppered with the Jazz music that made it so famous.",
"genres": [
"Crime",
"Drama",
"Music"
],
"director": "Francis Ford Coppola",
"runtime": "127",
"id": 2
}
]}