Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a2-alescion-Nicholas-Alescio #29

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added PowerBI/pbi_viz.pbix
Binary file not shown.
139 changes: 24 additions & 115 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,51 @@
# 02-DataVis-5ways

Assignment 2 - Data Visualization, 5 Ways
Nicholas Alescio - Assignment 2 - Data Visualization, 5 Ways
===

Now that you have successfully made a "visualization" of shapes and lines using d3, your next assignment is to successfully make a *actual visualization*... 5 times.
In this project, five visualizations were created for one dataset. Three languages (JavaScript, Python, R) and two visualization tools (Flourish, PowerBI) were used in this project.

The goal of this project is to gain experience with as many data visualization libraries, languages, and tools as possible.
# R + ggplot2

I have provided a small dataset about cars, `cars-sample.csv`.
Each row contains a car and several variables about it, including miles-per-gallon, manufacturer, and more.

Your goal is to use 5 different tools to make the following chart:

![ggplot2](img/ggplot2.png)

These features should be preserved as much as possible in your replication:

- Data positioning: it should be a downward-trending scatterplot as shown. Weight should be on the x-axis and MPG on the y-axis.
- Scales: Note the scales do not start at 0.
- Axis ticks and labels: both axes are labeled and there are tick marks at 10, 20, 30, etcetera.
- Color mapping to Manufacturer.
- Size mapping to Weight.
- Opacity of circles set to 0.5 or 50%.

Other features are not required. This includes:

- The background grid.
- The legends.

Note that some software packages will make it **impossible** to perfectly preserve the above requirements.
Be sure to note where these deviate.

Improvements are also welcome as part of Technical and Design achievements.

Libraries, Tools, Languages
---

You are required to use 5 different tools or libraries.
Of the 5 tools, you must use at least 3 libraries (libraries require code of some kind).
This could be `Python, R, Javascript`, or `Java, Javascript, Matlab` or any other combination.
Dedicated tools (i.e. Excel) do not count towards the language requirement.

Otherwise, you should seek tools and libraries to fill out your 5.

Below are a few ideas. Do not limit yourself to this list!
Some may be difficult choices, like Matlab or SPSS, which require large installations, licenses, and occasionally difficult UIs.

I have marked a few that are strongly suggested.

- R + ggplot2 `<- definitely worth trying`
- Excel
- d3 `<- since the rest of the class uses this, we're requiring it`
- Matplotlib
- three.js `<- well, it's a 3d library. not really recommended, but could be "interesting"`
- p5js `<- good for playing around. not really a chart lib`
- Tableau
- Java 2d
- GNUplot
- Vega-lite <- `<- recently much better. look for the high level js implementations`
- Flourish <- `<- popular last year`
- PowerBI
- SPSS

You may write everything from scratch, or start with demo programs from books or the web.
If you do start with code that you found, please identify the source of the code in your README and, most importantly, make non-trivial changes to the code to make it your own so you really learn what you're doing.

Tips
---

- If you're using d3, key to this assignment is knowing how to load data.
You will likely use the [`d3.json` or `d3.csv` functions](https://github.com/mbostock/d3/wiki/Requests) to load the data you found.
Beware that these functions are *asynchronous*, meaning it's possible to "build" an empty visualization before the data actually loads.

- *For web languages like d3* Don't forget to run a local webserver when you're debugging.
See this [ebook](http://chimera.labs.oreilly.com/books/1230000000345/ch04.html#_setting_up_a_web_server) if you're stuck.


Readme Requirements
---

A good readme with screenshots and structured documentation is required for this project.
It should be possible to scroll through your readme to get an overview of all the tools and visualizations you produced.

- Each visualization should start with a top-level heading (e.g. `# d3`)
- Each visualization should include a screenshot. Put these in an `img` folder and link through the readme (markdown command: `![caption](img/<imgname>)`.
- Write a paragraph for each visualization tool you use. What was easy? Difficult? Where could you see the tool being useful in the future? Did you have to use any hacks or data manipulation to get the right chart?
R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.

Other Requirements
---
To visualized the cars dataset, I made use of ggplot2's `geom_point()` layer, with aesthetics functions for the color and size.

0. Your code should be forked from the GitHub repo.
1. Place all code, Excel sheets, etcetera in a named folder. For example, `r-ggplot, matlab, mathematica, excel` and so on.
2. Your writeup (readme.md in the repo) should also contain the following:
Since I had never used R before, it took me a decent amount of time just to figure out how to run a script... eventually I found good documentation and everything worked out!

- Description of the Technical achievements you attempted with this visualization.
- Some ideas include interaction, such as mousing over to see more detail about the point selected.
- Description of the Design achievements you attempted with this visualization.
- Some ideas include consistent color choice, font choice, element size (e.g. the size of the circles).
![ggplot2](img/ggplot2.png)

GitHub Details
---
# d3.js

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to fulfill the project requirements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.
D3 is a JavaScript library for visualizing data with HTML, SVG, and CSS.

Grading
---
The challenge here was finding a way to make everything actually appear in the location I wanted things to be (my circles kept rendering off the screen). Margins helped me tremendously with solving that issue. I also included some code to hide n/a values. D3, being a JavaScript library, only serves to make JavaScript more powerful and more flexible for any and all vizualization tasks.

Grades on a 120 point scale.
24 points will be based on your Technical and Design achievements, as explained in your readme.
![JavaScript D3](img/javascript-d3.png)

Make sure you include the files necessary to reproduce your plots.
You should structure these in folders if helpful.
We will choose some at random to run and test.
# Python + Seaborn

**NOTE: THE BELOW IS A SAMPLE ENTRY TO GET YOU STARTED ON YOUR README. YOU MAY DELETE THE ABOVE.**
Python is an interpreted language that can be used for pretty much any high-level task, popular due to its vast amounts of libraries. For this viz, the Seaborn library (based on matplotlib) was used. This viz was the most straightforward and caused me the least issues. The only complication I ran into was having to use the pandas library to import the csv, since it was being pulled locally and not over the internet. Because of Python's flexibility and general-purpose nature, I see it being useful for any simple vizualization task.

# R + ggplot2 + R Markdown
![Seaborn](img/python-seaborn.png)

R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.
R Markdown is a document format that compiles to HTML or PDF and allows you to include the output of R code directly in the document.
# PowerBI

To visualized the cars dataset, I made use of ggplot2's `geom_point()` layer, with aesthetics functions for the color and size.
PowerBI is a Microsoft tool used for creating vizualizations in the field of business intelligence. Expecting this to be the easiest of the five, PowerBI ended up providing the most issues for me. Issue number one was needing to clean the data in order to display MPG on the y-axis (and also changing the data type of the MPG column, which defaulted to "Text" because of N/A values. Issue number two was trying to figure out how to scale points based on Weight, which didn't end up working out too great because PowerBI doesn't like using individual data points for sizes in their "Scatter" viz.

While it takes time to find the correct documentation, these functions made the effort creating this chart minimal.
![PowerBI](img/powerbi.png)

![ggplot2](img/ggplot2.png)
# Flourish

# d3...
Flourish is a free, online data viz platform that allows users to quickly create interactive visualizations. This viz took me the least amount of time, and I found it to be very user-friendly (having no experience, it took me ~5 minutes to create the viz I was looking for).

(And so on...)
![Flourish](img/flourish.png)


## Technical Achievements
- **Proved P=NP**: Using a combination of...
- **Solved AI Forever**: ...
- **Automatic hiding of n/a values in d3**
- **Filtering out n/a values in PowerBI and reinterpreting datatype for MPG column**

### Design Achievements
- **Re-vamped Apple's Design Philosophy**: As demonstrated in my colorscheme...
- **Actually re-vamped Apple's Design Philosophy**: Haha just kidding, I'm not smart enough for that
10 changes: 10 additions & 0 deletions ggplot2/ggplot2.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
library(ggplot2)

# Run in the directory of this script

cars <- read.csv(file="../cars-sample.csv",head=TRUE,sep=",")

p <- ggplot(data=cars, mapping=aes(x=Weight, y=MPG, color=Manufacturer)) +
geom_point(aes(size=Weight), alpha=0.5)

print(p)
Binary file added img/flourish.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/ggplot2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/javascript-d3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/powerbi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/python-seaborn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
85 changes: 85 additions & 0 deletions javascript-d3/cars.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
<!-- example code from https://www.d3-graph-gallery.com/graph/scatter_basic.html was used as a guide for this viz -->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Cars-sample</title>
<!-- Load d3.js -->
<script src="https://d3js.org/d3.v6.min.js"></script>
</head>
<body>
<!-- Create a div where the graph will take place -->
<div id="car_viz">
<script>
// set the dimensions and margins of the graph
var margin = {top: 10, right: 30, bottom: 60, left: 60},
width = 690 - margin.left - margin.right,
height = 600 - margin.top - margin.bottom

// append the svg object to the body of the page
var svg = d3.select("#car_viz")
.append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform",
"translate(" + margin.left + "," + margin.top + ")")

// Add X axis label
svg.append("text")
.attr("y", height + margin.bottom/2)
.attr("x", width/2)
.style("text-anchor", "middle")
.text("Weight")

// Add X axis
var x = d3.scaleLinear()
.domain([1500, 5050])
.range([0, width])
svg.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x))

// Add Y axis label
svg.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 0 - margin.left/2)
.attr("x", 0 - (height/2))
.style("text-anchor", "middle")
.text("MPG")

// Add Y axis
var y = d3.scaleLinear()
.domain([8, 50])
.range([height, 0])
svg.append("g")
.call(d3.axisLeft(y))

d3.csv("/cars-sample.csv", function(data)
{
// Add dots
svg.append("circle")
.datum(data)
.attr("cx", function (d) { return x(d.Weight) } )
.attr("cy", function (d) { return y(d.MPG) } )
.attr("r", function (d) {return d.Weight/500} )
.style("fill", function(d)
{
switch(d.Manufacturer)
{
case "bmw": return "red"
case "ford": return "yellow"
case "honda": return "green"
case "mercedes": return "blue"
case "toyota": return "purple"
}
})
.style("opacity", 0.5)
.style("stroke", "black")
.style("visibility", function(d) { return ( (x(d.Weight) && y(d.MPG))? "visible" : "hidden" ) } ) // hide n/a values
})

</script>
</div>
</body>
</html>
13 changes: 13 additions & 0 deletions python-seaborn/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import os.path

def main():
print("hello!")
cars = pd.read_csv(os.path.dirname(__file__) + '/../cars-sample.csv')
sns.relplot(x="Weight", y="MPG", data=cars, hue="Manufacturer", size="Weight", alpha=0.5)
plt.show()

if __name__ == "__main__":
main()