Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a2-alvaradoblancouribe-isabel-alvarado-blanco-uribe #36

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Binary file added .DS_Store
Binary file not shown.
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"python.pythonPath": "/usr/local/opt/[email protected]/bin/python3.9"
}
142 changes: 34 additions & 108 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,140 +3,66 @@
Assignment 2 - Data Visualization, 5 Ways
===

Now that you have successfully made a "visualization" of shapes and lines using d3, your next assignment is to successfully make a *actual visualization*... 5 times.

The goal of this project is to gain experience with as many data visualization libraries, languages, and tools as possible.

I have provided a small dataset about cars, `cars-sample.csv`.
Each row contains a car and several variables about it, including miles-per-gallon, manufacturer, and more.

Your goal is to use 5 different tools to make the following chart:
The attempted graph:

![ggplot2](img/ggplot2.png)

These features should be preserved as much as possible in your replication:

- Data positioning: it should be a downward-trending scatterplot as shown. Weight should be on the x-axis and MPG on the y-axis.
- Scales: Note the scales do not start at 0.
- Axis ticks and labels: both axes are labeled and there are tick marks at 10, 20, 30, etcetera.
- Color mapping to Manufacturer.
- Size mapping to Weight.
- Opacity of circles set to 0.5 or 50%.

Other features are not required. This includes:

- The background grid.
- The legends.

Note that some software packages will make it **impossible** to perfectly preserve the above requirements.
Be sure to note where these deviate.

Improvements are also welcome as part of Technical and Design achievements.

Libraries, Tools, Languages
---

You are required to use 5 different tools or libraries.
Of the 5 tools, you must use at least 3 libraries (libraries require code of some kind).
This could be `Python, R, Javascript`, or `Java, Javascript, Matlab` or any other combination.
Dedicated tools (i.e. Excel) do not count towards the language requirement.

Otherwise, you should seek tools and libraries to fill out your 5.

Below are a few ideas. Do not limit yourself to this list!
Some may be difficult choices, like Matlab or SPSS, which require large installations, licenses, and occasionally difficult UIs.

I have marked a few that are strongly suggested.

- R + ggplot2 `<- definitely worth trying`
- Excel
- d3 `<- since the rest of the class uses this, we're requiring it`
- Matplotlib
- three.js `<- well, it's a 3d library. not really recommended, but could be "interesting"`
- p5js `<- good for playing around. not really a chart lib`
- Tableau
- Java 2d
- GNUplot
- Vega-lite <- `<- recently much better. look for the high level js implementations`
- Flourish <- `<- popular last year`
- PowerBI
- SPSS

You may write everything from scratch, or start with demo programs from books or the web.
If you do start with code that you found, please identify the source of the code in your README and, most importantly, make non-trivial changes to the code to make it your own so you really learn what you're doing.
# R + ggplot2
I ran into a lot of bugs when trying to set up R and ggplot (and I also learned that most of the help out there is based on RStudio). However, once I got it up and running it was super simple to implement! Really surprised me. R seems to be the tool to use when you just want to visualize things and don't really care about incorporating it into a website. To visualize the data, I used ggplot2's geom_point

Tips
---

- If you're using d3, key to this assignment is knowing how to load data.
You will likely use the [`d3.json` or `d3.csv` functions](https://github.com/mbostock/d3/wiki/Requests) to load the data you found.
Beware that these functions are *asynchronous*, meaning it's possible to "build" an empty visualization before the data actually loads.

- *For web languages like d3* Don't forget to run a local webserver when you're debugging.
See this [ebook](http://chimera.labs.oreilly.com/books/1230000000345/ch04.html#_setting_up_a_web_server) if you're stuck.


Readme Requirements
---

A good readme with screenshots and structured documentation is required for this project.
It should be possible to scroll through your readme to get an overview of all the tools and visualizations you produced.
I relied on documentation and this website (https://365datascience.com/tutorials/r-tutorials/ggplot2-scatter-plot/) and https://www.datanovia.com/en/blog/how-to-create-a-bubble-chart-in-r-using-ggplot2/to implement the visualization
![ggplot2](img/ggplot2.png)

- Each visualization should start with a top-level heading (e.g. `# d3`)
- Each visualization should include a screenshot. Put these in an `img` folder and link through the readme (markdown command: `![caption](img/<imgname>)`.
- Write a paragraph for each visualization tool you use. What was easy? Difficult? Where could you see the tool being useful in the future? Did you have to use any hacks or data manipulation to get the right chart?
# Python + Altair + pandas
When I saw that Altair/Vega-lite had a Python version, I decided to give it a try! It did take a lot of debugging to get everything installed and ready to go, but it's kind of cool how you can just export the graph to an html file using Python. It also has a lot of possibilities for interactivity.

Other Requirements
---
Because this library was created for JavaScript, it was a little bit weird to use JS-like syntax for python.

0. Your code should be forked from the GitHub repo.
1. Place all code, Excel sheets, etcetera in a named folder. For example, `r-ggplot, matlab, mathematica, excel` and so on.
2. Your writeup (readme.md in the repo) should also contain the following:
To correctly load the csv file, I used to pandas library and I put the data I needed in a dictionary.

- Description of the Technical achievements you attempted with this visualization.
- Some ideas include interaction, such as mousing over to see more detail about the point selected.
- Description of the Design achievements you attempted with this visualization.
- Some ideas include consistent color choice, font choice, element size (e.g. the size of the circles).
I mostly used this website (https://www.geeksforgeeks.org/python-altair-scatter-plot/) and https://altair-viz.github.io/getting_started/starting.html to visualize the data. I also relied on the official documentation to figure out the scaling of the circles and to debug.

GitHub Details
---
![altair](img/altair-python.svg)

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to fulfill the project requirements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.
# Python + Matplotlib + pandas
Like with Altair, I put the data that was needed in a dictionary for easier use. I ran into some difficulties getting everything set up but once i got it to work it was pretty simple. However, I did run into problems trying to implement the colors (which I eventually figured out and helped me implement it in js). It seemes like Altair is the fancy way of doing visualizations but Matplotlib seems like it has the most resources out there for Python

Grading
---
I used this website (https://pythonspot.com/matplotlib-scatterplot/) and https://kanoki.org/2020/08/30/matplotlib-scatter-plot-color-by-category-in-python/ and https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.scatter.html

Grades on a 120 point scale.
24 points will be based on your Technical and Design achievements, as explained in your readme.
![matplotlib](img/python-matplotlib.png)

Make sure you include the files necessary to reproduce your plots.
You should structure these in folders if helpful.
We will choose some at random to run and test.
# d3.js
This is the graph that was made using d3.js. I used the D3.js documentation to create it. Having worked on this after having worked on it in Python, I am starting to appreciate some of the features that JavaScript has that Python does not. Getting the colors to work was a bit tricky but I got it to work because of the work that I did with Python.

**NOTE: THE BELOW IS A SAMPLE ENTRY TO GET YOU STARTED ON YOUR README. YOU MAY DELETE THE ABOVE.**
I used this website (https://www.d3-graph-gallery.com/graph/scatter_basic.html)

# R + ggplot2 + R Markdown
![d3](img/d3-js-pic.png)

R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.
R Markdown is a document format that compiles to HTML or PDF and allows you to include the output of R code directly in the document.
# Google Sheets + d3
I decided to explore how this would be done with Google Sheets instead of Excel. However, I quickly realized that you can't add transparency to colors in google sheets charts and you also can't change the radius of the circles. Which looks like this:

To visualized the cars dataset, I made use of ggplot2's `geom_point()` layer, with aesthetics functions for the color and size.
![google](img/chart-google.png)

While it takes time to find the correct documentation, these functions made the effort creating this chart minimal.
However, I did find that Google has documentation to create the charts with JavaScript (https://developers.google.com/chart/interactive/docs/gallery/bubblechart#javascript)! It was pretty cool and I am glad that I looked into it. It was fairly simple to add things to the graph, but I can see how it can get very difficult to do specific things to the graphs. It wasn't too bad to recreate the chart, but I can see how it can get confusing real quick if you want to create your own.

![ggplot2](img/ggplot2.png)
![google-upgrade](img/google.gif)

# d3...
# Tableau
It was suprisingly easy to make this graph with Tableau. Here's a screenshot of the settings I used and the graph I created:
![tableau](img/tableau.png)

(And so on...)
![tableau](img/tableau-2.png)

# Flourish
Flourish was also really easy to use! I can see how customizing parts of it might be more challenging though.
![flourish](img/flourish.png)

## Technical Achievements
- **Proved P=NP**: Using a combination of...
- **Solved AI Forever**: ...
- **Google Charts**: Used the Google Graph Documentation to create the chart to allow for the customization
- **Adding Interactivity**: When you hover over the google graph, you can see the specific attributes of the plot

### Design Achievements
- **Re-vamped Apple's Design Philosophy**: As demonstrated in my colorscheme...
- Trying my best to maintain the same color scheme in all of the graphs (except for the Google Graph update with JS) :)
Binary file added __pycache__/altair.cpython-39.pyc
Binary file not shown.
File renamed without changes.
12 changes: 12 additions & 0 deletions d3js/chart2.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!DOCTYPE html>
<html>
<head>

<script src="https://d3js.org/d3.v6.min.js"></script>

</head>
<body>
<div id="dataviz"></div>
</body>
<script src="d3-viz.js"></script>
</html>
58 changes: 58 additions & 0 deletions d3js/d3-viz.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
margin = {top: 10, right: 30, bottom: 30, left: 60},
width = 460 - margin.left - margin.right,
height = 400 - margin.top - margin.bottom;

var svg = d3.select("#dataviz")
.append("svg")
.attr("width", 5000)
.attr("height", 5000)
.append("g")
.attr("transform",
"translate(" + margin.left + "," + margin.top + ")");

d3.csv("cars-sample.csv").then(function(data){

var colors = {"bmw":"lightcoral","ford":"olive","honda":"green","mercedes":"cornflowerblue","toyota":"fuchsia"}

console.log(data)
// Add X axis
var x = d3.scaleLinear()
.domain([1500, 5000])
.range([ 0, width ]);
svg.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x));

// Add Y axis
var y = d3.scaleLinear()
.domain([8, 48])
.range([ height, 0]);
svg.append("g")
.call(d3.axisLeft(y));

// Add dots
svg.append('g')
.selectAll("dot")
.data(data)
.enter()
.append("circle")
.attr("cx", function (d) { return x(d.Weight); } )
.attr("cy", function (d) { return y(d.MPG); } )
.attr("r", function (d) { return d.Weight/700; } )
.attr("opacity", "0.5" )
.style("fill", function (d) { return colors[d.Manufacturer]; })

})

svg.append("text")
.attr("x", 200)
.attr("y", 400)
.style("text-anchor", "middle")
.text("Weight");

svg.append("text")
.attr("transform", "rotate(-90)")
.attr("x", -180)
.attr("y", -30)
.style("text-anchor", "middle")
.text("MPG");
98 changes: 98 additions & 0 deletions google-sheets/cars-sample.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
"","Car","Manufacturer","MPG","Cylinders","Displacement","Horsepower","Weight","Acceleration","Model.Year","Origin"
"5","torino","ford",17,8,302,140,3449,10.5,70,"American"
"6","galaxie 500","ford",15,8,429,198,4341,10,70,"American"
"13","torino (sw)","ford",NA,8,351,153,4034,11,70,"American"
"18","mustang boss 302","ford",NA,8,302,140,3353,8,70,"American"
"21","corona mark ii","toyota",24,4,113,95,2372,15,70,"Japanese"
"24","maverick","ford",21,6,200,85,2587,16,70,"American"
"30","2002","bmw",26,4,121,113,2234,12.5,70,"European"
"32","f250","ford",10,8,360,215,4615,14,70,"American"
"38","corona","toyota",25,4,113,95,2228,14,71,"Japanese"
"39","pinto","ford",25,4,98,NA,2046,19,71,"American"
"44","torino 500","ford",19,6,250,88,3302,15.5,71,"American"
"48","galaxie 500","ford",14,8,351,153,4154,13.5,71,"American"
"51","country squire (sw)","ford",13,8,400,170,4746,12,71,"American"
"56","mustang","ford",18,6,250,88,3139,14.5,71,"American"
"61","corolla 1200","toyota",31,4,71,65,1773,19,71,"Japanese"
"65","corona hardtop","toyota",24,4,113,95,2278,15.5,72,"Japanese"
"69","pinto runabout","ford",21,4,122,86,2226,16.5,72,"American"
"73","galaxie 500","ford",14,8,351,153,4129,13,72,"American"
"82","gran torino (sw)","ford",13,8,302,140,4294,16,72,"American"
"88","pinto (sw)","ford",22,4,122,86,2395,16,72,"American"
"90","corona mark ii (sw)","toyota",23,4,120,97,2506,14.5,72,"Japanese"
"92","corolla 1600 (sw)","toyota",27,4,97,88,2100,16.5,72,"Japanese"
"96","gran torino","ford",14,8,302,137,4042,14.5,73,"American"
"100","ltd","ford",13,8,351,158,4363,13,73,"American"
"108","maverick","ford",18,6,250,88,3021,16.5,73,"American"
"112","country","ford",12,8,400,167,4906,12.5,73,"American"
"116","carina","toyota",20,4,97,88,2279,19,73,"Japanese"
"120","pinto","ford",19,4,122,85,2310,18.5,73,"American"
"131","mark ii","toyota",20,6,156,122,2807,13.5,73,"Japanese"
"134","maverick","ford",21,6,200,NA,2875,17,74,"American"
"138","pinto","ford",26,4,122,80,2451,16.5,74,"American"
"139","corolla 1200","toyota",32,4,71,65,1836,21,74,"Japanese"
"144","gran torino","ford",16,8,302,140,4141,14,74,"American"
"147","gran torino (sw)","ford",14,8,302,140,4638,16,74,"American"
"152","corona","toyota",31,4,76,52,1649,16.5,74,"Japanese"
"157","civic","honda",24,4,120,97,2489,15,74,"Japanese"
"163","maverick","ford",15,6,250,72,3158,19.5,75,"American"
"167","ltd","ford",14,8,351,148,4657,13.5,75,"American"
"174","mustang ii","ford",13,8,302,129,3169,12,75,"American"
"175","corolla","toyota",29,4,97,75,2171,16,75,"Japanese"
"176","pinto","ford",23,4,140,83,2639,17,75,"American"
"179","corona","toyota",24,4,134,96,2702,13.5,75,"Japanese"
"182","pinto","ford",18,6,171,97,2984,14.5,75,"American"
"189","civic cvcc","honda",33,4,91,53,1795,17.5,75,"Japanese"
"198","gran torino","ford",14.5,8,351,152,4215,12.8,76,"American"
"201","maverick","ford",24,6,200,81,3012,17.6,76,"American"
"206","civic","honda",33,4,91,53,1795,17.4,76,"Japanese"
"208","granada ghia","ford",18,6,250,78,3574,21,76,"American"
"213","corolla","toyota",28,4,97,75,2155,16.4,76,"Japanese"
"214","pinto","ford",26.5,4,140,72,2565,13.6,76,"American"
"218","mark ii","toyota",19,6,156,108,2930,15.5,76,"Japanese"
"219","280s","mercedes",16.5,6,168,120,3820,16.7,76,"European"
"222","f108","ford",13,8,302,130,3870,15,76,"American"
"224","accord cvcc","honda",31.5,4,98,68,2045,18.5,77,"Japanese"
"236","granada","ford",18.5,6,250,98,3525,19,77,"American"
"240","thunderbird","ford",16,8,351,149,4335,14.5,77,"American"
"243","corolla liftback","toyota",26,4,97,75,2265,18.2,77,"Japanese"
"244","mustang ii 2+2","ford",25.5,4,140,89,2755,15.8,77,"American"
"250","320i","bmw",21.5,4,121,110,2600,12.8,77,"European"
"253","fiesta","ford",36.1,4,98,66,1800,14.4,78,"American"
"256","civic cvcc","honda",36.1,4,91,60,1800,16.4,78,"Japanese"
"262","fairmont (auto)","ford",20.2,6,200,85,2965,15.8,78,"American"
"263","fairmont (man)","ford",25.1,4,140,88,2720,15.4,78,"American"
"272","futura","ford",18.1,8,302,139,3205,11.2,78,"American"
"275","corona","toyota",27.5,4,134,95,2560,14.2,78,"Japanese"
"278","celica gt liftback","toyota",21.1,4,134,95,2515,14.8,78,"Japanese"
"287","accord lx","honda",29.5,4,98,68,2135,16.6,78,"Japanese"
"290","fairmont 4","ford",22.3,4,140,88,2890,17.3,79,"American"
"294","ltd landau","ford",17.6,8,302,129,3725,13.4,79,"American"
"298","country squire (sw)","ford",15.5,8,351,142,4054,14.3,79,"American"
"305","300d","mercedes",25.4,5,183,77,3530,20.1,79,"European"
"318","corolla tercel","toyota",38.1,4,89,60,1968,18.8,80,"Japanese"
"322","fairmont","ford",26.4,4,140,88,2870,18.1,80,"American"
"326","corona liftback","toyota",29.8,4,134,90,2711,15.5,80,"Japanese"
"329","corolla","toyota",32.2,4,108,75,2265,15.2,80,"Japanese"
"336","240d","mercedes",30,4,146,67,3250,21.8,80,"European"
"337","civic 1500 gl","honda",44.6,4,91,67,1850,13.8,80,"Japanese"
"344","mustang cobra","ford",23.6,4,140,NA,2905,14.3,80,"American"
"345","accord","honda",32.4,4,107,72,2290,17,80,"Japanese"
"351","starlet","toyota",39.1,4,79,58,1755,16.9,81,"Japanese"
"353","civic 1300","honda",35.1,4,81,60,1760,16.1,81,"Japanese"
"356","tercel","toyota",37.7,4,89,62,2050,17.3,81,"Japanese"
"359","escort 4w","ford",34.4,4,98,65,2045,16.2,81,"American"
"360","escort 2h","ford",29.9,4,98,65,2380,20.7,81,"American"
"363","prelude","honda",33.7,4,107,75,2210,14.4,81,"Japanese"
"364","corolla","toyota",32.4,4,108,75,2350,16.8,81,"Japanese"
"370","cressida","toyota",25.4,6,168,116,2900,12.6,81,"Japanese"
"374","granada gl","ford",20.2,6,200,88,3060,17.1,81,"American"
"382","fairmont futura","ford",24,4,140,92,2865,16.4,82,"American"
"390","accord","honda",36,4,107,75,2205,14.5,82,"Japanese"
"391","corolla","toyota",34,4,108,70,2245,16.9,82,"Japanese"
"392","civic","honda",38,4,91,67,1965,15,82,"Japanese"
"393","civic (auto)","honda",32,4,91,67,1965,15.7,82,"Japanese"
"398","granada l","ford",22,6,232,112,2835,14.7,82,"American"
"399","celica gt","toyota",32,4,144,96,2665,13.9,82,"Japanese"
"402","mustang gl","ford",27,4,140,86,2790,15.6,82,"American"
"405","ranger","ford",28,4,120,79,2625,18.6,82,"American"
Loading