Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Add short tutorial about extracting info from Excel files #210

Open
gvwilson opened this issue Dec 20, 2013 · 11 comments
Open

Add short tutorial about extracting info from Excel files #210

gvwilson opened this issue Dec 20, 2013 · 11 comments

Comments

@gvwilson
Copy link
Contributor

Lots of people have data in Excel; http://www.python-excel.org/ can read spreadsheets, and we can also show people the csv module.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 20, 2013

This would be a fun one. There's a lot of rich interaction with spreadsheets and other data sources using everything from NumPy to Pandas to the Excel modules you mention to even core Python. Sometimes I think our directory structure limits how we think about lessons, because this seems like a "data" "beginner" "Python" thing, but I wouldn't know where to put it so that it was obvious to somebody looking for a lesson on this sort of thing.

@wrightaprilm
Copy link

I do have a lesson on pandas, touching on Excel:
https://github.com/wrightaprilm/SSC_spr_2014/blob/master/materials/python_5.md
I haven't done a whole lot with the ipython notebook, though, so it might be nice to partner with someone to adapt the exercise. As written, the talking part takes about 35 minutes and about 15 min for the exercise.

@juliangarcia
Copy link
Contributor

I did my instruction training screencast on how to convert from excel to csv files using pandas. Maybe that helps?

The video is here: https://vimeo.com/74873085

@gvwilson
Copy link
Contributor Author

We should do this for each language we support (Python and R right now, MATLAB to come...)

@genevievekathleensmith
Copy link

Jonathan Frederic and I will try our hand at this in both Python and R!

@genevievekathleensmith
Copy link

I'd like to add myself and Jonathan as assignees to this issue.

@jdfreder
Copy link

I heard a ringing in my ear 😛

@jdfreder
Copy link

jdfreder commented Jul 4, 2014

Following up an email discussion with @genevievekathleensmith , I thought it would be a good idea to solidify an outline of the course before creating it. @genevievekathleensmith made a nice concept map:

2014-06-27 16 39 49

Taking from the concept map to organize a chronological outline:

Language specific

  • Motivation for using language X for data analysis instead of Excel.
  • Briefly explain why importing CSV files is preferred to XLS files directly.
  • Mention packages that would enable direct read of XLS files.

Applies to both R and Python tracks

  • Demonstrate how to export a spreadsheet to a CSV file.
  • Explain how the spreadsheet maps to the CSV file, including empty cells, multiple worksheets, etc...
  • Mention the difference between reading entire CSV into the memory or reading line-by-line as needed.

Language specific

  • Method 1: Read the entire file contents into the memory.
  • Parsing.
  • Method 2: Read contents as needed.
  • Parsing (possibly a reference to the already mentioned)

Bonus material, language specific

  • Using parsed data with popular data science packages (i.e. Pandas or NumPy).

What do you guys think of this outline? Am I far from the general idea of this course, or does something like this make sense? Am I over complicating things by describing both how to load the entire contents into the memory and only parts as needed?

@gvwilson
Copy link
Contributor Author

gvwilson commented Jul 4, 2014

Yay for the concept map!

@chendaniely
Copy link
Contributor

is the excel already in a 'dataset' format? meaning there aren't random bits of 'side calculations' somewhere on a sheet?

I use the XLConnect package in R to read in excel data. Not sure if this will address the issue (at least for the R material)

@genevievekathleensmith
Copy link

@jdfreder I made a stab at an R-based lesson. Let me know what you think (e.g. it is short on details, but meant to be covered in a short amount of time)! Thanks @chendaniely for the XLConnect suggestion!
https://github.com/genevievekathleensmith/instructor_training/blob/master/excel_data_into_R.Rmd

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants