-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathWorkshopSetup.tex
63 lines (40 loc) · 4.29 KB
/
WorkshopSetup.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
%-----------------------------------------------------------------------------------%
\subsection*{Abstract}
An Introduction to Pandas
This tutorial will get you started with Pandas - a data analysis library for Python that is great for data preparation, joining, and ultimately generating well-formed, tabular data that's easy to use in a variety of visualization tools or (as we will see here) machine learning applications.
%------------%
Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing.
Use the IPython interactive shell as your primary development environment
Learn basic and advanced NumPy (Numerical Python) features
Get started with data analysis tools in the pandas library
Use high-performance tools to load, clean, transform, merge, and reshape data
Create scatter plots and static or interactive visualizations with matplotlib
Apply the pandas groupby facility to slice, dice, and summarize datasets
Measure data by points in time, whether it’s specific instances, fixed periods, or intervals
Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
%-----------------------------------------------------------------------------------%
\subsection*{Introduction}
Python has been one of the premier general scripting languages, and a major web development language.
Numerical and data analysis and scientific programming developed through the packages Numpy and Scipy,
which, along with the visualization package Matplotlib formed the basis for an open-source alternative to Matlab.
Numpy provided array objects, cross-language integration, linear algebra and other functionalities.
Scipy adds to this and provides optimization, linear algebra, optimization, statistics and basic image analysis
capabilities.
Matplotlib provides sophisticated 2-D and basic 3-D graphics capabilities with Matlab-like syntax.
Further recent development has resulted in a rather complete stack for data manipulation and analysis, that includes Sympy for symbolic mathematics, pandas for data structures and analysis, and IPython as an enhanced console and HTML notebook that also facilitates parallel computation.
%-----------------------------------------------------------------------------------%
\subsection*{Environment Setup}
First thing, we'll need a Python environment suitable for scientific and statistical computing. Assuming you already have Python installed (no? Well then get it! Python 2.7 is recommended), we'll need three packages.
You should install each in the order they appear here:
\begin{itemize}
\item numpy - (pronounced num-pie) Powerful numerical arrays. A foundational package for the two packages below.
\item scipy - (sigh-pie) Scientific, mathematical, and engineering package
\item scikit-learn - Easy to use machine learning library
\end{itemize}
Note: 64 bit versions of these libraries can be found here.
Click through the links above for the home pages of each project and get the installation for your operating system or, if you're running Linux, you can install from a package manager (pip). If you're on a Windows machine, it's easiest to install using the setup executables for scipy and scikit-learn rather than installing from a package manager.
I'd also highly recommend to setting up a decent Python development environment. You can certainly execute Python scripts from the command line, but it's a heck of a lot easier to use a proper environment with debugging support. I use PyDev, but even something like IPython is better than nothing.
%-----------------------------------------------------------------------------------%
\subsection*{references}
http://people.duke.edu/~ccc14/pcfb/analysis.html