UCSB IGERT Network Data Science Boot Camp (2016) materials
This is initially just my portion of the boot camp for setup and 2 hours of initial instruction.
Optional in brackets:
- Python: Anaconda distribution
- IDE: Rodeo like RStudio
- [R, Rstudio]
- Git, Github. [GitKraken]
- git vs github
- Github Features
- Exercises:
- setup Github account
- Exercise for git: clone, add (json for class directory), commit, push, pull request
- [Github Pages, Markdown, Rmarkdown, presentation, website]
Have notes, but do interactively
- Python console, basic calculator
- Jupyter Notebook
- everything is an object:
dir
- modules
- whitespace matters: loops, conditionals
- data types:
-
number (divide int vs float)
-
strings: real vs unicode
-
lists. list comprehension
-
dictionaries
-
Graph in NetworkX: networkx/graph.py at master · networkx/networkx:
The Graph class uses a dict-of-dict-of-dict data structure. The outer dict (node_dict) holds adjacency information keyed by node. The next dict (adjlist_dict) represents the adjacency information and holds edge data keyed by neighbor. The inner dict (edge_attr_dict) represents the edge data and holds edge attribute values keyed by attribute names.
-
dates. import modules
-
Graphs and networks¶ (github: simoninireland/cncp: complex networks, complex processes)
-
Tabular data (esp CSV)
-
csv — CSV File Reading and Writing — examples
- TODO: represent each row with own key,val vs each col has key: list of values
import csv
d = {} rdr = csv.reader(open('filename.csv', 'r')) d.keys = rdr.next() for row in rdr: k, v = row d[d.keys()] = v
-[pandas](http://pandas.pydata.org/pandas-docs/stable/) is well suited for "Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet"
- [Package overview — pandas 0.18.1 documentation](http://pandas.pydata.org/pandas-docs/stable/overview.html)
- [10 Minutes to pandas — pandas 0.18.1 documentation](http://pandas.pydata.org/pandas-docs/stable/10min.html)
- read csv (vs dic representation)
```python
dic = pd.Series.from_csv(filename, names=cols, header=None).to_dict()
Both projects rely on creation of simpler networks from a dense raster for various applications:
-
assessing spatial connectivity of habitats (Python)
- extract TIN with cumulative distance away from patches for determining distances away
- Keitt & Urban, Urban et al
-
ship routing applications to avoid whale strikes (R)
- increase density closer to shore
-
how to make sparse networks from:
- dens