title: Text Mining Bootcamp
place: 12.0.26 på KU Sønder Campus, Danmark
time: August 14-18/2017, 9 AM to 3 PM.
instructors: Peter Leonard (Yale University Library) & Kristoffer L. Nielbo (Interacting Minds Centre)
contact: [email protected]
- Install the Anaconda distribution of Python for your OS
- Read chapters 1-6 of Automate the Boring Stuff with Python
- Sweigart, A. (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press.
DAY 1: Programming with Python
Time | Content | Instructor |
---|---|---|
09:00-09:30 | Welcome & Setup | KLN |
09:30-10:30 | Text Analytics |
KLN |
10:30-11:00 | Analyzing Tabular Data |
KLN |
11:00-11:30 | Repeating Actions with Loops |
KLN |
11:30-12:00 | Storing Multiple Values in Lists |
KLN |
12:00-13:00 | Lunch | * |
13:00-13:30 | Analyzing Data from Multiple Files |
KLN |
13:30-14:00 | Making Choices |
KLN |
14:00-14:30 | Creating Functions |
KLN |
14:30-15:00 | Finish | KLN |
DAY 2: From Print to Probability
Time | Content | Instructor |
---|---|---|
09:00-09:30 | Welcome | KLN |
09:30-10:00 | Reading Unstructured Data |
KLN |
10:00-10:30 | Cleaning & Segmentation |
KLN |
10:30-11:00 | Free Play | KLN |
11:00-11:30 | Language Normalization |
KLN |
11:30-12:00 | Term Frequencies |
KLN |
12:00-13:00 | Lunch | * |
13:00-13:30 | Dispersion and Distributions |
KLN |
13:30-14:00 | Vector Space Representations |
KLN |
14:00-14:30 | Project hour | KLN |
14:30-15:00 | Project hour | KLN |
DAY 3: Time, Density, and Information
Time | Content | Instructor |
---|---|---|
09:00-09:30 | Welcome | KLN |
09:30-10:00 | Beyond Words |
KLN |
10:00-10:30 | Lexical Density |
KLN |
10:30-11:00 | Free Play | KLN |
11:00-11:30 | Readability |
KLN |
11:30-12:00 | Information |
KLN |
12:00-13:00 | Lunch | * |
13:00-13:30 | Sentiment vectors |
KLN |
13:30-14:00 | Sentiment vectors |
KLN |
14:00-14:30 | Project hour | PL & KLN |
14:30-15:00 | Project hour | PL & KLN |
DAY 4: Latent Variables and (Multiple) Relations
Time | Content | Instructor |
---|---|---|
09:00-09:30 | Welcome | PL |
09:30-10:00 | Network Analysis: Introduction | PL |
10:00-10:30 | Network Analysis: Textual/Literary Examples | PL |
10:30-11:00 | Free Play: Brainstorming Network Projects |
PL |
11:00-11:30 | Network Analysis: Building a Dataset | PL |
11:30-12:00 | Network Analysis: Tools - Gephi | PL |
12:00-13:00 | Lunch | * |
13:00-13:30 | Topic Modeling | PL |
13:30-14:00 | Topics Modeling Hands-On | PL |
14:00-14:30 | Project hour | PL |
14:30-15:00 | Project hour | PL |
DAY 5: Classification and Associations
topics: classification, document similarity, and word embedding
Time | Content | Instructor |
---|---|---|
09:00-09:30 | Statistical learning | KLN |
09:30-10:00 | Classification: Introduction |
KLN |
10:00-10:30 | Representation |
ins |
10:30-11:00 | Validation |
KLN |
11:00-11:30 | Optimization |
KLN |
11:30-12:00 | Free Play |
KLN |
12:00-13:00 | Lunch | * |
13:00-13:30 | Topic Modeling: Review | PL |
13:30-14:00 | Word Embedding: Demonstrations |
PL |
14:00-14:30 | Word Embedding: Hands-On | PL |
14:30-15:00 | Finish | * |