-
Notifications
You must be signed in to change notification settings - Fork 4
/
rss.xml
1316 lines (1144 loc) · 127 KB
/
rss.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Python Project Night Challenges</title><link>https://chicagopython.github.io/</link><description>The Chicago Python User Group's coding workshops for Python Project Night.</description><atom:link href="https://chicagopython.github.io/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2020 <a href="mailto:[email protected]">Chicago Python User Group</a>
<a rel="license" href="https://www.gnu.org/licenses/gpl-3.0.en.html">
<img alt="Gnu Public License version 3.0"
style="border-width:0;"
src="https://www.gnu.org/graphics/gplv3-with-text-84x42.png"></a></copyright><lastBuildDate>Thu, 16 Jan 2020 02:57:03 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Fuzzy String Matching</title><link>https://chicagopython.github.io/posts/fuzzy-string-matching/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h3>Overview</h3>
<p>Data collection has and is rapidly expanding. However, data often isn’t submitted and/or collected without the required cleanliness or detail. At ChiPy we face the issue of trying to match Project Night attendees’ Meetup names with their legal names, which are needed for venue security. As a human it’s often easy to tell when names with slight variations match (Mike vs Michael; missing initials, etc), but trying to match hundreds of names one at a time is time consuming. Your job is to match the meetup and given names as accurately as possible using the fuzzy matching technique(s) of your choosing.</p>
<p>Background reading:<br>
- <a href="http://www.basistech.com/whitepapers/the-name-matching-you-need-EN.pdf">The Name Matching You Need: A Comparison of Name Matching Technologies</a><br>
- <a href="https://medium.com/bcggamma/an-ensemble-approach-to-large-scale-fuzzy-name-matching-b3e3fa124e3c">An Ensemble Approach to Large-Scale Fuzzy Name Matching</a><br>
- <a href="https://towardsdatascience.com/fuzzy-matching-at-scale-84f2bfd0c536">Fuzzy Matching at Scale</a> </p>
<p>Some Python libraries you might want to use:<br>
- <a href="https://pypi.org/project/fuzzywuzzy/">fuzzywuzzy</a><br>
- <a href="https://pypi.org/project/textdistance/">textdistance</a><br>
- <a href="https://github.com/Bergvca/string_grouper">string_grouper</a> </p>
<h3>Setup</h3>
<p>There is no existing repo for this project, and no requirements to install. All you need to start is the data, which can be downloaded <a href="https://drive.google.com/file/d/1WtW89K43Rwxq5ZM8Dyryv5EQgkkauOCF/view?usp=sharing">here</a>.</p>
<p>Feel free to work how you see fit. That said, we strongly recommend setting up a virtual environment.</p>
<p>If you are using Linux or OS X, run the following to create a new virtualenv:</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">venv</span>
<span class="k">source</span> <span class="n">venv</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">activate</span>
<span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">r</span> <span class="n">requirements</span><span class="p">.</span><span class="n">txt</span>
</pre>
<p>On Windows, instead run the following:</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">venv</span>
<span class="n">venv</span><span class="err">\</span><span class="n">Scripts</span><span class="err">\</span><span class="n">activate</span>
<span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">r</span> <span class="n">requirements</span><span class="p">.</span><span class="n">txt</span>
</pre>
<h3>So what should we do?</h3>
<p>The dataset has three columns:
- meetup_id: The unique Meetup identifier for each user.<br>
- meetup_name: The publicly available display name of the Meetup user.<br>
- given_names: The "actual" name of the attendee, as given as a form response via Meetup. </p>
<p>Each row in the dataset has the True matching name. In some cases, the meetup and given names match exactly, in some cases they don't. You won't need the meetup_id while actually attempting to match meetup and given names, but you can use it to validate your approach.</p>
<p>Some things you might want to consider along the way:</p>
<ol>
<li>Are all the names usable? Could a human uniquely identify matches?</li>
<li>What patterns can be identified in the data we're working with?</li>
<li>What is our true goal with matching? In other words, when evaluating our process' success, how do we balance ensuring someone has preregistered with not turning too many people away at the door? To that end, what's the right evaluation metric to choose?</li>
<li>How might our approach differ if instead of a couple hundred names we have 10,000, a million, or even a billion names to match?</li>
</ol>
<h3>Hints (for if you're stuck)</h3>
<p>One easy way to load the data is with pandas:</p>
<pre class="code literal-block"><span></span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
<span class="n">read_kwargs</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"header"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s2">"index_col"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s2">"skip_blank_lines"</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="s2">"names"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"meetup_names"</span><span class="p">,</span> <span class="s2">"given_names"</span><span class="p">]</span>
<span class="p">}</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">"fuzzy_names.csv"</span><span class="p">,</span> <span class="o">**</span><span class="n">read_kwargs</span><span class="p">)</span><span class="o">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="n">given_names</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s2">"given_names"</span><span class="p">]</span>
<span class="n">meetup_names</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s2">"meetup_names"</span><span class="p">]</span>
</pre>
<p>The Levenshtein algorithm is one of the more basic and popular algorithms for fuzzy string matching. It has a few useful Python implementations, but fuzzywuzzy is probably the most popular.</p>
<p>Sklearn has modules dedicated to evaluation metrics. One very simple metric to evaluate how your matching is going is accuracy. Try starting with <code>from sklearn.metrics import accuracy_score</code>.</p>
<p>Happy Developing!</p></div></description><guid>https://chicagopython.github.io/posts/fuzzy-string-matching/</guid><pubDate>Thu, 16 Jan 2020 11:00:00 GMT</pubDate></item><item><title>Build an API with Django REST Framework</title><link>https://chicagopython.github.io/posts/django-rest-framework/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h2>Build an API with Django REST Framework</h2>
<h3>Overview</h3>
<p>For this project, we will be creating a functioning REST API. REST APIs can help distribute useful information via GET requests, as well as post and alter databases in a user friendly fashion.</p>
<p>This project will revolve around using Django and Django's REST framework to build an API for the dataset of your choice. Django is a full web framework capable of handling both back and front end portions of a web app; and the Django team has created great resources to make setting up a Django app quick and easy.</p>
<p>While the project is structured around Django, feel free to use flask instead, if you're more comfortable.</p>
<h3>Environment Setup</h3>
<p>To avoid bloating of your primary working environment, we strongly recommend creating a virtual environment. The requirements.txt file includes the required packages, and the included versions have been tested for our needs - use different versions at your own risk.</p>
<p>We also strong recommend using <a href="https://atom.io/">Atom</a> or <a href="https://www.sublimetext.com/3">Sublime Text</a> as your text editor. This project has also NOT been tested using Jupyter Notebook, PyCharm,
Spider, or any other ide/text editor/programming environment.</p>
<ol>
<li>
<p>For this challenge you will need Python 3.7, pipenv, and git installed. If you're not familiar with pipenv, it's a packaing tool for Python that effectively replaced the pip+virtualenv+requirements.txt workflow. If you already have pip installed, the easiest way to install pipenv is with <code>pip install --user pipenv</code>; however, a better way for Mac/Linux Homebrew users is to instead run <code>brew install pipenv</code>. More options can be found <a href="https://pipenv-fork.readthedocs.io/en/latest/install.html#installing-pipenv">here</a>.</p>
</li>
<li>
<p>The project is in the ChiPy project night repo. If you do not have the repository already, run </p>
<p><code>git clone https://github.com/chicagopython/CodingWorkshops.git</code></p>
</li>
<li>
<p>Navigate to the folder for this challenge:</p>
<p><code>cd CodingWorkshops/problems/webdev/django_rest_framework_api</code></p>
</li>
<li>
<p>Run <code>pipenv install</code>, which will install all of the libraries we have recommended for this exercise.</p>
</li>
<li>After you've installed all of the libraries, run <code>pipenv shell</code>, which will turn on a virtual environment running Python 3.7.</li>
<li>To exit the pipenv shell when you are done, simply type <code>exit</code>.</li>
</ol>
<h3>Instructions</h3>
<h4>Find a Database</h4>
<ul>
<li>Before advancing, find a database that you wish to use for a REST API. It helps if the data is something you are interested in, but don't waste too much time on this part. <a href="https://www.kaggle.com/tags/databases">Kaggle</a> has a great selection of publicly available databases. If you are looking for something specific, Google has a stellar <a href="https://toolbox.google.com/datasetsearch">database search</a> feature.</li>
</ul>
<h4>Create Your First App</h4>
<ul>
<li>Create a Django app in a local directory of your choosing. Feel free to use the <a href="https://chicagopython.github.io/posts/django-rest-framework/(https:/docs.djangoproject.com/en/2.2/intro/tutorial01/)">Django tutorial</a> to accomplish this, but please don't call your app the standard Polls App. Create a unique application inside of your Django directory to handle your database and models. Make sure the application is configured in your settings.py file!</li>
</ul>
<h4>Create a Django Model</h4>
<ul>
<li>Create a Django model custom to your database. Feel free to take liberties like creating relational databases for your models. The model field types should match the intended fields of your database. Make sure to migrate your Django model when you are finished!</li>
</ul>
<h4>Configure the REST framework</h4>
<ul>
<li>Make sure you appropriately configure Django REST Framework in your settings.py file. If you forget this step, Django to recognize the add on.</li>
</ul>
<h4>Serialization</h4>
<ul>
<li>Before creating a url or view, serialize your data. This allows Django to render data into a JSON format. Make sure you designate the table (model) and fields (features) you wish to include in your REST API.</li>
</ul>
<h4>Create a View</h4>
<ul>
<li>Use the standard Django REST framework to create your Django view. Django REST framework allows you to interact with your API in both JSON and a preset interactive template. If you feel like going the extra mile, make your database queryable to gather the information you need.</li>
</ul>
<h4>Designating a URL</h4>
<ul>
<li>Finally, designate url addresses where your page views can be found. Make sure to create a URL scheme that makes sense to how the intended user will interact with your API.</li>
</ul>
<h4>Running your Server</h4>
<ul>
<li>At this point it is time to test your API. This can be accomplished by the manage.py runserver command. Django's default location is localhost:8000/. From there, follow the naming scheme you created in your urls. Feel free to play with your API by using those filterable features you created!</li>
</ul>
<h3>Useful Weblinks</h3>
<ul>
<li>
<p>Django Startup and Features</p>
<p>https://docs.djangoproject.com/en/2.2/intro/tutorial01/</p>
<p>https://docs.djangoproject.com/en/2.2/ref/applications/</p>
</li>
<li>
<p>Django Models</p>
<p>https://docs.djangoproject.com/en/2.2/ref/models/fields/</p>
<p>https://docs.djangoproject.com/en/2.2/topics/db/models/#automatic-primary-key-fields</p>
</li>
<li>
<p>Django REST Framework</p>
<p>https://www.django-rest-framework.org/#installation</p>
</li>
<li>
<p>Serialization</p>
<p>https://www.django-rest-framework.org/api-guide/serializers/#modelserializer</p>
<p>https://www.django-rest-framework.org/api-guide/serializers/#specifying-read-only-fields</p>
</li>
<li>
<p>Views and URLS</p>
<p>https://www.django-rest-framework.org/tutorial/quickstart/#views</p>
<p>https://www.django-rest-framework.org/tutorial/quickstart/#urls</p>
</li>
</ul></div></description><guid>https://chicagopython.github.io/posts/django-rest-framework/</guid><pubDate>Thu, 21 Nov 2019 11:00:00 GMT</pubDate></item><item><title>Predict Home Credit Defaults</title><link>https://chicagopython.github.io/posts/home-credit-default-risk/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h3>Overview</h3>
<p>Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders.</p>
<p>Tonight's project examines a dataset from a real bank that focuses on lending to people with little or no credit history. Their goal is to ensure that clients capable of repayment are not rejected. You will explore the dataset and make predictions whether someone will default or not, based on their application for a loan.</p>
<h3>Your Task</h3>
<p>Your goal is to train a binary classification model on the data in <code>default_risk_train_data.csv</code> that optimized area under the ROC curve between the predicted probability and the observed target. For each <code>SK_ID_CURR</code> in <code>default_risk_train_data.csv</code>, you must predict a probability for the TARGET variable. Your deliverable to the bank will be a CSV with predictions for each SK_ID_CURR in the test set.</p>
<h3>Setup</h3>
<ol>
<li>
<p>For this challenge you will need Python 3.7, pipenv, and git installed. If you're not familiar with pipenv, it's a packaing tool for Python that effectively replaced the pip+virtualenv+requirements.txt workflow. If you already have pip installed, the easiest way to install pipenv is with <code>pip install --user pipenv</code>; however, a better way for Mac/Linux Homebrew users is to instead run <code>brew install pipenv</code>. More options can be found <a href="https://pipenv-fork.readthedocs.io/en/latest/install.html#installing-pipenv">here</a>.</p>
</li>
<li>
<p>The project is in the ChiPy project night repo. If you do not have the repository already, run </p>
<p><code>git clone https://github.com/chicagopython/CodingWorkshops.git</code></p>
</li>
<li>
<p>Navigate to the folder for this challenge:</p>
<p><code>cd CodingWorkshops/problems/data_science/home_credit_default_risk</code></p>
</li>
<li>
<p>Run <code>pipenv install</code>, which will install all of the libraries we have recommended for this exercise.</p>
</li>
<li>After you've installed all of the libraries, run <code>pipenv shell</code>, which will turn on a virtual environment running Python 3.7.</li>
<li>From within the shell, run <code>jupyter lab default_risk.ipynb</code> to launch the pre-started notebook.</li>
<li>To exit the pipenv shell when you are done, simply type <code>exit</code>.</li>
</ol>
<h3>What's in this repository?</h3>
<p>There are three data files, one metadata file, and a jupyter notebook.</p>
<ol>
<li>default_risk_train_data.csv -- The data you will use to train your models. Includes all potential features and the target.</li>
<li>default_risk_test_data.csv -- The data you will use to test your models. Includes all potential features, but NOT the target (which theoretically reflect unknown future default status).</li>
<li>perfect_deliverable.csv -- The CSV with perfect predictions for each SK_ID_CURR in the test set. You should only use this at the very end to test the model and NEVER factor it into training your model. To prevent overfitting, you should test models sparingly. This is the same format the final deliverable should be submitted to the bank in.</li>
<li>default_risk_column_descriptions.csv -- Descriptive metadata for the columns found in the train and test datasets.</li>
<li>default_risk.ipynb -- The jupyer notebook where all coding should be completed, unless you opt to work in a different environment.</li>
</ol>
<p>This project is based on a Kaggle competition, with a subset of the data provided for the sake of download size. Note that this data has not been cleaned for you, and you should expect to deal with real world data issues, such as missing values, bad values, class imbalances, etc.</p>
<h3>So what should we do?</h3>
<p>To successfully complete this challenge, you'll need to:
1. become an expert on the data,
2. clean the data,
3. engineer the features for your model(s),
4. test/validate your models,
5. generate the deliverable the bank expects.</p>
<p>Here are some tips/questions to consider along the way:
- Identify which columns are numerical and which are categorical
- Which columns are missing values, and what should be done about the missing values?
- Which features are relevant and why?
- Which features might you want to remove?
- What new features might you create?
- How will you deal with categorical data (e.g. Label Encoding, One-Hot encoding, etc).
- Is there any class imbalance?
- What models will you try? sklearn has been installed in your environment; and <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html">linear regression</a>, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression">logistic regression</a>, and <a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier">random forest</a> models have been imported in the given notebook. Feel free, however, to use the library/models of your choice.</p></div></description><guid>https://chicagopython.github.io/posts/home-credit-default-risk/</guid><pubDate>Thu, 17 Oct 2019 11:00:00 GMT</pubDate></item><item><title>Make a Game</title><link>https://chicagopython.github.io/posts/make-a-game/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h3>Overview</h3>
<p>For a long time, computer games made use of few, if any, graphics. Many of them were text based adventures that you could run directly on your command line. Some examples included:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Zork">Zork</a></li>
<li>Adventureland</li>
<li><a href="https://en.wikipedia.org/wiki/Dwarf_Fortress">Dwarf Fortress</a></li>
</ul>
<p>and many others. Players would input their directions using words and the computer would return back what happened. </p>
<h3>Your Task</h3>
<p>Your task for this evening is to, working together, create something fun to play! Your group will take turns typing (in other words, one computer per group and only one person typing at a time) and helping to develop (offering ideas, thoughts on what to do next, etc.). It can be helpful to have another person with their computer open to research, but ultimately, this is a group effort! Everyone should have a chance to write code, offer suggestions, research libraries, etc. </p>
<h3>Setup</h3>
<ol>
<li>You'll need one computer that your group will share that can install and run <a href="https://pipenv-fork.readthedocs.io/en/latest/">Pipenv</a>. While an OS-X or Linux machine will likely do the best for this step, a Windows machine will be able to do it as well. If you run into any challenges installing Pipenv, please ask for help!</li>
<li>
<p>The project is in the ChiPy project night repo. If you do not have the repository already, run </p>
<p><code>git clone https://github.com/chicagopython/CodingWorkshops.git</code></p>
</li>
<li>
<p>Navigate to the folder for this challenge:</p>
<p><code>cd CodingWorkshops/problems/py101/make_a_game</code></p>
</li>
<li>
<p>Run <code>pipenv install</code>, which will install all of the libraries we have recommended for this exercise.</p>
</li>
<li>After you've installed all of the libraries, run <code>pipenv shell</code>, which will turn on a virtual environment running Python 3.7.</li>
<li>Run <code>python run.py</code> to see the program in its current state or <code>pytest -vv</code> to run all tests.</li>
<li>If you make changes, this project uses a library called <a href="https://github.com/psf/black">Black</a> to automatically format the code for you (this known as a <a href="https://en.wikipedia.org/wiki/Lint_(software)">linter</a>. To run it, from the root of the directory, run <code>black .</code></li>
</ol>
<h3>What's in this repository?</h3>
<p>In this repository is a basic shell of a game. This game sets up a <code>Player()</code> which parrots back what the player writes to it until they decide to leave. Some of the key features here that you might want to use or modify or extend are:</p>
<ol>
<li><em>Tests</em> -- in the <code>tests/</code> folder are a series of tests to make sure that the <code>Player()</code> object continues to work as expected. As you add new functionality, you might want to practice <a href="https://en.wikipedia.org/wiki/Test-driven_development">test-driven development</a> to ensure that your code continues to work as you want it to!</li>
<li><em>run.py</em> -- This is the main file that the player will run to play the game. One thing to note is the section that starts with <code>while player.in_game:</code> -- this section sets up a loop that will keep running until the <code>in_game</code> attribute is set to False. This way, your players can continue to do things and the game won't run once through the code and immediately finish. You'll likely add extra things into this section.</li>
<li><em>Player() class</em> -- This class holds information about the player -- what its name is, what message it wants to repeat, whether it still wants to play the game...classes are useful for persisting or modifying some sort of collected state or values about a "thing", as well as defining actions that that thing may take. For example, our <code>Player()</code> can currently <code>say_hello()</code> and it has an <code>in_game</code> status that can be either <code>True</code> or <code>False</code>. A different object might have different behaviors or different attributes that can be set. Depending on your game, you may want to set up more of these classes -- for example, you could set up a <code>Map()</code> class to hold onto information about a map (what room the player is currently in, what rooms they can go to, etc.) or an <code>Enemy()</code> class (what the enemy can do, how it interacts with the player, whether it is defeated or not, etc.</li>
</ol>
<h3>So what should we do?</h3>
<p>A good way to begin might be the following:</p>
<ol>
<li>Decide what type of game you want to make: do you want to make a madlibs clone? Tic-tac-toe? A small dungeon? A word game? Put together a couple of ideas and identify what you'd like to build (and don't worry if you don't finish in time! This exercise is for you to be introduced to some Python concepts, not to emerge with a fully-developed game).</li>
<li>Identify what basic building blocks you would need to interact with in the game. For example, if you were making a madlibs clone, you would want to identify what the user could enter, some scripts for those words to be entered into, and something that reads the story out after all the words have been entered. This can help with figuring out the basic flow of the game (for example, you would not want the story to be revealed before all the words are entered!)</li>
<li>Start adding code and testing the game -- you could both add automated tests (like the ones in <code>tests/</code> or try playing your game to see if it works. </li>
</ol>
<p>Happy Developing!</p></div></description><guid>https://chicagopython.github.io/posts/make-a-game/</guid><pubDate>Thu, 19 Sep 2019 11:00:00 GMT</pubDate></item><item><title>GitHub Jobs API</title><link>https://chicagopython.github.io/posts/github-jobs-api/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h3>Project Night Purpose</h3>
<p>All of us need to look for a job at some point; and most every job board has its own API for users to post and pull data (though usually they charge money for access). The <a href="https://jobs.github.com">GitHub Jobs page</a> is a great simple API for a first look at this kind of data acqusition and analysis.</p>
<p>Tonight's project uses the very popular Python HTTP library <a href="https://2.python-requests.org/en/master/">requests</a> along with <a href="https://docs.python.org/3/library/json.html">json</a> from the standard library. You will explore the data in the jobs API with the intent of learning something about the current job market for devs. There are a ton of caveats here: for example,</p>
<ul>
<li>Who chooses to post to GitHub Jobs? Are they representative of the overall population?</li>
<li>How old are the postings?</li>
<li>Is it safe to extrapolate statistics to the country? to Illinois? to Chicago?</li>
</ul>
<p>Although the topic is a little serious we want to make sure you don't get discouraged by the data you pull (ask around--how many people in your group got their job from a GitHub posting?). The GitHub site shouldn't be taken as a good primary source for job availability or category...but that's part of data acquistition and analysis: assessing the strenghts and shortcomings of your data source. Possible discusison points for the group are things like:</p>
<ul>
<li>Where might you get additional information?</li>
<li>How often should you pull the data? What would capture over time give you?</li>
<li>Does the dataset tell you anything about which companies use the GitHub for hiring at all? In which cities? Should we move to a different part of the country?</li>
<li>Choose your own adventure and share with the group what you have learned!</li>
</ul>
<p>Most importantly - have fun! Accessing an API is a great core skill for data analysis and all experience is good experience. We hope that you are creative in your explanation and that each group discovers different things!</p>
<h3>Setting up your environment</h3>
<p>There is no pre-written code for this project, but we assume you have Python 3.+ installed on your machine. If this is your fisrt project night, we recommend creating a folder for the project night repo: <code>mkdir chipy_projects &amp;&amp; cd chipy_projects</code>. If you already have the project night repository on your machine, go to that directory and pull from master.</p>
<p>If you are using Linux or OS X, run the following to create a new virtualenv:</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">github_jobs_api</span>
<span class="k">source</span> <span class="n">github_jobs_api</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">activate</span>
</pre>
<p>On Windows, run the following</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">github_jobs_api</span>
<span class="n">github_jobs_api</span><span class="err">\</span><span class="n">Scripts</span><span class="err">\</span><span class="n">activate</span>
</pre>
<h3>Getting the project</h3>
<p>The project is in the ChiPy project night repo. If you do not have the repository already, run </p>
<pre class="code literal-block"><span></span><span class="n">git</span> <span class="n">clone</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">chicagopython</span><span class="o">/</span><span class="n">CodingWorkshops</span><span class="p">.</span><span class="n">git</span>
</pre>
<p>Now we will:</p>
<p>Go to the project:</p>
<pre class="code literal-block"><span></span><span class="n">cd</span> <span class="n">CodingWorkshops</span><span class="o">/</span><span class="n">problems</span><span class="o">/</span><span class="n">data_science</span><span class="o">/</span><span class="n">github_jobs_api</span>
</pre>
<p>Install the packages we need into our environment:</p>
<pre class="code literal-block"><span></span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">r</span> <span class="n">requirements</span><span class="p">.</span><span class="n">txt</span>
</pre>
<p>view the <em>README.md</em> file for additional information:</p>
<pre class="code literal-block"><span></span><span class="n">cat</span> <span class="n">README</span><span class="p">.</span><span class="n">md</span>
</pre>
<h3>Have fun!</h3></div></description><category>api</category><category>EDA</category><guid>https://chicagopython.github.io/posts/github-jobs-api/</guid><pubDate>Thu, 15 Aug 2019 23:00:00 GMT</pubDate></item><item><title>Battery Life</title><link>https://chicagopython.github.io/posts/battery-life/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h2>Predicting Battery Life - Challenge #1: Gathering Data</h2>
<p>Portable electronics such as mobile phones and laptops have become a near necessity in our daily lives; and those devices share one essential resource in common: battery life. Have you ever sat on the floor to be by a power outlet for your laptop or cell phone? How about delayed leaving for an event because you had to make sure your phone was charged? In a perfect world, we wouldn't have to worry about battery life, but in the absence of the miracle battery, users must rely on indicators of remaining battery life.</p>
<p>While there's still ongoing research into the capacity of batteries over time, the question of what percentage of charge remains has largely been solved in our everyday electronics. Most operating systems offer a way to display the percentage of battery life remaining. However, features that predict time remaining on the battery have been notoriously inaccurate, to the point where such features have been removed or hidden by default. Wouldn't it be nice if we could accurately predict when our phone was going to "die?"</p>
<p><strong>Your goal is going to be work toward that solution by gathering data to build a machine learning model predicting remaining battery life. You are tasked with determining what data we might want to collect for such a model, determining a strategy for ongoing collection of that data, actually collecting it, and organizing it into a form that will be usable to the machine-learning models of choice.</strong> There are no right or wrong answer here, just things that are feasible and ultimately help drive better predictions.</p>
<p>Before digging in, some background reading on how batteries work and what kinds of data/models can be useful is likely in order. Feel free to find your own resources, but here are a few:
- Overview of research and features: https://arxiv.org/pdf/1801.04069.pdf
- Battery Terminology: http://web.mit.edu/evt/summary_battery_specifications.pdf
- Battery Discharge Formulas: https://planetcalc.com/2283/</p>
<p>Once you're ready to collect data, you'll likely want to collect running process and/or system utilization data as at least part of the data you collect. Gathering such data can vary drastically by hardware and operating system. To get you started, here are a few options to make extracting the data easier:
- The <a href="https://psutil.readthedocs.io/en/latest/">psutil</a> library in python has cross-OS support, but only collects some such data.
- On Windows, there's the <a href="http://timgolden.me.uk/python/wmi/tutorial.html">wmi</a> library.
- On most distributions of Linux and MacOS, the standard librarys' os, sys, and subprocess modules can actually get you rolling pretty quickly, once you track down where system logs are stored!</p>
<p>The rest is up to you, but some questions you might want to consider:
- When tracking battery/system data, how are you accounting for a device sometimes being plugged in?
- How will you account for different battery types, device types, and operating systems?
- Besides the obvious battery and system-related data, what features might help predict battery life?
- How can you collect enough data from enough sources to successfully train a model?</p></div></description><guid>https://chicagopython.github.io/posts/battery-life/</guid><pubDate>Thu, 18 Jul 2019 11:00:00 GMT</pubDate></item><item><title>trackcoder</title><link>https://chicagopython.github.io/posts/trackcoder/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h2>1. trackcoder: The Mentorship Journal</h2>
<p>Chipy's mentorship program is an extra-ordinary jounery for becoming a better developer.
As a mentee, you are expected to do a lot - you read new articles/books, write code,
debug and troubleshoot, pair program with other mentees in coding workshop or your mentor.
This is involves managing time efficiently and doing the effective things.
But as the old adage goes, "you can't manage what you can't measure".</p>
<p>This project is the first of the three part series of building tools for the mentees for
tracking time. The end goal of such a tool will be to give you a tool that helps you record mini
journal entires every day. The tool would also show you insight into your learning patterns
allowing you to make better decisions on when allocating time for self directed learnings beyond the
mentorship program. </p>
<h3>1.1. The Done list</h3>
<p>Lets say you were to keep an account of every thing you have accomplished during your mentorship.
The minimalistic way for doing that would be noting things down in a text file - think of it as
a journal that you update frequently.</p>
<pre class="code literal-block"><span></span> <span class="nv">Date</span>: <span class="mi">02</span><span class="o">/</span><span class="mi">20</span><span class="o">/</span><span class="mi">2019</span>
<span class="nv">first</span> <span class="nv">blog</span> <span class="nv">post</span> <span class="nv">completed</span> 🏅
<span class="nv">learned</span> <span class="nv">about</span> <span class="nv">using</span> <span class="nv">click</span> <span class="nv">at</span> <span class="nv">project</span> <span class="nv">night</span>
<span class="nv">updated</span> <span class="nv">cli</span> <span class="k">for</span> <span class="nv">the</span> <span class="nv">app</span>
<span class="nv">Date</span>: <span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>
<span class="nv">read</span> <span class="nv">good</span> <span class="nv">article</span> <span class="nv">on</span> <span class="nv">decorators</span> <span class="nv">http</span>:<span class="o">//</span><span class="nv">realpython</span>.<span class="nv">org</span><span class="o">/</span>
<span class="nv">refactored</span> <span class="nv">to</span> <span class="nv">use</span> <span class="nv">decorators</span> <span class="k">for</span> <span class="nv">orthogonal</span> <span class="nv">logic</span>
<span class="nv">debugging</span> <span class="nv">decorators</span>
<span class="nv">met</span> <span class="nv">with</span> <span class="nv">mentor</span> <span class="nv">to</span> <span class="nv">fix</span> <span class="nv">decorator</span> <span class="nv">issues</span>
</pre>
<p>With a little bit of effort, you can capture two more data points with each of the accomplishment
you are recording in this mini journal entry.</p>
<p>(a) the time spent
(b) the type of activity</p>
<p>These will form the basis of gaining insights into your personal learning patterns.</p>
<p>For example:</p>
<pre class="code literal-block"><span></span><span class="nv">Date</span>: <span class="mi">02</span><span class="o">/</span><span class="mi">20</span><span class="o">/</span><span class="mi">2019</span>
<span class="nv">first</span> <span class="nv">blog</span> <span class="nv">post</span>, <span class="nv">blogging</span>, <span class="mi">120</span> <span class="nv">mins</span>
<span class="nv">learned</span> <span class="nv">about</span> <span class="nv">using</span> <span class="nv">click</span> <span class="nv">at</span> <span class="nv">project</span> <span class="nv">night</span>, <span class="nv">pair_programming</span>, <span class="mi">120</span> <span class="nv">mins</span>
<span class="nv">updated</span> <span class="nv">cli</span> <span class="k">for</span> <span class="nv">the</span> <span class="nv">app</span>, <span class="nv">coding</span>, <span class="mi">20</span> <span class="nv">mins</span>
<span class="nv">Date</span>: <span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>
<span class="nv">read</span> <span class="nv">good</span> <span class="nv">article</span> <span class="nv">on</span> <span class="nv">decorators</span> <span class="nv">http</span>:<span class="o">//</span><span class="nv">realpython</span>.<span class="nv">org</span><span class="o">/</span>, <span class="nv">research</span>, <span class="mi">45</span> <span class="nv">mins</span>
<span class="nv">refactored</span> <span class="nv">to</span> <span class="nv">use</span> <span class="nv">decorators</span> <span class="k">for</span> <span class="nv">orthogonal</span> <span class="nv">logic</span>, <span class="nv">coding</span>, <span class="mi">30</span> <span class="nv">mins</span>
<span class="nv">debugging</span> <span class="nv">decorators</span>, <span class="nv">debugging</span>, <span class="mi">30</span> <span class="nv">mins</span>
<span class="nv">met</span> <span class="nv">with</span> <span class="nv">mentor</span> <span class="nv">to</span> <span class="nv">fix</span> <span class="nv">decorator</span> <span class="nv">issues</span>, <span class="nv">mentor</span>, <span class="mi">60</span> <span class="nv">mins</span>
</pre>
<p>While recording each of these activities have taken up time, you'll probably find one was
more useful than the other. So add another field to your
entry, effectiveness and refactor a little bit.</p>
<pre class="code literal-block"><span></span><span class="mi">02</span><span class="o">/</span><span class="mi">20</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">first</span> <span class="nv">blog</span> <span class="nv">post</span>, <span class="nv">blogging</span>, <span class="mi">120</span> <span class="nv">mins</span>, <span class="mi">4</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">20</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">learned</span> <span class="nv">about</span> <span class="nv">using</span> <span class="nv">click</span> <span class="nv">at</span> <span class="nv">project</span> <span class="nv">night</span>, <span class="nv">pair_programming</span>, <span class="mi">120</span> <span class="nv">mins</span>, <span class="mi">4</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">20</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">updated</span> <span class="nv">cli</span> <span class="k">for</span> <span class="nv">the</span> <span class="nv">app</span>, <span class="nv">coding</span>, <span class="mi">20</span> <span class="nv">mins</span>, <span class="mi">3</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">read</span> <span class="nv">good</span> <span class="nv">article</span> <span class="nv">on</span> <span class="nv">decorators</span> <span class="nv">http</span>:<span class="o">//</span><span class="nv">realpython</span>.<span class="nv">org</span><span class="o">/</span>, <span class="nv">research</span>, <span class="mi">45</span> <span class="nv">mins</span>, <span class="mi">4</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">refactored</span> <span class="nv">to</span> <span class="nv">use</span> <span class="nv">decorators</span> <span class="k">for</span> <span class="nv">orthogonal</span> <span class="nv">logic</span>, <span class="nv">coding</span>, <span class="mi">30</span> <span class="nv">mins</span>, <span class="mi">3</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">debugging</span> <span class="nv">decorators</span>, <span class="nv">debugging</span>, <span class="mi">30</span> <span class="nv">mins</span>, <span class="mi">2</span>
<span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="o">/</span><span class="mi">2019</span>, <span class="nv">met</span> <span class="nv">with</span> <span class="nv">mentor</span> <span class="nv">to</span> <span class="nv">fix</span> <span class="nv">decorator</span> <span class="nv">issues</span>, <span class="nv">mentor</span>, <span class="mi">60</span> <span class="nv">mins</span>, <span class="mi">5</span>
</pre>
<p>If you save this file as a .csv now and open it in excel, you'll be able to get an account of how your time has been spent by selecting all the rows
for the last but one column. Taking the next step, you can very easily build a bar graph of time spent per day by including the first column.</p>
<p><img alt="csv.png" src="https://chicagopython.github.io/images/csv.png"></p>
<p>We could have stopped right here and let you continue with a spreadsheet. as a system for your mentorship journal. But lets make it a fun, reliable,and smooth.</p>
<h4>1.1.1. The Data structure</h4>
<p>Having looked at the data we intend to capture, lets try to reason how each entry in the row
can be represented by variables in a script and what their type would be like.</p>
<pre class="code literal-block"><span></span> <span class="n">task</span><span class="p">:</span> <span class="n">str</span>
<span class="n">description</span><span class="p">:</span> <span class="n">str</span>
<span class="k">timestamp</span><span class="p">:</span> <span class="n">datetime</span><span class="p">.</span><span class="k">timestamp</span>
<span class="n">mins</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">effective</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">done</span><span class="p">:</span> <span class="n">bool</span>
</pre>
<p>Note: I have sneaked in a variable <code>done</code>, which can tell us if this task is completed. We would not use it right now, but with a little bit of effort we can use this field to enhance this app to have a Todo list feature. With these as a member variables, we can now define a <code>class</code> for our <code>app.py</code>. A class is nothing but an abstract representation of how each of our data record (or object) should look like and behave.</p>
<p>You'll find this in <code>app.py</code>-s line 16-22.</p>
<h4>1.1.2. Replacing csv with a database</h4>
<p>csv or spreadsheets are a good start for storing data, but are not without limitations. For project like ours, where we are looking to do automation, analytics and integration with more than one system we should be better of using a database.
There are a lot of databases to chose from, however but for something simple like ours we will use the <code>sqlite</code> that comes with the python installation. Lets now look at the different ways how we can capture the data in a python script, persist it into the database, and as well as retrieve it for further processing.</p>
<h5>1.1.2.1. Object Relational Mapper (ORM)</h5>
<p>Each of your mini journal entry would be first captured as
a Python object using the <code>app.py</code> script that we will build in part 1. <code>app.py</code> will also convert the python object
into a database record using an Object Relatoinal Mapper(orm) and persist the data into the database. If you want to retrieve/update/delete a record from the database, the ORM allows you to use a similar Pythonic code to get the data back.</p>
<h5>1.1.2.2. SQL/Pandas</h5>
<p>While the ORM allows you to write Python directly, the primary mechanism for querying the data in relational databases is
Structed Query Language (SQL) which is english like query language that allows you to create,
retrieve, summarize and analyze the data. We would be using SQL and pandas in part three for analysis of the data.</p>
<h4>1.1.3. Building the app in three parts</h4>
<p><img alt="project.png" src="https://chicagopython.github.io/images/project.png"></p>
<h4>1.1.4. Part 1: The terminal client</h4>
<p>Since most of our development time is spent on a terminal, in part 1 we will be building a terminal based client. The client will provide a simple interface to add a data record of your mini journal entry into the database.</p>
<h4>1.1.5. Part 2: Web interface</h4>
<p>After the database has been populated with entries we made in part 1, we will build a web interface that pulls
up data from the database and presents a dashboard to show where time was spent.</p>
<p><img alt="dashboard" src="https://chicagopython.github.io/images/dashboard.gif"></p>
<h4>1.1.6. Part 3: Data Analysis</h4>
<p>Once the data enty and output part is complete, in part 3, we will use data science tools to
answer questions regarding your learning patterns. As a strech goal, we will enhance the dashboard built in part 2, with metrics and insights coming out of part 3.</p>
<h3>1.2. Part 1</h3>
<p>In this project we will explore</p>
<ul>
<li>How to build command line applications using <code>prompt_toolkit</code>, <code>click</code></li>
<li>How to store data in sqlite database that comes with Python using <code>peewee</code> ORM</li>
</ul>
<h3>1.3. Setup Instructions</h3>
<p>You will need a text editor like Visual Studio Code, Atom or Sublime Text. Since this
you'll be working in a group, having an editor that does not get in the way of solving
the problem is essential. So stick to what every one in your team is familiar with.</p>
<h4>1.3.1. Download .zip from github</h4>
<p>If you are not familiar with <code>git</code>, you can download the repository from <a href="https://github.com/chicagopython/CodingWorkshops/archive/master.zip">here</a>.
Clicking on the link will download a .zip file to your computer. Next you need to
navigate to the folder where it was downloaded and unzip the folder. Once you have
the CodingWorkshop directory, you can go to step 1.5.</p>
<h4>1.3.2. Git and Github [Optional]</h4>
<p>After completing the steps below you should have a github account and be able to push
your local changes to this repository to github.</p>
<ul>
<li>Follow the setup steps described <a href="https://help.github.com/articles/set-up-git/">here</a></li>
<li>Read the steps described in <a href="https://help.github.com/articles/fork-a-repo">fork a repo</a></li>
<li>Use the steps described above to fork this repository <a href="https://github.com/chicagopython/CodingWorkshops">CodingWorkshops</a></li>
</ul>
<p>The changes that you make as a part of this exercise, will be pushed to the fork you created for this
repository.</p>
<p>In case you have already have created a fork of this repository in your github account, you will
want to bring it up to date with the recent changes. In that case,
you will need to do the following:</p>
<ul>
<li><a href="https://help.github.com/articles/configuring-a-remote-for-a-fork/">configuring a remote fork</a></li>
<li><a href="https://help.github.com/articles/syncing-a-fork/">syncing a fork</a></li>
</ul>
<h3>1.4. Python</h3>
<p>This project has made no attempt to be compatible with Python 2.7. 😎</p>
<p>Recommended version: Python 3.6 or higher.</p>
<h3>1.5. Quick Git command refresher [Optional]</h3>
<p>Below are the few most used git commands</p>
<pre class="code literal-block"><span></span><span class="nv">git</span> <span class="nv">checkout</span> <span class="nv">master</span> # <span class="nv">checkout</span> <span class="nv">to</span> <span class="nv">master</span> <span class="nv">branch</span>
<span class="nv">git</span> <span class="nv">checkout</span> <span class="o">-</span><span class="nv">b</span> <span class="nv">feature</span><span class="o">/</span><span class="nv">cool</span> # <span class="nv">crate</span> <span class="nv">a</span> <span class="nv">new</span> <span class="nv">branch</span> <span class="nv">feature</span><span class="o">/</span><span class="nv">cool</span>
<span class="nv">git</span> <span class="nv">add</span> <span class="o">-</span><span class="nv">u</span> # <span class="nv">stage</span> <span class="nv">all</span> <span class="nv">the</span> <span class="nv">updates</span> <span class="k">for</span> <span class="nv">commit</span>
<span class="nv">git</span> <span class="nv">commit</span> <span class="o">-</span><span class="nv">am</span> <span class="s2">"</span><span class="s">Adding changes and commiting with a comment</span><span class="s2">"</span>
<span class="nv">git</span> <span class="nv">push</span> <span class="nv">origin</span> <span class="nv">master</span> # <span class="nv">push</span> <span class="nv">commits</span> <span class="nv">to</span> <span class="nv">develop</span><span class="o">/</span><span class="nv">ci</span> <span class="nv">branch</span>
</pre>
<p>Note for this exercise, we will be working on the master branch directly. However,
that is NOT a best practice. Branches are cheap in git, so a new feature or fix
would first go to a branch, get tested, code reviewed and finally merged to master.</p>
<h3>1.6. Documentation references</h3>
<p>Below are the libraries used by this program.</p>
<ul>
<li><a href="https://python-prompt-toolkit.readthedocs.io/en/master/">prompt_toolkit</a></li>
<li><a href="http://click.pocoo.org/5/">click</a></li>
<li><a href="http://docs.peewee-orm.com/en/latest/">peewee</a></li>
</ul>
<h3>1.7. Exercise 0: Project Setup</h3>
<p>After completing the steps in setup, you should have the cloned versoin of the fork of <code>CodingWorkshop</code>
repository in your local machine. Lets take the time to look at the structure of this
project. All code is located under <code>/problems/py101/trackcoder</code> directory. So from your
terminal go to the directory where you have cloned the repository.</p>
<pre class="code literal-block"><span></span><span class="n">cd</span> <span class="n">path</span><span class="o">/</span><span class="k">to</span><span class="o">/</span><span class="n">CodingWorkshop</span><span class="o">/</span><span class="n">problems</span><span class="o">/</span><span class="n">py101</span><span class="o">/</span><span class="n">trackcoder</span>
</pre>
<p>Make sure you are in this directory for the remainder of this project.</p>
<p>Run <code>pwd</code> (<code>cwd</code> for Windows) on the command prompt to find out which directory you
are on.</p>
<p>Your output should end in <code>problems/py101/trackcoder</code> and contain the files described
below.</p>
<h4>1.7.1. <code>app.py</code></h4>
<p>This file contains the code required to get you started with building the project.
You will be building on top of what has been provided in this file.</p>
<h4>1.7.2. <code>Makefile</code></h4>
<p>This file contains the commands that are required building the project.
You can run <code>make help</code> to see what are the options.</p>
<p>Note Makefile would not work on Windows out of the box.</p>
<h4>1.7.3. <code>Pipfile</code> and <code>Pipfile.lock</code></h4>
<p>These two files are used by <code>pipenv</code> to create a virtual enviornment that
isolates all the dependencies of this project from other python projects in your computer.
Learn more about <a href="https://docs.pipenv.org/">pipenv</a>.</p>
<h3>1.8. Exercise 1: Build</h3>
<p>From the <code>/problems/py101/trackcoder</code> directory, run</p>
<pre class="code literal-block"><span></span><span class="n">make</span>
</pre>
<ul>
<li>Which packages got installed?</li>
<li>Which version of python is getting used?</li>
</ul>
<p>Skip this exercise for Windows. Install the dependencies using</p>
<pre class="code literal-block"><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">prompt_toolkit</span> <span class="n">Click</span> <span class="n">peewee</span>
</pre>
<h3>1.9. Exercise 2: Run the program</h3>
<p>First shell into your virtual environment</p>
<pre class="code literal-block"><span></span><span class="n">make</span> <span class="n">shell</span>
</pre>
<p>This should activate your virtual enviornment, i.e. give you access to a python
environment where all the dependencies for this project has been installed.</p>
<p>Note: If the above command errors out, or you are on Windows, run the following to get into
a shell with the virtualenv acitvated.</p>
<pre class="code literal-block"><span></span><span class="n">pipenv</span> <span class="n">shell</span>
</pre>
<p>If everything fails,</p>
<pre class="code literal-block"><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">prompt_toolkit</span> <span class="n">Click</span> <span class="n">peewee</span>
</pre>
<p>Start by running</p>
<pre class="code literal-block"><span></span><span class="n">python</span> <span class="n">app</span><span class="p">.</span><span class="n">py</span> <span class="c1">--help</span>
</pre>
<p>What are the possible options that command has?
Run each option with --help option to see what is the help message provided.</p>
<h3>1.10. Exercise 3: Fix the help message</h3>
<h4>1.10.1. Interactive mode</h4>
<p>Running <code>app.py</code> with <code>-i</code> - should start the app in interactive mode.
Once in interactive mode, there are two commands <code>add</code> and <code>show</code>.</p>
<p>The <code>add</code> command allows adding a new <code>Task</code>. The format is</p>
<pre class="code literal-block"><span></span><span class="c">% add b 10 first paragraph of first blog post</span>
</pre>
<p>here <code>b</code> is the abbreviations for blogging, <code>10</code> shows the time taken for the task. Rest of the sentence is comment.
There are only 6 possible Task types</p>
<ul>
<li>blogging (b)</li>
<li>coding (c)</li>
<li>debugging (d)</li>
<li>pair programming at project night (p)</li>
<li>research (r)</li>
<li>meeting with mentor (m)</li>
</ul>
<p>For example, an interactive session might look like</p>
<blockquote>
<pre class="code literal-block"><span></span><span class="c">% add b 10 first blog post</span>
<span class="c">% add c 10 finished cli</span>
<span class="c">% add d 120 debugging decorators</span>
<span class="c">% add m 120 always keep the final presentation in mind</span>
<span class="c">% add r 60 read articles on pandas</span>
<span class="c">% add p 120 learned about decorators</span>
</pre>
</blockquote>
<p>The <code>show</code> command allows listing of all the <code>Task</code>-s added till now.</p>
<blockquote>
<pre class="code literal-block"><span></span><span class="o">%</span> <span class="k">show</span>
<span class="nv">b</span> <span class="mi">10</span> <span class="nv">first</span> <span class="nv">blog</span> <span class="nv">post</span>
<span class="nv">c</span> <span class="mi">10</span> <span class="nv">finished</span> <span class="nv">cli</span>
<span class="nv">d</span> <span class="mi">120</span> <span class="nv">debugging</span> <span class="nv">decorators</span>
<span class="nv">m</span> <span class="mi">120</span> <span class="nv">always</span> <span class="nv">keep</span> <span class="nv">the</span> <span class="nv">final</span> <span class="nv">presentation</span> <span class="nv">in</span> <span class="nv">mind</span>
<span class="nv">r</span> <span class="mi">60</span> <span class="nv">read</span> <span class="nv">articles</span> <span class="nv">on</span> <span class="nv">pandas</span>
<span class="nv">p</span> <span class="mi">120</span> <span class="nv">learned</span> <span class="nv">about</span> <span class="nv">decorators</span>
</pre>
</blockquote>
<p>For this exercise you need to update add helpful messages that will summarize what each
of the options for <code>app.py</code> stand for.</p>
<h3>1.11. Exercise 4: Run in interactive mode</h3>
<pre class="code literal-block"><span></span><span class="n">python</span> <span class="n">app</span><span class="p">.</span><span class="n">py</span> <span class="o">-</span><span class="n">i</span>
</pre>
<p>Add some tasks and list them out by using the commands shown above. Play around with the up/down
arrow keys to access history of the commands.</p>
<p>Exit the session using <code>ctrl+D</code>. From your command prompt, run <code>ls -l</code> in linux or mac or <code>dir</code>
in windows. What is the name of the file that gets created?</p>
<p>Using sqlite3</p>
<pre class="code literal-block"><span></span><span class="n">sqlite3</span> <span class="n">to_do_list</span><span class="p">.</span><span class="n">db</span> <span class="s1">'select * from ToDo;'</span>
</pre>
<p>Compare the output that you get from running <code>show</code> and using the command above.</p>
<h3>1.12. Exercise 4: Run in non-interactive mode</h3>
<p>For ease of entering data the program can also be run in non-interactive mode</p>
<pre class="code literal-block"><span></span><span class="n">python</span> <span class="n">app</span><span class="p">.</span><span class="n">py</span> <span class="o">-</span><span class="n">a</span> <span class="n">b</span> <span class="mi">30</span> <span class="ss">"first blog post completed"</span>
<span class="n">python</span> <span class="n">app</span><span class="p">.</span><span class="n">py</span> <span class="o">-</span><span class="n">s</span>
</pre>
<p>Add a few tasks that have been completed and list them non-interactively.
Note you'll need to put the description in quotes in this mode.</p>
<h4>1.12.1. Optional: For non-windows users only</h4>
<p>You can further simplify entering tracking your time by adding a bash shell alias.</p>
<pre class="code literal-block"><span></span><span class="n">alias</span> <span class="n">add</span><span class="o">=</span><span class="err">'</span><span class="n">function</span> <span class="n">_add</span><span class="p">(){</span> <span class="n">python</span> <span class="n">app</span><span class="p">.</span><span class="n">py</span> <span class="o">-</span><span class="n">a</span> <span class="s">"$@"</span><span class="p">;</span> <span class="p">};</span><span class="n">_add</span><span class="err">'</span>
</pre>
<p>Then from your shell you can</p>
<pre class="code literal-block"><span></span>$ add c <span class="m">30</span> <span class="s2">"finished oauth"</span>
</pre>
<p>Add a similar shell alias for the <code>show</code> command.</p>
<h3>1.13. Exercise 5: Error handling</h3>
<p>Currently we have two commands <code>add</code> and <code>show</code>. Lets say the user made a typo,
or was creative while trying to input a command.</p>
<pre class="code literal-block"><span></span><span class="c">% add c api 30 complete</span>
</pre>
<p>instead of</p>
<pre class="code literal-block"><span></span><span class="c">% add c 30 api complete</span>
</pre>
<p>This results in the program crashing horribly with huge stack trace.
Add error handling to handle cases when the program is unable to <code>parse</code> the input
passed by the user.</p>
<h3>1.14. Exercise 6: Enhance the show command</h3>
<p>Enhance the show command to summarize the output by task category.
Your summary should include how much time was spent on each of the task category.</p>
<p>As seen above, we are using sqlite3. You may choose to do your summary calculation
using sql or write the logic in python.</p>
<h3>1.15. Exercise 7: Add a field for task complete or not</h3>
<p>Next take a look at</p>
<pre class="code literal-block"><span></span><span class="k">class</span> <span class="n">ToDo</span>(<span class="n">Model</span>):
</pre>
<p>This class has a list of fields - task, description, timestamp, mins, done.
Till now we have not been using this field. It has a default value of <code>True</code>
to indicate that a completed task is being added.</p>
<p>However, it might not always be the case. You might want to log your work,
and still have incomplete tasks. In fact, logging often and logging early is
encouraged! In order to faciliate that we need to optionally
take a fourth parameter in the input for adding a new task.</p>
<p>Take a look at the decorator right above the <code>main</code> function</p>
<pre class="code literal-block"><span></span><span class="nv">@click</span><span class="p">.</span><span class="k">option</span><span class="p">(</span><span class="s1">'--add'</span><span class="p">,</span><span class="w"> </span><span class="s1">'-a'</span><span class="p">,</span><span class="w"> </span><span class="n">nargs</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="p">(</span><span class="n">click</span><span class="p">.</span><span class="n">STRING</span><span class="p">,</span><span class="w"> </span><span class="nc">int</span><span class="p">,</span><span class="w"> </span><span class="n">click</span><span class="p">.</span><span class="n">STRING</span><span class="p">),</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="k">default</span><span class="o">=</span><span class="p">(</span><span class="k">None</span><span class="p">,</span><span class="w"> </span><span class="k">None</span><span class="p">,</span><span class="w"> </span><span class="k">None</span><span class="p">))</span><span class="w"></span>
</pre>
<p>This is the starting point for allowing taking in an extra input.
You will find the relevant documentation <a href="http://click.pocoo.org/5/options/">here</a></p>
<p>Hint: Note the type of the field is boolean.
You will need to modify the <code>parse</code>, <code>add</code> and the <code>main</code> function in order to complete
this exercise.</p>
<h3>1.16. Exercise 8: Enhance the summary</h3>
<p>Enhance your summary function to show how many tasks are in progress and how many are complete.
How you want to format the information is completely up to your choice.</p>
<h3>1.17. Exercise 9: Hashtags</h3>
<p>Now that you have enabled the flag to indicate if a task is complete or not, you
can log a much fine grained prorgress of your tasks. You can tag your task with
arbitary hashtags in order to provide better semantic information. For example:</p>
<pre class="code literal-block"><span></span><span class="c">% add p 120 #data_science learned about precision/recall</span>
<span class="c">% add b 120 finished the blogpost</span>
<span class="c">% add p 30 #data_science learned about roc curves</span>
<span class="c">% add p 30 #webdev added a flask interface</span>
<span class="c">% add d 30 #issues/7 found a bug, new github issue</span>
<span class="c">% add p 30 #issues/7 closed github issue 7</span>
</pre>
<p>Enhance the show command to optionally take a hashtag as parameter, that will filter out only
tasks which have that hashtag. Accrodingly your summary should reflect only
data relevant to that hashtag.</p>
<h3>1.18. Exercise 10: Add a field for effectiveness</h3>
<p>Next its time to add a score to your efforts. Add a field to the <code>ToDo</code> class called effective,
where you can record how effective a task was. An effective score is a number between 1 to 5,
1 being the lowest and 5 being the highest.</p>
<p>Armed with this data, you should be able to answer
(i) what is taking up most of your time?
(ii) which activities are the most effective for your growth</p></div></description><guid>https://chicagopython.github.io/posts/trackcoder/</guid><pubDate>Thu, 16 May 2019 04:48:53 GMT</pubDate></item><item><title>ChiPy Chipmunk Project Night </title><link>https://chicagopython.github.io/posts/chipy-chipmunks/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><h3>Project Night Purpose</h3>
<p>Many people assume data scientists spend all day visualizing data and making impressive predictive models. While this isn’t untrue, the luckiest and most productive data scientists spend a lot of their time communicating. They communicate their model results - as well as their assumptions and limitations when making their models and doing analysis - in a way that is digestible to their stakeholders and colleagues. </p>
<p>Tonight’s project is aimed towards that aspect of communication. You will be asked to make assumptions as a team - particularly as they pertain to this problem and what the stakeholders need. There are no exactly correct assumptions or answers for this project night. There may be assumptions and answers that clearly don’t have evidence to support them, but do not feel bogged down by getting the “right” answer.</p>
<p>Most importantly - have fun. While this project night covers serious concepts, it is ridiculously silly and meant to be taken with a bit of lighthearted exploration and plenty of opportunities to make mistakes.</p>
<h3>Setting up your environment</h3>
<p>This project is contained in a jupyter notebook and is assuming you have Python 3.+ installed on your machine. If this is your fisrt project night, we recommend creating a folder for the project night repo: <code>mkdir chipy_projects &amp;&amp; cd chipy_projects</code>. If you already have the project night repository on your machine, go to that directory and pull from master.</p>
<p>If you are using Linux or OS X, run the following to create a new virtualenv:</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">chipmunk</span>
<span class="k">source</span> <span class="n">chipmunk</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">activate</span>
</pre>
<p>On Windows, run the following</p>
<pre class="code literal-block"><span></span><span class="n">python3</span> <span class="o">-</span><span class="n">m</span> <span class="n">venv</span> <span class="n">chipmunk</span>
<span class="n">chipmunk</span><span class="err">\</span><span class="n">Scripts</span><span class="err">\</span><span class="n">activate</span>
</pre>
<h3>Getting the project</h3>
<p>The project is in the ChiPy project night repo. If you do not have the repository already, run </p>
<pre class="code literal-block"><span></span><span class="n">git</span> <span class="n">clone</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">chicagopython</span><span class="o">/</span><span class="n">CodingWorkshops</span><span class="p">.</span><span class="n">git</span>
</pre>
<p>Now we will:</p>
<p>Go to the project:</p>
<pre class="code literal-block"><span></span><span class="n">cd</span> <span class="n">CodingWorkshops</span><span class="o">/</span><span class="n">problems</span><span class="o">/</span><span class="n">data_science</span><span class="o">/</span><span class="n">chipmunks</span>
</pre>
<p>Install the packages we need into our environment:</p>
<pre class="code literal-block"><span></span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">r</span> <span class="n">requirements</span><span class="p">.</span><span class="n">txt</span>
</pre>
<p>Run our jupyter notebook server for the project:</p>
<pre class="code literal-block"><span></span><span class="n">jupyter</span> <span class="n">notebook</span>
</pre>
<h3>Have fun!</h3></div></description><category>analytics</category><category>EDA</category><category>pandas</category><category>statistics</category><guid>https://chicagopython.github.io/posts/chipy-chipmunks/</guid><pubDate>Thu, 18 Apr 2019 23:00:00 GMT</pubDate></item><item><title>Chipmunks Data Science</title><link>https://chicagopython.github.io/posts/chipmunks-data-science/</link><dc:creator>Chicago Python User Group</dc:creator><description><div><div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Project-Night-Purpose">Project Night Purpose<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Project-Night-Purpose">¶</a></h2><p>Many people assume data scientists spend all day visualizing data and making impressive predictive models. While this isn’t untrue, the luckiest and most productive data scientists spend a lot of their time communicating. They communicate their model results - as well as their assumptions and limitations when making their models and doing analysis - in a way that is digestible to their stakeholders and colleagues.</p>
<p>Tonight’s project is aimed towards that aspect of communication. You will be asked to make assumptions as a team - particularly as they pertain to this problem and what the stakeholders need. There are no exactly correct assumptions or answers for this project night. There may be assumptions and answers that clearly don’t have evidence to support them, but do not feel bogged down by getting the “right” answer.</p>
<p>Most importantly - have fun. While this project night covers serious concepts, it is ridiculously silly and meant to be taken with a bit of lighthearted exploration and plenty of opportunities to make mistakes.</p>
<h2 id="Oh,-no!-We've-had-a-data-crash.">Oh, no! We've had a data crash.<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Oh,-no!-We've-had-a-data-crash.">¶</a></h2><p>As ChiPy leadership was preparing for <a href="https://us.pycon.org/2019/">PyCon</a> at the end of this month, they found that the dataset on our infamous <em>ChiPy chipmunks</em> has disapeared. While they transition from Oracle to Postgres, the leadership team has enlisted your help as data scientists to analyze some salvaged chipmunk data. The PyCon organizers had a few questions about coding in Chicago, ChiPy, and chipmunks that need answers. We will get to those questions shortly, but first let's get to the data.</p>
<h3 id="Reading-in-the-Data">Reading in the Data<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Reading-in-the-Data">¶</a></h3><p>The salvaged chipmunk dataset is <code>chipmunk.csv</code>. The wonderful <a href="https://pandas.pydata.org/">pandas</a> library, built on <a href="http://www.numpy.org/">numpy</a>, will let the team read in the data.</p>
<h4 style="color: #f92828;text-decoration: underline;">ChiPy Check-in</h4><p>Now is a good time to check in with the team. Is anyone familiar with <code>pandas</code> and <code>numpy</code>? Discuss with your team what these libraries are, what they allow data scientists to do, and then decide on what <code>pandas</code> function will read in our data.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Setting-up-your-environment">Setting up your environment<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Setting-up-your-environment">¶</a></h3><p>This project is contained in a jupyter notebook and is assuming you have Python 3.+ installed on your machine. If this is your fisrt project night, we recommend creating a folder for the project night repo: <code>mkdir chipy_projects &amp;&amp; cd chipy_projects</code>. If you already have the project night repository on your machine, go to that directory and pull from master.</p>
<p>If you are using Linux or OS X, run the following to create a new virtualenv:</p>
<pre><code>python3 -m venv chipmunk
source chipmunk/bin/activate</code></pre>
<p>On Windows, run the following</p>
<pre><code>python3 -m venv chipmunk
chipmunk\Scripts\activate</code></pre>
<h3 id="Getting-the-project">Getting the project<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Getting-the-project">¶</a></h3><p>The project is in the ChiPy project night repo. If you do not have the repository already, run</p>
<pre><code>git clone https://github.com/chicagopython/CodingWorkshops.git</code></pre>
<p>Now we will:</p>
<p>Go to the project:</p>
<pre><code>cd CodingWorkshops/problems/data_science/chipmunks</code></pre>
<p>Install the packages we need into our environment:</p>
<pre><code>pip install -r requirements.txt</code></pre>
<p>Run our jupyter notebook server for the project:</p>
<pre><code>jupyter notebook</code></pre>
<p>The dataset is in the <code>csv</code> file <code>chipmunks.csv</code>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="nn">sns</span>
<span class="o">%</span><span class="k">matplotlib</span> inline
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Read in the data</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Exploring-the-data">Exploring the data<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Exploring-the-data">¶</a></h3><p>We need to be familiar with our data before we can answer questions about ChiPy and our chipmunks. Let's start with some questions we would ask of <em>any</em> dataset:</p>
<ul>
<li>How many rows are in this dataset? What does each row represent?</li>
<li>What does the data look like? Check the first 5 rows</li>
<li>Is there missing data? If so, how much is missing?</li>
<li>What columns are categorical?</li>
<li>What are the unique number of observations for each column?</li>
</ul>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Check the number of rows</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## See first 5 rows of data</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Check for missing data</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Check for categorical data and unique number of values</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Was-there-missing-data?">Was there missing data?<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Was-there-missing-data?">¶</a></h3><p>We will keep exploring the data and start answering questions soon, but first let's address missing data (if there is any). What columns have missing data? What kind of data is missing?</p>
<h4 style="color: #f92828;text-decoration: underline;">ChiPy Check-in</h4><p>This a great point for discussion. If there is missing data - why might it be missing? Discuss some possible reasons with your team and decide on a reason that makes sense.</p>
<p><a href="https://en.wikipedia.org/wiki/Imputation_(statistics">Imputation</a>) is the process of replacing missing data with some estimated value. The process can be as complicated (or simple) as you would like it to be! Given the possible reason for our missing data, what is an acceptable imputation?</p>
<p>Impute any missing data in your dataset and note what assumptions you made as a team. If you are not sure how to replace data in <code>pandas</code>, feel free to use google like a proper data scientist.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Replace any missing data here</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Check your data for missing values to see if it worked!</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Stakeholder-Questions">Stakeholder Questions<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Stakeholder-Questions">¶</a></h2><h3 id="Question-#1">Question #1<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Question-#1">¶</a></h3><p>The great folks at PyCon want to know all about ChiPy and our chipmunks. They have heard that <strong>ChiPy is an inclusive and open community</strong>. Can we support that claim with our data? Given that the <code>ChiPy</code> column takes a value of <code>1</code> for a ChiPy chipmunk and a value of <code>0</code> for chipmunks not in ChiPy, start to explore this question.</p>
<p>Some ideas to get you started:</p>
<ul>
<li>Are chipmunks of different species represented in ChiPy?</li>
<li>Are chipmunks of different sizes represented in ChiPy?</li>
<li>Are chipmunks of different careers represented in ChiPy?</li>
<li>Are spotted and not spotted chipmunks represented in ChiPy?</li>
</ul>
<h4 style="color: #f92828;text-decoration: underline;">ChiPy Check-in</h4><p>There are no right or wrong answers here, only well supported or poorly supported ones! Discuss as a group the aspects of the data you have looked at and if it constitutes enough evidence to justify an answer.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Exploration of species</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Exploration of size</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Exploration of careers</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Exploration of spotted vs non-spotted</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Question-#2">Question #2<a class="anchor-link" href="https://chicagopython.github.io/posts/chipmunks-data-science/#Question-#2">¶</a></h3><p>The word on the street at PyCon is that chipmunks that live in Chicago enjoy coding more than those that don't. Is this not true? Given that the <code>chicago</code> column takes a value of <code>1</code> for chipmunks that live in Chicago and a value of <code>0</code> for chipmunks that do not, explore this question.</p>
<ul>
<li>Visualize the distributions of <code>coding_enjoyment</code> for chipmunks that do and do not live in Chicago.</li>
<li>Come up with a way to test our question.</li>
</ul>
<h4 style="color: #f92828;text-decoration: underline;">ChiPy Check-in</h4><p>Coming up with a proper way to test stakeholder questions can be an artform as well as a science. We have imported a few statistical tests below that may (or may not) be appropriate for our question. First consider a way to frame our question as something to <em>disprove</em> (those familiar with jargon, let's construct a <a href="https://en.wikipedia.org/wiki/Null_hypothesis">null hypothesis</a>) - then conduct a test that may disprove it. Reading the documentation for the imported tests below may prove to be very helpful!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="k">import</span> <span class="n">ttest_ind</span><span class="p">,</span> <span class="n">levene</span><span class="p">,</span> <span class="n">chisquare</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1">## Beautiful plot</span>
</pre></div>