-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
653 lines (610 loc) · 54.5 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Becky Jaimes</title>
<link href="http://blog.datasommelier.com/atom.xml" rel="self"/>
<link href="http://blog.datasommelier.com"/>
<updated>2019-06-21T17:35:36.933Z</updated>
<id>http://blog.datasommelier.com</id>
<author>
<name>Becky Jaimes</name>
</author>
<entry>
<title>Farewell Segment</title>
<link href="http://blog.datasommelier.com/Farewell-Segment"/>
<id>http://blog.datasommelier.com/Farewell-Segment</id>
<updated>2018-11-07T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Written three years ago after submitting my resignation to Segment</p>
</summary>
<content type="html"><p>Written three years ago after submitting my resignation to Segment</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*jnZ-kgZbQs7NoreHiERAcw.jpeg" alt=""></p>
<p>After the most amazing year and a half of working at Segment, I have submitted my resignation.</p>
<p>I joined Segment when the company was still operating out of a live-work loft in SOMA, and we were only 14 employees. During my interview, we all gathered in the office kitchen and made dinner together. There was an incredible family vibe that was unlike any work environment I’ve experienced before. The company started to grow. Everyone was very proud of breaking 1 million dollars in annual revenue — a growth mark that many SaaS startups dream of achieving.</p>
<p>My role when joining the company was to start the Enterprise Success team. Until that point, Jake (Segment’s first employee) and often Peter (the CEO) had been handling all support requests. They were closing more than 100 support tickets per day! Half of the company’s revenue depended on a handful of enterprise customers and Segment had no one on the payroll to proactively offer dedicated service to those VIP customers. My role was to make sure Segment delivered on promises, and that enterprise customers were happy and used the product to its full potential.</p>
<p>During the past year and a half, we worked long, fun hours. Segment grew to a company of more than 50 employees and surpassed the 10 million dollars in annual revenue mark. I am very proud of the work accomplished. Segment holds a very special place in my heart — the caliber of its employees and customers, and the elegance of its product is hard to surpass. With my resignation letter, I also compiled a list of my favorite moments:</p>
<h3 id="that-time-i-told-our-biggest-client-to-not-use-our-product">That time I Told Our Biggest Client To Not Use Our Product</h3>
<p>My favorite moment from my time working at Segment was the time I asked Peter for guidance and advice as we discovered that our largest paying client — who represented a significant portion of our annual revenue — could have a better and cheaper experience processing their data outside of Segment (for their particular use case). Without hesitation, Peter was 100% on board about exposing the client to the alternative solution outside of Segment that would make their data process cheaper and faster. Once I told the client their alternatives, they explained that even though there might be better fitting services for their use case, they would still choose Segment because “by having Segment in charge of that part of the business, we have nothing to worry about.”</p>
<h3 id="that-time-we-fired-a-client">That Time We Fired A Client</h3>
<p>I remember that one time we fired one of our few enterprise customers for being ungrateful, disrespectful, but mostly, that enterprise client got fired as a customer for throwing Jake and me under the bus.</p>
<h3 id="that-time-my-colleagues-were-being-amazing-every-day-really-">That Time My Colleagues Were Being Amazing (every day, really)</h3>
<p>I remember all those times that the engineering team would add instructions to a chat robot that completed some of my mundane tasks to make my workflow more efficient and exciting. And that the robot replied to me using Air Traffic Control lingo — as a reminder of one of my previous lives. I also remember the time off the coast of Mexico when (with other Segment ladies) we jumped off a boat to surf out in the open sea for the first time. And when we showed up that Sunday morning to the start line of the SF marathon and ran 13.1 miles for the first time. And when some of my colleagues showed up to the airport without knowing I was taking them to countries they’ve never been. And when we celebrated the 2014 holidays sitting on the empty floor of the new office, eating pizza and cheering with cans of soda. And when Peter explained the change on equity terms allowing employees ten years (instead of 90 days) to exercise our options.</p>
<p>For all these moments and people I’ve met, I am grateful. After spending the past three months building projects and javascripting under the direction of the fine folks at HackReactor, I am ready to take some time off from work. Until I welcome my first child into the world (in less than two months!), you will find me hacking in some project remotely from my living room.</p>
</content>
</entry>
<entry>
<title>What database should I use?</title>
<link href="http://blog.datasommelier.com/Document-Databases"/>
<id>http://blog.datasommelier.com/Document-Databases</id>
<updated>2017-10-07T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Use a relational database like PostgreSQL unless you have a specific reason not to</p>
</summary>
<content type="html"><p>Use a relational database like PostgreSQL unless you have a specific reason not to</p>
<h3 id="when-to-use-mongodb-">When to use MongoDB?</h3>
<p>Never.</p>
<p>MongoDB is a document-oriented database. These type of databases do not have the concept of tables or predefined columns; instead, document databases store data as a collection of records without schemas, structure or relations to other records. In MongoDB’s case, it stores the data under collections (instead of tables) where each row of the collection is one JSON object (without columns).</p>
<p>You’ll find many articles talking you out of using MongoDB, so let’s skip that part. Something important to note is that many of those articles are written by users that dislike MongoDB for the wrong reasons: they’ve tried to use a document database when dealing with relational data. And even though it is somewhat forgivable because schema design is hard, using document storage when dealing with relational data is a poor architectural choice (at least most of the time).</p>
<p>There are other articles telling you that you shouldn’t use MongoDB because it is a poorly engineered storage option run on marketing and not on technical merit. Security and support issues have been part of MongoDB since its inception making it a poor choice even if you do need a document store. Those are the better reasons to stay away from MongoDB.</p>
<p>Because most data is relational, use cases of databases that are designed to store, represent, or work with documents tend to be very specific. A search engine that is constantly executing queries that require more than matching an exact string is one example:</p>
<p><img src="https://i.imgur.com/QOcPX3r.png" alt=""></p>
<p>Notice how the results of querying “longboarding in bolinas” yields results that don’t exactly match the input string. If we had to join data from different tables (or even from the same table) in order to get those results, it could take a long time; instead, we can do a faster search by querying a document database that stores a ‘denormalized’ version of the data which could then be used to create normalized search indexes.</p>
<p>An example of the indexing process is when variants of a term are mapped to a single, standardized form. For instance mapping <em>“surfing”</em> and <em>“surfer”</em> to <em>“surf”</em> ignoring the distinction between <em>surfing</em> and <em>surfer</em>. This is why when using normalized search indexes, a query for <em>“longboarding in bolinas”</em> ([“longboard”, “bolinas”]) could yield its exact match <em>“longboarding in bolinas”</em> and also other results including similar phrases such as <em>“Continue on to Bolinas, which has stellar longboard waves…”</em>.</p>
<p>Full-text search engine queries could get very complex. The main record and all of its related searchable data should be in a single object — even if some of that related data would <em>normally</em> be part of a separate record. This means faster results because with one lookup you are able to get the applicable record instead of following relations.</p>
<h3 id="which-are-the-best-document-stores-">Which are the best document stores?</h3>
<p>It depends on your requirements. Cases where document storage is the right solution are almost always really specific, like the previous search example. It also just so happens that PostgreSQL can be used as a document store, because you can store and work with arbitrarily nested data (through a JSONB column and the associated json_* functions). It’s not something commonly used but it can be useful in some cases like for API scraping (so that you can store the original API response even if its exact format may vary over time, outside of your control or knowledge).</p>
<p>A better question would be something like “If I want to build a search engine, what are the best solutions?” in which case you might consider ElasticSearch (or the underlying tech, Lucene). The usual answer to “What database should I use?” is <strong>“Use an RDBMS like PostgreSQL unless you have a specific reason not to”</strong>.</p>
</content>
</entry>
<entry>
<title>Promises</title>
<link href="http://blog.datasommelier.com/promises"/>
<id>http://blog.datasommelier.com/promises</id>
<updated>2017-01-10T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>What are promises</p>
</summary>
<content type="html"><p>What are promises</p>
<p>A promise is an object that represents an async operation (process) that’s still in progress. The purpose of Promises is to make it easier to write sequential code.</p>
<pre><code class="lang-javascript">Promise.try(() =&gt; {
return taskOne();
}).then((result) =&gt; {
return taskTwo();
}).then((result) =&gt; {
return taskThree();
});
</code></pre>
<p>In the above snippet, even though there are technically 6 promises involved, only three implement real functionality:</p>
<ul>
<li>Promise.try(…) -&gt; Tracks the result of taskOne below</li>
<li>taskOne -&gt; <strong>Implements real functionality</strong></li>
<li>.then(…) -&gt; Propagate the <code>result</code></li>
<li>taskTwo -&gt; <strong>Implements real functionality</strong></li>
<li>.then(…) -&gt; Propagate the <code>result</code></li>
<li>taskThree -&gt; <strong>Implements real functionality</strong></li>
</ul>
<p>The order of execution:</p>
<pre><code>--- tick 0 ---
Promise.try called
taskOne called
a Promise is returned from taskOne
return that Promise from the Promise.try callback, Promise.try will now track it
first .then called (note, .then itself, *not* the callback) with a callback as argument
second .then called, with a callback as argument
--- tick 10 ---
taskOne Promise resolves
Promise.try Promise resolves
first .then callback called
taskTwo called
a Promise is returned from taskTwo
return that Promise from the first .then callback, the first .then will now track it
--- tick 20 ---
taskTwo Promise resolves
first .then Promise resolves
second .then callback called
taskThree called
a Promise is returned from taskThree
return that Promise from the second .then callback, the second .then will now track it
--- tick 30 ---
taskThree Promise resolves
second .then Promise resolves
--- the end ---
</code></pre><h3 id="what-s-the-difference-with-callbacks-">What’s the difference with callbacks?</h3>
<p>The main difference between Promises and callbacks is that a Promise is a placeholder object you get access to immediately that represents the in-progress operation; in a callback you have to immediately specify the next behaviour.</p>
<p>Nested callbacks also allow for writing sequential async code, but are more difficult to use right.
Similar functionality of the above snippet as a series of nested callbacks:</p>
<pre><code class="lang-javascript">taskOne((err, result) =&gt; {
taskTwo((err, result) =&gt; {
taskThree((err, result) =&gt; {
// ...
});
});
});
</code></pre>
<p>With error-first callbacks you have to <strong>immediately</strong> specify the next code to run when a task completes. This is not true for Promises. With Promises you are able to attach behaviour at any time including after the Promise has already resolved. This allows for a lot of improvements:</p>
<ul>
<li>Compose things easier - <code>Promise.all([promise1, promise2])</code> will produce a combined Promise that resolves when <code>promise1</code> and <code>promise2</code> both resolve. <code>Promise.all</code> wires up things internally, which it can do because it can specify behaviour at any time on <code>promise1</code> and <code>promise2</code>, including its own behaviour.</li>
<li>Propagate errors automatically - Because behaviour can be attached at any time, you can return a Promise from another Promise’s <code>.then</code> callback, and then that other Promise can wire up error propagation automatically.</li>
<li>The returned value isn’t the actual result, but it can be returned and passed around just as easily.</li>
</ul>
</content>
</entry>
<entry>
<title>Never forget to take your vitamins again</title>
<link href="http://blog.datasommelier.com/zapier-slack-smartthings"/>
<id>http://blog.datasommelier.com/zapier-slack-smartthings</id>
<updated>2016-09-21T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>SmartThings, IFTTT, Zapier, and Slack to the rescue</p>
</summary>
<content type="html"><p>SmartThings, IFTTT, Zapier, and Slack to the rescue</p>
<p>When I worked at Segment, we built bots to complete annoying and mundane tasks. Wishing for something similar at home, I built a series of triggers so that Slackbot records the last time I took my vitamins and then reminds me when I need to take them again.</p>
<p>To begin, I used a <a href="https://shop.smartthings.com/#!/products/samsung-smartthings-multipurpose-sensor">multipurpose sensor</a> from SmartThings (used to alert when a door or window opens or closes) and using <a href="https://smile.amazon.com/Duck-282116-Printed-Inches-Single/dp/B00CJGF6EI/ref=sr_1_1?ie=UTF8&amp;qid=1474492591&amp;sr=8-1&amp;keywords=duck+tape+owl">this</a> awesome duck date, I taped it to the vitamin bottle:</p>
<p><img src="https://cldup.com/BSgh11Oowo.png" alt=""></p>
<p>Next, for each time the sensor closes, I used an IFTTT recipe to trigger a Slack message telling me that I just took a vitamin:</p>
<p><img src="https://cldup.com/MYzYmI2Fg2.png" alt=""></p>
<p>I also had to set another trigger to a Google sheet that records the time the bottle closed (since I wasn’t able to get IFTTT to properly send a <code>/remind</code> message to Slack):</p>
<p><img src="https://cldup.com/1BVZAatq_m.png" alt=""></p>
<p>Lastly, I connected Zapier to that Google sheet and created a Zap to Slack with a <code>/remind me to take my vitamins in 12 hours</code> message:</p>
<p><img src="https://cldup.com/UW2pYYnmzn.png" alt=""></p>
<p>If I get the slack message when I’m not near my vitamins, I can just hit <code>remind me in 15 minutes</code> option, and get a new notification later. </p>
<p><img src="https://cldup.com/siwQI7CEIT.png" alt=""></p>
<p>This reminder has been so helpful that I am going to replicate a similar project using a Raspberry Pi and a sensor connected to the dog food container. That way both my husband and I know when the dawgs have been fed and unfortunately for them, prevent double dinners.</p>
</content>
</entry>
<entry>
<title>Newborn percentiles</title>
<link href="http://blog.datasommelier.com/Newborn-measurements"/>
<id>http://blog.datasommelier.com/Newborn-measurements</id>
<updated>2016-05-28T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>How did the nurse know that my son was born in the xth percentile?</p>
</summary>
<content type="html"><p>How did the nurse know that my son was born in the xth percentile?</p>
<p>When my son was born earlier this month, the nurse measured his length, head circumference, and weight. She then informed us that x percentage of male babies were born below his height. How did she know that?</p>
<p>The concept of a normal distribution is helpful for understanding this percentage. In a normal distribution, most observations are grouped around the middle; more precisely, 34% fall between the mean and +1 standard deviation, and the other 34% -1 standard deviation. Also, in a normal distribution, 50% of the observations fall below the mean, and the other half above the mean.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Empirical_Rule.PNG/675px-Empirical_Rule.PNG" alt=""></p>
<p>This is useful when calculating other interesting statistics, like how many individuals do we expect to fall below or above a particular point. If a baby is one standard deviation above the mean, then we know that such baby is at the 84th percentile (50% which correspond to all the individuals below the mean + 34% which correspond to all the individuals between the mean and one standard deviation).</p>
<p>In statistics, a z-score is equivalent to how many standard deviations your observation is from the mean. A positive score means that your observation lays above the mean, and a negative score means below the mean. If we have the following input:</p>
<ul>
<li>A child measures 53.34 cm of length at birth</li>
<li>The mean of length in newborn boys is <a href="http://www.who.int/childgrowth/standards/LFA_boys_0_13_zscores.pdf?ua=1">49.9 cm</a></li>
<li>The standard deviation of length in newborn boys is <a href="http://www.who.int/childgrowth/standards/LFA_boys_0_13_zscores.pdf?ua=1">1.89 cm</a></li>
</ul>
<p>(The above mean and standard deviation are taken from data at the World Health Organization - the leading organization promoting these percentiles)</p>
<p>Then we can calculate the z-score, like so:</p>
<p>[My Measure - Mean] / Standard Deviation = z-score</p>
<p>So for us, we get a z-score of 1.82 ( [53.34 - 49.9] / 1.89 = 1.82). Looking at a <a href="http://math.arizona.edu/~rsims/ma464/standardnormaltable.pdf">z-score chart</a>, we gather that 1.82 corresponds to 0.9641, which means that 96.41% of the population fall at or below this measure.</p>
<p>According to the World Health Organization the data for some measurements such as weight, is <a href="http://www.who.int/childgrowth/training/module_c_interpreting_indicators.pdf">not normally distributed</a> - it is right skewed (the right side - big babies - is longer than the left). Cross-checking this reference by looking at a different data set, from a different source -babies born only in the US- reveals a contradiction: the data set is symmetrical (a little kurtotic but symmetrical). Which makes me wonder about the WHO’s percentile obsession.</p>
<p>So they’ll tell you that your baby is in the 34% in weight, 96% of height, and 90% of brain circumference. So what? Does it all mean that my baby is tall, dark, handsome and wicked smart and because of that he is going to be accepted into daycare and then into Harvard, and then become POTUS? Sure, mom. Especially if you weigh him after a big meal. </p>
</content>
</entry>
<entry>
<title>Python and SQLAlchemy</title>
<link href="http://blog.datasommelier.com/Python-and-SQLAlchemy.md"/>
<id>http://blog.datasommelier.com/Python-and-SQLAlchemy.md</id>
<updated>2016-04-20T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Interacting with a SQL database using SQLAlchemy</p>
</summary>
<content type="html"><p>Interacting with a SQL database using SQLAlchemy</p>
<p>There are various packages and libraries that interact with SQL (SQLAlchemy, Django, pewee, SQLObject, Storm, pony) but the most popular and probably the best and most beautiful Python library ever written is SQLAlchemy.
SQLAlchemy allows you to write raw SQL directly and operate in SQL tables using them essentially as Python classes - basically, read and write data using SQL and then treat that data as a Python container.</p>
<h2 id="how-to-use-sqlalchemy-">How to use SQLAlchemy:</h2>
<ul>
<li>Create an engine</li>
<li>Define tables</li>
<li>Add instances</li>
<li>Query</li>
</ul>
<h3 id="create-an-engine-">Create an engine:</h3>
<p>Create an engine that establishes a connection with the database and sets the framework in order to make SQL requests. Like so:</p>
<pre><code>from sqlalchemy import create_engine
engine = create_engine(&#39;sqlite:////phonebook.db&#39;, echo=True)
</code></pre><h3 id="define-tables-">Define tables:</h3>
<p>Create a class call <code>Base</code> (using <code>declarative_base</code>) to define our various tables - SQLAlchemy will refer to this class to create the table schemas. Use the <code>__tablename__</code> member to name the table and <code>__repr__</code> method when printing a table row. Like so:</p>
<pre><code>from sqlalchemy import Column, Integer, String
class Phonebook(Base):
__tablename__ = &quot;friends&quot;
id = Column(Integer, primary_key=True)
name = Column(String)
email = Column(String)
phone = Column(String)
def __repr__(self):
return &quot;&lt;Phonebook(name=&#39;%s&#39;, email=&#39;%s&#39;, phone=&#39;%s&#39;)&gt;&quot;\
%(self.name, self.email, self.phone)
</code></pre><p>Now that we have the engine, use the metadata <code>create_all</code> method to actually create the database - if working on a mac, run this from the terminal using iPython, if working from a PC just append this line to the .py file that we are building here:</p>
<pre><code>Base.metadata.create_all(engine)
</code></pre><h3 id="add-instances-">Add instances:</h3>
<p>Establish a session in order to interact with the database, and create instances and rows so that we can add them to the session:</p>
<pre><code># create tables
Base.metadata.create_all(engine)
# establish a session
from sqlalchemy.orm import sessionmaker
session = sessionmaker(bind=engine)
session = session()
# create instances
# use ** to unpack the key-value pairs
james = Phonebook(**friends_numbers[0])
# example: how to add a single row
# add james&#39;s record to the Phonebook database
session.add(james)
session.new
# example: how to delete a single row
# delete james&#39;s record from the Phonebook database
session.expunge(james)
session.new
# create a list with our friend&#39;s numbers
friends_numbers = [
{&#39;name&#39;: &#39;James Rodriguez&#39;,
&#39;email&#39;: &#39;[email protected]&#39;,
&#39;phone&#39;: &#39;123-456-7890&#39;},
{&#39;name&#39;: &#39;Pibe Valderrama&#39;,
&#39;email&#39;: &#39;[email protected]&#39;,
&#39;phone&#39;: &#39;111-222-3333&#39;},
{&#39;name&#39;: &#39;Farid Mondragon&#39;,
&#39;email&#39;: &#39;[email protected]&#39;,
&#39;phone&#39;: &#39;222-333-4444&#39;}
]
# add all records from the friends_numbers list into the db
phonebook_rows = [Phonebook(**p) for p in friends_numbers]
session.add_all(phonebook_rows)
session.commit()
</code></pre><h3 id="query-the-database">Query the database</h3>
<p>Now you can go ahead a run any query you’d like. All you have to do is use the <code>session</code> method. Like so:</p>
<pre><code># query the database
# count how many records we have
print session.query(Phonebook).count()
# find James Rodriguez record using filter_by
friend = session.query(Phonebook).filter_by(name=&#39;James Rodriguez&#39;)
result = list(friend)
print result
</code></pre><p>For a list of all available methods, check SQLAlchemy documentation <a href="http://docs.sqlalchemy.org/en/latest/orm/query.html">here</a>.
The entire code for this article is available <a href="https://github.com/TheBecky/python_awesomeness/blob/master/python_sql.py">here</a>. Send me a note if you have any issues with the above code and would like help debugging [email protected]</p>
</content>
</entry>
<entry>
<title>When to ship without analytics</title>
<link href="http://blog.datasommelier.com/when-to-ship-without-analytics"/>
<id>http://blog.datasommelier.com/when-to-ship-without-analytics</id>
<updated>2016-01-21T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Never.</p>
</summary>
<content type="html"><p>Never.</p>
</content>
</entry>
<entry>
<title>Python and the Twitter API</title>
<link href="http://blog.datasommelier.com/Python-and-Twitter"/>
<id>http://blog.datasommelier.com/Python-and-Twitter</id>
<updated>2015-12-30T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Run basic text analytics on a collection of Tweets</p>
</summary>
<content type="html"><p>Run basic text analytics on a collection of Tweets</p>
<p>The Twitter API allows us to collect data based on a particular keyword, user handle, or hashtag. We can filter those results by language, geographic location, and either the latest, most popular tweets, or tweets within a specified timeframe. Also, the API provides some tools to run basic text analytics such as finding entities in a particular collection of tweets. To collect and run basic text analytics in a collection of tweets:</p>
<ul>
<li>Connect to the API</li>
<li>Search Tweets</li>
<li>Extract Tweet entities</li>
</ul>
<h2 id="connect-to-the-twitter-api-">Connect to the Twitter API:</h2>
<p>To establish a successful connection with the API we first need consumer and oAuth tokens from a newly created <a href="https://apps.twitter.com/">Twitter App</a>, like so:</p>
<pre><code>import twitter
def oauth_login():
CONSUMER_KEY = &#39;0pJAid2aqrRtgwe6dKvPAerp8b&#39;
CONSUMER_SECRET = &#39;rfrb0fbGgCvpf1sgtRd7OsrBCT7p8DPWuB8WpeLJ9LfelJW8sp&#39;
OAUTH_TOKEN = &#39;15648766-lxT6QBxMgp69gFDsef6FI4KqporqqvOyd4U5t4qD7&#39;
OAUTH_TOKEN_SECRET = &#39;KYdm5roVu2xMlo5asSDfs1LGHwYBRL0Gxi5IkXMRZLsuJR2&#39;
auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
twitter_api = twitter.Twitter(auth=auth)
return twitter_api
</code></pre><p>Make sure you substitue the tokens with your own credentials as the above keys are just place holders and will not work.</p>
<h2 id="search-tweets">Search Tweets</h2>
<p>Lets define two functions. One to find Tweets based in our criteria, and then, another function to select the most favorited Tweets from that collection:</p>
<pre><code># Language = English
# Result type = recent or popular
# count = how many tweets to return
# geocode = “latitude,longitude,radius”, for example, “&quot;37.781157,-122.398720,10mi&quot;”
# will limit the results for tweets within 10 miles of San Francisco
# Other parameters not used but available: until, since_id, max_id
def twitter_search(twitter_api, q, max_results=200, **kw):
search_results = twitter_api.search.tweets(q=q, count=100, lang=&#39;en&#39;, result_type=&#39;recent&#39;, geocode= &quot;37.781157,-122.398720,10mi&quot;, **kw)
statuses = search_results[&#39;statuses&#39;]
max_results = min(100, max_results)
for _ in range(10):
try:
next_results = search_results[&#39;search_metadata&#39;][&#39;next_results&#39;]
except KeyError as e:
break
kwargs = dict([ kv.split(&#39;=&#39;)
for kv in next_results[1:].split(&quot;&amp;&quot;) ])
search_results = twitter_api.search.tweets(**kwargs)
statuses += search_results[&#39;statuses&#39;]
if len(statuses) &gt; max_results:
break
return statuses
def find_popular_tweets(twitter_api, statuses, retweet_threshold=30):
return [ status
for status in statuses
if status[&#39;retweet_count&#39;] &gt; retweet_threshold ]
twitter_api = oauth_login()
q = &quot;surf&quot;
search_results = twitter_search(twitter_api, q, max_results=20)
popular_tweets = find_popular_tweets(twitter_api, search_results)
print (&quot;************POPULAR TWEETS*********************&quot;)
print (&quot;************(TWEET, RE_TWEET COUNT*************&quot;)
print (&quot;&quot;)
for tweet in popular_tweets:
print (tweet[&#39;text&#39;].encode(&#39;utf8&#39;), tweet[&#39;retweet_count&#39;])
</code></pre><h2 id="extract-tweet-entities">Extract Tweet entities</h2>
<p>Create a function that extracts Tweet entities such as user handles, hashtags, and URLs from our collection.</p>
<pre><code>def extract_tweet_entities(statuses):
if len(statuses) == 0:
return [], [], [], []
screen_names = [ user_mention[&#39;screen_name&#39;]
for status in statuses
for user_mention in status[&#39;entities&#39;][&#39;user_mentions&#39;] ]
hashtags = [ hashtag[&#39;text&#39;]
for status in statuses
for hashtag in status[&#39;entities&#39;][&#39;hashtags&#39;] ]
urls = [ url[&#39;expanded_url&#39;]
for status in statuses
for url in status[&#39;entities&#39;][&#39;urls&#39;] ]
return screen_names, hashtags, urls
# provides the entity and the frequency for each collection
def get_common_tweet_entities(statuses, entity_threshold=3):
tweet_entities = [ e
for status in statuses
for entity_type in extract_tweet_entities([status])
for e in entity_type
]
c = Counter(tweet_entities).most_common()
return [ (k,v)
for (k,v) in c
if v &gt;= entity_threshold
]
twitter_api = oauth_login()
q = &quot;surf&quot;
search_results = twitter_search(twitter_api, q, max_results=20)
popular_tweets = find_popular_tweets(twitter_api, search_results)
statuses = twitter_search(twitter_api, q)
screen_names, hashtags, urls = extract_tweet_entities(statuses)
print (&quot;************Tweet Entities*********************&quot;)
print (&quot;***********************************************&quot;)
print (&quot;&quot;)
print (&quot;**************TOP 50 HANDLES*******************&quot;)
# json.dumps([dict(mpn=pn) for pn in lst])
print (json.dumps(screen_names[0:50], indent=1))
print (&quot;&quot;)
print (&quot;**************TOP 50 HASHTAGS*****************&quot;)
print(json.dumps(hashtags[0:50], indent=1))
print (&quot;&quot;)
print (&quot;**************TOP 50 URLs*********************&quot;)
print(json.dumps(urls[0:50], indent=1))
print (&quot;&quot;)
common_entities = get_common_tweet_entities(search_results)
print (&quot;*****************************************************&quot;)
print (&quot;************Most Common Entities*********************&quot;)
print (common_entities)
print (&quot;*****************************************************&quot;)
print (&quot;*****************************************************&quot;)
# calculate average number of words per tweet:
def analyze_tweet_content(statuses):
if len(statuses) == 0:
print (&quot;No statuses to analyze&quot;)
return
def average_words(statuses):
total_words = sum([ len(s.split()) for s in statuses ])
return 1.0*total_words/len(statuses)
status_texts = [ status[&#39;text&#39;] for status in statuses ]
screen_names, hashtags, urls, _ = extract_tweet_entities(statuses)
words = [ w
for t in status_texts
for w in t.split() ]
print (&quot;Averge words per tweet:&quot;, average_words(status_texts))
</code></pre><p>To find the account that owns the most favorited tweets of a person:</p>
<pre><code># Analyze a user&#39;s favorite tweets. Insert user handle
analyze_tweet_content(search_results)
print (&quot;*****************************************************&quot;)
print (&quot;*****************************************************&quot;)
analyze_favorites(twitter_api, &quot;theebecky&quot;)
</code></pre><p>The full code can be found <a href="">here</a></p>
</content>
</entry>
<entry>
<title>When to sign in with your Google account</title>
<link href="http://blog.datasommelier.com/Sign-in-with-Google-Github-or-Twitter"/>
<id>http://blog.datasommelier.com/Sign-in-with-Google-Github-or-Twitter</id>
<updated>2015-10-10T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Or Twitter, Facebook or any other account that provides the service</p>
</summary>
<content type="html"><p>Or Twitter, Facebook or any other account that provides the service</p>
<p>Some sites require passwords to have at least 6 characters, others require at least one number and a character (but make sure that the character is not a \ ), and others require you to have both a capital and a lower case letter… and make sure that the capital letter is not at the begining or at the end of the password. Oh, also make sure your password is unique to the site, and that you change it often. </p>
<p><img src="http://i.imgur.com/fUMJpI1.png" alt=""></p>
<p>Inevitably, we all end up creating just one password. One really good password, you know, something like <code>pAssw0rd!</code> and then use that REALLY good password for every site. And never change it. </p>
<p>One problem with this technique is that if a site’s database is compromised, access to all other sites is compromised. We’ve heard that a thousand times already, however, and to most of us, the risk outweighs the annoyance. Note: If you click on “Forgot password” on any site and get an email with your actual password, run away from that service - your data is being handled by amateurs. If the site is able to send an email with the password you created, it means that the site is either saving your password in plain text, or is storing a simple encryption of your password. </p>
<p>Storing plaintext or encrypted passwords with a simple key are the easiest methods for developers to handle passwords, and also the easiest to access by an attacker. A safer way for sites to deal with passwords is by using a salted version of the <a href="https://crackstation.net/hashing-security.htm">hashed password</a> or by using an encryption algorithm such as <a href="http://bcrypt.sourceforge.net/">bcryt</a> both of which are more involved processes from the developer standpoint. <strong>The safest way for sites to deal with passwords is to not deal with passwords at all</strong>. </p>
<p>If you use your one really good password on every site, you will be better off logging in with either Google, Github, Twitter or any other site that offers a login service—in particular if your really good password is what you also use to login into any of those sites. The service is called oAuth. oAuth exchanges security keys between the new site and Google, Github or Twitter and they handle the logging without involving the creation of new passwords. Sometimes the new sites will ask your permission to export into their site additional data from Google, usually, your email address and avatar. Some more creepy services ask for permission to access your contacts, demographic and geographic data, and others ask your permission to actually act on your behalf into the service (i.e. Tweet for you). </p>
<p>Initially, you may be confused why Google, Github, or Twitter would offer a free service to handle logins through oAuth. Altruism? One motivation is to leverage as many services as possible to your account in order to reduce the chances of you churning and closing that account. For Google, the motivation might be more geared towards tracking which apps you use and how often. A newer and more interesting oAuth service provider is the banking industry. Now you can use your American Express login credentials to sign up to AirBnb, granting AirBnb access some data from your Amex account. When using oAuth is always important to check what data access or permissions you are granting to the new site.</p>
<p><img src="http://i.imgur.com/LxgZ0Ry.png" alt=""></p>
<p>Unless the new site is asking for too much data, for me, the benefits from oAuth almost always outweighs the risk of compromising my REALLY good password. Even if that means granting access to additional basic data points so that Google can have a clearer picture of “Who you are to Google.”</p>
<p>To review the list of authorized applications on each of your accounts go to: <a href="https://security.google.com/settings/security/permissions">Google</a>, <a href="https://github.com/settings/applications">Github</a>, and <a href="https://twitter.com/settings/applications">Twitter</a>.</p>
<p>Coming next: Who you are to Google </p>
</content>
</entry>
<entry>
<title>Build a simple website and implement Segment</title>
<link href="http://blog.datasommelier.com/Implement-Segment"/>
<id>http://blog.datasommelier.com/Implement-Segment</id>
<updated>2015-07-07T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>5 steps. Completion time 45 minutes</p>
</summary>
<content type="html"><p>5 steps. Completion time 45 minutes</p>
<p>Note: The tutorial that follows is intended to encourage active learning for a non technical audience.</p>
<h2 id="1-create-an-account">1. Create an Account</h2>
<p>Create an account in GitHub (<a href="https://github.com/new">https://github.com/new</a>) or sign in. If you are creating a new account, make sure to confirm your email address (go to your email and click on the confirm button on the message from Github). Some files will not update later if you haven’t confirmed your email address.</p>
<h2 id="2-create-a-repo">2. Create a Repo</h2>
<p>Create a new repository (click on the black + sign on the top right) and call it <code>yourusername.github.io</code> . Since we are using Github pages to host this site, it is very important to follow the exact naming convention <strong>[YOUR USERNAME].github.io</strong>. In my case, my new repository is called <code>beckyjaimes.github.io</code>, check the box that initializes the repo with a README file, and click on the “Create repository” button.
<img src="https://cloudup.com/ch3rFGk5D3P+" alt=""></p>
<h2 id="3-create-an-index-html-file">3. Create an index.html file</h2>
<p>Create a new file by clicking on the blue + symbol next to your new repository’s name and call it index.html (this file will contain the code for your website’s default page)</p>
<p><img src="https://cloudup.com/c0cFaR-2xwe+" alt=""></p>
<p>On that <code>index.html</code> file, insert the following text, and commit that file by clicking on the green “Commit” button:</p>
<pre><code class="lang-html">&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Home Questionaire&lt;/title&gt;
&lt;!--Placeholder for Google Analytics Snippet --&gt;
&lt;!--Placeholder for MixPanel Snippet --&gt;
&lt;!--Placeholder for KissMetrics Snippet --&gt;
&lt;!--Placeholder for Segment Snippet --&gt;
&lt;!--Placeholder for index.css link --&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;br&gt;
&lt;h1&gt;What is your favorite place to travel?&lt;/h1&gt;
&lt;p&gt;I am building a directory of the sweetest travel destinations.&lt;/p&gt;
&lt;br&gt;&lt;br&gt;
&lt;form name=&quot;travel&quot; onsubmit=&quot;identify(event)&quot;&gt;
What is your favorite travel destination?
&lt;br&gt;&lt;br&gt;
&lt;input name=&quot;destination&quot; required=&quot;&quot; size=&quot;81&quot; type=&quot;text&quot;/&gt;
&lt;br&gt;&lt;br&gt;&lt;br&gt;
Any reccomendations (cool things to do, places to visit or restaurants to eat)?
&lt;br&gt;&lt;br&gt;
&lt;textarea cols=&quot;81&quot; name=&quot;details&quot; required=&quot;&quot; rows=&quot;10&quot;&gt;&lt;/textarea&gt;
&lt;br&gt;&lt;br&gt;
Name:
&lt;input name=&quot;fullname&quot; required=&quot;&quot; size=&quot;75&quot; type=&quot;text&quot;/&gt;
&lt;br&gt;&lt;br&gt;
Email:
&lt;input name=&quot;email&quot; required=&quot;&quot; size=&quot;75&quot; type=&quot;email&quot;/&gt;
&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;input name=&quot;submit&quot; type=&quot;submit&quot; value=&quot;submit&quot;/&gt;
&lt;br&gt;&lt;br&gt;
&lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;
</code></pre>
<p>To make sure everything is working this far, on a separate window, navigate to yourusername.github.io. You should be able to see your new website (it might take a few minutes). Should look similar to this:</p>
<p><img src="https://cloudup.com/c7XrmvISWfJ+" alt=""></p>
<h2 id="4-implement-segment">4. Implement Segment</h2>
<p>Create an account on <a href="https://segment.com">Segment</a> and a new project by selecting “new project” from the top right dropdown menu:
<img src="https://cloudup.com/czuUe3gBBTj+" alt="">
Click on the “install a library in your site or mobile app” option (or select it by clicking on the 6th icon called “setup project”)</p>
<p><img src="https://cloudup.com/co7DtWt5aOO+" alt=""></p>
<p>Copy the full snippet from the box on your segment dashboard, go back to your <code>index.html</code> file, select the pencil to edit, and replace the line that reads <code>&lt;!--Placeholder for Segment Snippet --&gt;</code> (mine is line 8) with the Segment snippet.</p>
<p>Your new file should look like this:
<img src="https://cloudup.com/ckTqBYoDp7L+" alt=""></p>
<h2 id="5-identify-users-and-track-an-event">5. Identify Users and Track an Event</h2>
<p>Identify those users that submit a destination. To do so, we created a little function that captures the input from the form, and sends some of that data as traits in an identify call. While we are at it, lets also send an event using the .track method called “destination submitted.” We are going to do that on the same index.html file, so if you haven’t committed your changes yet (if you have, just open to edit the index.html file again), scroll down to the line after the <code>&lt;/form&gt;</code> tag (mine is line 38) and insert the following text.</p>
<pre><code>&lt;script type=&quot;text/javascript&quot;&gt;
function identify(e){
e.preventDefault();
var form = e.target;
var email = form[&quot;email&quot;].value;
var fullname = form[&quot;fullname&quot;].value;
var destination = form[&quot;destination&quot;];
var details = form[&quot;details&quot;].value;
var user = {email: email, name: fullname, destination: destination, details: details};
analytics.identify(email, {email: email, name: fullname});
analytics.track(&#39;destination submitted&#39;, user, function() {
window.location.href = &quot;&quot;;
});
}
&lt;/script&gt;
</code></pre><p>Your index.html file should contain code similar to the one found <a href="https://gist.github.com/TheBecky/76eaa40b43a82a900c82">here</a> (with your project’s Segment write key in line 10). Commit the changes. </p>
<p>Go back to your website (refresh to make sure all changes have been loaded) and submit a travel recommendation form.
<img src="https://cloudup.com/cdWxA9BwOdr+" alt=""></p>
<p>Go to your debugger on the Segment’s dash. You should be able to see the following 3 calls:</p>
<p><img src="https://cloudup.com/c245KeijI5E+" alt=""></p>
<p>Enable Google Analytics, MixPanel and KissMetrics. You might have to create an account on each of those services to get the site IDs, tokens or keys necessary to input in Segment’s dash.</p>
<p>If something is not right and you triple checked that your code is similar to the one <a href="https://gist.github.com/TheBecky/76eaa40b43a82a900c82">here</a> (and you just created a Github account only to complete this project), make sure that you have confirmed your email address on that Github email.</p>
<h2 id="6-bonus-step">6. Bonus Step</h2>
<p>Right now the debugger is showing “/” on our page call. This is because we are in the home page - if we were in another page, the debugger should show /blog or /countries or whatever path you are in. Wouldn’t it be nice that the page call had the actual title of the page instead of the path? To do that, you just have to replace <code>analytics.page()</code> in your index.html file (mine is in line 11) with <code>analytics.page(document.title)</code>.</p>
<p>Your calls should now look like this:</p>
<p><img src="https://cloudup.com/cbaLOR5Jjjb+" alt=""></p>
<p>Congratulations! You just finished Module 1 of this training series.</p>
<p>One thing worth pointing out is that for simplicity purposes we didn’t follow Segment’s best practices when assigning a random userID for each user. Instead, we sent our user’s email address as the userID and that will not make Sperandio proud. The reason that sending the email is less than ideal, is because users can have many email addresses and we dont want to count one user multiple times.</p>
<p>In module 2 we will integrate with Optimizely and Keen. Optimizely is an interesting one, as its the only integration that requires their snippet to also be loaded into our page.</p>
</content>
</entry>
<entry>
<title>ASCII, Unicode and alphanumeric lists</title>
<link href="http://blog.datasommelier.com/ASCII-and-Unicode"/>
<id>http://blog.datasommelier.com/ASCII-and-Unicode</id>
<updated>2015-02-23T00:00:00.000Z</updated>
<author>
<name>Becky Jaimes</name>
</author>
<summary type="html"><p>Intricacies of native sorting</p>
</summary>
<content type="html"><p>Intricacies of native sorting</p>
<p>Using native sorting functions to order lists that contain alphanumeric strings will <em>most of the time</em> yield unexpected results. For example, when using a program to order a list of flights in ascending order, the results are most likely to resemble the following:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Flight</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">UA1006</td>
</tr>
<tr>
<td style="text-align:left">UA1008</td>
</tr>
<tr>
<td style="text-align:left">UA1009</td>
</tr>
<tr>
<td style="text-align:left">UA289</td>
</tr>
<tr>
<td style="text-align:left">UA327</td>
</tr>
<tr>
<td style="text-align:left">UA3444</td>
</tr>
<tr>
<td style="text-align:left">UA5445</td>
</tr>
<tr>
<td style="text-align:left">UA570</td>
</tr>
</tbody>
</table>
<p>Most text editors serve as binary-to-symbol translators. In this case, the above table was typed on a text editor and was stored in some location as a set of ASCII, Unicode or some other type of binary values. The result of sorting that translation is a “lexical” sort meaning that “1006” comes before “289”.</p>
<h6 id="what-are-ascii-values-">What are ASCII Values?</h6>
<p>ASCII stands for American Standard Code for Information Interchange. ASCII is a binary file format that assigns a 7-bit binary value (a string of seven 0s and 1s) to alphanumeric, numeric and some special characters — most keys on the standard keyboard.</p>
<p>ASCII code was first introduced in the early 1960’s and was based on a system used in telegraphy. In ASCII, numbers <strong>0</strong> to <strong>9</strong> are assigned smaller values than upper case letters <strong>A</strong> to <strong>Z</strong>, and upper case letters are assigned smaller values than lower case letters <strong>a</strong> to <strong>z</strong>.</p>
<p>This type of sorting is efficient for machines and produces reasonable results when using the English alphabet. However, since ASCII is American centric it presents limitations for Asian languages and other languages that contain characters outside of the English alphabet.</p>
<h6 id="unicode">Unicode</h6>
<p>Unicode is a newer (1987) standard for representing characters. It also stores characters in binary form, but rather than using only 7-bits, Unicode is based in 16-bit binary values which allows more than 1,000,000 possible character representations (as opposed to ASCII’s 128 character limit). Currently, Unicode includes characters from most popular languages and preserved the order of the first 128 ASCII characters. Although many languages were originally based on ASCII (Python, R, C and C++), Unicode has been implemented in many recent technologies and supported by most languages.</p>
<p>When sorting lists that allow alphanumeric characters on a string, in most programs, is necessary to define a custom sorting algorithm that does not relay in the order of individual character’s codes - some places refer to it as “natural sorting.” A comprehensive list of natural sorting algorithms in most languages can be found <a href="http://rosettacode.org/wiki/Natural_sorting">here</a>.</p>
</content>
</entry>
</feed>