Skip to content

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

License

Notifications You must be signed in to change notification settings

zakiindra/applied-ml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

applied-ml

Curated papers, articles, and blogs on data science & machine learning in production. ⚙️

contributions welcome Summaries HitCount

Figuring out how to implement your ML project? Learn how other organizations did it:

  • How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
  • What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
  • Why it works, the science behind it with research, literature, and references 📂
  • What real-world results were achieved (so you can better assess ROI ⏰💰📈)

P.S., Want a summary of ML advancements? 👉ml-surveys

P.P.S, Looking for guides and interviews on applying ML? 👉applyingML

Table of Contents

  1. Data Quality
  2. Data Engineering
  3. Data Discovery
  4. Feature Stores
  5. Classification
  6. Regression
  7. Forecasting
  8. Recommendation
  9. Search & Ranking
  10. Embeddings
  11. Natural Language Processing
  12. Sequence Modelling
  13. Computer Vision
  14. Reinforcement Learning
  15. Anomaly Detection
  16. Graph
  17. Optimization
  18. Information Extraction
  19. Weak Supervision
  20. Generation
  21. Audio
  22. Validation and A/B Testing
  23. Model Management
  24. Efficiency
  25. Ethics
  26. Infra
  27. MLOps Platforms
  28. Practices
  29. Team Structure
  30. Fails

Data Quality

  1. Monitoring Data Quality at Scale with Statistical Modeling Uber
  2. An Approach to Data Quality for Netflix Personalization Systems Netflix
  3. Automating Large-Scale Data Quality Verification (Paper)Amazon
  4. Meet Hodor — Gojek’s Upstream Data Quality Tool Gojek
  5. Reliable and Scalable Data Ingestion at Airbnb Airbnb
  6. Data Management Challenges in Production Machine Learning (Paper) Google
  7. Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) Facebook

Data Engineering

  1. Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb
  2. Sputnik: Airbnb’s Apache Spark Framework for Data Engineering Airbnb
  3. Unbundling Data Science Workflows with Metaflow and AWS Step Functions Netflix
  4. How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand DoorDash
  5. Revolutionizing Money Movements at Scale with Strong Data Consistency Uber
  6. Zipline - A Declarative Feature Engineering Framework Airbnb
  7. Automating Data Protection at Scale, Part 1 (Part 2) Airbnb
  8. Real-time Data Infrastructure at Uber Uber

Data Discovery

  1. Amundsen — Lyft’s Data Discovery & Metadata Engine Lyft
  2. Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) Lyft
  3. Amundsen: One Year Later Lyft
  4. Using Amundsen to Support User Privacy via Metadata Collection at Square Square
  5. Discovery and Consumption of Analytics Data at Twitter Twitter
  6. Democratizing Data at Airbnb Airbnb
  7. Databook: Turning Big Data into Knowledge with Metadata at Uber Uber
  8. Turning Metadata Into Insights with Databook Uber
  9. Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) Netflix
  10. Exploring Data @ Netflix (Code) Netflix
  11. DataHub: A Generalized Metadata Search & Discovery Tool (Code) LinkedIn
  12. DataHub: Popular Metadata Architectures Explained LinkedIn
  13. How We Improved Data Discovery for Data Scientists at Spotify Spotify
  14. How We’re Solving Data Discovery Challenges at Shopify Shopify
  15. Nemo: Data discovery at Facebook Facebook
  16. Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) Apache
  17. Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code) WeWork

Feature Stores

  1. Introducing Feast: An Open Source Feature Store for Machine Learning (Code) Gojek
  2. Feast: Bridging ML Models and Data Gojek
  3. Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression DoorDash
  4. Building Riviera: A Declarative Real-Time Feature Engineering Framework DoorDash
  5. Michelangelo Palette: A Feature Engineering Platform at Uber Uber
  6. Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory Uber
  7. Distributed Time Travel for Feature Generation Netflix
  8. Fact Store at Scale for Netflix Recommendations Netflix
  9. The Architecture That Powers Twitter's Feature Store Twitter
  10. Building the Activity Graph, Part 2 (Feature Storage Section) LinkedIn
  11. Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed LinkedIn
  12. Accelerating Machine Learning with the Feature Store Service Condé Nast
  13. Building a Feature Store Monzo Bank
  14. Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb
  15. ML Feature Serving Infrastructure at Lyft Lyft
  16. Butterfree: A Spark-based Framework for Feature Store Building (Code) QuintoAndar

Classification

  1. High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) LinkedIn
  2. Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) Walmart
  3. Deep Learning: Product Categorization and Shelving Walmart
  4. Large-scale Item Categorization for e-Commerce (Paper) DianPing, eBay
  5. Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) NAVER
  6. Categorizing Products at Scale Shopify
  7. Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) Google
  8. Discovering and Classifying In-app Message Intent at Airbnb Airbnb
  9. How We Built the Good First Issues Feature GitHub
  10. Teaching Machines to Triage Firefox Bugs Mozilla
  11. Testing Firefox More Efficiently with Machine Learning Mozilla
  12. Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) Microsoft
  13. Prediction of Advertiser Churn for Google AdWords (Paper) Google
  14. Scalable Data Classification for Security and Privacy (Paper) Facebook
  15. Uncovering Online Delivery Menu Best Practices with Machine Learning DoorDash
  16. Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging DoorDash

Regression

  1. Using Machine Learning to Predict Value of Homes On Airbnb Airbnb
  2. Using Machine Learning to Predict the Value of Ad Requests Twitter
  3. Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) Netflix
  4. Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment DoorDash

Forecasting

  1. Forecasting at Uber: An Introduction Uber
  2. Engineering Extreme Event Forecasting at Uber with RNN Uber
  3. Transforming Financial Forecasting with Data Science and Machine Learning at Uber Uber
  4. Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) Uber
  5. Under the Hood of Gojek’s Automated Forecasting Tool Gojek
  6. BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) Google
  7. Retraining Machine Learning Models in the Wake of COVID-19 DoorDash
  8. Managing Supply and Demand Balance Through Machine Learning DoorDash
  9. Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) Atlassian
  10. Greykite: A flexible, intuitive, and fast forecasting library LinkedIn

Recommendation

  1. Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) Amazon
  2. Temporal-Contextual Recommendation in Real-Time (Paper) Amazon
  3. P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) Amazon
  4. Recommending Complementary Products in E-Commerce Push Notifications (Paper) Alibaba
  5. Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) Alibaba
  6. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) Alibaba
  7. TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) Alibaba
  8. PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) Alibaba
  9. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) Alibaba
  10. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) Alibaba
  11. Controllable Multi-Interest Framework for Recommendation (Paper) Alibaba
  12. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) Alibaba
  13. ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) Alibaba
  14. Session-based Recommendations with Recurrent Neural Networks (Paper) Telefonica
  15. How 20th Century Fox uses ML to predict a movie audience (Paper) 20th Century Fox
  16. Deep Neural Networks for YouTube Recommendations YouTube
  17. Personalized Recommendations for Experiences Using Deep Learning TripAdvisor
  18. E-commerce in Your Inbox: Product Recommendations at Scale (Paper) Yahoo
  19. Powered by AI: Instagram’s Explore recommender system Facebook
  20. Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) Netflix
  21. Learning a Personalized Homepage Netflix
  22. Artwork Personalization at Netflix Netflix
  23. To Be Continued: Helping you find shows to continue watching on Netflix Netflix
  24. Calibrated Recommendations (Paper) Netflix
  25. Marginal Posterior Sampling for Slate Bandits (Paper) Netflix
  26. Food Discovery with Uber Eats: Recommending for the Marketplace Uber
  27. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber
  28. How Music Recommendation Works — And Doesn’t Work Spotify
  29. Music recommendation at Spotify Spotify
  30. Recommending Music on Spotify with Deep Learning Spotify
  31. For Your Ears Only: Personalizing Spotify Home with Machine Learning Spotify
  32. Reach for the Top: How Spotify Built Shortcuts in Just Six Months Spotify
  33. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) Spotify
  34. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) Spotify
  35. The Evolution of Kit: Automating Marketing Using Machine Learning Shopify
  36. Using Machine Learning to Predict what File you Need Next (Part 1) Dropbox
  37. Using Machine Learning to Predict what File you Need Next (Part 2) Dropbox
  38. Personalized Recommendations in LinkedIn Learning LinkedIn
  39. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) LinkedIn
  40. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) LinkedIn
  41. Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)LinkedIn
  42. Building a Heterogeneous Social Network Recommendation System LinkedIn
  43. How TikTok recommends videos #ForYou ByteDance
  44. A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) Twitter
  45. Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) Twitter
  46. Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) Google
  47. Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) Google
  48. Self-supervised Learning for Large-scale Item Recommendations (Paper) Google
  49. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) Google
  50. Personalized Channel Recommendations in Slack Slack
  51. Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) Google
  52. Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) ByteDance
  53. Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) Tencent
  54. Using AI to Help Health Experts Address the COVID-19 Pandemic Facebook
  55. A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) Home Depot
  56. Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) Ikea
  57. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) Pinterest
  58. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads Pinterest
  59. Multi-task Learning for Related Products Recommendations at Pinterest Pinterest
  60. Improving the Quality of Recommended Pins with Lightweight Ranking Pinterest
  61. Advertiser Recommendation Systems at Pinterest Pinterest
  62. Personalized Cuisine Filter Based on Customer Preference and Local Popularity DoorDash
  63. How We Built a Matchmaking Algorithm to Cross-Sell Products Gojek
  64. On YouTube's Recommendation System YouTube

Search & Ranking

  1. Amazon Search: The Joy of Ranking Products (Paper, Video, Code) Amazon
  2. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) Amazon
  3. Semantic Product Search (Paper) Amazon
  4. QUEEN: Neural query rewriting in e-commerce (Paper) Amazon
  5. Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) Amazon
  6. Seasonal relevance in e-commerce search (Paper) Amazon
  7. How Lazada Ranks Products to Improve Customer Experience and Conversion Lazada
  8. Using Deep Learning at Scale in Twitter’s Timelines Twitter
  9. Machine Learning-Powered Search Ranking of Airbnb Experiences Airbnb
  10. Applying Deep Learning To Airbnb Search (Paper) Airbnb
  11. Managing Diversity in Airbnb Search (Paper) Airbnb
  12. Improving Deep Learning for Airbnb Search (Paper) Airbnb
  13. Ranking Relevance in Yahoo Search (Paper) Yahoo
  14. An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) Etsy
  15. Learning to Rank Personalized Search Results in Professional Networks (Paper) LinkedIn
  16. Entity Personalized Talent Search Models with Tree Interaction Features (Paper) LinkedIn
  17. In-session Personalization for Talent Search (Paper) LinkedIn
  18. The AI Behind LinkedIn Recruiter Search and recommendation systems LinkedIn
  19. Learning Hiring Preferences: The AI Behind LinkedIn Jobs LinkedIn
  20. Quality Matches Via Personalized AI for Hirer and Seeker Preferences LinkedIn
  21. Understanding Dwell Time to Improve LinkedIn Feed Ranking LinkedIn
  22. Ads Allocation in Feed via Constrained Optimization (Paper, Video) LinkedIn
  23. Talent Search and Recommendation Systems at LinkedIn (Paper) LinkedIn
  24. Understanding Dwell Time to Improve LinkedIn Feed Ranking LinkedIn
  25. AI at Scale in Bing Microsoft
  26. Query Understanding Engine in Traveloka Universal Search Traveloka
  27. The Secret Sauce Behind Search Personalisation Gojek
  28. Food Discovery with Uber Eats: Building a Query Understanding Engine Uber
  29. Neural Code Search: ML-based Code Search Using Natural Language Queries Facebook
  30. Bayesian Product Ranking at Wayfair Wayfair
  31. COLD: Towards the Next Generation of Pre-Ranking System (Paper) Alibaba
  32. Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) Alibaba
  33. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) Alibaba
  34. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) Alibaba
  35. Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) Alibaba
  36. Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search Alibaba
  37. Understanding Searches Better Than Ever Before (Paper) Google
  38. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) Pinterest
  39. Driving Shopping Upsells from Pinterest Search Pinterest
  40. GDMix: A Deep Ranking Personalization Framework (Code) LinkedIn
  41. Bringing Personalized Search to Etsy Etsy
  42. Building a Better Search Engine for Semantic Scholar Allen Institute for AI
  43. Query Understanding for Natural Language Enterprise Search (Paper) Salesforce
  44. How We Used Semantic Search to Make Our Search 10x Smarter Tokopedia
  45. Powering Search & Recommendations at DoorDash DoorDash
  46. Things Not Strings: Understanding Search Intent with Better Recall DoorDash
  47. Query Understanding for Surfacing Under-served Music Content (Paper) Spotify
  48. How We Built A Context-Specific Bidding System for Etsy Ads Etsy
  49. Query2vec: Search query expansion with query embeddings GrubHub
  50. Embedding-based Retrieval in Facebook Search (Paper) Facebook
  51. Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) JD
  52. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search Baidu
  53. Pre-trained Language Model based Ranking in Baidu Search (Paper) Baidu

Embeddings

  1. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) Alibaba
  2. Embeddings@Twitter Twitter
  3. Listing Embeddings in Search Ranking (Paper) Airbnb
  4. Understanding Latent Style Stitch Fix
  5. Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) LinkedIn
  6. Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) Moshbit
  7. Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) Sears
  8. Machine Learning for a Better Developer Experience Netflix
  9. Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) Google
  10. Personalized Store Feed with Vector Embeddings DoorDash
  11. Embedding-based Retrieval at Scribd Scribd

Natural Language Processing

  1. Abusive Language Detection in Online User Content (Paper) Yahoo
  2. How Natural Language Processing Helps LinkedIn Members Get Support Easily LinkedIn
  3. Building Smart Replies for Member Messages LinkedIn
  4. DeText: A deep NLP Framework for Intelligent Text Understanding (Code) LinkedIn
  5. Smart Reply: Automated Response Suggestion for Email (Paper) Google
  6. Gmail Smart Compose: Real-Time Assisted Writing (Paper) Google
  7. SmartReply for YouTube Creators Google
  8. Using Neural Networks to Find Answers in Tables (Paper) Google
  9. A Scalable Approach to Reducing Gender Bias in Google Translate Google
  10. Assistive AI Makes Replying Easier Microsoft
  11. AI Advances to Better Detect Hate Speech Facebook
  12. A State-of-the-Art Open Source Chatbot (Paper) Facebook
  13. A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs Facebook
  14. Deep Learning to Translate Between Programming Languages (Paper, Code) Facebook
  15. Deploying Lifelong Open-Domain Dialogue Learning (Paper) Facebook
  16. Introducing Dynabench: Rethinking the way we benchmark AI Facebook
  17. Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) Facebook
  18. Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) Amazon
  19. How Gojek Uses NLP to Name Pickup Locations at Scale Gojek
  20. Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want Stitch Fix
  21. The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) Baidu
  22. PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) Google
  23. Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) Salesforce
  24. GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) Salesforce
  25. Applying Topic Modeling to Improve Call Center Operations RICOH
  26. WIDeText: A Multimodal Deep Learning Framework Airbnb
  27. How we reduced our text similarity runtime by 99.96% Microsoft
  28. Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models) Facebook

Sequence Modelling

  1. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)Alibaba
  2. Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) Alibaba
  3. Deep Learning for Electronic Health Records (Paper) Google
  4. Deep Learning for Understanding Consumer Histories (Paper) Zalando
  5. Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) Telefonica
  6. Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) Sutter Health
  7. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) Sutter Health
  8. How Duolingo uses AI in every part of its app Duolingo
  9. Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) Facebook

Computer Vision

  1. Categorizing Listing Photos at Airbnb Airbnb
  2. Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb Airbnb
  3. Powered by AI: Advancing product understanding and building new shopping experiences Facebook
  4. New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) Facebook
  5. Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning Dropbox
  6. How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors Deepomatic
  7. A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) Google
  8. Machine Learning-based Damage Assessment for Disaster Relief (Paper) Google
  9. RepNet: Counting Repetitions in Videos (Paper) Google
  10. Converting Text to Images for Product Discovery (Paper) Amazon
  11. How Disney Uses PyTorch for Animated Character Recognition Disney
  12. Image Captioning as an Assistive Technology (Video) IBM
  13. AI for AG: Production machine learning for agriculture Blue River
  14. AI for Full-Self Driving at Tesla Tesla
  15. On-device Supermarket Product Recognition Google
  16. Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) Google
  17. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) Pinterest
  18. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) Google
  19. Vision-based Price Suggestion for Online Second-hand Items (Paper) Alibaba
  20. Making machines recognize and transcribe conversations in meetings using audio and video Microsoft
  21. An Efficient Training Approach for Very Large Scale Face Recognition (Paper) Alibaba
  22. Identifying Document Types at Scribd Scribd
  23. Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) Walmart

Reinforcement Learning

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) Alibaba
  2. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) Alibaba
  3. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) Alibaba
  4. Productionizing Deep Reinforcement Learning with Spark and MLflow Zynga
  5. Deep Reinforcement Learning in Production Part1 Part 2 Zynga
  6. Building AI Trading Systems Denny Britz
  7. Reinforcement Learning for On-Demand Logistics DoorDash
  8. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) Alibaba

Anomaly Detection

  1. Detecting Performance Anomalies in External Firmware Deployments Netflix
  2. Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code) LinkedIn
  3. Preventing Abuse Using Unsupervised Learning LinkedIn
  4. The Technology Behind Fighting Harassment on LinkedIn LinkedIn
  5. Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) Ant Financial
  6. How Does Spam Protection Work on Stack Exchange? Stack Exchange
  7. Auto Content Moderation in C2C e-Commerce Mercari
  8. Blocking Slack Invite Spam With Machine Learning Slack
  9. Cloudflare Bot Management: Machine Learning and More Cloudflare
  10. Anomalies in Oil Temperature Variations in a Tunnel Boring Machine SENER
  11. Using Anomaly Detection to Monitor Low-Risk Bank Customers Rabobank
  12. Fighting fraud with Triplet Loss OLX Group
  13. Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) Facebook
  14. How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4 Facebook
  15. Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) Swedbank, Hopsworks

Graph

  1. Building The LinkedIn Knowledge Graph LinkedIn
  2. Retail Graph — Walmart’s Product Knowledge Graph Walmart
  3. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber
  4. AliGraph: A Comprehensive Graph Neural Network Platform (Paper) Alibaba
  5. Scaling Knowledge Access and Retrieval at Airbnb Airbnb
  6. Contextualizing Airbnb by Building Knowledge Graph Airbnb
  7. Traffic Prediction with Advanced Graph Neural Networks DeepMind
  8. SimClusters: Community-Based Representations for Recommendations (Paper, Video) Twitter
  9. Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) Alibaba
  10. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) Alibaba
  11. JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) JPMorgan Chase
  12. Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)Pinterest

Optimization

  1. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats Uber
  2. Next-Generation Optimization for Dasher Dispatch at DoorDash DoorDash
  3. Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) Lyft
  4. The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED) Grab
  5. Optimization of Passengers Waiting Time in Elevators Using Machine Learning Thyssen Krupp AG
  6. Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) Amazon
  7. Optimizing DoorDash’s Marketing Spend with Machine Learning DoorDash

Information Extraction

  1. Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) Rakuten
  2. Information Extraction from Receipts with Graph Convolutional Networks Nanonets
  3. Using Machine Learning to Index Text from Billions of Images Dropbox
  4. Extracting Structured Data from Templatic Documents (Paper) Google
  5. AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) Amazon
  6. One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) Alibaba

Weak Supervision

  1. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) Google
  2. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) Intel
  3. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) Apple
  4. Bootstrapping Conversational Agents with Weak Supervision (Paper) IBM

Generation

  1. Better Language Models and Their Implications (Paper)OpenAI
  2. Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) OpenAI
  3. Image GPT (Paper, Code) OpenAI
  4. Deep Learned Super Resolution for Feature Film Production (Paper) Pixar
  5. Unit Test Case Generation with Transformers Microsoft

Audio

  1. Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)Google
  2. The Machine Learning Behind Hum to Search Google

Validation and A/B Testing

  1. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) Google
  2. Twitter Experimentation: Technical Overview Twitter
  3. Experimenting to Solve Cramming Twitter
  4. Building an Intelligent Experimentation Platform with Uber Engineering Uber
  5. Analyzing Experiment Outcomes: Beyond Average Treatment Effects Uber
  6. Under the Hood of Uber’s Experimentation Platform Uber
  7. Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) Uber
  8. Enabling 10x More Experiments with Traveloka Experiment Platform Traveloka
  9. Large Scale Experimentation at Stitch Fix (Paper) Stitch Fix
  10. Multi-Armed Bandits and the Stitch Fix Experimentation Platform Stitch Fix
  11. Experimentation with Resource Constraints Stitch Fix
  12. Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) Better
  13. It’s All A/Bout Testing: The Netflix Experimentation Platform Netflix
  14. Computational Causal Inference at Netflix (Paper) Netflix
  15. Key Challenges with Quasi Experiments at Netflix Netflix
  16. Interpreting A/B Test Results: False Positives and Statistical Significance Netflix
  17. Interpreting A/B Test Results: False Negatives and Power Netflix
  18. Constrained Bayesian Optimization with Noisy Experiments (Paper) Facebook
  19. Detecting Interference: An A/B Test of A/B Tests LinkedIn
  20. Making the LinkedIn experimentation engine 20x faster LinkedIn
  21. Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn LinkedIn
  22. How to Use Quasi-experiments and Counterfactuals to Build Great Products Shopify
  23. Improving Experimental Power through Control Using Predictions as Covariate DoorDash
  24. Supporting Rapid Product Iteration with an Experimentation Analysis Platform DoorDash
  25. Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity DoorDash
  26. Leveraging Causal Modeling to Get More Value from Flat Experiment Results DoorDash
  27. Iterating Real-time Assignment Algorithms Through Experimentation DoorDash
  28. Running Experiments with Google Adwords for Campaign Optimization DoorDash
  29. The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
  30. Spotify’s New Experimentation Platform (Part 1) (Part 2) Spotify
  31. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) Google
  32. Experimentation Platform at Zalando: Part 1 - Evolution Zalando
  33. Scaling Airbnb’s Experimentation Platform Airbnb
  34. Designing Experimentation Guardrails Airbnb
  35. Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab Grab
  36. Meet Wasabi, an Open Source A/B Testing Platform (Code) Intuit
  37. Building Pinterest’s A/B Testing Platform Pinterest
  38. Network Experimentation at Scale(Paper] Facebook
  39. Universal Holdout Groups at Disney Streaming Disney

Model Management

  1. Runway - Model Lifecycle Management at Netflix Netflix
  2. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) Apple
  3. Managing ML Models @ Scale - Intuit’s ML Platform Intuit
  4. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions Comcast
  5. ML Model Monitoring - 9 Tips From the Trenches Nubank

Efficiency

  1. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) Facebook
  2. Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) Uber
  3. How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs Roblox

Ethics

  1. Building Inclusive Products Through A/B Testing (Paper) LinkedIn
  2. LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) LinkedIn

Infra

  1. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook
  2. Elastic Distributed Training with XGBoost on Ray Uber

MLOps Platforms

  1. Managing ML Models @ Scale - Intuit’s ML Platform Intuit
  2. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions Comcast
  3. Big Data Machine Learning Platform at Pinterest Pinterest
  4. Real-time Machine Learning Inference Platform at Zomato Zomato
  5. Meet Michelangelo: Uber’s Machine Learning Platform Uber
  6. Building Flexible Ensemble ML Models with a Computational Graph DoorDash
  7. LyftLearn: ML Model Training Infrastructure built on Kubernetes Lyft
  8. "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper) Coveo
  9. Core Modeling at Instagram Instagram
  10. Open-Sourcing Metaflow - a Human-Centric Framework for Data Science Netflix
  11. MLOps at GreenSteam: Shipping Machine Learning GreenSteam
  12. Evolving Reddit’s ML Model Deployment and Serving Architecture Reddit

Practices

  1. Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) Yoshua Bengio
  2. Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) Google
  3. Rules of Machine Learning: Best Practices for ML Engineering Google
  4. On Challenges in Machine Learning Model Management Amazon
  5. Machine Learning in Production: The Booking.com Approach Booking
  6. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) Booking
  7. Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank Rabobank
  8. Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) Cambridge
  9. Continuous Integration and Deployment for Machine Learning Online Serving and Models Uber
  10. Tuning Model Performance Uber
  11. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook
  12. The problem with AI developer tools for enterprises Databricks
  13. Maintaining Machine Learning Model Accuracy Through Monitoring DoorDash
  14. Building Scalable and Performant Marketing ML Systems at Wayfair Wayfair
  15. Our approach to building transparent and explainable AI systems LinkedIn
  16. 5 Steps for Building Machine Learning Models for Business Shopify

Team structure

  1. Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department Stitch Fix
  2. Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist Stitch Fix
  3. Cultivating Algorithms: How We Grow Data Science at Stitch Fix Stitch Fix
  4. Analytics at Netflix: Who We Are and What We Do Netflix
  5. Building a Data Team at a Mid-stage Startup: A Short Story Erikbern
  6. Building The Analytics Team At Wish Wish

Fails

  1. 160k+ High School Students Will Graduate Only If a Model Allows Them to International Baccalaureate
  2. When It Comes to Gorillas, Google Photos Remains Blind Google
  3. An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor Harrisburg University
  4. It's Hard to Generate Neural Text From GPT-3 About Muslims OpenAI
  5. A British AI Tool to Predict Violent Crime Is Too Flawed to Use United Kingdom
  6. More in awful-ai

P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys

About

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published