Data Science

Learn on Towards Data Science 주제별 글 모음 링크
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines
R에서 파이썬까지…데이터과학 학습 사이트 8곳
리다, 기업을 위한 데이터과학 강의 공개
Data Analysis - YouTube
데이터과학에 입문하고 싶다면, 이곳부터
데이터과학을 시작할 때 도움되는 것들
데이터 사이언스의 학습 로드맵 (번역) – 이바닥늬우스
헬로 데이터 과학- 헬로 데이터 과학당신의 삶과 업무를 바꾸는 데이터 과학 (데이터 사이언스)
- 데이터 과학자의 데이터로 책 쓰기: 데이터는 기획력과 감수성이다
- 온라인 서비스 개선을 데이터 활용법 (How We Use Data 발표)
- 데이터 지능(Data Intelligence) 팟캐스트
인정받는 데이터 분석가 되기 – 외부 세미나 요약 –
데이터 분석가는 어떤 SKILLSET을 가져야 하는가?
당근마켓 팀과 데이터 분석. 프로덕트 데이터 분석가는 어떤 일을 하는가 | by matthew l | 당근마켓 팀블로그 | Aug, 2021 | Medium
데이터 분석가가 되기 위해서는?
데이터 분석, 의심에서 전달까지 | Pega Devlog
데이터 분석이란 무엇일까 기술적인 이야기는 아님
Overfitting을 피해보자!
손에 잡히는 데이터 과학 이야기
How to Become a Data Scientist for Free
- 데이터 과학을 무료로 공부해보자
데이터 과학을 지탱하는 기본기
Hiring data scientists
- (part 1): what to look for in a candidate
- (part 2): the perfect candidate doesn’t exist
- (part 3): interview questions
- (part 4): the case study
Top Python Data Science Interview Questions | .cult by Honeypot
GitHub Special: Data Scientists to Follow & Best Tutorials on GitHub
How to Become a Data Scientist
So You Want To Be a Data Scientist: A Guide for College Grads
Aspiring data scientist? Master these fundamentals
How I Became a Data Scientist Despite Having Been a Math Major
Data Scientist: The Sexiest Job of the 21st Century
Lessons in Becoming an Effective Data Scientist
PyData Paris 2016 - Round table: "How to become a data scientist"
Renee Teate | Becoming a Data Scientist Advice From My Podcast Guests
How to land a Data Scientist job at your dream company — My journey to Airbnb
어서와~ 데이터사이언티스트는 처음이지?
장바구니를 든 데이터 사이언티스트
B급 프로그래머 데이터 과학자로 취직하려면 남들처럼 하지 마라
세상에서 가장 이해받지 못하는 영웅, 데이터 과학자 (1/3)
세상에서 가장 이해받지 못하는 영웅, 데이터 과학자 (2/3)
세상에서 가장 이해받지 못하는 영웅, 데이터 과학자 (3/3)
데이터분석가의 분석포트폴리오만들기 · Present
카일데이 : 카일의 데이터 이야기 - YouTube
Full Stack Data Science: The Next Gen of Data Scientists Cohort | by Jay Kachhadia | Towards Data Science
Engineers as Data Scientists?. How the Trends of IoT and Big Data can… | by Christianlauer | Jun, 2022 | Medium
Is Data Science a Dying Profession? | R-bloggers
Data Science Career Ladder - YouTube
Data Engineering Technology Tree | Jesse Anderson DBA/Data Warehouse/SQL-Focused, Software Engineer, Data Scientists tech stack tree
The Rise of the Data Engineer
데이터 직군 안내서: DA, TA, DE, DS, ML엔지니어, BI 분석가
A Beginner’s Guide to Data Engineering
- Part I
  - 주된 내용
    - 데이터 엔지니어링이 무엇이고 왜 어려운지?
    - 데이터 과학의 계층 구조
    - ETL 프레임워크 (airflow 소개)
    - 두 가지 패러다임 : SQL- v.s. JVM 중심의 ETL 비교
  - 불행히도 많은 기업들은 기존의 데이터 과학 교육 프로그램 중 대부분이 학계 또는 전문직인 경우 피라미드 지식의 최상 부분(Ex. AI)에 집중하는 경향이 있다는 것을 인식하지 못합니다, 대부분은 학생들에게 테이블 스키마를 적절하게 설계하거나 데이터 파이프 라인을 작성하는 방법을 가르치지 않습니다.
  - ETL은 일부실험 구성 파일을 가져와서 해당 실험에 대한 관련 측정 항목을 계산하고 마지막으로 UI에서 p 값 및 신뢰 구간을 출력하여 제품 변경으로 인해 사용자 변동이 방지되는지 여부를 알릴 수 있습니다. 또 다른 예는 일일 단위로 기계 학습 모델의 기능을 계산하여 사용자가 며칠 이내에 이탈하는지 예측하는 배치 ETL 작업입니다. 가능성은 무한합니다!
  - SQL 중심 ETL은 일반적으로 SQL, Presto 또는 Hive와 같은 언어로 작성됩니다. ETL 작업은 종종 선언적 방법으로 정의되며 거의 모든 것이 SQL 및 테이블을 중심으로 이루어집니다. UDF를 작성하는 것은 때로는 다른 언어 (예 : Java 또는 Python)로 작성해야 하기 때문에 번거롭습니다. 이 이유 때문에 테스트가 훨씬 어려울 수 있습니다. 이 패러다임은 데이터 과학자들 사이에서 인기가 있습니다.
- Part II
A Beginner’s Guide to Data Engineering
- Part I Data Engineering: The Close Cousin of Data Science
- Part II Data Modeling, Data Partitioning, Airflow, and ETL Best Practices
data-engineering-zoomcamp: Free Data Engineering course!
시리즈 | IBM Data Science - DEV_SK
데이터 아키텍처의 변화 ETL -> ELT
인터뷰ㅣ"기술 회사도 IT 현대화해야 한다" 키사이트 테크놀로지스 CIO - CIO Korea ETL -> ELT로의 이유
칼럼ㅣ'ETL'은 빅데이터와의 경쟁에서 패배했다 - CIO Korea ETL -> ELT or pipelined data streaming
ETL, ELT의 4가지 주요 차이점 - 밥먹는 개발자
ETL vs ELT, 당신의 선택은?. ELT의 장단점과 딜라이트룸에서의 도입 후기 | by Chris Lee | DelightRoom | Jan, 2024 | Medium
GumGum Builds and Maintains High-Performance ETL Pipelines for Ad Exchange Reporting - YouTube
Tables as Code: The Journey from Ad-hoc Scripts to Maintainable ETL Workflows at Booking.com - YouTube
I want to study Data Science Wiki 한글
A Beginner’s Guide to the Data Science Pipeline
Big Data: Its Benefits, Challenges, and Future | by Benedict Neo | Oct, 2020 | Towards Data Science
Big Data Pipeline Recipe. Introduction | by Javier Ramos | Aug, 2020 | ITNEXT
Designing Functional Data Pipelines for Reproducibility and Maintainability | PyData Global 2021 - YouTube
Data Engineering Principles - Build frameworks not pipelines - Gatis Seja - YouTube
Live Data Demo – Practical Pipelines - YouTube
29CM 데이터 파이프라인 소개. 안녕하세요 데이터그로스팀 이진환입니다. 29CM에선 21년 9월… | by brownbears | 29CM TEAM | Jan, 2023 | Medium
FMS(차량 관제 시스템) 데이터 파이프라인 구축기 1편. 스트리밍/배치 파이프라인 개발기 - SOCAR Tech Blog
FMS(차량 관제 시스템) 데이터 파이프라인 구축기 2편. 신뢰성 높은 데이터를 위한 테스트 환경 구축기 - SOCAR Tech Blog
Data Pipelines Overview
How to jump into Data Science
Functional Data Engineering — a modern paradigm for batch data processing
Data Engineers are in Greater Demand than Data Scientists
Data Infrastructure at In Loco
- Loco 라는 회사에서 일 약 15TB 데이터를 분석 및 BI 플랫폼에 활용하는 데이터 인프라 설명
- Kafka, Presto, Airflow, Spark 사용
Data engineers vs. data scientists
데이터 사이언티스트 vs 데이터 엔지니어: 주요 차이점과 이해
쏘카 데이터 그룹 - 데이터 엔지니어링 팀이 하는 일 - SOCAR Tech Blog
실무 AI 프로젝트 - 분석보다 엔지니어링이 중요한 이유
Analytics Engineer 란? (Feat. Modern Data Stack)
Analytics Engineer 란? (Feat. Modern Data Stack)
Coalesce 2022 New Orleans 후기 - Analytics Engineering 그리고 Modern DataStack
How The Modern Data Stack Is Reshaping Data Engineering | Preset - Blog | Preset
온프레미스 데이터 플랫폼 팀의 데이터 엔지니어가 하는 일(feat. 11번가 데이터 플랫폼 2020년 회고) :: Kaden Sungbin Cho
쏘카 신입 데이터 엔지니어 디니의 4개월 회고 - SOCAR Tech Blog
Data Product (1) 쏘카 고객은 무슨 목적으로 쏘카를 이용할까? - SOCAR Tech Blog
Data Product (2) AI(데이터)로 실제 운영 효율화가 가능할까? - SOCAR Tech Blog
데이터 엔지니어란 무엇일까? - Nephtyw’S Programming Stash
- 데이터 엔지니어란 무엇인가? | GeekNews
데이터분석가 vs 데이터엔지니어 vs 데이터과학자 차이가 뭘까? (1) 역할과 정의
데이터분석가 vs 데이터엔지니어 vs 데이터과학자 차이가 뭘까? (2) 필요 역량, 기술
데이터분석가 vs 데이터엔지니어 vs 데이터과학자 차이가 뭘까? (3) 연봉과 보상
Roadmap to Data Engineering in 2022. | by Chetan Dekate | Mar, 2022 | Medium
There’s No Such Thing as a Data Scientist
데이터 사이언티스트가 되기 위해 필요한 기술,이 문장만 보면 다 알 수 있다
새로운 데이터 분석가와의 랑데부를 위하여(2) SQL 중요성 강조
따라 하는 데이터 과학 – 강의 PPT
datasciencetech.institute
mindscale.kr
How to actually learn data science
Skills You Need for that Data Science Job
데이터과학 자료모음
A curated list of data science blogs
Data Science Courses
Faster Data Science Education Kaggle
Pascal Poupart's Homepage
dataquest.io
Linear Algebra for Data Scientists
Reading Between the Lines: How We Make Sense of Users’ Searches
Research papers that changed the world of Big Data
Paper Search using ScopusAPI | Pega Devlog
Data Analysis (1): Neuroimaging Data loading using SPM8 toolbox
당신이 알고 있는 좋은 데이터 분석 슬라이드가 있나요?
The last-mile problem: How data science and behavioral science can work together
The democratization of predictive analytics
Tracking Economic Development with Open Data and Predictive Algorithms
Predictive maintenance
Data Science for Startups: Predictive Modeling
공공데이터를 연결하라…‘LOD’
GE산업인터넷 플랫폼, 프레딕스™(Predix™)에 대해 알아야 할 모든 것
articles
트위터로 들여다보는 빅데이터 분석
버즈피드의 교훈: 분산 미디어와 데이터 분석
실리콘 밸리 데이터 사이언티스트의 하루
“데이터의 잡음 속 숨겨진 진실을 찾아라”
Data Science From Scratch: First Principles with Python
Three Things About Data Science You Won't Find In the Books
Weekly Digest, January 8
Weekly Digest, June 15
Grepping logs is terrible
Grepping logs is still terrible
Why Topological Data Analysis Works
Topological Data Analysis (TDA) is a cool thing that data scientists should know
HyperLogSandwich
Pipelining - A Successful Data Processing Model
NASA'S DATA PORTAL
신선한 데이터를 냉장고에서 꺼내기
Algorithm reduces size of data sets while preserving their mathematical properties
A BEGINNER'S GUIDE TO DATA ANALYSIS WITH UNIX UTILITIES
Enterprise Data Analysis and Visualization: An Interview Study
Why Interactive Data Visualization Matters for Data Science in Python | PyData Global 2021 - YouTube
Prologue to Data Science
Data Science in Clojure at Yieldbot
Mining the Web to Predict Future Events
Using Data Science to Measure a Musical Revolution
Data Science Career Alert - June 12
Comparing Python and R for Data Science
Data Science for Startups: R -> Python
Introducing ShArc: Shot Arc Analysis
Inside Data@Scale 2015
- Dato
DataLake
- A Data Lake Architecture With Hadoop and Open Source Search Engines
- 데이터 관리 패러다임 바꾼 ‘데이터 레이크’ (1) - 데이터넷
- 데이터 관리 패러다임 바꾼 ‘데이터 레이크’ (2) - 데이터넷
- 데이터 관리 패러다임 바꾼 ‘데이터 레이크’ (3) - 데이터넷
- 빅데이터 분석 위한 대규모 확장형 스토리지··· ‘데이터 레이크’ A to Z - CIO Korea
Data Lake with Serverless | 월요일 오후 9시
Data Warehouse vs. Data Mart vs. Data Lake | by Christianlauer | Jul, 2022 | Medium
기획특집 ‘창고’와 ‘호수’를 넘어서는 데이터 레이크하우스 lakehouse
‘일관성·유연성’ 덕에 각광… 데이터 레이크하우스 활용 사례 - CIO Korea
Data Lake vs. Data Lakehouse | 01
What is a Data Fabric?. How to realize modern Data Management | by Christianlauer | Aug, 2022 | Medium
Data Warehouse vs. Data Lake vs. Data Fabric | by Christianlauer | Nov, 2022 | Medium
Data Maven
Data Catalog, 데이터경험의 심리학 법칙. https://us.semantix.ai/ | by reckoner | Nov, 2022 | Medium
ryd.io - A data science exploration of the NYC Taxi data set via clustering and time-series analysis
프레임드, 예측 분석 기술 클라우드 서비스로 출시
11 Facts about Data Science that you must know
The Data Science Workflow
Eric Ma - Principled Data Science Workflows | PyData Boston July Virtual Meetup - YouTube
퇴물개발자가 생각하는 빅데이터 기술
Predicting winners of the Rugby World Cup
Building Analytics at 500px
2015 Data Science Salary Survey / 2015 데이터과학 소득 조사
데이터과학자들의 실험실, 넘버웍스
50 years of Data Science
기획자·마케터가 알아둘 데이터과학 원칙 6가지
우리 식당 김사장이 데이터 과학자가 된 사연은?
데이터 과학자에서 AI 연구자로 들어서며…
e커머스 데이터 파헤치기-6편
데이터와 관련하여 기업들이 공개한 기술은 어떤게 있을까?
The Automatic Statistician - An artificial intelligence for data science
좋다는 건 알겠는데 좀 써보고 싶소. 데이터! - 넘버웍스 하용호 대표
‘데이터’를 똑똑하게 만드는 오픈소스 기술 12종
Google Data Studio (beta) provides everything you need to turn your data into beautiful, informative reports that are easy to read, easy to share, and fully customizable
- How to create a Ethereum DeFi realtime dashboard | Towards Data Science
쉽게 이해하는 모바일 데이타 분석
데이터 사이언티스트로 성장하기
Data School
github.com/collections/open-journalism
data.fivethirtyeight.com
- github.com/fivethirtyeight/data
- A User’s Guide To FiveThirtyEight’s 2016 General Election Forecast
어떻게 하면 싱싱한 데이터를 모형에 바로 적용할 수 있을까? – Bayesian Online Leaning
데이터 과학 여름 학교 2016
데이터에 현혹되지 않고, 데이터를 잘 활용할수 있는 14가지 룰
Demystifying Different Roles in Data Team
Why Data Science Teams Need Generalists, Not Specialists 스페셜리스트가 아닌 제너럴리스트도 필요하다
Causal Data Science
Announcing the general availability of the Microsoft Excel API to expand the power of Office 365
16 analytic disciplines compared to data science
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
데이터 전처리에 대한 모든 것
데이터 사이언스 스쿨 - Python 데이터 핸들링과 시각화 라이브러리 실무
데이터 과학을 공부하는 이유
데이터는 차트가 아니라 돈이 되어야 한다
Practical Data Science at Honestbee - DataScienceSG
빅데이터의 대중화
이론의 종말: 데이터 홍수가 과학적 연구방법을 구닥다리로 만든다
이메일로 분석해 보는 나의 3년
E-Mail 데이터 곱씹어보기
스터디뽀개기.zip
- GNMT로 알아보는 신경망 기반 기계번역 / 구글 신경망 기계번역 시스템 리뷰
- Spark + R / spark + R 기본 사용법, 특징과 장단점 소개
- Spark를 이용한 분산 컴퓨팅 / 분산환경에서 머신러닝을 운용하기 위한 기반으로 Spark와 클라우드를 활용하는 법
- 강화학습을 활용한 대화형 시스템 / 대화형 시스템을 구성하기 위해 강화학습을 이용하는 방법 리뷰
How to Make Your Database 200x Faster Without Having to Pay More?
- 데이터 분석에 있어 정확한 수치가 필요한 것이 아니라 데이터의 추이 또는 비율 등을 분석하는 경우에는 전체 데이터가 아닌 샘플링을 하는 방식을 이용할 수 있다는 내용
- Presto, BlinkDB / G-OLA, SnappyData 등과 같은 샘플링 방식을 지원하는 데이터 처리 솔루션에 대해서도 간단하게 소개
3 methods to deal with outliers
Visual Information Theory
가장 위대한 데이터 분석가
Tutorial 1: Protein - DNA interaction
A survey on predicting the popularity of web content
Data analysis in excel
Common Probability Distributions: The Data Scientist’s Crib Sheet
dataplatforms.com
빅데이터 파라독스 표본수가 클수록 정확할 거 같지만, 선택편향이 있는 경우 실제 정확도는 400명의 확률표본으로 조사한 것과 마찬가지
How to Start a Data Science Project in Python
- 데이터 분석을 위한 기본적인 Python 환경 설정 방법
- Anaconda의 Conda를 활용해 분리된 환경 설정
- 하나의 Python 데이터 분석 프로젝트의 디렉토리를 구성하는 방법
50 Best Data Science Project Ideas You Must Know in 2022
Ian Ozsvald - Data Science Project Patterns that Work | PyData Global 2022 - YouTube
Why Most Data Projects Fail & How to Avoid It • Jesse Anderson • GOTO 2023 - YouTube
이야기 12. 당신은 데이터 문맹(Data Illiterate) 인가?
Q&A with leading Data Scientists
수많은 데이터 사이언티스트들이 직장을 떠나는 이유는 무엇인가?
Forrester vs Gartner on Data Science Platforms and Machine Learning Solutions
sooyongshin.wordpress.com
- Healthcare Data? Data! Data!! (0) – 왜 데이터 이야기를 하나
Data Science Ontology
Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnb
A list of artificial intelligence tools you can use today — for personal use (1/3)
Data Science Bowl 2017, Predicting Lung Cancer: Solution Write-up, Team Deep Breath
Strata Data Conference
Data Science Resources : Cheat Sheets
Data Science Cheatsheets
Top 28 Cheat Sheets for Machine Learning, Data Science, Probability, SQL & Big Data
ds-cheatsheets: List of Data Science Cheatsheets to rule the world
Getting started: the 3 stages of data infrastructure
EVERYTHING A DATA SCIENTIST SHOULD KNOW ABOUT DATA MANAGEMENT
Back To The Future: Data Engineering Trends 2020 & Beyond - Data Engineering Weekly Data Infrastructure, Data Architecture, Data Management를 주제로 여러가지 좋은 글 link 모음
Silent data corruption: Mitigating effects at scale - Facebook Engineering
Roadmap: Data Infrastructure · Bessemer Venture Partners
The Guide to Modern Data Architecture | Future
데이터를 얻으려는 노오오력
#2.5. Intra/Inter-Class Variability 데이터의 '질'이란?
Analyzing GitHub, how developers change programming languages over time
Regression 모델 평가 방법
- opendataminer object mapper
7 Techniques to Handle Imbalanced Data
초급자를 위한 데이터 과학 비디오 1: 데이터 과학으로 답변할 수 있는 5가지 질문
Brunch Magzine List about Data Science
오픈 글로벌 데이터세트를 탐구하고 시각화하는 과정에 대해 확인해 보세요
빅데이터 : 샘플 양의 힘 (quantity over quality)
데이터야놀자2107 강남 출근길에 판교/정자역에 내릴 사람 예측하기
How to Set Up Data Science?
FIAN Research
- blog.naver.com/tortellini
A Reference Stack for Modern Data Science
Comprehensive Repository of Data Science and ML Resources
Top 10 Popular GitHub Repositories to learn about Data Science
If you’re a developer transitioning into data science, here are your best resources
How to Handle Missing Data
Missing Data Handling |How to Deal with Missing Data using Python
5 Amazing Improvement Big Data Can Bring to Retail
Notes On Using Data Science & Artificial Intelligence To Fight For Something That Matters
Five Misconceptions about Data Science - Knowing What You Don't Know
Data Preprocessing For Non-Techies: Basic Terms and Definitions
What Getting A Job In Data Science Might Look Like
Data Science. Intro
Weekly Selection — Mar 2, 2018
Big Data Engineering VS Data Warehousing
데이터 웨어하우스(Data Warehouse)와 데이터 레이크(Data Lake)의 차이
Free Data Engineering Course for Beginners - #1 EXTRACT - YouTube
How ‘Big’ should be your Data?
Self Driven Data Science — Issue #40
How I automated my job search by building a web crawler from scratch
다양한 사람들의 데이터 사이언스 이야기 후기
10 Modern Data Trends
Test-Driven Data Analysis - Nick Radcliffe
Ways I Use Testing as a Data Scientist | Peter Baumgartner
Data Science for Startups: R -> Python
데이터 사이언스(Data Science) 프로세스 정리
데이터 저널리즘, 오픈 데이터를 넘어 코드 공개로
The Data Science of K-Pop: Understanding BTS through data and A.I.
Should data scientists learn JavaScript?
Data Science with Watson Analytics
데이콘 주최 1회 펀다 상점매출 예측 대회 우승자 코드
Dacon_KBO스카우팅챌린지 조용건 영상1 코드설명
Mission 13. 2019 Jeju BigData Competition - 퇴근시간 버스승차인원 예측
Mission 11. 에너지 빅데이터 활용 데이터 사이언스·아이디어 콘테스트
데이터 사이언스 Meetup
The penalty of missing values in Data Science
Machine Learning and Data Science Applications in Industry 다양한 분야의 적용 예
4 Pillars of Analytics Data acquisition, processing, surfacing and actioning are key to an effective analytics initiative
Data Science for Startups: Tracking Data
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
- I want to study Data Science Wiki
Organizing and scaling an effective data team
빅데이터 조직과 시스템
Generating and visualizing alpha with Vectorspace AI datasets and Canvas
쿠팡 데이터 플랫폼의 진화
데이터 플랫폼 구현 사례
When your data doesn’t fit in memory: the basic techniques
Top 5 must-have Data Science skills for 2020
순차 패턴 마이닝을 활용한 EHR 분석 – 1편
순차 패턴 마이닝을 활용한 EHR 분석 – 2편
데이터사이언스 취업 전에 꼭 고민하면 좋을 내용 공개!
다시 찾아간 지표의 세계 vanity metrics, actionable metrics
3 Design Principles for Engineering Data
데이터분석가로서 업무 과정과 경험, 배움을 공유합니다 - 우아한형제들 기술 블로그
제로베이스 데이터 사이언스 스쿨 | 직무 인터뷰 ③ 현직 네이버 계열사 데이터 분석가가 말하는 ‘데이터 분석가의 일’ | zero-base
컬리에서 데이터 분석가로 일한다는 것 - 컬리 기술 블로그 요구사항에 대한 이야기가 인상적, 맘에 듦
Becoming A Data Analyst: Step by Step Guide - YouTube
데이터분석가의 분석포트폴리오만들기 · Present
interview 아이디어스팀이 데이터로 일하는 방법. 아이디어스팀의 데이터 기반으로 일하는 문화를 함께 만들어가고 있는… | by Saeyeon Park | idus-Tech | Mar, 2022 | Medium
IT 회사에서 데이터 직군은 데이터를 어떻게 관리할까?
Data Science Life Cycle 101 for Dummies like Me | by Sangeet Moy Das | Towards Data Science
입력 데이터를 정규화 하는 이유 : 네이버 블로그
Quantified Self Part 6 - 생산적인 하루에 대한 정량적인 표현과 4년간의 데이터 이야기 - HumanBrain
Almost Everything You Need To Know on Data Discovery Platforms
What Is Data Engineering and Is It Right for You? – Real Python realpython post지만 python이 아니라 general한 글
Data Experience Report 모음
The Top 5 Data Trends for CDOs to Watch Out for in 2021 | by Prukalpa | Jan, 2021 | Towards Data Science
- 2021년 5가지 데이터 트렌드 | GeekNews
9 Distance Measures in Data Science | Towards Data Science
The Future of Data Engineering
🗃개발에 필요한 데이터 구하기 #fetch - YouTube
좋은 분석환경은 공짜가 아니다
Causal design patterns for data analysts | Emily Riederer
Design patterns every data engineer should know | by Raj Samuel | Jan, 2022 | Medium
1부: 스타벅스 DT 소셜 데이터를 이용한 감성분석 – SPH
머신러닝 비지도학습으로 찾은 최적의 스타벅스 DT, TOP 4 ! – SPH
머신러닝 지도학습을 통해서 꼽아본 최적의 스타벅스 DT 장소!? – SPH
‘데이터 랭글링’ 및 ‘탐구 데이터 분석’ 따라잡기 - CIO Korea
글로벌 칼럼 | 데이터 랭글링을 비하해선 안 되는 이유 - ITWorld Korea
칼럼ㅣ결코 하찮지 않다!··· '데이터 랭글링' 작업이 가치 있는 이유 - CIO Korea
데이터 처리 플랫폼 : 네이버 블로그
김진철의 How-to-Big Data | How-to-Big Data 핵심 정리(Key Takeaways) (1) - CIO Korea
세미나 후기 Wanted Con. Data 요즘 데이터 팀은 어떻게 일할까?
How to structure a data team to climb the pyramid of Data Science | Airbyte
게임 속 시장을 들여다보기 위한 단 하나의 지표
Why and how should you learn “Productive Data Science”? - KDnuggets
The Quick and Dirty Guide to Building Your Data Platform | by Barr Moses | Jul, 2021 | Towards Data Science
데이터 플랫폼 2022: 페타바이트 규모의 글로벌 확장. 쿠팡 데이터 플랫폼의 데이터 인제스천(Ingestion), 머신 러닝… | by 쿠팡 엔지니어링 | Coupang Engineering Blog | Medium
데이터 플랫폼 2022: 데이터를 비즈니스 인사이트로 전환하기 | 쿠팡 엔지니어링 | Coupang Engineering Blog
Big Data World, Part 1: Definitions | JetBrains News
빅데이터의 세계, 2부: 직무 | JetBrains News
빅데이터의 세계, 3부: 데이터 파이프라인 구축 | JetBrains News
빅데이터의 세계, 4부: 아키텍처 | JetBrains News
Building a Scalable Data Science Pipeline at REA • Justin Hamman & Jack Low • YOW! 2019 - YouTube
Big Data World, Part 5: CAP Theorem | JetBrains News
현대 신경과학은 과연 동키콩을 이해할 수 있는가 (2016) | GeekNews 기술적으로 관련이 있는 건 없지만 시사점이 있음
Why MapReduce is making a comeback — Estuary
오늘의집 데이터 마케팅 활용법 : 유입 기여 분석 시스템 - 오늘의집 블로그
양질의 데이터를 판별하는 5가지 방법 : 데이터 양은 충분한가? | 요즘IT
Log-based Change Data Capture — lessons learnt | by Andreas Buckenhofer | Daimler TSS Tech | Medium Debezium, DynamoDB Streams, VoltDB
데이터 분석에 필수적인 5 가지 마인드
업무 지식도 모르면 데이터 분석을 할 수 없다！
업무 지식도 모르면 데이터 분석을 할 수 없다! 2
업무 지식도 모르면 데이터 분석을 할 수 없다! 3
업무 지식도 모르면 데이터 분석을 할 수 없다! 4
글로벌 칼럼 | ‘머신러닝은 만능이 아니다’ ML 대신 SQL 쿼리를 써야하는 이유 - ITWorld Korea
칼럼ㅣ머신러닝의 첫 번째 규칙은 ML 없이 시작하는 것이다 - CIO Korea
모델만 잘 만들면 끝?··· 데이터 과학을 위한 ‘CI/CD’가 필요하다 - CIO Korea
Five Predictions for the Future of the Modern Data Stack | by Jordan Volz | Medium
Modern Data Stack for Startups. “Use the right tool for the job!” | by cyber-venom003 | Nybles | Medium
Data Engineering: Major Technologies To Learn In 2022 | by Chandan Kumar | Jan, 2022 | Medium
The Future of Data Engineering
Roadmap to a Successful Data Engineer - Rock the JVM Blog
카우레터 B컷 중대재해 데이터를 공개합니다 - alookso
Foundational Infrastructure to Create a Successful Data Science Team | PyData Global 2021 - YouTube
Bridging Data and Business - Sylvia Lee | PyData Global 2021 - YouTube
2021년 가트너 Data Science hype graph에 등장한 용어들 – Cojette (꼬젯) – 잡덕 잉여 데이터 분석가의 이것저것 기술적인 이야기는 아니지만 봐둘만함
The Importance of Ratios & KPIs in Data Science | by Christianlauer | CodeX | Feb, 2022 | Medium
데이터 실험에서의 실험자 편향 – Cojette (꼬젯) – 잡덕 잉여 데이터 분석가의 이것저것
SEF2021 빅데이터가 도대체 무엇? 빅데이터 분석가는 또 무엇? - YouTube
브런치북 온라인서비스를 위한 데이터사이언스
7 Must-Know Data Buzzwords in 2022 | by Coco Li | Kyligence | Jan, 2022 | Medium
Data Management Trends You Need to Know - Gradient Flow
What is Data as a Service?. How the new Paradigm will make your… | by Christianlauer | Apr, 2022 | Medium
데이터 분석가 대디가 유소년 축구 플렉스하기
Week 1 - What is advanced data science anyway?
데이터 분석에 필요한 자질은 뭘까? | Popit
데이터 스토리텔링 연습! Day3
Data Is An Art, Not Just A Science—And Storytelling Is The Key — Data Science & Engineering (2022)
There's no such thing as data — Benedict Evans
카카오페이 유저 프로파일링, 페이프로파일 | Kakao Pay Tech
Data Versioning for Modern Data Teams and Platforms | by Christianlauer | CodeX | Jul, 2022 | Medium
데이터에 신뢰성과 재사용성까지, Analytics Engineering with dbt - SOCAR Tech Blog data build tool
Typical Problems and Challenges in Data Science | by Christianlauer | CodeX | Aug, 2022 | Medium
DataFest Seoul 발표자료
2022 카카오 채용연계형 겨울 인턴십 for Tech Developers을 진행합니다! – tech.kakao.com
빅데이터로 살펴본 '택시대란' : (1) 수요편
빅데이터로 살펴본 '택시대란' : (2) 공급편
빅데이터로 살펴본 '택시대란' : (3) 종합편
컬리는 물류 최적화 문제를 어떻게 풀고 있을까? - 1부 - 컬리 기술 블로그
컬리는 물류 최적화 문제를 어떻게 풀고 있을까? - 2부 - 컬리 기술 블로그
60. 데이터사이언스 원-포인트레슨
Elena Dyachkova on Twitter: "Data folks, thoughts on this title overlap illustration? https://t.co/xe41a4JZJz" / Twitter 각 역할이 어떻게 겹치는지 diagram으로 표현
Python vs. SQL in Data Science | 01
공학적 관점으로 데이터 분석 프로세스 만들기. 당근마켓 데이터 분석 프로세스 개선기 | by Theo | 당근마켓 테크 블로그 | Apr, 2023 | Medium
하나금융경영연구소
DMOps(Data Management Operation and Recipes), 현업에서 데이터 구축하기 — Upstage
데이터로 콘텐츠 제대로 다루기. 오늘은 콘텐츠 스쿼드의 일원으로서 풀고 있는 29CM 콘텐츠 데이터… | by 김동욱 | 29CM TEAM | May, 2023 | Medium
Uplift Modeling. Maximizing the incremental return of… | by Barış Karaman | Towards Data Science
100+ 팀원의 의사결정에 영향을 주는 Data Scientist, Decision | by matthew l | 당근마켓 테크 블로그 | Jul, 2023 | Medium test 관련 좋은 글
Practical advice for analysis of large, complex data sets
- 복잡한 대규모 데이터 세트의 분석에 대한 실무 조언. Patrick Riley의 Practical advice for… | by Jonas Kim | Medium
Vin Vashishta on LinkedIn: #data #analytics #datascience #consulting | 698 comments data에 대한 재미있는 사진

Book

시스템 트레이딩을 위한 데이터 사이언스 (파이썬 활용편)
밑바닥부터 시작하는 데이터 과학
- 밑바닥부터 시작하는 데이터 사이언스
- 밑바닥부터 시작하는 데이터 과학 ch.03 데이터 시각화
더북(TheBook): 모두의 데이터 과학 with 파이썬 3~5장만
더북(TheBook): 모두의 데이터 분석 with 파이썬
추천 시스템 | 에이콘출판사
파이썬을 활용한 데이터/AI 분석 사례 건강보험심사평가원에서 만든 "파이썬을 활용한 데이터/AI 분석 사례"
- HIRA OAK Repository: 파이썬을 활용한 데이터·AI 분석 사례
12 Data Analytics Books for Beginners: A 2022 Reading List | Coursera
- Coursera가 추천하는 초보자를 위한 데이터 분석 책들 : 2022 | GeekNews
27 free data mining books
Foundations of Data Science
The Data Science Handbook
16 Free Data Science Books
Free Data Science Books
50+ Free Data Science Books
60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more
Welcome to the School of Data Handbook
The Data Science Handbook
The Data Analytics Handbook
Reading for Growing Data Engineers — 2017
Data Science at the Command Line
List of Must – Read Free Data Science Books
Learning Data Science: Our Favorite Data Science Books
The Elements of Data Analytic Style
Executive Data Science
Data Analysis for the Life Sciences
Data-Engineering-with-Python: Data Engineering with Python, published by Packt
Statistical inference for data science
Essays on Data Analysis
Advanced Linear Models for Data Science
Introduction to Data Science
The Best Free Books for Learning Data Science
The Data Engineering Cookbook
Great Books for Data Science
브런치북 데이터 과학 미니북
Efficient Python Tricks and Tools for Data Scientists — Effective Python for Data Scientists
Free Book: Foundations of Data Science (from Microsoft Research Lab) - DataScienceCentral.com
PDA_Book: Code Examples Data Science using Python
Python for Data Analysis, 3E
- 판다스 개발자가 직접 쓰고 공개한 파이썬 데이터 분석 3판

Conference

David Aronchick - Revolutionizing the Big Data Age With Compute over Data | PyData Global 2022 - YouTube
데이터야놀자(2022) - 데이터로 토이 서비스만들기 · Present google sheets, telegram chatbot, pandas dataframe, airflow
- chatbot-reviewrate-compare: 네이버/카카오/구글 맛집 평점을 비교해주는 챗봇입니다
데이터야놀자2021 데이터와 함께하는 똑똑한 중고 거래 - 삼데오백님 - YouTube
데이터야놀자2021 공공데이터를 활용한 서울시 공/사교육 분석 - 고동우(데이터드림)님 - YouTube
Agile Data Science - John Sandall | PyData Global 2021 - YouTube
데이터로 트렌드 읽는 방법 | NHN FORWARD
Taming the Data Mess, How Not to Be Overwhelmed by the Data Landscape - YouTube
SOCAR DATA MeetUp 2022 - YouTube
Phillip Cloud & Gil Forsyth - Ibis: A fast, flexible, and portable tool for data analytics - YouTube

Course MOOC Lecture

수강료 500만원 데이터사이언스 스쿨 커리큘럼을 대체하는 온라인 무료강의 15개 커리큘럼
- 통계학, 선형대수학, numpy, 검정 및 추정(확률론), machine learning, database, 데이터 시각화, 데이터 분석, deep learning
모두를 위한 데이터 사이언스 강좌소개 : 부스트코스
Review: Udacity Data Analyst Nanodegree Program
I Dropped Out of School to Create My Own Data Science Master’s — Here’s My Curriculum
Learn Data Science in 3 Months
- Learn_Data_Science_in_3_Months
Our 25 Favorite Data Science Courses From Harvard To Udemy
pubdata.tistory.com/category/Lecture_DataMining
Nonnegative Matrix Factorization via Rank-One Downdate
- 이 장에서는 새운 기법인 NMF(Non-negative Matrix Factorization) 을 소개
5 Bite-Sized Data Science Summaries
5 Online Data Science Courses You Can Finish in 1 Day | by Sara A. Metwalli | Aug, 2021 | Towards Data Science
The online courses you must take to be a better Data Scientist | DataTau
Data-Science-For-Beginners: 10 Weeks, 20 Lessons, Data Science for All!
Dev Intro to Data Science - YouTube
Practical Data Ethics | Data ethics
Free Data Science for Beginners curriculum on GitHub - DEV Community
Data Analytics Full Course 2022 | Data Analytics For Beginners | Data Analytics Course | Simplilearn - YouTube
5 Best Python Courses For Data Science Beginners in 2022 - Best of Lot
Data Science Grandmaster Series - YouTube
12 Best+FREE Data Engineering Courses Online & Certifications- 2022
데이터 사이언스 스쿨 — 데이터 사이언스 스쿨

Data Cleaning

The Simple Yet Practical Data Cleaning Codes To solve the common scenarios of messy data
sampleclean - Data Cleaning With Algorithms, Machines, and People
The Ultimate Guide to Data Cleaning

Data Cleaning Python

Quick Guide: Steps To Perform Text Data Cleaning in Python
Steps for effective text data cleaning (with case study using Python)
The Art of Cleaning Your Data
Cleaning and Tidying Data in Pandas || Daniel Chen

Data Mining

Top 10 data mining algorithms in plain English
Statistical Data Mining Tutorials
Data Mining and Statistics: What's the Connection?
Introduction to Data Mining
Difference between classification and clustering in data mining?
OPENDATAMINER - THE DATA MINING COMPANY THAT TURNS YOUR DATA INTO VALUES
데이터 전처리 - RomanticQ의 머신러닝
텍스트 마이닝 기법 - RomanticQ의 머신러닝
텍스트 마이닝 기법2 - RomanticQ의 머신러닝
텍스트 마이닝 기법3 - RomanticQ의 머신러닝
knime.com

Library

"가자, 데이터의 세계로" 무료 애널리틱스 툴 7선
Comparison of top data science libraries for Python, R and Scala Infographic
10 Data Science Tools I Explored in 2018 - New Languages, Libraries, and Services
2018’s Top 7 Libraries and Packages for Data Science and AI: Python & R - This is a list of the best libraries and packages that changed our lives this year, compiled from my weekly digests
The Five Best Frameworks for Data Scientists
Learn Data Engineering: My Favorite Free Resources For Data Engineers
Interactive Tools for ML, DL and Math
Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning - KDnuggets
10 Essential Tools Data Scientists Should Learn in 2022 | by javinpaul | Javarevisited | Jan, 2022 | Medium
academictorrents.com
Airbyte | Open-Source Data Integration Pipelines To Your Warehouses
- Airbyte - 오픈소스 ELT | GeekNews
Announcing FsLab: Data science package
Beaker
chatbot-reviewrate-compare: 네이버/카카오/구글 맛집 평점을 비교해주는 챗봇입니다
- 데이터야놀자(2022) - 데이터로 토이 서비스만들기 · Present
danfojs: Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data
- Pandas를 자바스크립트에서! Danfo.js - YouTube
- Danfo.js | 재미있는 기억만 남기자
- Danfo.js에 관한 공부한 내용 정리
- JavaScript에서도 pandas 같은 라이브러리를? Danfo.js를 소개합니다.
- Introducing Danfo.js, a Pandas-like Library in JavaScript — The TensorFlow Blog
- Danfo.js: A Pandas-like Library for JavaScript
Dataflow Data pipeline asset management with Dataflow | by Netflix Technology Blog | Netflix TechBlog
Datasette: An open source multi-tool for exploring and publishing data
- Datasette - 개인용 데이터 웨어하우스 오픈소스 | GeekNews
dataverse: The Universe of Data. All about data, data science, and data engineering
datools a collection of Python-based tools for working with data in relational databases
- Data diffs: Algorithms for explaining what changed in a dataset | N=1 (marcua’s blog)
dbt - Transform data in your warehouse
- dbt로 ELT 파이프라인 효율적으로 관리하기
Decodable
- Announcing General Availability of the Decodable Real-Time Data Platform - Decodable
Digdag - a simple tool that helps you to build, run, schedule, and monitor complex pipelines of tasks Data Workflow Management Opensource Engine
faker.js: generate massive amounts of realistic fake data in Node.js and the browser
- How to generate mock data with faker.js | by Lucas Jellema | JavaScript In Plain English | Sep, 2020 | Medium
GRID - Global Research Identifier Database Cataloging the world's research organisations
HEARTCOUNT 모든 현업을 위한 데이터 분석 솔루션 :: 하트카운트 HEARTCOUNT
- 홈쇼핑의 취소율 데이터 시계열 분석
- 데이터에서 Signal(유의미한 차이)과 Noise(우연에 의한 차이) 분리하기
- 매체별 광고비가 매출에 미치는 영향, 회귀분석
- 데이터 분석 방법의 종류
- Facet Plot, 다이아몬드 같은 시각화
Metaflow - A framework for real-life data science
- 데이터과학 프로젝트에서, 모델 개발 외적인 "인프라" 적인 요소를 관리하는 도구
- Job 스케줄링, 플로우 요소별 버전 관리와 결과에 대한 Inspecting, 플로우 및 플로우 요소별 라이브러리 의존성 주입, Amazon S3에 대한 built-in 지원, 컴퓨팅 자원에 대한 손쉬운 스케일 인/아웃 등을 가능
- 기본적으로 플로우는 그래프 형태로, 그 파이프라인의 연쇄성이 연결되어, 어떤 형태라도 플로우가 흘러가는 형상을 구상 가능
- 모든 플로우의 목록은 싱글톤적인 객체에 의해서 관리
- 일단 플로우가 생성되면, 원하는 어떤 환경(주피터 노트북, IDE등) 에서도 접근 가능
- 추가적으로, 파일 (로컬 또는 S3) 및 실험에 사용되는 다양한 파라미터를 선언만 해두고, 값을 CLI로 프로그램 실행시 주입해주는것도 가능
- 파라미터도 데이터 처럼 파일로 관리해서 버전관리가 가능
- 라이브러리 형식으로 만들어졌지만, 요즘 추세처럼 annotation 형태로 기능 정의하는 방식 지원
- 예를 들어, 플로우의 각 단계설정은 @step, 플로우 단위의 라이브러리 의존성 주입은 @conda_base, 플로우 요소별 { 라이브러리 의존성 주입은 @conda, 자원의 크기설정은 @resource, AWS 배치단위 자원 크기설정은 @batch, 단계 실패시 재시도여부 설정은 @retry} 등이 존재
- 부가적으로, 플로우의 단계(요소)는 각각 버전이 컨트롤 되기 때문에, 각 단계별 결과를 조합해서 네임스페이스단위로 묶는것도 가능
- 원하는 실험 단계의 결과를 조합해서 분석 가능 (tagging도 가능)
- MetaFlow는 기본적으로, "first class support for various services on AWS" 인 라이브러리. Amazon S3에 배포하는 튜토리얼도 잘 작성
- How Metaflow Became Netflix's Beloved Data Science Framework • Julie Amundson • YOW! 2022 - YouTube
Mirador is a tool for visual exploration of complex datasets
Mockaroo - Mockaroo lets you generate up to 1,000 rows of realistic test data in CSV, JSON, SQL, and Excel formats
Mode - Analyze raw or modeled data with SQL, Python, or R without moving between different tools
nf-data-explorer: The Data Explorer gives you fast, safe access to data stored in Cassandra, Dynomite, and Redis
- Exploring Data @ Netflix. By Gim Mahasintunan on behalf of Data… | by Netflix Technology Blog | Jun, 2021 | Netflix TechBlog
Piwik - Open Analytics Platform
Psyberg
- Psyberg: Automated end to end catch up | by Netflix Technology Blog | Nov, 2023 | Netflix TechBlog
- Streamlining Membership Data Engineering at Netflix with Psyberg | by Netflix Technology Blog | Nov, 2023 | Netflix TechBlog | Netflix TechBlog
- Diving Deeper into Psyberg: Stateless vs Stateful Data Processing | by Netflix Technology Blog | Nov, 2023 | Netflix TechBlog
Velox Hello from Velox | Velox
- Introducing Velox: An open source unified execution engine
  - Meta에서 데이터 관리 시스템을 가속화하고 간소화하는 통합 실행 엔진 Velox를 오픈소스로 공개
  - presto, Spark, PyTorch 등 많은 발전이 있었지만 여러 시스템 간에 상호 작용해야 하는 어려움을 Meta 내에서 Velox로 해결하기 위해서 개발
Weld: A common runtime for high performance data analytics
- Numba와 비슷하게, Rust 기반 컴파일러를 이용해 Data 분석 스크립트의 속도를 최적화하여 빠르게 함
- 내용에 따르면 특정 데이터 분석의 경우 속도 향상
- Pandas, TensorFlow, Spark SQL등 결합 가능

Library data discovery

데이터 디스커버리 플랫폼 도입기 - 1편. 데이터 디스커버리란?(feat. Datahub VS Amundsen 비교 분석) - SOCAR Tech Blog
- 쏘카의 데이터 디스커버리 플랫폼 도입기 | GeekNews
데이터 디스커버리 플랫폼 도입기 - 2편. GKE에 Datahub 구축하기 - SOCAR Tech Blog
DDP를 말할 때 같이 고려해 볼 것들(1) – Cojette (꼬젯) – 잡덕 잉여 데이터 분석가의 이것저것
DDP를 말할 때 같이 고려해 볼 것들(2) – Cojette (꼬젯) – 잡덕 잉여 데이터 분석가의 이것저것
DDP를 말할 때 같이 고려해 볼 것들(3) – Cojette (꼬젯) – 잡덕 잉여 데이터 분석가의 이것저것
amundsen Data discovery & metadata management (amundsen installation)
datahub A Metadata Platform for the Modern Data Stack | DataHub
- 뱅크샐러드 Data Discovery Platform의 시작 | 뱅크샐러드
dbt - Transform data in your warehouse
- Automating Propagations with DataHub and DataHub-Tools | by Ada Draginda | Mar, 2023 | DataHub
- datahub-tools: simplify working with DataHub API endpoints

News

The Best of Big Data: New Articles Published This Month (June 2017)

Public Data

19 Free Public Data Sets For Your First Data Science Project
Fueling the Gold Rush: The Greatest Public Datasets for AI
Awesome Public Datasets
city of Chicago
datalab.naver.com
Open Data for Deep Learning
Research data management simplified
Welcome to Kaggle Datasets
Creating a dataset using an API with Python
Best Public Datasets for Machine Learning and Data Science

Python

awesome-data-and-analytics-governance: 데이터 & 분석 거버넌스 제고를 위한 양질의 레퍼런스들을 수집하고 생각을 나눌 수 있습니다.
Awesome Data Engineering Learning Path - Best resources, books, courses
Awesome Data Science with Python
awesome-ds-setting: Data science setting for a new machine
github.com/PyDataKR/pydata.kr
Hands-on Introduction to Spatial Data Analysis in Python
Data Science for Losers
Data Science for Losers, Part 2 – Addendum
The Guide to Learning Python for Data Science
dprl - 의사결정(DP) + 강화학습(RL) + 온라인광고(OA) + 파이썬웹(Pyweb)
Infographic – Quick Guide to learn Python for Data Science
PyDataSentry - Memory for Data Science
Unisex names – Data Analysis Use Case
A modern guide to getting started with Data Science and Python
빅데이터를 위한 파이썬(Python) 교육 내용 정리
Python for Data Science - Python Brasil 11 (2015)
Machine Learning in Python has never been easier
python-data-analysis
- Python으로 Big Data 분석하기
Data Analysis with Python and Pandas
Marco Bonzanini - Building Data Pipelines in Python
Robson Junior - Mastering a data pipeline with Python: 6 years of learned lessons from mistakes - YouTube
PyData Boston September 2023 session 1: Data sci done wrong: how & why data scientists make mistakes - YouTube
Data manipulation primitives in R and Python
How A Data Scientist Can Improve His Productivity
Python으로 Big Data 분석하기
Neuroimaging_Python 뉴로해킹 파이썬 파트 스터디 관련 자료를 저장
python4mri - Introduction to Python for neuroimaging (MRI) analysis
차원 축소 (Principal Component Analysis)
A Complete Tutorial on Ridge and Lasso Regression in Python
Ridge와 Lasso Regression의 쉬운 풀이
Intro to Linear Model Selection and Regularization Understand how to select the best linear model, and understand what lasso and ridge regression do
How to Perform Lasso and Ridge Regression in Python
Fast group lasso in Python
Predicting Football Results With Statistical Modelling
12 Python Resources for Data Science
파이썬 코딩으로 말하는 데이터 분석
- 1. 통계
- 2. 베이즈 확률
- 3. 군집화
- 4. 연관 (Apriori 알고리즘)
- 5. 데이터 다루기 (기본,척도조절,차원축소)
- 6. 경사하강법
- 7. 회귀분석 (최소제곱법,경사하강법)
- 8. HMM 학습문제 (Baum-Welch 알고리즘)
- 9. k-NN (최근접이웃,분류문제)
- 10. DTW (Data time wrapping)
The Python ecosystem for Data Science: A guided tour - Christian Staudt
Analysing IPL Data to begin Data Analytics with Python
Python for Data Science: 8 Concepts You May Have Forgotten
땀내를 줄이는 Data와 Feature 다루기
Python Data Science Handbook
- Python Data Science Handbook
- Python Data Science Handbook.ipynb
- Book: Python Data Science Handbook - DataScienceCentral.com
야구 대회 간단한 모델링 코드
How to use Data Science to better understand your customers
Aaron Richter: Your data fits in RAM: How to avoid cluster computing | PyData Miami 2019
Data Science Toolkit (Concepts + Code) Jupyter, Numpy, Pandas, Plotly
10 Simple hacks to speed up your Data Analysis in Python
- 1. Profiling the pandas dataframe dataframe 데이터를 보고서 형식으로
  - Pandas Profiling
- 1. Bringing Interactivity to pandas plots
- 1. A Dash of Magic
- 1. Finding and Eliminating Errors
- 1. Printing can be pretty too
- 1. Making the Notes stand out.
- 1. Printing all the outputs of a cell
- 1. Running python scripts with the ‘i’ option.
- 1. Commenting out code automatically
- 1. To delete is human, to restore divine
영화진흥원 박스오피스 순위 분위
Ondrej Kokes - High Performance Data Loss | PyData Fest Amsterdam 2020 - YouTube
Data Science With Python | Python For Data Science | Data Science For Beginners | Simplilearn - YouTube
python 데이터 분석 실습 코로나 19 2021 현재 시점 분석하기 1편
15 Python Snippets to Optimize your Data Science Pipeline - KDnuggets
5 ways for Data Scientists to Code Efficiently in Python
Data scientist’s guide to efficient coding in Python | by Dr. Varshita Sher | Jul, 2021 | Towards Data Science
Creating a Data Science Python Package Using Jupyter Notebook | by Abid Ali Awan | Jul, 2021 | Towards Data Science
Why Python is best choice for Data Science? - DEV Community
Python for Data Science - YouTube
Analyzing Data with Python - YouTube
90+ Data Science Projects You Can Try with Python | Python in Plain English
A Guide to Getting Datasets for Machine Learning in Python
Refactoring A Data Science Project Part 1 - Abstraction and Composition - YouTube
Refactoring A Data Science Project Part 2 - The Information Expert - YouTube
Refactoring A Data Science Project Part 3 - Configuration Cleanup - YouTube
Ian Ozsvald - Building Successful Data Science Projects | PyData London 2022 - YouTube
날씨마루 파이썬 날씨 데이터 분석 - YouTube
Sebastiaan J. van Zelst: Process Mining in Python | PyData Eindhoven 2019
Python Fundamentals For Data Engineering: Create your first ETL Pipeline - YouTube
Python & Visual Studio Code - Revolutionizing the way you do data science - presented by Jeffrey Mew - YouTube
Irina Klein - IMF Data Discovery and Collection | PyData Global 2022 - YouTube
DEVOCEAN 메타분석 - 1. 데이터 탐색

Python Library

Any aspiring data scientist should know these Python libraries
Top 15 Python Libraries for Data Science in 2017
Top 15 Python Libraries for Data Science in 2017
Lesser Known Python Libraries for Data Science
10 Simple hacks to speed up your Data Analysis in Python 여러가지 분야(data, visualization등)의 library
Python ETL Tools: Best 8 Options
데이터 과학을 위해 '더 개선된' 최신 필수 파이썬 툴 6가지 - ITWorld Korea
Accelerator 단일 시스템에서 테라바이트 단위의 데이터를 고속 처리를 하도록 설계된 eBay에서 개발한 오프소스 프레임워크
- accelerator-project_skeleton
- Announcing the Accelerator
BlazingSQL(BSQL) GPU-accelerated SQL and Data Science - Rodrigo Aramburu - YouTube
CC-hurricane-analysis-project: A simple project with several functions that organize and manipulate data about Category 5 Hurricanes
dabl - the Data Analysis Baseline Library
- Doing Hard Things with Less Data; and Dabl: AutoML with a human in the loop - YouTube
dagster: A data orchestrator for machine learning, analytics, and ETL
- Introducing Dagster. A open-source Python library for… | by Nick Schrock | Dagster | Medium
- Dagster: The Data Orchestrator. As machine learning, analytics, and… | by Nick Schrock | Dagster | Aug, 2020 | Medium
- Sandy Ryza - Data pipelines != workflows: orchestrating data with Dagster | PyData Global 2022 - YouTube
datatable An Overview of Python’s Datatable package
deep-daze: Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
GOAI: Open GPU-Accelerated Data Analytics
hamilton: A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc
- Elijah ben Izzy- Scalable Feature Engineering with Hamilton | PyData NYC 2022 - YouTube
- Elijah ben Izzy & Stefan Krawczyk - Scalable Feature Engineering with Hamilton | PyData Global 2022 - YouTube
Ibis: Scaling the Python Data Experience
Kedro Tam-Sanh Nguyen - Writing and Scaling Collaborative Data Pipelines with Kedro - YouTube
Lineapy
- Thomas Frauholz: From notebook to pipeline in no time with LineaPy - YouTube
Mandrova: Sensor Data Generator for Python3
Mode - SQL, Python, & visualizations in one platform. Mode helps analysts and data scientists improve their workflow and share impactful analysis easily
MKL Intel
- Installing the Intel® Distribution for Python and Intel® Performance Libraries with pip and PyPI
OpenRVDAS (Open Research Vessel Data Acquisition System) - a Python-based open source architecture intended to allow easy creation of customized data acquisition systems for research vessels and other scientific installations
Prefect - The New Standard in Dataflow Automation - Prefect
- PyData Triangle Meetup | Eleanor Hanna & Mary Clair Thompson - YouTube
pylift: Uplift modeling and evaluation library. Actively maintained pypi version
- Welcome to pylift’s documentation! — pylift 0.1.3 documentation
- About Wayfair | Pylift: A Fast Python Package for Uplift Modeling
Pytubes - a library that optimizes loading datasets into memory
- Analysing 1.4 billion rows with python Using pytubes, numpy and matplotlib
RAPIDS Open GPU Data Science | RAPIDS
- Accelerating Data Science with RAPIDS - Keith Kraus
- RAPIDS cuGraph
- Fundamentals Of Accelerated Data Science With RAPIDS
- Using GPUs for Data Science and Data Analytics
- High Performance Python - Gus Cavanaugh | PyData Global 2021 - YouTube
RoboSat - an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery
- PyParis 2018 - Robosat: an Open Source and efficient Semantic Segmentation...
siuba: Python library for using dplyr like syntax with pandas and SQL
slr - Simple linear regression with confidence intervals on parameters and prediction
Snorkel: A System for Fast Training Data Creation
- Introducing Snorkel
- Hand in hand with weak supervision using snorkel - Szymon Wojciechowski
- Weak Supervision: A New Programming Paradigm for Machine Learning
- Introducing the New Snorkel
- Snorkel is a fundamentally new interface to ML without hand-labeled training data
- 게임의 부정 사용자를 탐지하는 방법, Snorkel을 활용해 라벨 보정하기
- Snorkel을 이용한 직무 키워드 추출 - DRAMA&COMPANY
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set
- Tuplex - 병렬 빅데이터 처리 프레임워크 | GeekNews

Quality

데이터 품질에 관한 5개 체크포인트 전처리와 관계된 매우 실용적이고 중요한 부분에 대한 좋은 글
Data Observability In Practice: Data Monitoring At Scale With SQL And Machine Learning - Monte Carlo Data
Data Quality Automation at Twitter
GX: a proactive, collaborative data quality platform • Great Expectations
- 데이터 품질 이슈로 발생하는 data downtime을 줄이자

Recommendation

제1화 추천 시스템 이란.ppt
데이터마이닝 02-추천시스템 만들기
Recommending items to more than a billion people
Recommendation Engines for Email Marketing
Seldon
The Netflix Prize and Production Machine Learning Systems: An Insider Look
Netflix algorithm: Prize Tribute Recommendation Algorithm in Python
넷플릭스는 어떻게 작동하는가
- Artwork Personalization at Netflix
The Next Step in Personalization: Dynamic Sizzles | by Netflix Technology Blog | Nov, 2023 | Netflix TechBlog
넷플릭스와 아마존
Deep Dive into Netflix’s Recommender System | by David Chong | Towards Data Science
Building confidence in a decision | by Netflix Technology Blog | Netflix TechBlog
Experimentation is a major focus of Data Science across Netflix | by Netflix Technology Blog | Jan, 2022 | Netflix TechBlog
Speech-Based, Natural Language Conversational Recommender Systems
Using Graph Theory to Build a Simple Recommendation Engine in JavaScript
실시간 추천엔진 머신한대에 구겨넣기
- '실시간 추천엔진 머신한대에 구겨넣기' 50줄로 구현하기
- Tiny Elephant(English) In memory based collaborative filtering by python
추천 시스템의 기초 python RecSys
MapReduce 기반 대용량 추천 알고리즘 개발
빅데이터와 NLP를 이용한 11번가 상품 추천
Building NLP Content-Based Recommender Systems A tutorial for a NLP recommendation engine using unsupervised learning
- job_analysis_content_recommendation.ipynb
Powerpoint-Slides for Recommender Systems - An Introduction
Content Based Anime Recommender! ipynb notebook
딥러닝 (Tensorflow) 을 이용한 추천시스템 개발
Quick Guide to Build a Recommendation Engine in Python
CatBoost - an open-source gradient boosting library with categorical features support
- categorical feature 지원. ranking, recommendation
- Industry's fastest inference implementation: Presenting to you the New version of CatBoost gradient boosting library
- CatBoost vs. Light GBM vs. XGBoost
Recommendation System Algorithms
인공지능추천시스템 airs 개발기 모델링과 시스템
Spotify’s Discover Weekly: How machine learning finds your new music 노래 추천
Introduction to Recommender System. Part 1 (Collaborative Filtering, Singular Value Decomposition)
- Introduction to Recommender System Part 1 정리
Introduction to Recommendation Systems
Listing Embeddings for Similar Listing Recommendations and Real-time Personalization in Search
SK ICT Tech Summit 2017 추천 플랫폼 콜로세오
- SK ICT Tech Summit 2017 추천 플랫폼 콜로세오
눈으로 듣는 음악 추천 시스템 CF, CBF
ML: Matchbox Recommender 추천 모델 적용 실패 사례
How Cambridge Analytica’s Facebook targeting model really worked – according to the person who built it
- 페이스북의 ‘좋아요’는 어떻게 프로파일링에 사용되었는가
The Remarkable world of Recommender Systems 기초를 이해할 수 있는 글
Recommendation Systems in the Real world
이상열, Interpretable Recommender System 개발 사례연구, NDC 2019
브런치 추천의 힘에 대한 6가지 기술(記述)
2 years of Developing Personalized Real-Time Recommendation Service Based on Machine Learning
System Design for Recommendations and Search
- 추천과 검색의 시스템 디자인을 두 축(Offline vs Online 환경 / Candidate Retrieval vs Ranking)으로 나누어 분석
  - 2 X 2에 Component들을 배치해보면 링크의 그림 같이 생각 가능
  - 저자는 Alibaba, Facebook, JD, Doordash에서 공유한 시스템 디자인 사례들을 이 프레임에 맞춰 분석
  - 프레임 제시 뿐만 아니라 중간중간에 train-test skew 문제, embedding model과 ANN을 같은 컨테이너에서 운영하는 이유, batch대신 real-time recommendation이 정말 필요한가 같은 엔지니어링 이슈를 다뤄서 좋음
- 번역글 System Design for Recommendations and Search | You May Also Like
Offline to Online: Feature Storage for Real-time Recommendation Systems with NVIDIA Merlin | NVIDIA Technical Blog
랭킹 시스템 평가 방법 (MRR, DCG)
How Youtube is recommending your next video
Using machine learning to predict what file you need next
Using machine learning to predict what file you need next, Part 2
Powered by AI: Instagram’s Explore recommender system
Youtube 추천 시스템 분석
Spotyfy가 당신을 알고 있다, 어떻게?
LINE Timeline의 새로운 도전 1편 – 추천 컨텐츠 탐색을 위한 Discover와 새로운 구독 모델 Follow
LINE Timeline의 새로운 도전 2편 - Discover 딜리버리 시스템 소개 - LINE ENGINEERING
LINE Timeline의 새로운 도전 3편 - Discover 추천 모델 - LINE ENGINEERING
추천(Recommendation) 시스템 - 알고리즘 Trend 정리
추천시스템에 빠져들기
쿠팡 추천 시스템 2년간의 변천사 (상품추천에서 실시간 개인화로)
추천 시스템(Recommendation System) - 협업 필터링 (Collaborative filtering) 설명 (1)
카카오 AI추천 : 협업 필터링 모델 선택 시의 기준에 대하여 – tech.kakao.com
코드잇 머신 러닝을 통한 추천 시스템 추천 방법(내용 기반, 협업 필터링)
우리 동네 맛집 추천엔진 직접, 쉽게 만들기 (크롤링과 코사인 유사도) cosine similarity
SaaS 추천 솔루션을 이용한 개인화 추천 :: GS Retail Engineering
추천시스템 맛집
Learn About Recommender Systems With These 8 Resources
ifkakao 추천 시스템: 맥락과 취향 사이 줄타기
Simon Kim의 데이터 과학 - YouTube
쿠팡 추천 시스템 2년간의 변천사
추천 서비스와 아키텍처 1 -추천 서비스란 무엇인가. 이 글에서는 평소 무심코 지나쳤을 추천 서비스를 좀 더 이해할 수… | by Jongmin Lee | How we build MyRealTrip | Nov, 2020 | Medium
추천 서비스와 아키텍처 2 -추천 서비스 제공을 위한 아키텍처 | by Jongmin Lee | How we build MyRealTrip | Nov, 2020 | Medium
당신 취향의 맛집을 추천해드립니다 : 장소 개인화 추천 시스템의 비밀
추천시스템 평가는 어떻게 하면 좋을까?
번역글 REVEAL'20 Workshop Introduction | You May Also Like
T아카데미 | 스마트 ICT 전문가 양성
1년차 주니어가 추천 시스템 현장에서 마주한 고민. 머신 러닝의 여러 분야 중 추천 시스템에 가장 큰 매력을 느꼈습니다… | by Zimin | WATCHA | Apr, 2021 | Medium
비용 효율적인 Click-Through Rate Prediction 모델로 하쿠나 라이브 추천시스템 구축하기 | Hyperconnect Tech Blog
카카오 AI추천 : 토픽 모델링과 MAB를 이용한 카카오 개인화 추천 – tech.kakao.com Multi Armed Bandit
Advertiser Recommendation Systems at Pinterest | by Pinterest Engineering | Pinterest Engineering Blog | Jul, 2021 | Medium
Recommender System KR
Keynote 7: Moving Beyond Recommender Models - Even Oldridge (NVIDIA), Karl Byleen-Higley (NVIDIA) - YouTube
- Candidate Retrieval + Ranking과 같이 2-stage로 이뤄진 추천 시스템이 많이 언급
  - 발표자는 실제로는 Filtering과 Ordering이라는 숨겨진 단계가 있기 때문에 4 stage가 필요하다고 주장
  - Filtering은 Candidate Retrieval 다음에 사용이 불가능한 아이템을 추가로 걸러내는 단계
  - Ordering은 Ranking 다음에 최종 리스트 순서를 결정할 때 순서를 조정하거나 아이템을 제외하는 단계
  - Explicit하게 적용되는 경우가 많고 모델이 학습하기 어렵거나/ 번거로운 비즈니스 로직을 적용하는 경우가 많아서 이전 단계와 구분지어 생각해야 한다고 주장
  - 인스타그램의 예시를 생각해보면 다른 유저를 차단하거나 알림을 끄는 경우 Filtering에서 차단한 유저의 게시글을 제외하는게 편함
  - 또한 한 유저의 글이 랭킹 점수가 비슷해 피드에 연달아 나온다면 유저 경험이 좋지 않기 때문에 순서를 다시 Ordering 할 필요 존재
유튜브는 내가 좋아할 다음 영상을 어떻게 추천해줄까? (알기 쉽게 설명한 구글 최신 논문) – techNeedle 테크니들
Collaborative filtering doesn't work for us
On YouTube’s recommendation system
Spotify가 당신의 음악 다양성을 이해하는 방법 – 인사이트캠퍼스
카카오 AI추천 : 카카오의 콘텐츠 기반 필터링 (Content-based Filtering in Kakao) – tech.kakao.com
- 카카오 웹툰에서 콘텐츠 기반 필터링으로 관련 있는 다른 콘텐츠를 추천하는 방법 설명
- 콘텐츠의 유사도를 측정하기 위해서 아이템의 벡터를 만들기 위해 One-hot encoding과 Embedding
- 표현할 데이터 범주의 영역이 넓거나 데이터가 복잡하다면 Embedding을 더 많이 사용
- 콘텐츠 기반 필터링은 소비 이력이 없어도 아이템 정보만 있으면 추천할 수 있다는 장점이 있지만, 소비 이력 데이터가 충분하다면 협업 필터링보다 추천 성능이 밀린다고 함
카카오 AI추천 : 카카오 음악 추천을 경험해보고 싶다면? Melon Playlist Dataset (feats. Kakao Arena) – tech.kakao.com
객관화 되기 어려운 취향의 벡터화 : 네이버 블로그
Recsperts - Recommender Systems Experts
컨텐츠 기반 필터링 구축기: MiniLM, ScaNN 그리고 TFServing - The Highlights - 라이너 팀 블로그
Insider Tips for Building Personalized Recommender Systems - YouTube
Autoencoders | Machine Learning for Recommender Systems - YouTube
신입 리서치 엔지니어의 개인화 콘텐츠 추천 모델 구현기. 안녕하십니까, TVING Data Engineer 팀의 Research… | by 주찬형 | tving.team | Mar, 2022 | Medium
Bag-of-Tricks for Recommendation: Recency, Clustering 그리고 Item Shuffling - The Highlights - 라이너 팀 블로그
Learning to Rank - DRAMA&COMPANY
Survey 추천시스템 라이브러리 비교
Real World Recommendation System - Part 1 - by Nikhil Garg
Real World Recommendation Systems - Part 2 (Training Data Generation)
‘AI 추천 기술’을 선도하는 카카오 추천팀을 소개합니다. – tech.kakao.com
Reinforcement Learning for Budget Constrained Recommendations | by Netflix Technology Blog | Aug, 2022 | Netflix TechBlog
추천 시스템: Bloom Filter for Filtering Layer
Why do we need two-stage Recommender System?
현대적인 추천 시스템 구축을 위한 여정 - 허훈(LINER) I 모두콘 2022 - YouTube
Twitter's Recommendation Algorithm
- the-algorithm: Source code for Twitter's Recommendation Algorithm
- Twitter가 자사의 추천 알고리즘을 오픈소스 공개
- 트위터에서 For You 탭에 보여줄 트윗을 선정하는 알고리즘으로 후보 소스로 수억 개의 풀에서 1,500개의 트윗을 추출해서 보여주는데 트윗은 사용자가 팔로잉하고 있는 In-Network 소스와 팔로잉하고 있지 않은 Out-of-Network 소스 두 가지로 나누어서 50:50 비율로 선정
- In-Network 소스에서는 두 사용자 간의 상호 참여 가능성을 예측하는 모델인 Real Graph를 통해 트윗의 순위를 결정
  - sig-alternate.pdf
- Out-of-Network 소스에서는 팔로우하지 않음에도 관련성을 찾아야 하므로 두 가지 방법 사용
  - 소셜 그래프를 통해 내가 팔로잉하는 사람들과 비슷한 관심사를 가진 사람들을 통해 실시간 상호작용 그래프를 유지하는 그래프 처리 엔진인 GraphJet을 개발
    - p1281-sharma.pdf
  - 소셜 그래프보다 훨씬 큰 비중을 차지하는 임베딩 스페이스는 사용자의 관심사와 트윗의 관련도를 수치로 만들어서 145,000개의 커뮤니티를 3주마다 업데이트
  - 이렇게 순위가 정해진 트윗을 최종적으로 필터링 및 정제를 거친 후 사용자에게 보여주는데 이 파이프라인이 하루에 약 50억 번 실행되고 평균 1.5초 이내에 완료
- 오픈 소스를 공개하면서 Elon Musk를 따로 처리하는 코드가 발견되어 논쟁거리가 되자 문제 되는 코드와 Git 히스토리를 정리해서 다시 업로드
- 상남자 특) 트위터 소스코드 깃헙에 공개함 - YouTube
TikTok for Text! 라이너 앱 Session-based Recommender 구축기 – The Highlights – 라이너 팀 블로그
The TikTok recommender system
추천시스템 Cold Start 문제는 어떻게 해결할까?
토스ㅣSLASH 23 - 머신러닝으로 더 똑똑하게 증권 뉴스 제공하기 - YouTube
Vinija's Notes • Recommendation Systems • Research Papers
거기 말고 이 호텔 어때? - 호텔 서비스 추천 시스템 도입기
Tech Radio : 호텔 추천 서비스(FOR YOU) 편
개인화 추천 시스템 #1. Multi-Stage Recommender System - 오늘의집 블로그
Awesome-Generative-RecSys: A curated list of Generative Recommender Systems (Paper & Code)

Recommendation Python

파이썬 추천 시스템 심화과정
추천시스템 만들기 Hands-on part01 django 이용
- 추천시스템 만들기 Hands-on part01
Collaborative-filtering-Tutorial
Machine Learning for Retail Price Recommendation with Python
Building and Testing Recommender Systems With Surprise, Step-By-Step - Learn how to build your own recommendation engine with the help of Python and Surprise Library, Collaborative Filtering
PyCon KR 2019 추천시스템 이제는 돈이 되어야 한다
Analyzing Hacker News book suggestions in Python
Keeping Sensitive Data Safe Using Recommendation Systems | PyData Global 2021 - YouTube
Machine Learning Recommender System With Python - YouTube
상품 추천 알고리즘 Item-CF의 최적화 여정
implicit: Fast Python Collaborative Filtering for Implicit Feedback Datasets
Recommender Utilities — Microsoft Recommenders 1.1.0 documentation
- recommenders: Best Practices on Recommendation Systems
- recommenders/examples at main · microsoft/recommenders
Surprise - A Python scikit for recommender systems
TOROS: Python Framework for Recommender System
TOROS Buffalo: A fast and scalable production-ready open source project for recommender systems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_science.md

data_science.md

Data Science

Book

Conference

Course MOOC Lecture

Data Cleaning

Data Cleaning Python

Data Mining

Library

Library data discovery

News

Public Data

Python

Python Library

Quality

Recommendation

Recommendation Python

Files

data_science.md

Latest commit

History

data_science.md

File metadata and controls

Data Science

Book

Conference

Course MOOC Lecture

Data Cleaning

Data Cleaning Python

Data Mining

Library

Library data discovery

News

Public Data

Python

Python Library

Quality

Recommendation

Recommendation Python