Skip to content

afontana1/Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resources

Design Guides

Tutorials

  1. Design Patterns in the Real World
  2. Design Patterns
  3. Refactoring Guru
  4. Design Patterns: Elements of Reusable Software
  5. UI Design Patterns
  6. OO Design
  7. python-ddd
  8. Design Patterns Scala
  9. Enterprise Application Patterns
  10. Practical Cryptography for Developers
  11. Problem Solving with Algorithms and Data Structures using Python
  12. Computer Security
  13. Open DSA Data Structures and Algorithms
  14. Awesome ETL
  15. Scala Design Patterns
  16. Goodreads ETL Pipeline
  17. Around Data Engineering
  18. Awesome Design Patterns
  19. Python Patterns
  20. Start Data Engineering
  21. Formal Ontology
  22. 6.005 Software Construction
  23. Google Site Reliability Engineering
  24. Calm Code
  25. Cosmic Python
  26. Python for Data Analysis
  27. Python Data Science Handbook
  28. Operating Systems
  29. Distributed Computing
  30. Data Integration
  31. Data Tools
  32. Awesome Big Data
  33. data oriented design
  34. Patterns of Distributed Systems
  35. Data Mesh Principles and Architecture
  36. Patterns for API Design
  37. gaphor
  38. Data Centric Design
  39. open-data-fabric
  40. serverlessland
  41. Data Oriented Design
  42. Python Patterns
  43. Diagrams
  44. python-anti-patterns
  45. Python 3 Patterns, Recipes and Idioms
  46. awesome-ddd
  47. designing-data-intensive-applications
  48. solution-architecture-patterns
  49. Serverless Patterns
  50. awesome-system-design
  51. awesome-software-architecture
  52. Google Site Reliabilty Engineering
  53. Microservice API Patterns

Data Concepts

  1. Data
  2. Databases
  3. Database Management Systems
  4. Data Warehouse
  5. Data Modeling
  6. Data Ops
  7. Metadata
  8. Data Management

Software Engineering Resources

Reading

  1. Algorithms
  2. Data Structures
  3. Software_architecture
  4. Software_engineering
  5. Programming_paradigms
  6. Software_testing
  7. Systems_engineering
  8. Systems_science
  9. Systems_theory
  10. Systems_analysis
  11. Cloud Computing
  12. Software_requirements
  13. Programming Paradigms
  14. Programming_language_concepts
  15. Programming_principles

Design Patterns

  1. Abstract_factory_pattern
  2. Builder_pattern
  3. Singleton_pattern
  4. Prototype_pattern
  5. Object_pool_pattern
  6. Facade_pattern
  7. Chain of Responsibility Pattern
  8. Flyweight Pattern
  9. Design Factory Patterns
  10. Composite Pattern
  11. Bridge Pattern
  12. Mediator Pattern
  13. Visitor Pattern
  14. Adapter Pattern
  15. Model View Controller
  16. Decorator Pattern
  17. Chain of Responsibility Pattern
  18. Command Pattern

Concepts

  1. Futures_and_promises
  2. Asynchronous_method_invocation
  3. Asynchronous_I/O
  4. Async/await
  5. Asynchrony
  6. Concurrent_computing
  7. Thread
  8. Parallel_computing
  9. Birthday_problem
  10. SHA 1
  11. Avalanche_effect
  12. Hash_collision
  13. Lazy_evaluation
  14. Relational_algebra

Continuous Integration and Continuous Deployment (DevOps)

  1. jenkins-tutorial
  2. awesome-ciandcd
  3. awesome-ci
  4. eksctl
  5. Dev-ops Exercises
  6. DockerLabs
  7. former2
  8. awesome-kubernetes
  9. Introduction to GitLab CI & DevOps with AWS - Course Notes
  10. Scuba
  11. dagu
  12. tawazi
  13. gitlab-ci
  14. pulumi

ML Ops

  1. ML Ops Cookbook
  2. awesome-mlops
  3. ML Ops Specialization
  4. Coursera ML Ops
  5. Engineering ML Ops
  6. Practical ML Ops
  7. awesome ML Ops
  8. Evidently
  9. Machine Learning version control
  10. Feast
  11. CleanLab
  12. ML Version Control
  13. WanDB
  14. Metaflow
  15. River
  16. Bytewax
  17. ml-design-patterns
  18. mlflow
  19. kestra
  20. kedro

Data and System Security

  1. ThreatDragon
  2. Kotlin Faker
  3. DataBunker
  4. awesome-IAM
  5. open-data-anonymizer
  6. Presidio
  7. Penetration Testing Tools
  8. Public Pentesting Reports
  9. Nettacker
  10. Kubesploit
  11. Hackerpro
  12. RapidScan
  13. Astra
  14. awesome-pentest-cheat-sheets
  15. Hacking Security Ebooks
  16. awesome-infosec
  17. awesome-web-hacking
  18. infosec-reference
  19. Web Security Testing Guide
  20. Infection Monkey
  21. awesome-web-security
  22. awesome-hacking-resources
  23. h4cker
  24. PayloadsAllTheThings
  25. threat-model-cookbook
  26. Awesome Threat Modeling
  27. awesome-hacking
  28. penetration testing
  29. awesome-pentest
  30. chaos-toolkit
  31. Python for Network Engineers
  32. Chaos Engineering
  33. awesome networking
  34. Python for Network Engineers
  35. awesome-network-automation

Serverless Frameworks

  1. Architect
  2. Webiny-JS
  3. Midway
  4. Amplify-JS
  5. Serverless Express
  6. Claudia
  7. Apex
  8. Zappa
  9. Serverless
  10. tomodachi

Helpful Tools

  1. Boltons
  2. Docker-py
  3. more-itertools
  4. Kafka Python
  5. ZODB
  6. Click
  7. DataConv
  8. DBT Core
  9. State Transition Machines
  10. Storm
  11. Toolz
  12. Paramiko
  13. Textblob
  14. Jupyter: Docker-stacks
  15. cookie-cutter
  16. Transcriber
  17. Pytools
  18. Misskey
  19. OpenMeta
  20. Chalice
  21. Microservice Architecture: Serverless Compute Implementation
  22. Python-Lambda
  23. Pywren
  24. Zappa
  25. Memray: Memory Profiling
  26. Pybossa
  27. Apache Samza
  28. filesystem_spec
  29. google-i18n-address
  30. docker-wsl
  31. aws-data-wrangler
  32. Optimus
  33. metricflow
  34. lightdash
  35. chaos genius
  36. pyrsistent
  37. pydash
  38. latexify_py
  39. rocketry
  40. pydatafaker
  41. pydbgen
  42. faker
  43. RateLimiting
  44. DateTimeRange
  45. tenacity
  46. mako
  47. jinjasql
  48. data engineering on gcp
  49. polars
  50. Vaex
  51. Fugure: Distributed Computation
  52. Funcy
  53. Singer
  54. Dateutil
  55. pyparsing
  56. psutil
  57. ray
  58. click
  59. flask-boilerplate
  60. python-packager
  61. python-project-skeleton
  62. wemake-python-package
  63. pyscaffold
  64. xmltodict
  65. duckdb

Data Viz and BI

  1. Dash and Sample Apps
  2. Seaborn
  3. Plotnine
  4. Bokeh
  5. Pygal
  6. Geoplotlib
  7. Gleam
  8. Missingno
  9. Leather
  10. Altair
  11. Folium
  12. Plotly
  13. Pillow
  14. Superset
  15. Glue Visualization
  16. BIRT
  17. SpagoBI
  18. Seal-Report
  19. metabase
  20. Databox
  21. KNIME
  22. Datapane
  23. Perspective
  24. redash
  25. reportserver
  26. awesome-business-intelligence
  27. Turnilo
  28. SandDance
  29. Abixen Platform
  30. d3
  31. Dash Examples
  32. sweetviz
  33. Awesome Web Viz Frameworks
  34. Echarts
  35. Grafana
  36. awesome-dataviz
  37. python-data-visualization
  38. The-Python-Graph-Gallery
  39. rustworkx
  40. solara
  41. pygwalker
  42. graphic-walker
  43. datapane
  44. gleam
  45. streamlit
  46. ipywidgets
  47. voila

Databases and Parsing

  1. SQL Alchemy
  2. Pyodbc
  3. PyMySQL
  4. Redash
  5. SQLmap
  6. Pyodbc
  7. ddlparse
  8. lacquer
  9. omymodels
  10. sql-metadata
  11. sqlglot
  12. sqlparse
  13. Sqlbucket
  14. DBFread
  15. sqlalchemy-hana
  16. pymssql
  17. sqleyes
  18. data-diff
  19. amazon-redshift-python-driver
  20. spyql
  21. awesome-sqlalchemy
  22. ipython-sql
  23. redshift-developer-guide
  24. cloud-sql-python-connector
  25. aiosql
  26. sqlfluff
  27. sqlmodel
  28. pypika
  29. Amazon Redshift Utils
  30. connector-x
  31. psycopg2
  32. pg_simple
  33. databases
  34. sqlmesh
  35. DBUtils
  36. pymysql-pool
  37. django-db-connection-pool
  38. Vanna
  39. malloy
  40. Apache ORC
  41. Apache Pinot
  42. pdfplumber
  43. camelot
  44. ingestr

API and Web Framework

  1. Flask
  2. Tornado
  3. Tenacity
  4. Eve
  5. Flask Restful
  6. Google API Client
  7. Zeep
  8. Connexion
  9. Hug
  10. Falcon
  11. Aiohttp
  12. FastAPI
  13. OpenAPI Python Client
  14. requests-toolbelt
  15. smart_open
  16. Wikipedia API and Wrapper
  17. Office365-Rest-Python-Client
  18. youtube-dl
  19. Twisted
  20. simple-salesforce
  21. Venmo API
  22. Flask
  23. Django
  24. Coursera Downloader
  25. Public APIs
  26. python-oauth2
  27. requests-oauth2
  28. redo
  29. backoff
  30. Directus Data Stack
  31. StreamLit
  32. Pybossa
  33. starlette
  34. awesome-fastapi
  35. awesome-fastapi-projects

Async & Multitasking & Distributed

  1. Requests-futures
  2. Requests-threads
  3. grequests
  4. async_generator
  5. httpx
  6. requests-async
  7. mpire
  8. offspring
  9. multiprocessing_on_dill
  10. continuous threading
  11. Needle
  12. atasker
  13. asgiref
  14. concurrency-in-python-with-asyncio
  15. Async & Multitasking
  16. fast_map
  17. aiobotocore
  18. aioboto3
  19. aiohttp-client-cache
  20. aiohttp
  21. multiprocess
  22. aiofiles
  23. aiobotocore
  24. aioboto3

Data Flow, Processing, Pipelines

  1. dataflow
  2. pyfi
  3. dataflowkit
  4. data flow graph
  5. python flow
  6. gerda dataflow
  7. dataflows
  8. d6tflow
  9. prefect
  10. Schedule
  11. Luigi
  12. Faust
  13. Redis Queue
  14. Airflow-Great-Expectations
  15. Smart Open
  16. Zipstream
  17. multi-part-upload
  18. Celery
  19. airflow
  20. sftp-lambda
  21. lambda-s3-ftp
  22. Apache Beam
  23. Processing (I/O and Piplines)
  24. stream unzip
  25. pypyr
  26. Data Flow Ops
  27. Apache Spark Guide
  28. Orchest
  29. Mage AI
  30. Meltano
  31. DataJoint Python
  32. Hamilton
  33. Kombu
  34. airbyte
  35. ploomber
  36. data-diff
  37. Amazon Apache Airflow Managed Workflow
  38. Airbyte
  39. mage-ai
  40. Dagster
  41. Data All
  42. awesome-flink and Examples
  43. flink
  44. RedPanda
  45. Materialize
  46. Hazelcast
  47. Watermill
  48. Amazon Kinesis Client Python
  49. Faust
  50. Stream Processing
  51. Spark Streaming in Python
  52. Kestra
  53. Hamilton
  54. CloudQuery
  55. Nifi
  56. Pentaho-Kettle
  57. Camel
  58. Riko
  59. Bonobo
  60. Petl
  61. awesome-apache-airflow
  62. airflow provider sample
  63. metabase
  64. Flowman
  65. Apache Beam
  66. hamilton
  67. pachyderm
  68. elementary

Big Data and Cloud API's

  1. AWS Encryption SDK
  2. AWS Xray SDK
  3. AWS SDK Pandas
  4. Sagemaker SDK
  5. GCP Data Validator
  6. AWS Redshift Driver
  7. Cloudwatch Logging
  8. Former2
  9. Sagemaker Spark
  10. Secrets Manager Caching
  11. Spark With Python
  12. Learning Pyspark
  13. Spark Redshift
  14. Sagemaker Graph ER
  15. aws-glue-developer-guide
  16. pyspark examples
  17. pyspark cheatsheet
  18. aws-scheduler
  19. emr-serverless-samples
  20. Polars
  21. duckDB
  22. Dask
  23. SparkSQL
  24. cloud-experiments
  25. document-understanding-solution
  26. aws-glue-docker
  27. amazon-comprehend-examples
  28. Festin
  29. MinIO-py
  30. Bucketstore
  31. amazon-redshift-udfs
  32. PyLazyS3
  33. Minio

Data Quality & Profiling & Business Rules

  1. Pandas Profiling
  2. WhyLogs
  3. PointBlank
  4. Hooqu
  5. pyDMNrules
  6. DQ-Meerkat
  7. dataqtor
  8. DataGristle
  9. versatile-data-kit
  10. Soda-Core
  11. ydata-quality
  12. Pydqc
  13. Business Rule Engine
  14. Python Business Logic
  15. Business Rules
  16. Hay_checker
  17. Great Expectations
  18. Feast
  19. Datatile
  20. business-rules venmo
  21. pydqc
  22. Data Gristle
  23. deep diff
  24. Great Expectations
  25. XML to Dict
  26. Pylint
  27. postal-address
  28. python-email-validator
  29. flatten-dict
  30. pytesseract
  31. python deequ
  32. ydata-profiling
  33. Memphis
  34. Benthos
  35. Awesome Streaming
  36. Storm

Data lineage & Discovery & Observability

  1. bigquery-data-lineage
  2. multi-data-lineage-capture-py
  3. DataTracer
  4. data-lineage
  5. elementary
  6. stairlight
  7. OpenLineage
  8. Marquez
  9. Odd-Platform
  10. waltz
  11. sqllineage
  12. Spline
  13. grafana
  14. awesome observability
  15. Signoz
  16. zipkin
  17. kibana
  18. vector
  19. netdata
  20. odd platform
  21. Data Observability in Practice
  22. Monosi
  23. Swiple
  24. Elementary
  25. awesome-opentelemetry
  26. dd-trace-py
  27. signoz
  28. vector
  29. netdata
  30. awesome-observability
  31. malloy

Data Schemas & Parsing & Scraping

  1. Json Classes
  2. Schema
  3. Jmespath
  4. XmlUtils
  5. ExtraTools
  6. Collections Extended
  7. More Itertools
  8. Lark Parser
  9. Json Flattener
  10. Scrapy
  11. Beautiful Soup
  12. marshmallow
  13. PyPDF2
  14. Pydantic
  15. Pyspider
  16. Pydantic SQLAlchemy
  17. simplejson
  18. json2parquet
  19. pyyaml
  20. Attrs
  21. Chardet
  22. simple-enum
  23. dataklasses
  24. dataclasses-json
  25. dataclassy
  26. python-choicesenum
  27. fastenum
  28. bnum
  29. data-enum
  30. superstring.py
  31. text2text
  32. pyspellchecker
  33. symspellpy
  34. python string similarity
  35. textdistance
  36. string-algorithms
  37. python-phonenumbers
  38. CommonRegex
  39. Addresser
  40. Unidecode
  41. whoosh
  42. usaddress
  43. jellyfish
  44. Postal Address
  45. dirtyJson
  46. awesome-json
  47. dataclass_array
  48. python-graphs
  49. python-email-validator
  50. dataprep
  51. fuzzywuzzy
  52. Cerberus
  53. PolyFuzz
  54. Fuzzy Search
  55. Pachyderm
  56. CleanLab
  57. awesome-jsonschema
  58. dateparser
  59. dateutil
  60. Cerebrus
  61. validators
  62. Valideer
  63. Pandera
  64. Typical
  65. kdatapackage
  66. PandasSchema
  67. TypedFrame
  68. tableschema
  69. validator-collection
  70. deepchecks
  71. awesome-validation-python
  72. attrs-strict
  73. pydantic-core
  74. dataclass-type-validator
  75. Jinja
  76. liaison
  77. DataCleaner

Excel Tools

  1. xltable
  2. xlwings
  3. xlsxWriter
  4. openpyxl
  5. formulas
  6. pycel
  7. pyexcel
  8. xlwt
  9. pyxll
  10. xlrd
  11. xlsx2csv
  12. libre office core
  13. pywin32
  14. pyxlsb2
  15. pyxlsb

Cache and Hash

  1. cachecontrol
  2. Cached Property
  3. cacheout
  4. cachetools
  5. Johnny Cache
  6. Requests Cache
  7. expiringdict
  8. lru_cache_http_client
  9. Data Cache
  10. Python Cache
  11. LRU Cache
  12. perfect-hash
  13. hash framework
  14. SHA3 Implementation
  15. Random Hash
  16. python hashes
  17. SHA-256 Algorithm
  18. Bloom Filter
  19. EnroCrypt
  20. pymorton
  21. shortuuid
  22. human-readable-id

Logging and Testing and Monitoring

  1. structlog
  2. Local Stack
  3. Hypothesis
  4. Loguru
  5. PySnooper
  6. PyGoGo
  7. minilog
  8. fluent logger
  9. daiquiri
  10. Locust
  11. Soda Core
  12. WhyLogs
  13. Logstash
  14. Hypothesis
  15. awesome-pytest
  16. spylunking
  17. splunk-sdk-python
  18. opentelemetry-python
  19. Aws Open Telemtry
  20. opentelemetry.io
  21. refurb
  22. Factory Boy

Webscraping

  1. Mechanical Soup
  2. tiktok-downloader
  3. youtube-dl
  4. BeautifulSoup4
  5. Selenium
  6. lxml
  7. portia
  8. scrapyd
  9. scrapy
  10. puppeteer
  11. awesome-web-scraping

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published