Skip to content
View BayoAdejare's full-sized avatar
🏠
🏠

Block or report BayoAdejare

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BayoAdejare/README.md

Twitter Follow Linkedin: Adebayo Adejare Medium Badge Kaggle Badge

πŸš€ Data/ML Engineer πŸ€– | πŸ‘¨β€πŸ”§ Pipeline Architect | 🐍 Python/Analytics Enthusiast πŸ“Š

About Me πŸ‘‹

Hi, I'm Bayo.

I'm a passionate Data/ML Engineer with a knack for building robust, scalable data pipelines and turning raw data into actionable insights. With years of experience in the field, I've worked on projects ranging from real-time streaming analytics to large-scale batch processing systems. My expertise extends to machine learning, where I've implemented ML pipelines and deployed models at scale, bridging the gap between data engineering and data science.

Learning 🌱

I'm always excited to expand my knowledge and stay up-to-date with the latest trends in data engineering. Currently, I'm focusing on:

  • Generative AI: Exploring applications of generative models in data pipelines and analytics
  • MLOps: Implementing best practices for deploying and maintaining machine learning models in production
  • Graph Databases: Learning Neo4j for handling complex, interconnected data
  • Data Mesh Architecture: Studying decentralized data management approaches

Projects πŸ”­

Real-time Data Processing Pipeline with Spark Streaming

  • Developed a robust real-time data processing pipeline using Apache Spark Streaming and Kafka
  • Ingested high-volume streaming data from IoT devices and processed it in real-time
  • Implemented windowed operations and stateful transformations to analyze time-series data
  • Utilized Spark SQL for complex aggregations and Delta Lake for reliable storage
  • Deployed the pipeline on AWS EMR for scalability and cost-effectiveness

Data Warehouse Optimization

  • Designed and implemented a star schema data model for a large-scale data warehouse
  • Optimized query performance by creating appropriate indexes and partitioning strategies
  • Reduced query execution time by 60% through careful schema design and query tuning

ETL Pipeline Automation

  • Built an automated ETL pipeline using Apache Airflow to process daily batches of data
  • Integrated multiple data sources and implemented data quality checks
  • Reduced manual intervention by 87% and improved data freshness

Stats πŸ“ˆ

Pinned Loading

  1. lightning-containers lightning-containers Public

    Docker powered starter for geospatial analysis of lightning atmospheric data.

    Jupyter Notebook 6 2

  2. lightning-streams lightning-streams Public

    Batch/stream ETL pipeline of NOAA GLM dataset, using Python frameworks: Dagster, PySpark and Parquet storage.

    Python 4

  3. airbyte_dbt_covid19 airbyte_dbt_covid19 Public

    dbt transformations for Snowflake data warehouse.

    Python 7 2

  4. dbt_hmda_data dbt_hmda_data Public

    Trying out dbt python models.

    Python 1

  5. snowflake-clusters snowflake-clusters Public

    Snowflake cluster keys & "micro-partitioning" scheme.

    PLpgSQL

  6. dagster_noaa_goes dagster_noaa_goes Public

    An example orchestration using Dagster.

    Python 2