Skip to content
View Stochastic1017's full-sized avatar
  • Madison, WI, USA.
  • 15:54 (UTC -12:00)

Block or report Stochastic1017

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stochastic1017/README.md

Welcome to my GitHub page!

MS in Statistics | BS in Mathematics and Statistics

Typing Animation

GitHub Stats

Most Commit Language GitHub Streak
Activity Graph
Repos Per Commit Profile Details Summary

Project Portfolio

Here are some of my notable projects:

  • Clustering Spotify Podcasts with NLP-Driven Insights

    • Scraped $\approx$ 284,481 episode details from 818 podcasts using Selenium and Spotify API pipeline.
    • Preprocessed and tokenized podcast descriptions with NLTK, including lemmatization and stopword removal.
    • Developed metrics to quantify directional, overlap, diversity similarities, and engineered recommendation system.
    • Deployed a Dash app for podcast clustering and personalized recommendations.
  • Predicting Flight Delays and Cancellations: An Integrated Analysis of Airport Data and Weather Data

    • Automated scraping of 23 GB airport and 30 GB weather data using Selenium.
    • Utilized reverse geocoding, Haversine-based, and UTC-normalized alignments to join datasets.
    • Trained random forest models, achieving $\approx$ 25 min test RMSE for delays and $\approx$ 98% test accuracy for cancellations.
    • Developed scalable workflows on Google Cloud, and deployed interactive web-app using Dash.
  • Modeling and Forecasting Walmart Stock Prices: A Comparative Analysis of ARMA and GARCH Approaches

    • Leveraged ARIMA and GARCH models using tseries and fGarch in R to analyze Walmart stock price volatility.
    • Developed and validated ensemble models through residual diagnostics and forecast evaluation.
    • Achieved RMSE of $\approx$ $ 0.01 for log-returns and $\approx$ $ 1.17 for closing prices on unseen 10-day forecast and actual prices.
  • Statistical Modeling and Deployment of Body Fat Percentage Prediction System

    • Implemented anomaly detection and imputation strategy using prior body fat estimation model.
    • Constructed Stepwise regression model with Goodness of Fit and Holm-Bonferroni F-tests to control Type I errors.
    • Developed Multiple Linear Regression model (R-squared 0.6592 and RMSE 4.38) with residual diagnostics.
    • Deployed an interactive Dash app with comprehensive explanations, detailed visuals, and predictions.
  • Outlier Detection and De-noising for Audio-Based Neural Network Language Classification

    • Developed an ensemble outlier detector using Isolation Forest and Local Outlier Factor in Scikit-Learn.
    • Designed a speech detection and spectral gating algorithm with Librosa and NoiseReduce for noise suppression.
    • Processed 1320 parallel jobs for anomaly detection and de-noising using Linux Bash and HTCondor.
    • Trained a preliminary CNN on a random sample in TensorFlow, improving test accuracy by 2% and AUC by 4%.
  • Finding Lyman Break Galaxy cB58 Resemblances Using High-Throughput Computing

    • Identified noisy spectra matching Lyman Break Galaxy cB58 from Sloan Digital Sky Survey (SDSS) datasets.
    • Computed similarity between spectra using distance metrics implemented in R.
    • Processed 2459 parallel jobs over 281 GB data using Linux Bash and HTCondor.
  • Supervised machine learning model to predict gender based on first names

    • Performed feature pre-processing and one-hot alphabet encoding using NumPy and Pandas.
    • Implemented an ensemble gradient boosting model with fine-tuned hyper-parameters using Scikit-Learn.
    • Achieved approximately 80% in accuracy, precision, recall, and AUC metrics.

Technical Skills

Programming Languages

Developer Tools

Computing

Miscellaneous

Pinned Loading

  1. Spotify-Podcast-Clustering Spotify-Podcast-Clustering Public

    This project constructs informative metrics for Spotify’s podcast data, and create a novel recommendation system based on these metrics.

    Python 1

  2. Airport-Weather-Prediction Airport-Weather-Prediction Public

    The holiday season (November to January of each calendar year) is one of the busiest times for the airline industry. This project aims to recognize important patterns in flight delays and cancellat…

    Python

  3. Body-Fat-Study Body-Fat-Study Public

    Accurate measurement of body fat is inconvenient/costly and it is desirable to have easy methods of estimating body fat that are not inconvenient/costly. In this project, we come up with a simple, …

    Python

  4. Speech-Enhancement_De-Noising Speech-Enhancement_De-Noising Public

    Implement isolation-based and density-based unsupervised anomaly detection methods to attempt to find noisy audio files and clean them using spectral gating. The purpose of this cleaning is to deve…

    Jupyter Notebook 1

  5. Identifying-cB58-Lyman-Break-Twins Identifying-cB58-Lyman-Break-Twins Public

    The project is dedicated to demonstrate the use of UW-Madison's High-Throughput Computing Clusters to run many parallel jobs and find the top galaxy spectra's most closely resembling CB58 Lyman-Bre…

    Shell

  6. Predicting-Gender Predicting-Gender Public

    Supervised machine learning techniques to predict gender using first names. (1.) K-Nearest Neighbors, (2.) Logistic Regression, (3.) Decision Trees, (4.) Gradient Boosting, (5.) Multi-Layered Perce…

    Jupyter Notebook 1