Shrivats Sudhir Stochastic1017

Welcome to my GitHub page!

MS in Statistics | BS in Mathematics and Statistics

GitHub Stats

Project Portfolio

Here are some of my notable projects:

Clustering Spotify Podcasts with NLP-Driven Insights
- Scraped $\approx$ 284,481 episode details from 818 podcasts using Selenium and Spotify API pipeline.
- Preprocessed and tokenized podcast descriptions with NLTK, including lemmatization and stopword removal.
- Developed metrics to quantify directional, overlap, diversity similarities, and engineered recommendation system.
- Deployed a Dash app for podcast clustering and personalized recommendations.
Predicting Flight Delays and Cancellations: An Integrated Analysis of Airport Data and Weather Data
- Automated scraping of 23 GB airport and 30 GB weather data using Selenium.
- Utilized reverse geocoding, Haversine-based, and UTC-normalized alignments to join datasets.
- Trained random forest models, achieving $\approx$ 25 min test RMSE for delays and $\approx$ 98% test accuracy for cancellations.
- Developed scalable workflows on Google Cloud, and deployed interactive web-app using Dash.
Modeling and Forecasting Walmart Stock Prices: A Comparative Analysis of ARMA and GARCH Approaches
- Leveraged ARIMA and GARCH models using tseries and fGarch in R to analyze Walmart stock price volatility.
- Developed and validated ensemble models through residual diagnostics and forecast evaluation.
- Achieved RMSE of $\approx$ $ 0.01 for log-returns and $\approx$ $ 1.17 for closing prices on unseen 10-day forecast and actual prices.
Statistical Modeling and Deployment of Body Fat Percentage Prediction System
- Implemented anomaly detection and imputation strategy using prior body fat estimation model.
- Constructed Stepwise regression model with Goodness of Fit and Holm-Bonferroni F-tests to control Type I errors.
- Developed Multiple Linear Regression model (R-squared 0.6592 and RMSE 4.38) with residual diagnostics.
- Deployed an interactive Dash app with comprehensive explanations, detailed visuals, and predictions.
Outlier Detection and De-noising for Audio-Based Neural Network Language Classification
- Developed an ensemble outlier detector using Isolation Forest and Local Outlier Factor in Scikit-Learn.
- Designed a speech detection and spectral gating algorithm with Librosa and NoiseReduce for noise suppression.
- Processed 1320 parallel jobs for anomaly detection and de-noising using Linux Bash and HTCondor.
- Trained a preliminary CNN on a random sample in TensorFlow, improving test accuracy by 2% and AUC by 4%.
Finding Lyman Break Galaxy cB58 Resemblances Using High-Throughput Computing
- Identified noisy spectra matching Lyman Break Galaxy cB58 from Sloan Digital Sky Survey (SDSS) datasets.
- Computed similarity between spectra using distance metrics implemented in R.
- Processed 2459 parallel jobs over 281 GB data using Linux Bash and HTCondor.
Supervised machine learning model to predict gender based on first names
- Performed feature pre-processing and one-hot alphabet encoding using NumPy and Pandas.
- Implemented an ensemble gradient boosting model with fine-tuned hyper-parameters using Scikit-Learn.
- Achieved approximately 80% in accuracy, precision, recall, and AUC metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrivats Sudhir Stochastic1017

Block or report Stochastic1017

Welcome to my GitHub page!

GitHub Stats

Project Portfolio

Technical Skills

Programming Languages

Developer Tools

Computing

Miscellaneous

Pinned Loading