I have more than seven years of experience in data analysis, marketing, sales and finance. In 2022, I started learning Python and dove into data science and machine learning. I earned professional certificates from Codecademy and DataCamp. This is a compilation of some of my projects in data analysis, data science and data visualization. Many have originated from courses and competitions I took part in or hobbies and personal interests.
I'm happy to get in touch on Linkedin.
The goal of the project is to build a model for predicting which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. The Logistic Regression model with GridSearchCV achieves this goal and performs better than a comparison Random Forest model.
I use statistics and data visualization to find the characteristics of highly-rated chocolate bars. Visualizations with matplotlib, Plotly Express and WordCloud help analyze numeric data, geospatial data and reviews. Bootstrapping suggests significant empirical evidence that the mean rating of chocolate bars without lecithin is higher than that of bars with lecithin.
I build different machine learning models to predict cancellations at a hotel. The random forest and decision tree models perform similarly and have higher accuracy than the logistic regression model. The simple neural network has good accuracy but I recommend to focus on the tree models since they perform equally well and are easier to interpret.
I analyze data from USDA's FoodData Central using data visualization, descriptive statistics, correlation and regression. I build a linear regression model to predict the calories in food items. I use a scatterplot and calculate the Pearson correlation coefficient to show that higher content of water is correlated with less calories.
The data is imbalanced, skewed and contains a lot of missing values so I use imputation with median and stratified sampling. I train various classification models and apply GridSearchCV to fine-tune the parameters of the model that performs best. I use the final model to predict the risk of diabetes for a person with specific traits.
I experiment with turtle and matplotlib to draw the flowers and use colorsys to achieve colorful hues. Users can choose a variety of parameters for the flowers such as color and number of petals. This project presents some of the most successful functions that are a result of this experiment.
- Data Scientist: Collect, analyze and interpret large amounts of data using machine learning and AI; Apply sampling methods and carry out statistical tests.
- Data Analyst: Use Python and SQL to answer business critical questions, visualize data and communicate results from data analysis
- Data Scientist, Analytics: Use Python and SQL to query, analyze, and visualize data
- Machine Learning/AI Engineer: Use Python, Git and ML to build predictive models, end-to-end applications and neural networks
- Business Intelligence: Use Python, SQL and Tableau to analyze and visualize data
Note: Photographs are from Unsplash and logos belong to the respective companies. The rest is personal work.