Skip to content

A collection of my notes and coding exercises in the field of data science & computational methods

License

Notifications You must be signed in to change notification settings

numersoz/analytics_journal

Repository files navigation

Analytics Journal

Collection of different topics in data science & computational methods and their implementation examples.

These serves as my notes towards general analytics applications, to be used as quick reference guide. ChatGPT has been used to enhance the code to ensure high quality content in some cases.

Virtual Environment

Dependencies are listed under requirements.txt of each directory.

On Windows, they can be installed with below commands using a virtual environment:

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Docker Containers

Below are useful multi-purpuse Docker files images.

Below number of workers can be adjusted:

cd docker_images\pyspark_hdfs_jupyter_server
docker-compose up --scale spark-worker=2

To List Containers Including Stopped Ones:

docker ps -a

To Start Container:

docker start {container_name}

Load Jupyter Lab:

http://localhost:8888/lab

Load Spark View:

http://localhost:8080/

Demo Notebook

Topics

Implementation of academic papers with relevant examples of applications.

Collection of regression/classification problems using tree based models such as:

  • Decision Tree
  • Random Forest
  • ADA Boost
  • Gradient Boost
  • Extreme Gradient Boost (XGBoost)

Solutions to Partial Differential Equations in Physics such as Heat Transfer and Fluid Dynamics using both numerical methods and Physics Informed Neural Networks.

These are modified versions of examples from the Udemy Class (https://www.udemy.com/course/physics-informed-neural-network-pinns/) by Dr. Mohammad Samara.

  • 1D Heat Equation Numerical Methods
  • 2D Burgers Equation Numerical Methods
  • 1D Burgers Equation PINN & PyTorch
  • 1D Heat Equation PINN & DeepXDE Library

Note: Code has been enhanced & documented with help of ChatGPT.

Contains several methods in functional data analysis such as splines, kernel smoothers and functional principal component analysis (FPCA).

  • B Spline Regression
  • Smoothing Spline Regression
  • Natural Cubic Spline Regression
  • Kernel Smoother Regression
  • Kernel Smoother Local Linear Regression
  • Kernel Smoother Local Polynomial Regression
  • ECG Heartbeat Categorization problem, implementing multiple classifications algorithms with and without B-spline transformation to deal with high dimensional data.

Examples are conversion of Matlab/R based lecture notes from Georgia Tech's ISYE 8803 High Dimensional Data Analytics class. In some parts, ChatGPT has been used for conversion to Python as well as documentation generation.

Evaluation of different change point detection algorithms.

  • CUSUM (Cumulative Sum Control Chart)

Containts examples of different type of optimization problems and its problem solutions via different Python packages.

  • Linear Program
  • Mixed Integer Program
  • Mixed Integer Quadratic Program
  • Non Linear Program
  • Mixed Integer Non Linear Program

Install Optimization Solvers

Optimization section requires installation of solvers. List of popular free solvers are below:

GLPK (GNU Linear Programming Kit)

  • Download GLPK from SourceForge: https://sourceforge.net/projects/winglpk/
  • Unzip glpk-4.65 folder and place it in C:\glpk-4.65
  • Add C:\glpk-4.65\w64 to PATH if using x64
  • Test that its working on CMD and get the executable path: where glpsol
  • Usage with Pyomo: solver = SolverFactory("glpk", executable= r"C:\glpk-4.65\w64\glpsol.exe")

IPOPT (Interior Point OPTimizer)

  • Download IPOPT from SourceForge: https://www.coin-or.org/download/binary/Ipopt/Ipopt-3.11.1-win64-intel13.1.zip
  • Unzip Ipopt-3.11.1-win64-intel13.1 folder and place it in C:\Ipopt-3.11.1-win64-intel13.1
  • Add C:\Ipopt-3.11.1-win64-intel13.1\bin to PATH
  • Test that its working on CMD and get the executable path: where ipopt
  • Usage with Pyomo: solver = SolverFactory("ipopt", executable= r"C:\Ipopt-3.11.1-win64-intel13.1\bin\ipopt.exe")

CPLEX (C Programming Language for EXecution.)

  • Download and install IBM CPLEX Optimization Studio: https://www.ibm.com/account/reg/signup?formid=urx-20028
  • Add C:\Program Files\IBM\ILOG\CPLEX_Studio_Community2211\cplex\bin\x64_win64 to PATH
  • Test that its working on CMD and get the executable path: where cplex
  • Usage with Pyomo: solver = SolverFactory("cplex", executable= r"C:\Program Files\IBM\ILOG\CPLEX_Studio_Community2211\cplex\bin\x64_win64\cplex.exe")

CBC (COIN-OR Branch and Cut)

SCIP

About

A collection of my notes and coding exercises in the field of data science & computational methods

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published