Skip to content

Latest commit

 

History

History
111 lines (84 loc) · 6.44 KB

README.md

File metadata and controls

111 lines (84 loc) · 6.44 KB

ML-Toolbox

logo

Table of Contents

About

Machine learning is closely related to statistics and optimization. Each ML algorithm makes certain assumptions about the data, has a theory behind it, and comes with its own set of advantages and disadvantages. In this repository, I aim to understand the intuition, implement algorithms from scratch, and derive the proofs and theories behind them. By understanding the theory behind an algorithm and the characteristics of the data, we can optimize performance and achieve better results.

Introduction to Machine Learning

Traditional Programming vs Machine Learning

Traditional programming is based on the idea of writing a program, providing it with an input, and receiving an output. This works well for all tasks where the rules can be clearly defined. Consider the problem of classifying a number as odd or even. This can be done by a simple if-else program.

ML

For problems where the rules cannot be clearly defined, we use Machine Learning to generate these rules for us. Consider the problem of classifying an image as cat or dog. Writing a program for this would be very difficult. Machine Learning is the idea where we provide the computer with data and corresponding outputs and get the program. This phase is called training. Now we use these program, along with new data like traditional programming to get an output. This phase is called inference.

ML

Core Idea behind Machine Learning

Machine Learning is a subset of Artificial Intelligence (AI). While AI aims to imitate human thinking, Machine Learning focuses on using statistics to uncover patterns in data. For instance, in games like chess, AI uses strategies like minimax, similar to how humans strategize, while Machine Learning methods such as Linear Regression aim to draw the best-fitting line through data points, relying on statistics and pattern recognition rather than directly mimicking human thought processes.

ml-idea

The main goal of ML is to use observations or data to find the true function f(x) or probability distribution P(x,y) that closely approximates the relationship between inputs and outputs in the real world. Unlike traditional programming, where functions are manually defined, machine learning algorithms learn from data to automatically derive the most suitable function or model for a given task.

Introduction to ML-Toolbox

The ML-Toolbox is like a toolkit full of different machine learning methods, each offering its own form of f(x). The trick is picking the right one for the job, which is kind of like choosing a setting on a tool – it depends on what we are trying to do. Neural networks are popular, but they're just one tool in the box, giving us outputs in the form of weights and biases.

The core concept behind the ML-Toolbox is to grasp the diverse range of algorithms capable of generating forms of f(x). Some widely used algorithms include Decision Trees, Neural Networks, Support Vector Machines, Random Forests, and K-Nearest Neighbors. The goal isn't to determine which method is the best; instead, it's about knowing when each method works well and when it might struggle. It's analogous to knowing when to use a screwdriver instead of a hammer.

File Structure

👨‍💻ML-Toolbox
 ┣ 📂assets                                   
 ┃ ┣ 📂data                                 // datasets 
 ┃ ┃ ┣ 📄articles.csv
 ┃ ┃ ┣ 📄gender.csv
 ┃ ┃ ┣ 📄modified_mumbai_house_price.csv
 ┃ ┃ ┣ 📄mumbai_house_price.csv
 ┃ ┃ ┣ 📄student_marksheet.csv
 ┃ ┃ ┣ 📄titanic.csv
 ┃ ┃ ┣ 📄un_voting.csv 
 ┃ ┣ 📂img 
 ┃ ┣ 📂scripts    
 ┃ ┣ 📂notes              
 ┣ 📂Concept Learning                  
 ┣ 📂K Nearest Neighbors                    
 ┣ 📂Perceptron                             
 ┣ 📂Naive Bayes
 ┣ 📂Logistic Regression
 ┣ 📂Linear Regression     
 ┣ 📂Support Vector Machine
 ┣ 📂Kernels
 ┃ ┣ 📂Perceptron
 ┃ ┣ 📂Linear Regression
 ┃ ┣ 📂Support Vector Machine
 ┣ 📂Decision Trees     
 ┣ 📂Neural Networks     
 ┣ 📂K Means Clustering      
 ┣ 📄README.md

models-vs-tasks

Results

Pending Section

The following sections are still in progress:

  • Kernels
  • Neural Networks
  • Association Rule Mining
  • KD Trees
  • Gaussian Processes
  • Bagging
  • Boosting

References

License

MIT License