Define a multivariate median for datasets where the median is not explicitly known.
Explore the docs »
Table of Contents
This repository contains the results of a project focused on defining a multivariate median for datasets where the median is not explicitly known. The methodology leverages optimal transport in discrete and semi-discrete settings, where a known distribution (e.g., a uniformly distributed spherical distribution) is transported to the target distribution (e.g., the ANSUR II dataset). The results showcase detailed visualizations of transport processes, quantile contours, and algorithmic convergence.
The main goals of this project are:
- Define a robust multivariate median for datasets lacking a direct median representation.
- Utilize optimal transport theory to compute solutions for discrete and semi-discrete cases.
- Analyze and visualize the resulting transports and their implications.
This work was conducted during a two-month internship (June–July 2024) within the Image Optimisation et Probabilities Team at the Institut de Mathématiques de Bordeaux, supervised by Professor Jérémie BIGOT.
PY-Optimal-Transport-Median ├── README.md # Overview and usage instructions. ├── social_preview.png # Social Preview of the repo. ├── docs/ # Reference materials (papers, reports, etc.). ├── data/ # Dataset and its detailed analysis. │ ├── analysis/ # Descriptive analysis of the database variables. │ └── raw/ # Database used (ANSUR II Male and Female) ├── src/ # Jupyter notebooks and Python scripts. │ ├── utils.py # Reusable utility functions. │ └── notebooks # Four Jupyter notebooks illustrating key processes. ├── results/ # Outputs (e.g., figures, graphs). └── reports/ # Final documents. ├── internship_report.pdf ├── summary_note.pdf └── presentation_slides.pptx
The ANSUR II dataset, located in the data/raw
folder, serves as the primary resource for this project.
- Source: ANSUR II Dataset
- Description: Anthropometric data from military populations. A descriptive analysis of the dataset is provided in
data/README.md
.
- Clone the repo:
git clone https://github.com/moranenzo/PY-Optimal-Transport-Median.git
- Navigate to the project directory:
cd PY-Optimal-Transport-Median
Navigate to the src
directory to explore:
- Detailed visualizations:
- Distributions of the data.
- Optimal transport processes between measures.
- Quantile contours of target distributions.
- Step-by-step guides for:
- Multivariate median computation.
- Transport map visualizations.
To run a notebook:
- Start Jupyter Notebook:
jupyter notebook
- Open the desired notebook from
src
.
Enzo MORAN - LinkedIn - [email protected]
Project Link: https://github.com/moranenzo/PY-Music-Genre-Classifier