WeightedCorr

Weighted correlation in Python. Pandas based implementation of weighted Pearson and Spearman correlations.

I thought it was strange that I couldn't easily find a way to get both these weighted correlations with a single class/function in Python. So I made it myself.

v2.1 20-03-2021

Fixed Issue #1

V2 Update 21-07-2020

Switched from a pandas backend to a numpy/scipy backend. Usage remains the same, but performance for Spearman correlations is significantly improved. See table below.

N samples	Pearson_v1	Pearson_v2	Spearman_v1	Spearman_v2
10	3.55 ms ± 64.1 µs	1.59 ms ± 9.32 µs	14 ms ± 131 µs	1.78 ms ± 7.55 µs
100	6.69 ms ± 89 µs	4.94 ms ± 79.9 µs	21.4 ms ± 979 µs	5.08 ms ± 144 µs
1000	39.1 ms ± 426 µs	36.7 ms ± 529 µs	93.7 ms ± 1.03 ms	37.2 ms ± 433 µs
10000	350 ms ± 4.56 ms	343 ms ± 5.41 ms	746 ms ± 5.29 ms	350 ms ± 7.42 ms
100000	3.48 s ± 11.9 ms	3.48 s ± 6.44 ms	7.44 s ± 20.1 ms	3.52 s ± 9.27 ms

Usage

This class can be used in a few different ways depending on your needs. The data should be passed to the initialization of the class. Then calling the class will produce the result with desired method (pearson is the default). Note that the method should be passed to the call, not the initialization. The examples below will result in pearson, pearson, and spearman correlations.

You can supply a pandas DataFrame with x, y, and w columns (columns should be in that order). The output will be a single floating point value.

WeightedCorr(xyw=my_data[['x', 'y', 'w']])(method='pearson')

You can supply x, y, and w pandas Series separately. The output will be a single floating point value.

WeightedCorr(x=my_data['x'], y=my_data['y'], w=my_data['w'])()

You can supply a pandas DataFrame, and the name of the weight column in that DataFrame. In this case the output will be an (M-1)x(M-1) pandas DataFrame (the correlation matrix) where M is the number of columns in the original dataframe (no correlation is calculated for the weight column, hence M-1).

WeightedCorr(df=my_data, wcol='w')(method='pearson')

Weighted Pearson correlation

The weighted Pearson r, given n pairs is calculated as

$r_{pearson} = \frac{\sum_{i=1}^{n} (w_i(x_i - \overline{x})(y_i - \overline{y}))} {\sqrt{\sum_{i=1}^{n}(w_i(x_i-\overline{x})^2) \sum_{i=1}^{n}(w_i(y_i-\overline{y})^2) }}$

Where

$\overline{x} = \frac{\sum_{i=1}^{n} (w_i*x_i)} {\sum_{i=1}^{n} w_i}$

$\overline{y} = \frac{\sum_{i=1}^{n} (w_i*y_i)} {\sum_{i=1}^{n} w_i}$

Weighted Spearman rank-order correlation

First, initial ranks (z) are assigned to x and y. Duplicate groups of records are assigned the average rank of that group. Next the weighted rank (rank) is calculated for x and y separately in n pairs. Such that the j-th rank of either x or y will be:

$rank_j = \sum_{i=1}^n (w_i *{\bf A} (z_i, z_j)) + \frac{1+\sum_{i=1}^{n} {\bf B}(w_i, w_j)} {2} * \frac{\sum_{i=1}^{n} w_i*{\bf B}(w_i, w_j)}{\sum_{i=1}^{n} {\bf B}(w_i, w_j)}$

Where

${\bf A} (z_i, z_j) =\begin{cases}1 & \text{if } z_i < z_j\\0 &\text{if } z_i \geq z_j\end{cases}$

and

${\bf B} (w_i, w_j) =\begin{cases}1 & \text{if } w_i = w_j\\0 & \text{if } w_i \neq w_j\end{cases}$

These weighted ranks are then passed to the weighted Pearson correlation function.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
WeightedCorr.py		WeightedCorr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeightedCorr

v2.1 20-03-2021

V2 Update 21-07-2020

Usage

Weighted Pearson correlation

Weighted Spearman rank-order correlation

About

Releases

Packages

Languages

License

aalfonsi/weightedcorr

Folders and files

Latest commit

History

Repository files navigation

WeightedCorr

v2.1 20-03-2021

V2 Update 21-07-2020

Usage

Weighted Pearson correlation

Weighted Spearman rank-order correlation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages