Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpolate and check correlation of time series with different cardinality? #173

Open
artitwa opened this issue Oct 14, 2016 · 2 comments

Comments

@artitwa
Copy link

artitwa commented Oct 14, 2016

I want to check how correlated two time series RDDs are with time, but they don't have the same cardinality (i.e., they have different number of data points because the timestamps the data are collected are different). I see from the Statistics API that the same number of partitions and cardinality is necessary. Below is some example code. I tried to look for some interpolation library in Spark to achieve the same number of partitions and cardinality, but I found none. Therefore, I would like to ask if it is possible with this library. Thank you

from pyspark.mllib.stat import Statistics

sc = ... # SparkContext

seriesX = ... # a series
seriesY = ... # must have the same number of partitions and cardinality as seriesX

# Compute the correlation using Pearson's method. Enter "spearman" for Spearman's method. If a 
# method is not specified, Pearson's method will be used by default. 
print Statistics.corr(seriesX, seriesY, method="pearson")

data = ... # an RDD of Vectors
# calculate the correlation matrix using Pearson's method. Use "spearman" for Spearman's method.
# If a method is not specified, Pearson's method will be used by default. 
print Statistics.corr(data, method="pearson")
@sryza
Copy link
Owner

sryza commented Oct 16, 2016

Hi @artitwa. It's preferable to ask questions on the mailing list. You might try using the sampling questions to achieve the same cardinality?

@artitwa
Copy link
Author

artitwa commented Oct 17, 2016

Hi @sryza , what do you mean "try using sampling questions"? I will ask on the list. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants