You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recently begun using the hyppo library for multivariate hypothesis testing and I am appreciating the comprehensiveness and ease-of-use it provides.
As datasets continue to grow in size and complexity, I believe a feature that could greatly benefit this library would be the integration of parallel processing support. This could significantly reduce the time it takes to run tests on larger, high-dimensional datasets, making the library even more efficient and user-friendly.
Here are a few things that could be done:
Parallel computation of test statistics: This could involve using multiprocessing or joblib to compute test statistics in parallel, which could significantly speed up computations for large datasets.
Distributed computing support: For extremely large datasets, it could be beneficial to support distributed computing frameworks like Dask or Apache Spark. This would allow users to leverage the power of a cluster to compute test statistics, which could be particularly useful for Big Data applications.
Asynchronous computation: For certain applications, it might be useful to support asynchronous computation. This would allow users to start a test, do other work while the test is running, and then come back to get the results once the test is done.
I understand that this is a big ask, but I believe these features would greatly enhance the usefulness and performance of hyppo. I'm also willing to contribute to the development of these features if that's something you'd be interested in.
Thank you for considering this feature request.
The text was updated successfully, but these errors were encountered:
Parallel computation of test statistics: This could involve using multiprocessing or joblib to compute test statistics in parallel, which could significantly speed up computations for large datasets.
Currently, we parallelize the p-value computation, so it's difficult to also parallelize the test statistic computation. This is because we repeatedly call the test statistic computation when computing the p-value. I'm open to approaches that get around this limitation.
Distributed computing support: For extremely large datasets, it could be beneficial to support distributed computing frameworks like Dask or Apache Spark. This would allow users to leverage the power of a cluster to compute test statistics, which could be particularly useful for Big Data applications.
Great idea, and I think this should be a separate issue with more information about the proposed method to do this.
Asynchronous computation: For certain applications, it might be useful to support asynchronous computation. This would allow users to start a test, do other work while the test is running, and then come back to get the results once the test is done.
Also, great idea, and would also split into a different issue.
I've recently begun using the hyppo library for multivariate hypothesis testing and I am appreciating the comprehensiveness and ease-of-use it provides.
As datasets continue to grow in size and complexity, I believe a feature that could greatly benefit this library would be the integration of parallel processing support. This could significantly reduce the time it takes to run tests on larger, high-dimensional datasets, making the library even more efficient and user-friendly.
Here are a few things that could be done:
Parallel computation of test statistics: This could involve using multiprocessing or joblib to compute test statistics in parallel, which could significantly speed up computations for large datasets.
Distributed computing support: For extremely large datasets, it could be beneficial to support distributed computing frameworks like Dask or Apache Spark. This would allow users to leverage the power of a cluster to compute test statistics, which could be particularly useful for Big Data applications.
Asynchronous computation: For certain applications, it might be useful to support asynchronous computation. This would allow users to start a test, do other work while the test is running, and then come back to get the results once the test is done.
I understand that this is a big ask, but I believe these features would greatly enhance the usefulness and performance of hyppo. I'm also willing to contribute to the development of these features if that's something you'd be interested in.
Thank you for considering this feature request.
The text was updated successfully, but these errors were encountered: