You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am using DataSynthesizer to generate synthetic data for research purposes. I've been using this package for moths and it works perfectly with small datasets. However, when I use a bigger dataset, especially higher number of columns, time problem rises. A single dataset(with 71236 instances and 52) took more than 18 hours to be synthesized on a 64 core machine(degree_of_bayesian_network =0 in this case) .
I also tried to decrease the degree_of_bayesian_network , by assigning it to 2 instead of the default 0. Although the quality of the synthesized data decreases, Time decreases , but it's still taking too long.
What do you suggest to do? Is there a better way you recommend to approach bigger datasets?
What I Did
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
The text was updated successfully, but these errors were encountered:
Please try k=1. k is the number of parents for nodes in the constructed Bayesian network. The running time / complexity of DataSynthesizer increases dramatically with k.
When k=0, its value will be self-determined, which could be very large.
Description
Hello,
I am using DataSynthesizer to generate synthetic data for research purposes. I've been using this package for moths and it works perfectly with small datasets. However, when I use a bigger dataset, especially higher number of columns, time problem rises. A single dataset(with 71236 instances and 52) took more than 18 hours to be synthesized on a 64 core machine(degree_of_bayesian_network =0 in this case) .
I also tried to decrease the degree_of_bayesian_network , by assigning it to 2 instead of the default 0. Although the quality of the synthesized data decreases, Time decreases , but it's still taking too long.
What do you suggest to do? Is there a better way you recommend to approach bigger datasets?
What I Did
The text was updated successfully, but these errors were encountered: