Time issues with DataSynthesizer #31

mahmoudibrahim98 · 2020-12-20T12:36:50Z

DataSynthesizer version:
Python version:
Operating System:

Description

Hello,
I am using DataSynthesizer to generate synthetic data for research purposes. I've been using this package for moths and it works perfectly with small datasets. However, when I use a bigger dataset, especially higher number of columns, time problem rises. A single dataset(with 71236 instances and 52) took more than 18 hours to be synthesized on a 64 core machine(degree_of_bayesian_network =0 in this case) .
I also tried to decrease the degree_of_bayesian_network , by assigning it to 2 instead of the default 0. Although the quality of the synthesized data decreases, Time decreases , but it's still taking too long.
What do you suggest to do? Is there a better way you recommend to approach bigger datasets?

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

The text was updated successfully, but these errors were encountered:

mahmoudibrahim98 · 2020-12-21T10:24:09Z

Also, can you point out for me the effect of using k=3 over the effect of using k=2.

haoyueping · 2021-02-02T22:40:05Z

Please try k=1. k is the number of parents for nodes in the constructed Bayesian network. The running time / complexity of DataSynthesizer increases dramatically with k.

When k=0, its value will be self-determined, which could be very large.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time issues with DataSynthesizer #31

Time issues with DataSynthesizer #31

mahmoudibrahim98 commented Dec 20, 2020

mahmoudibrahim98 commented Dec 21, 2020

haoyueping commented Feb 2, 2021

Time issues with DataSynthesizer #31

Time issues with DataSynthesizer #31

Comments

mahmoudibrahim98 commented Dec 20, 2020

Description

What I Did

mahmoudibrahim98 commented Dec 21, 2020

haoyueping commented Feb 2, 2021