-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter learning takes forever #104
Comments
I'm not intimately familiar with the inner workings of this package, but I am guessing from this that there are type instabilities and dynamic memory allocations slowing it down. The best way to find out would be to use https://github.com/timholy/ProfileView.jl on a small example. Fortunately, the language has improved vastly since most of the work on this package was done before 2018, and I think it would be possible to realize massive speedups now! Happy to make suggestions if someone wants to work on improving it. |
It looks like you have several variables with many parents. I think the way it is set up now is that it loops over all possible parental instantiations, which is going to be enormous in your case (the complexity grows exponentially with the number of parents). The loop of interest is here. It would be much faster to just loop over your data and fill in the non-zero entries in the tables. |
I tried it but even with 20 samples it does not finish.
I would happily work on it, but I am very knew to these concepts.
That should be the issue. Creating a table for a single variable with many parents (
I am not sure how to loop over the data and fill in the entries in tables. |
I think the issue is with this definition: struct CategoricalCPD{D} <: CPD{D}
target::NodeName
parents::NodeNames
# list of instantiation counts for each parent, in same order as parents
parental_ncategories::Vector{Int}
# a vector of distributions in DMU order
distributions::Vector{D}
end The distributions::Dict{Int,D} Then, we would store distributions for parental instantiations that actually exist in the data. If we try to access a parental instantiation that does not exist, then we can either throw an error or return a uniform distribution. We would also need to update It may be better to provide this implementation as a new type, such as |
Thanks for the explanation. I'll see if I can implement it. |
Ah, yeah, it was kind of silly of me to think this is a type stability/allocation issue, haha. Such a large slowdown is obviously algorithmic. If |
@zsunberg It might be a memory issue. Unless we switch to a sparse representation, it will allocate memory enough to hold all the parental instantiations: distributions = Array{Categorical{Float64,Vector{Float64}}}(undef, prod(parental_ncategories)) This might not be good if |
Right - you would have to do something like struct CategoricalCPD{D,T} <: CPD{D}
target::NodeName
parents::NodeNames
# list of instantiation counts for each parent, in same order as parents
parental_ncategories::Vector{Int}
# a vector of distributions in DMU order
distributions::T
end and then |
Ah, clever! |
I am trying to fit the parameters of a not-so-big discrete BN. I have tested it in Netica and the learning happens instantly. But with BayesNets.jl it took 30 hours before running out of memory. Is this a bug or a limitation?
Here is the network structure and 200 samples to replicate the issue.
The text was updated successfully, but these errors were encountered: