-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance populating tree #48
Comments
@phemmer What are the other alternatives you found? |
They're listed in the benchmark code on the issue description. |
@phemmer Find func may not seem do what you want, it doesn't return most specific result. Try two entry in trie like 8.8.8.0/24 and 8.8.8.0/25 each with some value, and Find "8.8.8.8", it will show up /24 value instead of /25 value. I think the original fn just Contain, which return bool and doesn't matter if that's more specific or not, and your Find fn try to return most specific entry(smallest prefix), which require follow the children path maybe change the find fn to
|
Use netip from yl2chen#48
@phemmer I'm guessing your gist is under |
@ldkingvivi Seems like you integrated those changes already and added even more functionality. The |
Yes, you may do what you want with it. Also as a side note, I'd encourage discussion on the gist itself, as it's linked to from places other than here. I also updated it to address the issue regarding |
Since there seems to be moderate interest in my implementation, I've published it as a full package here: https://github.com/phemmer/go-iptrie/ |
@phemmer Awesome, from my local benchmarks it seems the changes made by @ldkingvivi make it even faster. See here: https://github.com/ldkingvivi/cidranger/blob/master/iptire/trie.go |
@phemmer my change was mainly adding Entry with generic, so containing and cover networking will all return something instead of pure prefix, and of course some tests and benchmark using existing way how cidrange tested/benchmakred. Do you want me create a PR to your repo? I will likely have time during weekend. |
I had considered using generics, but stuck with
The Having
I omitted tests from my gist because gists are meant to be concise, not because they didn't exist. They exist, and were published in the package. As far as benchmarks, the package also contains a benchmarking suite. If you don't feel the benchmarks are meaningful, I'd be willing to consider additional scenarios. I do know that the random networks aren't likely representative of real-world scenarios, as it creates too many networks which are too large. I've considered solving that using a curve to make smaller prefixes more rare. But I don't think it'd change the overall results significantly, so I haven't bothered.
If you like, though I'd recommend proposing specific changes for discussion first. The proposals should be in that repo as well, as this one should be for discussions about cidranger only, and I feel we've hijacked it enough. |
I didn't referring to Contains, but ContainingNetworks. Also for the comments about tests and benchmark, I just simply mentioned the fact what I added, nothing criticize your gist. At this point, I will just continue use my branch directly, since I do have use case to fetch data other than just prefix, as well as generic for my use case. One thing I agree, we probably hijacked this place enough, I will stop doing that now. |
I was looking at various modules that can provide a fast lookup for whether an IP is in a list, and while testing this module, it turned out to have significantly slower load time compared to the other alternatives I looked at. The lookup is indeed faster, but not enough to justify the increased load time, which for me is prohibitively slow (20x slower than the fastest alternative, and takes over a minute to load a list of about 5 million IPs). I just wanted to create this issue so you're aware, and in case there's something which can be done to increase performance.
Benchmark code
Edit: Update: In the end I ended up with a private fork that strips out a ton of code from cidranger to get the performance up. Mostly I removed support for all the different address types and used only netip types, as well as adding a loader which uses the last insert point as the starting point to search for a location for the new node. This got load times over 20x faster, and lookups 1.5x faster.
https://gist.github.com/phemmer/6231b12d5207ea93a1690ddc44a2c811
The text was updated successfully, but these errors were encountered: