Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Accelerated DHT #45

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Conversation

dennis-tra
Copy link
Contributor

Context: #7

@dennis-tra dennis-tra force-pushed the v2-issue-7-accelerated-dht branch from 6a24dd1 to d99b889 Compare October 13, 2023 07:38
fullrt.go Outdated Show resolved Hide resolved
fullrt.go Outdated Show resolved Hide resolved
@@ -0,0 +1,15 @@
package zikade

//func TestNewFullRT(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write at least one test for FullRT

internal/coord/brdcst/brdcst.go Outdated Show resolved Hide resolved
internal/coord/brdcst/brdcst.go Outdated Show resolved Hide resolved
internal/coord/query.go Show resolved Hide resolved
internal/coord/query.go Outdated Show resolved Hide resolved
internal/coord/query/query.go Outdated Show resolved Hide resolved
internal/coord/routing.go Outdated Show resolved Hide resolved
internal/coord/routing/crawl.go Show resolved Hide resolved
@dennis-tra dennis-tra force-pushed the v2-issue-7-accelerated-dht branch from 6d4043c to e6118a3 Compare October 18, 2023 09:02
"github.com/plprobelab/zikade/pb"
)

type FullRT struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make much sense to me to call this FullRT. It seems to just be following the same naming as the go-libp2p-kad-dht implementation.

It seems to me that this is really a specialised routing table population strategy for the DHT. Can we make it an option on the normal DHT type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also wasn't sure about that. Though it's more than just a specialized routing table population strategy. The routing.Routing implementation also behaves quite differently.

If we put an option on the DHT type we'd need to branch into either the default routing.Routing implementation or the fullRT routing.Routing implementation which I think is not super elegant. I don't have a better idea though :/

}

for j := 0; j < c.cfg.MaxCPL; j++ {
target, err := c.cplFn(node.Key(), j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to targetNode since job.target is a key

return &StateCrawlIdle{}
}

if len(c.info.waiting) >= c.cfg.MaxCPL*c.cfg.Concurrency {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(c.info.waiting) >= c.cfg.MaxCPL*c.cfg.Concurrency {
if len(c.info.waiting) >= c.cfg.Concurrency {

Concurrency is the maximum number of concurrent requests, but the original code is sending 16 times as many

Copy link
Contributor Author

@dennis-tra dennis-tra Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending 16 requests is intended because each request contains (should contain) a different target key for which we want to know the 20 closest nodes that the other peer knows. This is the strategy for effectively fetching the entire routing table of a remote peer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency should be the maximum number of in-flight requests. The 16 is irrelevant here, since we are checking how many are currently in-flight. As it stands if the user specifies concurrency of 200 then they will actually end up with 3200 concurrent requests.

}

span.SetAttributes(
attribute.Int(prefix+"_todo", len(c.info.todo)),
Copy link
Contributor

@iand iand Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Report these as metric gauges too

}

// clear info to indicate that we're idle
c.info = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a metrics gauge that is 1 when the crawl is running and 0 otherwise

c.info = nil

return &StateCrawlFinished[K, N]{
Nodes: nodes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could contain thousands of nodes, do we need to return this? Could include stats instead: number found, number of errors etc.

cpls map[string]int
waiting map[string]N
success map[string]N
failed map[string]N
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't used the actual list of failures or errors. It would be less memory to keep a counts instead. If we don't return the (potentially very large) list of successful nodes then we could keep a count instead too

}

newJob := crawlJob[K, N]{
node: node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from crawling a node multiple times for the same target? Nodes A and B could both return node C in their list of nodes closer to target T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that's what cpls map is for. Can you add comments on the fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it, couldn't we end up asking the same node for different targets with the same CPL? The CPL function returns a random key in with the given CPL so a node could be asked to crawl the same CPL more than once with different keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right 🤔 Let me write a test for it and assert exactly that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants