Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring connection bytes read/written #356

Open
gmaz42 opened this issue Jun 30, 2021 · 8 comments
Open

Measuring connection bytes read/written #356

gmaz42 opened this issue Jun 30, 2021 · 8 comments

Comments

@gmaz42
Copy link

gmaz42 commented Jun 30, 2021

I am noticing a problem with reading objects that might be large (spanning 1-2 megabytes) when setting a deadline on the time allowed to retrieve them.

Unfortunately they are an exception rather than the rule, so I was wondering: would it be possible to get information about the read/written connection bytes from a Get* operation?

I think it is not currently possible but if you are willing to accept a PR for it I could add a new set of methods like GetObjectWithStats?

@gmaz42
Copy link
Author

gmaz42 commented Jul 28, 2021

@khaf please let me know what you think about this, I might start a PR.

@khaf
Copy link
Collaborator

khaf commented Jul 28, 2021

Sorry I thought I had made a response, but apparently I hadn't posted it. Historically, we have been hesitant to add instrumentation to the client since it can affect performance. We are open to ideas, but at this moment we can not promise anything concrete until we make sure we can do it in a way that does not incur any performance penalties.

@gmaz42
Copy link
Author

gmaz42 commented Aug 19, 2021

I understand; I would also not be happy with writing an implementation that brings performance penalties.

I think this could be done in 2 ways:

  1. a wrapper around the connection objects that simply records I/O bytes; this might already be possible but would not give context of which commands performed the I/O
  2. a "stats" object used by every internal command that tracks the number of I/O bytes

I do not think that (2) would incur in (measurable) performance penalties because the measurement is already happening (each write/read on a connection returns the number of bytes), it is simply not accounted for.

The broader question would be how to "neatly" provide an API for accessing this information.

@gmaz42
Copy link
Author

gmaz42 commented Aug 19, 2021

Some ideas I have had so far:

  • attach the stats to the policy object which is currently passed as 1st argument
  • implement context (Missing context.Context in all API methods #255), then always check if there is such stats object in the context and use it if it is there (somewhat dirtier approach)

Another important bit of information when doing performance analysis is: which node did reply to my request? This can help spot degraded nodes.

By the way, I have not been able to analyze traffic in the cluster I am currently using to detect what could possibly cause slow replies and I am falling back to capturing traffic with tcpdump for later analysis; @khaf do you know about a better approach? I would be interested in knowing what happens during one of these GetObject / PutObject calls from server perspective.

@khaf
Copy link
Collaborator

khaf commented Sep 6, 2021

@gmaz42 Sorry it seems this message flew under my radar. Both your approaches have major downsides: first one can lead to race conditions, and the second one needs a major API overhaul.
The only way I see we can do something in that vein would be to include the stats with the record. Problem being that if the transaction hits an error mid transaction, you will lose the stats (since the returned record will be nil.)

I still do not know what could be best approach. At the moment we are considering exporting the stats outside the client, but even that comes with its own set of issues. We are still looking into it though.

@gmaz42
Copy link
Author

gmaz42 commented Sep 6, 2021

No problem, thanks for your reply @khaf!

first one can lead to race conditions

You could use the object extracted from context only if it satisfies a specific interface, and leave to caller to correctly implement that interface in a goroutine-safe fashion. This way you would support also the use-case that caller provides a single-use object before each call (that has no race condition concern). The client library would not even need to provide any solid implementation of this stats object, could use a private one for its tests alone.

I still do not know what could be best approach. At the moment we are considering exporting the stats outside the client, but even that comes with its own set of issues. We are still looking into it though.

I understand; thanks. Meanwhile my troubleshooting has gone a few levels lower in the stack and I am currently checking under which conditions timeouts happen (from Aerospike client PoV) because the process is not pulling out TCP packets fast enough from the TCP stack (this is a kubernetes scenario, so multiple processes and kernel scale internals are at play).

@khaf
Copy link
Collaborator

khaf commented Sep 6, 2021

The callback approach for instrumentation sounds nice. Let me think about it for a bit and see where it takes me.

@gmaz42
Copy link
Author

gmaz42 commented Sep 6, 2021

Sure, take your time; I can also provide a mock example in a draft PR if you like. Thanks @khaf

P.S. I also left some question/doubt here (shameless plug)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants