-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue while scanning large datasets #65
Comments
@timoha this also occurs when using the scanner API directly and not using a channel. Any idea what is going on here? For context, for each cell in the scan, I am doing a cockroachdb database insertion. Is it possible that this operation is taking too long and the client is timing out? |
I'll try to investigate soon, been a bit occupied. The error that you are seeing should not affect anything in your code, as it happens in the background. Would be more useful if you provided the error that is returned by call to Next(). |
@timoha Interesting... I thought that was the error that was being returned by the Next() call. Unless, maybe, an |
@timoha Sorry for the (very) late reply here. |
@timoha the docs for
|
More information. Looks like when I see this UnknownScannerException, only that one rpc fails. If there are more RPCs to make, then they will continue on, but I will have lost the data (save some partial data that is returned, if any) for the failed RPC. Not really sure how to recover from this. So my statement a couple comments up is not entirely correct. The immediately following error is not necessarily |
Yeah, sound like you either take a long time between calling Next() or your regionserver died in the process of scanning. For the first case, we could implement periodic scanner lease renewal: https://github.com/tsuna/gohbase/blob/master/pb/Client.proto#L285 For the second case, need to spend some time to better handle error cases for scanner and retry in case we get this exception gracefully. I might have some time soon, to take a stab at it. |
@timoha I was able to confirm that it is the former. In this event will it miss data? If so, I am not sure what I can do here as we have, like, gigs of data that is being processed concurrently, distributed across nodes, but in order to not destroy our memory, we throttle the number of goroutines with a semaphore. If we crank this sempahore up too high, we end up pegging the CPU and stalling the system, so we have to have a balance. How difficult would it be to implement lease renewal? |
@timoha We are again encountering this time, we are not doing anything heavy, but we are doing a lot of scanning and inserting. I am worried we are overloading the cluster and I was wondering.. What happens in the event that the HBase cluster is overloaded and a request to the cluster takes a long time? Will the lease still expire because Next() hasn't been called in a bit? If so, what can we do here? (Aside from throw more resources at the HBase cluster.) |
Also, @timoha Do you think you could give me a run down of how scanner lease renewal should work? I'd be happy to submit a PR for this, but don't have the necessary understanding, I fear. Is there any documentation on the HBase protocol that I could use to discern such information? |
|
Haven't checked AsyncHBase code or standard client code in a while, but previously neither actually handled the case of partial row scanners. The problem is that if the scanner times out in the middle of the row, there's no way to safely (preserving row atomicity) restart scanning from the middle of the row. The best option has always been for clients to explicitly keep track of the last row they've scanned and restart scanning from the beginning of the row when exception happens. That way each client can take care of duplicates their own way. That being said, maybe there's an API that somehow uses MVCC to properly address this now, but scanners can blow up with OOM if you don't rely on partial row scanning feature. |
Closing in favour of #91 |
I am using the new scanner api like so:
After some minutes and several hundred rows being scanned, I am seeing the following error from HBase:
It should also be noted that this error occurs well before all of the rows are scanned, so I never get to scan all rows.
The text was updated successfully, but these errors were encountered: