Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.x: Make driver invalidate tablets #378

Closed
dkropachev opened this issue Nov 22, 2024 · 9 comments
Closed

3.x: Make driver invalidate tablets #378

dkropachev opened this issue Nov 22, 2024 · 9 comments
Assignees

Comments

@dkropachev
Copy link
Collaborator

dkropachev commented Nov 22, 2024

Invalidation on events

  • On SCHEMA_CHANGE, for target keyspace + table, for all change types: UPDATED or DROPPED drop related tablet records.
  • On TOPOLOGY_CHANGE, for REMOVED_NODE (and probably MOVED_NODE) change type, all tablet records that contains this node, to be removed.

Possible optimizations:

  • On SCHEMA_CHANGE change types: UPDATED driver can look at what have changed and if nothing that can trigger tablet migration (say, ks replication factor and tablet feature on the table) is touched, then invalidation to be voided

Invalidation on control connection reconnect

When cc is lost, driver can miss some of the events.
So we need some logic to catch up:

  • Read keyspace and table descriptions to find if any keyspace or table was removed or updated, doing exactly the same what should be done for SCHEMA_CHANGE event.
  • Read system.peers to find if any node was removed and do the same what should be done for TOPOLOGY_CHANGE event.
Bouncheck added a commit to Bouncheck/java-driver that referenced this issue Nov 26, 2024
Registers an event listener dedicated to updating the `TabletMap`
held by `Metadata`. It will trigger removal of relevant mappings whenever it's
notified of any change to the table schema or keyspace schema including
dropping them.

Introduces an integration test that verifies this listener's behaviour.

Addresses scylladb#378.
@Bouncheck
Copy link
Collaborator

Created #382 . Regarding invalidation on node removal, the tablet map already skips the replicas that are not currently recognized hosts as seen by Metadata class. Actively removing them would require acquiring a write lock (even if done lazily) and possibly traversing the map of sets in order to find relevant entries. In almost every case this stale data will be removed with the next tablet payload. It's an issue in a case like cluster doing a big downscale without moving the tablets at all.

Do you think I should proceed with removing stale hosts when encountered or is it fine as it is?

@Lorak-mmk
Copy link

Lorak-mmk commented Nov 26, 2024

Created #382 . Regarding invalidation on node removal, the tablet map already skips the replicas that are not currently recognized hosts as seen by Metadata class. Actively removing them would require acquiring a write lock (even if done lazily) and possibly traversing the map of sets in order to find relevant entries. In almost every case this stale data will be removed with the next tablet payload. It's an issue in a case like cluster doing a big downscale without moving the tablets at all.

Do you think I should proceed with removing stale hosts when encountered or is it fine as it is?

What do you mean by "skips the replicas that are not currently recognized hosts as seen by Metadata class"?
For example, if tablet has replicas [A, B, C], and C is removed (and replaced with D on replica list, so replicas are now [A, B, D]), then the driver is just going to make queries to A and B, right?

If this is the case, then note that driver won't receive new tablet custom payload - because A and B are still replicas so queries made to them won't result in tablet feedback. This is the reason for removing tablets with removed replicas.

This is not only important for downscaling, but also for replacing nodes, which is a more frequent operation afaik.

@Lorak-mmk
Copy link

Alternative to actively removing those tablets is ignoring them: during the query, after determining tablet for this query, check if all replicas are still in the cluster. If there is a stale replica, proceed as if the tablet was not present at all.

I'm not convinced this will be a better solution performance-wise, because now you have those checks in a hot-path (query execution).

If there is only one thread / worker (or whatever is the correct terminology here) that modifies tablets, then you can apply a similar strategy as we did in Rust Driver: copy on write: copy all the tablet data, modify the copy, switch the object to be used by queries. This eliminates any need for any locking.

@Bouncheck
Copy link
Collaborator

What do you mean by "skips the replicas that are not currently recognized hosts as seen by Metadata class"? For example, if tablet has replicas [A, B, C], and C is removed (and replaced with D on replica list, so replicas are now [A, B, D]), then the driver is just going to make queries to A and B, right?

The tablet map will just return [A,B] even though it has C somewhere internally too, because the Metadata instance does not see C as alive host.

If this is the case, then note that driver won't receive new tablet custom payload - because A and B are still replicas so queries made to them won't result in tablet feedback. This is the reason for removing tablets with removed replicas.

That's right. I see now that the intention was to drop the whole tablet, not just remove stale host from the replica list.
In that case I can see the benefit of doing that.

@Bouncheck
Copy link
Collaborator

Bouncheck commented Nov 26, 2024

Alternative to actively removing those tablets is ignoring them: during the query, after determining tablet for this query, check if all replicas are still in the cluster. If there is a stale replica, proceed as if the tablet was not present at all.

I'm not convinced this will be a better solution performance-wise, because now you have those checks in a hot-path (query execution).

I think I'll proceed with that. In java driver I need to ask Metadata instance about the hosts anyway. I do that in order to translate the UUIDs saved from tablets_routing_v1 payloads into already existing instances of Host class. So I just need to invalidate the whole tablet whenever that "check" returns a null instead of skipping it. (Actually I realized the conversion from uuid to host happens in a slightly different moment, but I'll try to follow this path first)

And invalidating a whole tablet does not require a write lock unlike modifying part of it.

@Bouncheck
Copy link
Collaborator

Bouncheck commented Nov 26, 2024

I got the granularity in my comment wrong again: removing a mapping for a whole table does not require lock. For a singular tablet from the set of many for a specific table it does.
But here it should be fine to invalidate whole table anyway, am I correct on that? On first encounter of a stale host in the replica list it is highly likely that all other tablets of this table also have this stale host. I don't see a scenario where that would not be the case right now.

@Lorak-mmk
Copy link

I got the granularity in my comment wrong again: removing a mapping for a whole table does not require lock. For a singular tablet from the set of many for a specific table it does. But here it should be fine to invalidate whole table anyway, am I correct on that? On first encounter of a stale host in the replica list it is highly likely that all other tablets of this table also have this stale host. I don't see a scenario where that would not be the case right now.

Why would that be the case? Let's say that you have 20 nodes and RF=3. In that case a node will be a replica for ~15% of the tablets. You can calculate it here: https://opensource.docs.scylladb.com/stable/cql/consistency-calculator.html . This number will only go down for bigger clusters. Invalidating <15% of tablets sounds much better than invalidating 100% right?

@Bouncheck
Copy link
Collaborator

Bouncheck commented Nov 26, 2024

Yeah 15% is way too small to justify that (by that I mean throwing 100% away). Thanks for the clarifications.
Even 66% would be not that great either.

Bouncheck added a commit to Bouncheck/java-driver that referenced this issue Nov 26, 2024
Makes TabletMap return empty collection for some calls to `getReplicas()`.
If the current replica list for a tablet turns out to contain a host
that is not present in current Metadata then empty collection is returned
in an effort to misroute the query. This query will cause the tablet
information to be updated when the result with the up to date information
is received.
It is possible that the query will be randomly routed correctly, but this
case is significantly less likely with each subsequent query, although
this can depend on underlying load balancing policy.

This covers the node removal/replacement case of scylladb#378.
Bouncheck added a commit to Bouncheck/java-driver that referenced this issue Nov 26, 2024
Makes TabletMap return empty collection for some calls to `getReplicas()`.
If the current replica list for a tablet turns out to contain a host
that is not present in current Metadata then empty collection is returned
in an effort to misroute the query. This query will cause the tablet
information to be updated when the result with the up to date information
is received.
It is possible that the query will be randomly routed correctly, but this
case is significantly less likely with each subsequent query, although
this can depend on underlying load balancing policy.

This covers the node removal/replacement case of scylladb#378.
Bouncheck added a commit to Bouncheck/java-driver that referenced this issue Nov 27, 2024
Registers an event listener dedicated to updating the `TabletMap`
held by `Metadata`. It will trigger removal of relevant mappings whenever it's
notified of any change to the table schema or keyspace schema including
dropping them.

Introduces an integration test that verifies this listener's behaviour.

Addresses scylladb#378.
Bouncheck added a commit that referenced this issue Nov 27, 2024
Registers an event listener dedicated to updating the `TabletMap`
held by `Metadata`. It will trigger removal of relevant mappings whenever it's
notified of any change to the table schema or keyspace schema including
dropping them.

Introduces an integration test that verifies this listener's behaviour.

Addresses #378.
Bouncheck added a commit to Bouncheck/java-driver that referenced this issue Nov 28, 2024
Makes TabletMap return empty collection for some calls to `getReplicas()`.
If the current replica list for a tablet turns out to contain a host
that is not present in current Metadata then empty collection is returned
in an effort to misroute the query. This query will cause the tablet
information to be updated when the result with the up to date information
is received.
It is possible that the query will be randomly routed correctly, but this
case is significantly less likely with each subsequent query, although
this can depend on underlying load balancing policy.

This covers the node removal/replacement case of scylladb#378.
Bouncheck added a commit that referenced this issue Nov 28, 2024
Makes TabletMap return empty collection for some calls to `getReplicas()`.
If the current replica list for a tablet turns out to contain a host
that is not present in current Metadata then empty collection is returned
in an effort to misroute the query. This query will cause the tablet
information to be updated when the result with the up to date information
is received.
It is possible that the query will be randomly routed correctly, but this
case is significantly less likely with each subsequent query, although
this can depend on underlying load balancing policy.

This covers the node removal/replacement case of #378.
@Bouncheck
Copy link
Collaborator

Closing as completed by #382 and #385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants