Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance autoTables API to support more flexible sharding rules #33364

Open
9 tasks
strongduanmu opened this issue Oct 23, 2024 · 5 comments
Open
9 tasks

Enhance autoTables API to support more flexible sharding rules #33364

strongduanmu opened this issue Oct 23, 2024 · 5 comments

Comments

@strongduanmu
Copy link
Member

strongduanmu commented Oct 23, 2024

Feature Request

Is your feature request related to a problem?

#33341

Describe the feature you would like.

In #33341, We temporarily removed the ShardingRouteAlgorithmException check logic to support storing different sharding tables in different database, but this may affect the normal check logic, for example: there is a problem with the sharding algorithm itself, resulting in routing to non-existent nodes.

A better way to support this is to refer to the existing autoTables usage and allow users to configure actualDataNodes. It is only necessary to ensure that the actual table name in actualDataNodes is globally unique, so that the sharding algorithm can directly find the corresponding database and table information based on actual table name. The traditional databaseStrategy and tableStrategy add more stringent configuration checks and do not allow irregular table sharding when it's under database sharding, because table sharding is based on database sharding logic.

The new configuration might look like this:

- !SHARDING
  autoTables:
    t_order:
      actualDataNodes: ds_${0}.t_order_${0..3},ds_${1}.t_order_${4..7}
      keyGenerateStrategy:
        column: order_id
        keyGeneratorName: t_order_snowflake
      logicTable: t_order
      shardingStrategy:
        standard:
          shardingAlgorithmName: t_order_mod
          shardingColumn: order_id

In addition, one benefit I can think of from this change is that the actualDataNodes of the existing autoTables are automatically generated by ShardingRule. We can also consider maintaining the automatically generated actualDataNodes in the newly added API, so that the data distribution is known to users. On this basis, we can remove the standard sharding algorithm and the automatic sharding algorithm from the existing sharding algorithms, and all algorithms can be universal because they only need to route the actualDataNodes.

For different sharding algorithm type, you can refer this doc - https://shardingsphere.apache.org/document/current/en/user-manual/common-config/builtin-algorithm/sharding/

Tasks:

  • Add new actualDataNodes yaml configuration
  • Add new actualDataNodes for DISTSQL
  • Init sharding and table rule according to new api
  • Adapte sharding sql route logic for autoTables actualDataNodes
  • Enhace configuration check logic when table sharding based on database sharding(keep the same actual table name)
  • Adjust autoTables actualDataSources expand to DistSQL and yaml handle logic(only persist actualDataNodes to zk instead of actualDataSources)
  • Remove ShardingAutoTableAlgorithm interface
  • Modify related doc
  • Add more unit test and e2e test
@Yash-cor
Copy link
Contributor

Yash-cor commented Nov 5, 2024

Hi @strongduanmu I am willing to work on this Issue.

@strongduanmu
Copy link
Member Author

Hi @Yash-cor, this task is somewhat difficult. Are you familiar with the sharding feature?

@Yash-cor
Copy link
Contributor

Yash-cor commented Nov 6, 2024

Yes, I am familiar with Autotables and have reviewed the documentation of ShardingSphere.
I plan to deepen my understanding of its functionality within ShardingSphere and I will begin working on this issue accordingly.

Hi @Yash-cor, this task is somewhat difficult. Are you familiar with the sharding feature?

@strongduanmu
Copy link
Member Author

@Yash-cor This sounds great, and you can organize the details of the code changes first, which will help ensure that you are working in the right direction.

@Yash-cor
Copy link
Contributor

Yash-cor commented Nov 26, 2024

@Yash-cor This sounds great, and you can organize the details of the code changes first, which will help ensure that you are working in the right direction.

Hello sorry for my late response I want to make sure that we have to keep actualDataSources and add the new 'actualDataNodes'
as if we delete actualDataSources it becomes similar configuration to normal sharding rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants