Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix listIndexes bug with double quoted columnName in the index target from indexMetadata #1772

Merged
merged 11 commits into from
Jan 13, 2025

Conversation

Yuqi-Du
Copy link
Contributor

@Yuqi-Du Yuqi-Du commented Dec 2, 2024

What this PR does:
Summary, we got something like this, columnName is doubleQuoted in index target.

  1. The regex does not match it.
  2. We need to manually remove the double quote and then create cqlIndentifier.

Reproduction:

Table

{"createTable": {
  "name": "dreams",
  "definition": {
    "columns": {
      "matchId": "text",
      "round": "tinyint",
      "mVector": { "type": "vector", "dimension": 3 },
      "score": "int",
      "when": "timestamp",
      "winner": "text",
      "fighters": { "type": "set", "valueType": "uuid" }
    },
    "primaryKey": { "partitionBy": [ "matchId" ], "partitionSort": { "round": 1 } }
  },
  "options": { "ifNotExists": true }
}}

Index

{"createVectorIndex": {
  "name": "m_vector_index",
  "definition": {
    "column": "mVector",
    "options": { "sourceModel": "ada002", "metric": "dot_product" }
  }
}}

(unknown-returning) List indexes

{
    "status": {
        "indexes": [
            {
                "name": "abc_AH_123_a_1121",
                "definition": {
                    "column": "UNKNOWN",
                    "apiSupport": {
                        "createIndex": false,
                        "filter": false,
                        "cqlDefinition": "CREATE CUSTOM INDEX \"abc_AH_123_a_1121\" ON filter.testindex (\"V22eector2\")\nUSING 'StorageAttachedIndex'\nWITH OPTIONS = {\n    'similarity_function' : 'DOT_PRODUCT',\n    'source_model' : 'ADA002'}"
                    }
                }
            }
       }
}

Which issue(s) this PR fixes:
Fixes #1770

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@Yuqi-Du Yuqi-Du requested a review from a team as a code owner December 2, 2024 20:21
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this looks quite complicated despite small size of diffs.

I am not super happy with the way things are handled but it may be necessary to fix the issue so that's ok.

But what really would be needed would be a test to show that fix works, if at all possible. And will guard against regression (avoid bug from reappearing due to changes)

Yuqi-Du and others added 5 commits December 9, 2024 14:20
…oted-problem

# Conflicts:
#	src/main/java/io/stargate/sgv2/jsonapi/service/schema/tables/ApiIndexType.java
#	src/test/java/io/stargate/sgv2/jsonapi/api/v1/tables/CreateTableIndexIntegrationTest.java
@Yuqi-Du Yuqi-Du merged commit f033b0e into main Jan 13, 2025
3 checks passed
@Yuqi-Du Yuqi-Du deleted the yuqi/fix-listIndex-quoted-problem branch January 13, 2025 19:59
@amorton
Copy link
Contributor

amorton commented Jan 13, 2025

I would not have merged this, it is a very good candidate to write a unit test , so I have re-opened the ticket #1770

Copy link
Contributor

@amorton amorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on this merging, it makes some assumptions about things in the CQL target value that are risks, removes some validation checks , and adds a new enum rather than use the existing ApiIndexFunction

*/
private static Pattern INDEX_TARGET_PATTERN = Pattern.compile("^(\\w+)?(?:\\((\\w+)\\))?$");
private static final Pattern TARGET_REGEX =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is copied from the CQL class, we should add the name of the CQL class

functionName = matcher.group(1);
} else {
columnName = target;
cqlIndexType = CqlIndexType.VALUES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this, why is the default Values ?
What about what this is an index on a text field ?

columnName = TWO_QUOTES.matcher(columnName).replaceAll(QUOTE);
}

return new IndexTarget(CqlIdentifier.fromInternal(columnName), cqlIndexType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this stop using existing ApiIndexFunction ?

switch (this) {
case KEYS:
return "keys";
case KEYS_AND_VALUES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called KEY_AND_VALUES when it is the entries ?


public static CqlIndexType fromString(String s) {
if ("".equals(s)) return SIMPLE;
else if ("values".equals(s)) return VALUES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a case sensitive match and duplicates the logic in toString

else if ("entries".equals(s)) return KEYS_AND_VALUES;
else if ("full".equals(s)) return FULL;

throw new AssertionError("Unrecognized index target type " + s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be IllegalArgument we have not used AssertionError anywhere in the code

@@ -268,13 +268,6 @@ protected ApiIndexDef create(
.formatted(ApiIndexType.VECTOR, apiIndexType));
}

// also, we must not have an index function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check removed ?

if (indexTarget.indexFunction() == null && apiColumnDef.type().isPrimitive()) {
// If cqlIndexType is values, and the column is a scalar
// then it is a regular index on primitive types
if (indexTarget.cqlIndexType() == CQLSAIIndex.CqlIndexType.VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels risky - we are inferring the VALUES function when it is not defined in the CQL for the index and then using that inferred value. The previous code did not infer a function when there was none defined.

@@ -186,13 +186,6 @@ protected ApiIndexDef create(
.formatted(ApiIndexType.REGULAR, apiIndexType));
}

// also, we should not have an index function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this check removed, there is no function for a regular index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Fails to recognise vector index created via the API
3 participants