Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VECTOR_SIMILARITY second operand must be a float array #12359

Closed
hdulay opened this issue Feb 3, 2024 · 2 comments
Closed

VECTOR_SIMILARITY second operand must be a float array #12359

hdulay opened this issue Feb 3, 2024 · 2 comments
Assignees

Comments

@hdulay
Copy link
Contributor

hdulay commented Feb 3, 2024

I'm testing the Vector Similarity function using this statement but getting the corresponding error: For VECTOR_SIMILARITY predicate, the second operand must be a float array literal

select ProductId, UserId, l2_distance(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4]) as l2_dist, n_tokens, combined
from fineFoodReviews
where VECTOR_SIMILARITY(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4], 5)
-- order by l2_dist ASC
limit 5
ProcessingException(errorCode:150, message:SQLParsingError:
org.apache.pinot.sql.parsers.SqlCompilationException: For VECTOR_SIMILARITY predicate, the second operand must be a float array literal, got: Expression(type:FUNCTION, functionCall:Function(operator:VECTOR_SIMILARITY, operands:[Expression(type:LITERAL, literal:<Literal doubleArrayValue:[0.1, 0.1, 0.3, 0.4]>), Expression(type:LITERAL, literal:<Literal doubleArrayValue:[0.1, 0.1, 0.3, 0.4]>), Expression(type:LITERAL, literal:<Literal longValue:5>)]))
	at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.updateFunctionExpression(PredicateComparisonRewriter.java:139)
	at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.updatePredicate(PredicateComparisonRewriter.java:65)
	at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.rewrite(PredicateComparisonRewriter.java:40)
	at org.apache.pinot.sql.parsers.CalciteSqlParser.queryRewrite(CalciteSqlParser.java:569))

Workaround is to use CTE & multistage = true

with DIST as (
  SELECT 
    ProductId, 
    Summary, 
    Score,
    l2_distance(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4]) AS l2_dist
  from fineFoodReviews
)
select * from DIST
where l2_dist < .6
order by l2_dist asc
@hdulay
Copy link
Contributor Author

hdulay commented Feb 4, 2024

Here is the SQL executed in python that encountered this error.

SELECT 
  ProductId, 
  Summary, 
  Score,
  l2_distance(embedding, ARRAY{search_embedding}) AS l2_dist
from fineFoodReviews
where VECTOR_SIMILARITY(embedding, ARRAY{search_embedding}, 5)
order by l2_dist asc

@xiangfu0
Copy link
Contributor

xiangfu0 commented Feb 5, 2024

Thanks for the reporting!
There is a regression introduced by #12118
The fix is here: #12365

@xiangfu0 xiangfu0 self-assigned this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants