Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: #17286

Open
oksmith opened this issue Dec 16, 2024 · 1 comment
Open

[Bug]: #17286

oksmith opened this issue Dec 16, 2024 · 1 comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@oksmith
Copy link

oksmith commented Dec 16, 2024

Bug Description

When using the SchemaLLMPathExtractor with strict=True, some validation fails because the triplets are expected to be dictionaries when they are in fact strings. This leads to a TypeError, but we are only excepting KeyError and ValueErrors in this block.

This only happens occasionally on my dataset. I think that in this case we should except TypeError and skip the triplet (don't add it to the passing_triplets) rather than raising an error, in this validation function.

i.e.

except (KeyError, ValueError, TypeError):

Version

0.12.5

Steps to Reproduce

I'm using

db = kuzu.Database(gdb_name)

entities = Literal["VARIABLE", "FUNCTION"]
relations = Literal["ASSIGNED_IN", "CALLS", "USES", "RETURNS"]

# Define the relationship schema that we will pass to our graph store
# This must be a list of valid triples in the form (head_entity, relation, tail_entity)
validation_schema = [
    # Variable assignment
    ("VARIABLE", "ASSIGNED_IN", "VARIABLE"),
    # Function calls
    ("FUNCTION", "CALLS", "FUNCTION"),
    # Function or variable usage relationships
    ("FUNCTION", "USES", "VARIABLE"),
    ("VARIABLE", "USES", "FUNCTION"),
    # Return relationships
    ("FUNCTION", "RETURNS", "VARIABLE"),
]

# define the graph store
graph_store = KuzuPropertyGraphStore(
    db,
    has_structured_schema=True,
    relationship_schema=validation_schema,
)

# now create the graph index
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=llm,
            possible_entities=entities,
            possible_relations=relations,
            kg_validation_schema=validation_schema,
            strict=True,  # if false, will allow triples outside of the schema
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

with

llm = HuggingFaceLLM(
    tokenizer_name="meta-llama/Llama-3.2-1B",
    model_name="meta-llama/Llama-3.2-1B",
)

and

embed_model = HuggingFaceEmbedding("sentence-transformers/all-MiniLM-L6-v2")

Relevant Logs/Tracbacks

File "{path}/venv/lib/python3.11/site-packages/llama_index/core/indices/property_graph/transformations/schema_llm.py", line 197, in validate
    triplet[key]["type"] = triplet[key]["type"].replace(
                           ^^^^^^^^^^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'
@oksmith oksmith added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 16, 2024
Copy link

dosubot bot commented Dec 16, 2024

Hey there, @oksmith! I'm here to help you with any issues or questions you have. Let's tackle this bug together!

To address the issue you're encountering with the SchemaLLMPathExtractor when strict=True, you can modify the exception handling in the validate method to include TypeError. This will allow the method to skip triplets that are strings instead of dictionaries, preventing the TypeError from being raised. You can update the exception handling block as follows:

def validate(v: Any) -> Any:
    """Validate triplets."""
    passing_triplets = []
    for i, triplet in enumerate(v):
        try:
            for key in triplet:
                triplet[key]["type"] = triplet[key]["type"].replace(
                    " ", "_"
                )
                triplet[key]["type"] = triplet[key]["type"].upper()

            # validate, skip if invalid
            _ = triplet_cls(**triplet)
            passing_triplets.append(v[i])
        except (KeyError, ValueError, TypeError):
            continue

    return passing_triplets

By adding TypeError to the exception block, the method will skip any triplets that are not in the expected dictionary format, thus preventing the error and ensuring only valid triplets are added to passing_triplets [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant