[Bug]: #17286

oksmith · 2024-12-16T21:47:54Z

Bug Description

When using the SchemaLLMPathExtractor with strict=True, some validation fails because the triplets are expected to be dictionaries when they are in fact strings. This leads to a TypeError, but we are only excepting KeyError and ValueErrors in this block.

This only happens occasionally on my dataset. I think that in this case we should except TypeError and skip the triplet (don't add it to the passing_triplets) rather than raising an error, in this validation function.

i.e.

except (KeyError, ValueError, TypeError):

Version

0.12.5

Steps to Reproduce

I'm using

db = kuzu.Database(gdb_name)

entities = Literal["VARIABLE", "FUNCTION"]
relations = Literal["ASSIGNED_IN", "CALLS", "USES", "RETURNS"]

# Define the relationship schema that we will pass to our graph store
# This must be a list of valid triples in the form (head_entity, relation, tail_entity)
validation_schema = [
    # Variable assignment
    ("VARIABLE", "ASSIGNED_IN", "VARIABLE"),
    # Function calls
    ("FUNCTION", "CALLS", "FUNCTION"),
    # Function or variable usage relationships
    ("FUNCTION", "USES", "VARIABLE"),
    ("VARIABLE", "USES", "FUNCTION"),
    # Return relationships
    ("FUNCTION", "RETURNS", "VARIABLE"),
]

# define the graph store
graph_store = KuzuPropertyGraphStore(
    db,
    has_structured_schema=True,
    relationship_schema=validation_schema,
)

# now create the graph index
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=llm,
            possible_entities=entities,
            possible_relations=relations,
            kg_validation_schema=validation_schema,
            strict=True,  # if false, will allow triples outside of the schema
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

with

llm = HuggingFaceLLM(
    tokenizer_name="meta-llama/Llama-3.2-1B",
    model_name="meta-llama/Llama-3.2-1B",
)

and

embed_model = HuggingFaceEmbedding("sentence-transformers/all-MiniLM-L6-v2")

Relevant Logs/Tracbacks

File "{path}/venv/lib/python3.11/site-packages/llama_index/core/indices/property_graph/transformations/schema_llm.py", line 197, in validate
    triplet[key]["type"] = triplet[key]["type"].replace(
                           ^^^^^^^^^^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'

The text was updated successfully, but these errors were encountered:

dosubot · 2024-12-16T21:49:05Z

Hey there, @oksmith! I'm here to help you with any issues or questions you have. Let's tackle this bug together!

To address the issue you're encountering with the SchemaLLMPathExtractor when strict=True, you can modify the exception handling in the validate method to include TypeError. This will allow the method to skip triplets that are strings instead of dictionaries, preventing the TypeError from being raised. You can update the exception handling block as follows:

def validate(v: Any) -> Any:
    """Validate triplets."""
    passing_triplets = []
    for i, triplet in enumerate(v):
        try:
            for key in triplet:
                triplet[key]["type"] = triplet[key]["type"].replace(
                    " ", "_"
                )
                triplet[key]["type"] = triplet[key]["type"].upper()

            # validate, skip if invalid
            _ = triplet_cls(**triplet)
            passing_triplets.append(v[i])
        except (KeyError, ValueError, TypeError):
            continue

    return passing_triplets

By adding TypeError to the exception block, the method will skip any triplets that are not in the expected dictionary format, thus preventing the error and ensuring only valid triplets are added to passing_triplets ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

oksmith added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: #17286

[Bug]: #17286

oksmith commented Dec 16, 2024 •

edited

Loading

dosubot bot commented Dec 16, 2024

[Bug]: #17286

[Bug]: #17286

Comments

oksmith commented Dec 16, 2024 • edited Loading

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Dec 16, 2024

oksmith commented Dec 16, 2024 •

edited

Loading