Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation #10

emrgnt-cmplxty · 2023-06-20T17:47:12Z

I wanted to bring up the subject of our regression testing and the potential for a new, more comprehensive approach to evaluating our agents' performance. Currently, our regression testing is limited mainly to search functionality, which leaves a lot of our codebase under-tested. A robust and comprehensive testing framework is critical to ensuring code quality, catching bugs early, and facilitating smooth code integration. It's also a key contributor to long-term code maintainability.

To this end, I propose two main initiatives:

Expand our Regression Testing: By enhancing our suite of regression tests, we can ensure that changes in one part of our code don't break something somewhere else. This will help us maintain system integrity and minimize the risks associated with ongoing development.
Introduce an Evaluation Suite with Eval Class: In addition to expanded regression testing, we should consider implementing a comprehensive evaluation suite featuring an Eval class. This class will allow us to evaluate how well our agents are performing by comparing their actions with expected outcomes.
Here's a rough skeleton of how the Eval class could look like:


class Eval(abc.ABC):
    """
    Evaluation classes generally should override two methods:
    `generate_eval_result`: Takes an instruction and a list of expected actions and evaluates the correctness of the agent's actions.
    `_extract_actions`: Removes the actions from a passed list of messages.
    """

    def __init__(self, *args, **kwargs):
        if "main_config" not in kwargs:
            raise ValueError("main_config must be provided to Eval")
        self.config = AutomataAgentConfigFactory.create_config(args, kwargs)

    def generate_eval_result(self, instructions: str, expected_actions: List[EvalAction]):
        """
        Evaluates a single sample.
        """
        logger.debug("Evaluating Instructions: %s", instructions)
        agent = AutomataAgentFactory.create_agent(instructions=instructions, config=self.config)
        agent.run()
        messages = Eval._extract_non_instruction_messages(agent)
        extracted_actions = Eval._extract_actions(messages)
        return calc_eval_result(extracted_actions, expected_actions)

    @staticmethod
    def _extract_actions(messages: List[OpenAIChatMessage]) -> List[Action]:
        """Extracts actions from a list of messages."""
        extracted_actions: List[Action] = []
        for message in messages:
            actions = AutomataActionExtractor.extract_actions(message.content)
            extracted_actions.extend(actions)
        return extracted_actions

Logic for the old Eval implementation can be seen here.

The Eval class's primary purpose is to take a set of instructions and corresponding expected actions and evaluate whether the instruction execution generates the anticipated actions. It's important to note that a good suite of evaluations won't necessarily pass with 100% success, but rather it will provide us a performance baseline and a clear target to strive for.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

The text was updated successfully, but these errors were encountered:

Feature/add leet code

…run-embedding update run_embedding

emrgnt-cmplxty added the good first issue Good for newcomers label Jun 23, 2023

emrgnt-cmplxty added a commit that referenced this issue Aug 28, 2023

Merge pull request #10 from emrgnt-cmplxty/feature/add-leet-code

dcc51b5

Feature/add leet code

Huntemall pushed a commit to Huntemall/automata-dev that referenced this issue Oct 30, 2023

Merge pull request emrgnt-cmplxty#10 from EmergentAGI/feature/udpate-…

96dbd3c

…run-embedding update run_embedding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation #10

Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation #10

emrgnt-cmplxty commented Jun 20, 2023 •

edited

Loading

Enhance Regression Testing and Implement Eval Class for Robust Agent Evaluation #10

Enhance Regression Testing and Implement Eval Class for Robust Agent Evaluation #10

Comments

emrgnt-cmplxty commented Jun 20, 2023 • edited Loading

Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation #10

Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation #10

emrgnt-cmplxty commented Jun 20, 2023 •

edited

Loading