\n",
- " 0 | Tool Usage - Alpha | e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5 | Environment with fake data about users and their locations and favorite foods.\n",
+ " |
0 | Tool Usage - Relational Data | e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5 | Environment with fake data about users and their locations and favorite foods.\n",
"\n",
"The environment provides a set of tools that can be used to query the data.\n",
"\n",
- "The object is to evaluate the ability of an agent to use the tools\n",
- "to answer questions about the data.\n",
+ "The objective of this task is to evaluate the ability to use the provided tools to answer questions about relational data.\n",
"\n",
- "The dataset contains 21 examples of varying difficulty. The difficulty is measured\n",
- "by the number of tools that need to be used to answer the question.\n",
+ "The dataset contains 21 examples of varying difficulty. The difficulty is measured by the number of tools that need to be used to answer the question.\n",
"\n",
- "Each example is composed of a question, a reference answer, and\n",
- "information about the sequence in which tools should be used to answer\n",
- "the question.\n",
+ "Each example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question.\n",
"\n",
"Success is measured by the ability to answer the question correctly, and efficiently. |
\n",
+ " 1 | Tool Usage - Typewriter (1 func) | placeholder | Environment with a single function that accepts a single letter as input, and "prints" it on a piece of paper.\n",
+ "\n",
+ "The objective of this task is to evaluate the ability to use the provided tools to repeat a given input string.\n",
+ "\n",
+ "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n",
+ "\n",
+ "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. |
\n",
+ " 2 | Tool Usage - Typewriter | placeholder | Environment with 26 functions each representing a letter of the alphabet.\n",
+ "\n",
+ "In this variation of the typewriter task, there are 26 parameterless functions, where each function represents a letter of the alphabet (instead of a single function that takes a letter as an argument).\n",
+ "\n",
+ "The object is to evaluate the ability of use the functions to repeat the given string.\n",
+ "\n",
+ "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n",
+ "\n",
+ "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. |
\n",
+ " 3 | Multiverse Math | placeholder | An environment that contains a few basic math operations, but with altered results.\n",
+ "\n",
+ "For example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\n",
+ "\n",
+ "The objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math. |
\n",
"\n",
"