Agents

This part records LLMs as Agents: Brain, Perception, Tools

Table of Contents

Reading List
Dataset & Benchmarks
Projects
Applications

Reading List

Technical Report

(2024-12) Genie 2: A large-scale foundation world model, Genie2

Survey

(2023-09) The Rise and Potential of Large Language Model Based Agents: A Survey paper
(2023-08) Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies paper
(2023-06) LLM Powered Autonomous Agents blog
(2023-04) Tool Learning with Foundation Models paper, BMTools
(2023-02) Augmented Language Models: a Survey paper

Reading List

Correct stands work that focus on correction of ALM

Paper	LLM	Code	Publication	Preprint	Affiliation
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning	LLaMA	MLLM-Tool		2401.10727	ShanghaiTech
LLM Augmented LLMs: Expanding Capabilities through Composition	PaLM2	CALM (non-official)		2401.02412	Google
GPT-4V(ision) is a Generalist Web Agent, if Grounded	GPT-4	SeeAct		2312.github	OSU
AppAgent: Multimodal Agents as Smartphone Users	GPT4	AppAgent		2312.13771	Tencent
An LLM Compiler for Parallel Function Calling	GPT, LLaMA2	LLMCompiler		2312.04511	UCB
ProAgent: From Robotic Process Automation to Agentic Process Automation	GPT-4	ProAgent		2311.10751	THU
ControlLLM: Augment Language Models with Tools by Searching on Graphs	ChatGPT, LLaMA	ControlLLM		2310.17796	Shanghai AI Lab
AgentTuning: Enabling Generalized Agent Abilities For LLMs	LLaMA2	AgentTuning		2310.12823	THU
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation		AutoGen		2308.08155	Microsoft
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn	ChatGPT	AssistGPT		2306.08640	NUS
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases	Alpaca	ToolAlpaca		2306.05301	CAS
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction	Vicuna-13B	GPT4Tools		2305.18752	Tencent
AdaPlanner: Adaptive Planning from Feedback with Language Models	GPT3/3.5	AdaPlanner		2305.16653	Gatech
`Correct` CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing	GPT3/3.5	CRITIC		2305.11738	Microsoft
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings	LLaMA	ToolkenGPT	Neurips 2023	2305.11554	UCSD
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	GPT4	LLM-PDDL		2304.11477	UTEXAS
Can GPT-4 Perform Neural Architecture Search?	GPT4	GENIUS		2304.10970	Cambridge
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models	GPT4	Chameleon		2304.09842	Microsoft
OpenAGI: When LLM Meets Domain Experts	ChatGPT	OpenAGI		2304.04370	Rutgers Univ.
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models	Opensource LLM	LLM-Adapters		2304.01933	SMU
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace	ChatGPT	JARVIS		2303.17580	Microsoft
Language Models can Solve Computer Tasks	ChatGPT, GPT3, etc	RCI Agent		2303.17491	CMU
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs	ChatGPT	TaskMatrix		2303.16434	Microsoft
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action	ChatGPT	MM-REACT		2303.11381	Microsoft
ART: Automatic multi-step reasoning and tool-use for large language models	GPT3, Codex	Language-Programmes		2303.09014	Microsoft
Foundation Models for Decision Making: Problems, Methods, and Opportunities	-	-		2303.04129	Google
`Correct` Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback	ChatGPT	LLM-Augmenter		2302.12813	Microsoft
Toolformer: Language Models Can Teach Themselves to Use Tools	GPT-J, OPT, GPT3	Toolformer (Unofficial)		2302.04761	Meta
Visual Programming: Compositional visual reasoning without training	GPT3	VisProg	CVPR 2023	2211.11559	AI2
Decomposed Prompting: A Modular Approach for Solving Complex Tasks	GPT3	DecomP	ICLR 2023	2210.02406	AI2
TALM: Tool Augmented Language Models	T5			2205.12255	Google

Datasets & Benchmarks

Datasets

(2023.07) ToolBench, This project (ToolLLM) aims to construct open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability.

Benchmarks

(2024.01) R-Judge, R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
(2024.01) RoTBench, RoTBench consists of five external environments, each featuring varying levels of noise (i.e., Clean, Slight, Medium, Heavy, and Union), providing an in-depth analysis of the model's resilience across three critical phases: tool selection, parameter identification, and content filling
(2024.01) ToolEyes, The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization.
(2024.01) AgentBoard，An Analytical Evaluation Board of Multi-turn LLM Agents
(2023.10) PCA-EVAL, PCA-EVAL is an innovative benchmark for evaluating multi-domain embodied decision-making, specifically focusing on the performance in perception, cognition, and action
(2023.08) AgentBench, AgentBench is the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments. It encompasses 8 distinct environments to provide a more comprehensive evaluation of the LLMs' ability to operate as autonomous agents in various scenarios

Projects

(2023-12) KwaiAgents, A generalized information-seeking agent system with Large Language Models (LLMs).
(2023-11) CrewAI, Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
(2023-11) WarAgent, Large Language Model-based Multi-Agent Simulation of World Wars
(2023-08) MetaGPT, The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo
(2023-08) AI-Town, AI Town is a virtual town where AI characters live, chat and socialize
(2023-08) XLang, An open-source framework for building and evaluating language model agents via executable language grounding
- homepage
(2023-06) Gentopia, Gentopia is a lightweight and extensible framework for LLM-driven Agents and ALM research
- paper
(2023-05) Tranformers Agent, Transformers Agent is an experimental API which is subject to change at any time
(2023-05) 闻达, 一个LLM调用平台。目标为针对特定环境的高效内容生成，同时考虑个人和中小企业的计算资源局限性，以及知识安全和私密性问题
(2023-04) AgentGPT, AgentGPT allows you to configure and deploy Autonomous AI agents
- demo
(2023-04) Auto-GPT, An experimental open-source attempt to make GPT-4 fully autonomous
- demo
(2023-04) BabyAGI, The system uses OpenAI and vector databases such as Chroma or Weaviate to create, prioritize, and execute tasks
- blog
(2022-10) LangChain, LangChain is a framework for developing applications powered by language models. It enables applications that: (i) Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.) (ii) Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
- Doc

Applications

(2023-11) Awesome-AI-GPTs, 欢迎来到 EmbraceAGI GPTs 开源目录，本项目收录了 OpenAI GPTs 的相关资源和有趣玩法，让我们一起因 AI 而强大
(2023-04) Awesome-ChatGPT, An awe-inspiring collection of resources, encompassing a wide range of tools, documents, resources, applications, and use cases related to ChatGPT.
(2023-04) 众评AI, 全球AI网站排行榜展示了人工智能领域最顶尖的1800+个网站，排行榜每日更新
(2023-03) ChatPaper, Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文总结+润色+审稿+审稿回复
- website
- Related: ChatReviewer
(2023-03) BibiGPT, BibiGPT · 1-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube | Tweet丨TikTok丨Local files | Websites丨Podcasts | Meetings | Lectures, etc. 音视频内容 AI 一键总结 & 对话：哔哩哔哩丨YouTube丨推特丨小红书丨抖音丨网页丨播客丨会议丨本地文件等 (原 BiliGPT 省流神器 & 课代表)
- website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent.md

Agent.md

Agents

Reading List

Datasets & Benchmarks

Projects

Applications

Files

Agent.md

Latest commit

History

Agent.md

File metadata and controls

Agents

Reading List

Datasets & Benchmarks

Projects

Applications