This part records LLMs as Agents: Brain, Perception, Tools
Table of Contents
Technical Report
- (2024-12) Genie 2: A large-scale foundation world model, Genie2
Survey
- (2023-09) The Rise and Potential of Large Language Model Based Agents: A Survey paper
- (2023-08) Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies paper
- (2023-06) LLM Powered Autonomous Agents blog
- (2023-04) Tool Learning with Foundation Models paper, BMTools
- (2023-02) Augmented Language Models: a Survey paper
Reading List
Correct
stands work that focus on correction of ALM
Paper | LLM | Code | Publication | Preprint | Affiliation |
---|---|---|---|---|---|
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning | LLaMA | MLLM-Tool | 2401.10727 | ShanghaiTech | |
LLM Augmented LLMs: Expanding Capabilities through Composition | PaLM2 | CALM (non-official) | 2401.02412 | ||
GPT-4V(ision) is a Generalist Web Agent, if Grounded | GPT-4 | SeeAct | 2312.github | OSU | |
AppAgent: Multimodal Agents as Smartphone Users | GPT4 | AppAgent | 2312.13771 | Tencent | |
An LLM Compiler for Parallel Function Calling | GPT, LLaMA2 | LLMCompiler | 2312.04511 | UCB | |
ProAgent: From Robotic Process Automation to Agentic Process Automation | GPT-4 | ProAgent | 2311.10751 | THU | |
ControlLLM: Augment Language Models with Tools by Searching on Graphs | ChatGPT, LLaMA | ControlLLM | 2310.17796 | Shanghai AI Lab | |
AgentTuning: Enabling Generalized Agent Abilities For LLMs | LLaMA2 | AgentTuning | 2310.12823 | THU | |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation | AutoGen | 2308.08155 | Microsoft | ||
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn | ChatGPT | AssistGPT | 2306.08640 | NUS | |
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases | Alpaca | ToolAlpaca | 2306.05301 | CAS | |
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction | Vicuna-13B | GPT4Tools | 2305.18752 | Tencent | |
AdaPlanner: Adaptive Planning from Feedback with Language Models | GPT3/3.5 | AdaPlanner | 2305.16653 | Gatech | |
Correct CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing |
GPT3/3.5 | CRITIC | 2305.11738 | Microsoft | |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | LLaMA | ToolkenGPT | Neurips 2023 | 2305.11554 | UCSD |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | GPT4 | LLM-PDDL | 2304.11477 | UTEXAS | |
Can GPT-4 Perform Neural Architecture Search? | GPT4 | GENIUS | 2304.10970 | Cambridge | |
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | GPT4 | Chameleon | 2304.09842 | Microsoft | |
OpenAGI: When LLM Meets Domain Experts | ChatGPT | OpenAGI | 2304.04370 | Rutgers Univ. | |
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | Opensource LLM | LLM-Adapters | 2304.01933 | SMU | |
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace | ChatGPT | JARVIS | 2303.17580 | Microsoft | |
Language Models can Solve Computer Tasks | ChatGPT, GPT3, etc | RCI Agent | 2303.17491 | CMU | |
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs | ChatGPT | TaskMatrix | 2303.16434 | Microsoft | |
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action | ChatGPT | MM-REACT | 2303.11381 | Microsoft | |
ART: Automatic multi-step reasoning and tool-use for large language models | GPT3, Codex | Language-Programmes | 2303.09014 | Microsoft | |
Foundation Models for Decision Making: Problems, Methods, and Opportunities | - | - | 2303.04129 | ||
Correct Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback |
ChatGPT | LLM-Augmenter | 2302.12813 | Microsoft | |
Toolformer: Language Models Can Teach Themselves to Use Tools | GPT-J, OPT, GPT3 | Toolformer (Unofficial) | 2302.04761 | Meta | |
Visual Programming: Compositional visual reasoning without training | GPT3 | VisProg | CVPR 2023 | 2211.11559 | AI2 |
Decomposed Prompting: A Modular Approach for Solving Complex Tasks | GPT3 | DecomP | ICLR 2023 | 2210.02406 | AI2 |
TALM: Tool Augmented Language Models | T5 | 2205.12255 |
Datasets
- (2023.07) ToolBench, This project (ToolLLM) aims to construct open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability.
Benchmarks
- (2024.01) R-Judge, R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
- (2024.01) RoTBench, RoTBench consists of five external environments, each featuring varying levels of noise (i.e., Clean, Slight, Medium, Heavy, and Union), providing an in-depth analysis of the model's resilience across three critical phases: tool selection, parameter identification, and content filling
- (2024.01) ToolEyes, The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization.
- (2024.01) AgentBoard,An Analytical Evaluation Board of Multi-turn LLM Agents
- (2023.10) PCA-EVAL, PCA-EVAL is an innovative benchmark for evaluating multi-domain embodied decision-making, specifically focusing on the performance in perception, cognition, and action
- (2023.08) AgentBench, AgentBench is the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments. It encompasses 8 distinct environments to provide a more comprehensive evaluation of the LLMs' ability to operate as autonomous agents in various scenarios
- (2023-12) KwaiAgents, A generalized information-seeking agent system with Large Language Models (LLMs).
- (2023-11) CrewAI, Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
- (2023-11) WarAgent, Large Language Model-based Multi-Agent Simulation of World Wars
- (2023-08) MetaGPT, The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo
- (2023-08) AI-Town, AI Town is a virtual town where AI characters live, chat and socialize
- (2023-08) XLang, An open-source framework for building and evaluating language model agents via executable language grounding
- (2023-06) Gentopia, Gentopia is a lightweight and extensible framework for LLM-driven Agents and ALM research
- (2023-05) Tranformers Agent, Transformers Agent is an experimental API which is subject to change at any time
- (2023-05) 闻达, 一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
- (2023-04) AgentGPT, AgentGPT allows you to configure and deploy Autonomous AI agents
- (2023-04) Auto-GPT, An experimental open-source attempt to make GPT-4 fully autonomous
- (2023-04) BabyAGI, The system uses OpenAI and vector databases such as Chroma or Weaviate to create, prioritize, and execute tasks
- (2022-10) LangChain, LangChain is a framework for developing applications powered by language models. It enables applications that: (i) Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.) (ii) Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
- (2023-11) Awesome-AI-GPTs, 欢迎来到 EmbraceAGI GPTs 开源目录,本项目收录了 OpenAI GPTs 的相关资源和有趣玩法,让我们一起因 AI 而强大
- (2023-04) Awesome-ChatGPT, An awe-inspiring collection of resources, encompassing a wide range of tools, documents, resources, applications, and use cases related to ChatGPT.
- (2023-04) 众评AI, 全球AI网站排行榜展示了人工智能领域最顶尖的1800+个网站,排行榜每日更新
- (2023-03) ChatPaper, Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文总结+润色+审稿+审稿回复
- website
- Related: ChatReviewer
- (2023-03) BibiGPT, BibiGPT · 1-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube | Tweet丨TikTok丨Local files | Websites丨Podcasts | Meetings | Lectures, etc. 音视频内容 AI 一键总结 & 对话:哔哩哔哩丨YouTube丨推特丨小红书丨抖音丨网页丨播客丨会议丨本地文件等 (原 BiliGPT 省流神器 & 课代表)