Skip to content

Latest commit

 

History

History
109 lines (90 loc) · 18.8 KB

Agent.md

File metadata and controls

109 lines (90 loc) · 18.8 KB

Agents

This part records LLMs as Agents: Brain, Perception, Tools

Table of Contents

Reading List

Technical Report

  • (2024-12) Genie 2: A large-scale foundation world model, Genie2

Survey

  • (2023-09) The Rise and Potential of Large Language Model Based Agents: A Survey paper
  • (2023-08) Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies paper
  • (2023-06) LLM Powered Autonomous Agents blog
  • (2023-04) Tool Learning with Foundation Models paper, BMTools
  • (2023-02) Augmented Language Models: a Survey paper

Reading List

Correct stands work that focus on correction of ALM

Paper LLM Code Publication Preprint Affiliation
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning LLaMA MLLM-Tool 2401.10727 ShanghaiTech
LLM Augmented LLMs: Expanding Capabilities through Composition PaLM2 CALM (non-official) 2401.02412 Google
GPT-4V(ision) is a Generalist Web Agent, if Grounded GPT-4 SeeAct 2312.github OSU
AppAgent: Multimodal Agents as Smartphone Users GPT4 AppAgent 2312.13771 Tencent
An LLM Compiler for Parallel Function Calling GPT, LLaMA2 LLMCompiler 2312.04511 UCB
ProAgent: From Robotic Process Automation to Agentic Process Automation GPT-4 ProAgent 2311.10751 THU
ControlLLM: Augment Language Models with Tools by Searching on Graphs ChatGPT, LLaMA ControlLLM 2310.17796 Shanghai AI Lab
AgentTuning: Enabling Generalized Agent Abilities For LLMs LLaMA2 AgentTuning 2310.12823 THU
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation AutoGen 2308.08155 Microsoft
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn ChatGPT AssistGPT 2306.08640 NUS
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Alpaca ToolAlpaca 2306.05301 CAS
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction Vicuna-13B GPT4Tools 2305.18752 Tencent
AdaPlanner: Adaptive Planning from Feedback with Language Models GPT3/3.5 AdaPlanner 2305.16653 Gatech
Correct CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing GPT3/3.5 CRITIC 2305.11738 Microsoft
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings LLaMA ToolkenGPT Neurips 2023 2305.11554 UCSD
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency GPT4 LLM-PDDL 2304.11477 UTEXAS
Can GPT-4 Perform Neural Architecture Search? GPT4 GENIUS 2304.10970 Cambridge
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models GPT4 Chameleon 2304.09842 Microsoft
OpenAGI: When LLM Meets Domain Experts ChatGPT OpenAGI 2304.04370 Rutgers Univ.
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models Opensource LLM LLM-Adapters 2304.01933 SMU
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace ChatGPT JARVIS 2303.17580 Microsoft
Language Models can Solve Computer Tasks ChatGPT, GPT3, etc RCI Agent 2303.17491 CMU
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs ChatGPT TaskMatrix 2303.16434 Microsoft
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action ChatGPT MM-REACT 2303.11381 Microsoft
ART: Automatic multi-step reasoning and tool-use for large language models GPT3, Codex Language-Programmes 2303.09014 Microsoft
Foundation Models for Decision Making: Problems, Methods, and Opportunities - - 2303.04129 Google
Correct Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback ChatGPT LLM-Augmenter 2302.12813 Microsoft
Toolformer: Language Models Can Teach Themselves to Use Tools GPT-J, OPT, GPT3 Toolformer (Unofficial) 2302.04761 Meta
Visual Programming: Compositional visual reasoning without training GPT3 VisProg CVPR 2023 2211.11559 AI2
Decomposed Prompting: A Modular Approach for Solving Complex Tasks GPT3 DecomP ICLR 2023 2210.02406 AI2
TALM: Tool Augmented Language Models T5 2205.12255 Google

Datasets & Benchmarks

Datasets

  • (2023.07) ToolBench, This project (ToolLLM) aims to construct open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability.

Benchmarks

  • (2024.01) R-Judge, R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
  • (2024.01) RoTBench, RoTBench consists of five external environments, each featuring varying levels of noise (i.e., Clean, Slight, Medium, Heavy, and Union), providing an in-depth analysis of the model's resilience across three critical phases: tool selection, parameter identification, and content filling
  • (2024.01) ToolEyes, The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization.
  • (2024.01) AgentBoard,An Analytical Evaluation Board of Multi-turn LLM Agents
  • (2023.10) PCA-EVAL, PCA-EVAL is an innovative benchmark for evaluating multi-domain embodied decision-making, specifically focusing on the performance in perception, cognition, and action
  • (2023.08) AgentBench, AgentBench is the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments. It encompasses 8 distinct environments to provide a more comprehensive evaluation of the LLMs' ability to operate as autonomous agents in various scenarios

Projects

  • (2023-12) KwaiAgents, A generalized information-seeking agent system with Large Language Models (LLMs).
  • (2023-11) CrewAI, Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
  • (2023-11) WarAgent, Large Language Model-based Multi-Agent Simulation of World Wars
  • (2023-08) MetaGPT, The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo
  • (2023-08) AI-Town, AI Town is a virtual town where AI characters live, chat and socialize
  • (2023-08) XLang, An open-source framework for building and evaluating language model agents via executable language grounding
  • (2023-06) Gentopia, Gentopia is a lightweight and extensible framework for LLM-driven Agents and ALM research
  • (2023-05) Tranformers Agent, Transformers Agent is an experimental API which is subject to change at any time
  • (2023-05) 闻达, 一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
  • (2023-04) AgentGPT, AgentGPT allows you to configure and deploy Autonomous AI agents
  • (2023-04) Auto-GPT, An experimental open-source attempt to make GPT-4 fully autonomous
  • (2023-04) BabyAGI, The system uses OpenAI and vector databases such as Chroma or Weaviate to create, prioritize, and execute tasks
  • (2022-10) LangChain, LangChain is a framework for developing applications powered by language models. It enables applications that: (i) Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.) (ii) Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

Applications

  • (2023-11) Awesome-AI-GPTs, 欢迎来到 EmbraceAGI GPTs 开源目录,本项目收录了 OpenAI GPTs 的相关资源和有趣玩法,让我们一起因 AI 而强大
  • (2023-04) Awesome-ChatGPT, An awe-inspiring collection of resources, encompassing a wide range of tools, documents, resources, applications, and use cases related to ChatGPT.
  • (2023-04) 众评AI, 全球AI网站排行榜展示了人工智能领域最顶尖的1800+个网站,排行榜每日更新
  • (2023-03) ChatPaper, Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文总结+润色+审稿+审稿回复
  • (2023-03) BibiGPT, BibiGPT · 1-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube | Tweet丨TikTok丨Local files | Websites丨Podcasts | Meetings | Lectures, etc. 音视频内容 AI 一键总结 & 对话:哔哩哔哩丨YouTube丨推特丨小红书丨抖音丨网页丨播客丨会议丨本地文件等 (原 BiliGPT 省流神器 & 课代表)