Skip to content

Latest commit

 

History

History
182 lines (131 loc) · 6.15 KB

README.md

File metadata and controls

182 lines (131 loc) · 6.15 KB

OpenHands Server

This is a WebSocket server that executes tasks using an agent.

Recommended Prerequisites

  • Initialize the frontend code
  • Install Python 3.12 (brew install python for those using homebrew)
  • Install pipx: (brew install pipx followed by pipx ensurepath)
  • Install poetry: (pipx install poetry)

Install

First build a distribution of the frontend code (From the project root directory):

cd frontend
npm install
npm run build
cd ..

Next run poetry shell (So you don't have to repeat poetry run)

Start the Server

uvicorn openhands.server.listen:app --reload --port 3000

Test the Server

You can use websocat to test the server.

websocat ws://127.0.0.1:3000/ws
{"action": "start", "args": {"task": "write a bash script that prints hello"}}

Supported Environment Variables

LLM_API_KEY=sk-... # Your OpenAI API Key
LLM_MODEL=gpt-4o   # Default model for the agent to use
WORKSPACE_BASE=/path/to/your/workspace # Default absolute path to workspace

API Schema

There are two types of messages that can be sent to, or received from, the server:

  • Actions
  • Observations

Actions

An action has three parts:

  • action: The action to be taken
  • args: The arguments for the action
  • message: A friendly message that can be put in the chat log

There are several kinds of actions. Their arguments are listed below. This list may grow over time.

  • initialize - initializes the agent. Only sent by client.
    • model - the name of the model to use
    • directory - the path to the workspace
    • agent_cls - the class of the agent to use
  • start - starts a new development task. Only sent by the client.
    • task - the task to start
  • read - reads the content of a file.
    • path - the path of the file to read
  • write - writes the content to a file.
    • path - the path of the file to write
    • content - the content to write to the file
  • run - runs a command.
    • command - the command to run
  • browse - opens a web page.
    • url - the URL to open
  • think - Allows the agent to make a plan, set a goal, or record thoughts
    • thought - the thought to record
  • finish - agent signals that the task is completed

Observations

An observation has four parts:

  • observation: The observation type
  • content: A string representing the observed data
  • extras: additional structured data
  • message: A friendly message that can be put in the chat log

There are several kinds of observations. Their extras are listed below. This list may grow over time.

  • read - the content of a file
    • path - the path of the file read
  • browse - the HTML content of a url
    • url - the URL opened
  • run - the output of a command
    • command - the command run
    • exit_code - the exit code of the command
  • chat - a message from the user

Server Components

The following section describes the server-side components of the OpenHands project.

1. session/session.py

The session.py file defines the Session class, which represents a WebSocket session with a client. Key features include:

  • Handling WebSocket connections and disconnections
  • Initializing and managing the agent session
  • Dispatching events between the client and the agent
  • Sending messages and errors to the client

2. session/agent.py

The agent.py file contains the AgentSession class, which manages the lifecycle of an agent within a session. Key features include:

  • Creating and managing the runtime environment
  • Initializing the agent controller
  • Handling security analysis
  • Managing the event stream

3. session/manager.py

The manager.py file defines the SessionManager class, which is responsible for managing multiple client sessions. Key features include:

  • Adding and restarting sessions
  • Sending messages to specific sessions
  • Cleaning up inactive sessions

4. listen.py

The listen.py file is the main server file that sets up the FastAPI application and defines various API endpoints. Key features include:

  • Setting up CORS middleware
  • Handling WebSocket connections
  • Managing file uploads
  • Providing API endpoints for agent interactions, file operations, and security analysis
  • Serving static files for the frontend

Workflow Description

  1. Server Initialization:

    • The FastAPI application is created and configured in listen.py.
    • CORS middleware and static file serving are set up.
    • The SessionManager is initialized.
  2. Client Connection:

    • When a client connects via WebSocket, a new Session is created or an existing one is restarted.
    • The Session initializes an AgentSession, which sets up the runtime environment and agent controller.
  3. Agent Initialization:

    • The client sends an initialization request.
    • The server creates and configures the agent based on the provided parameters.
    • The runtime environment is set up, and the agent controller is initialized.
  4. Event Handling:

    • The Session manages the event stream between the client and the agent.
    • Events from the client are dispatched to the agent.
    • Observations from the agent are sent back to the client.
  5. File Operations:

    • The server handles file uploads, ensuring they meet size and type restrictions.
    • File read and write operations are performed through the runtime environment.
  6. Security Analysis:

    • If configured, a security analyzer is initialized for each session.
    • Security-related API requests are forwarded to the security analyzer.
  7. Session Management:

    • The SessionManager periodically cleans up inactive sessions.
    • It also handles sending messages to specific sessions when needed.
  8. API Endpoints:

    • Various API endpoints are provided for agent interactions, file operations, and retrieving configuration defaults.

This server architecture allows for managing multiple client sessions, each with its own agent instance, runtime environment, and security analyzer. The event-driven design facilitates real-time communication between clients and agents, while the modular structure allows for easy extension and maintenance of different components.