Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lalalune committed Aug 1, 2023
1 parent fb28a3e commit 95f525f
Showing 1 changed file with 98 additions and 111 deletions.
209 changes: 98 additions & 111 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ A browser for your agent, built on Playwright.
[![Lint and Test](https://github.com/AutonomousResearchGroup/agentbrowser/actions/workflows/test.yml/badge.svg)](https://github.com/AutonomousResearchGroup/agentbrowser/actions/workflows/test.yml)
[![PyPI version](https://badge.fury.io/py/agentbrowser.svg)](https://badge.fury.io/py/agentbrowser)


# Installation

```bash
Expand Down Expand Up @@ -49,209 +48,197 @@ text = get_body_text(page)
print(text)
```

# Basic:
## API Documentation

### `ensure_event_loop()`

# Create a new page
Ensure that there is an event loop in the current thread. If no event loop exists, a new one is created and set for the current thread. This function returns the current event loop.

Equivalent of ctrl+t in Chrome. Makes a new blank page.
Example usage:

```python
page = create_page()
loop = ensure_event_loop()
```

# Close a page
### `get_browser()`

Equivalent of ctrl+w in Chrome. Closes the current page.
Get a Playwright browser. If the browser doesn't exist, initializes a new one.

Example usage:

```python
close_page(page)
browser = get_browser()
```

## Navigate to a URL
### `init_browser(headless=True, executable_path=None)`

Equivalent of typing a URL into the address bar and hitting enter.
If you haven't created a page yet, it will create one for you.
Initialize a new Playwright browser.

```python
page = navigate_to("https://google.com")
```
Parameters:

## Get the HTML of the page
- `headless`: Whether the browser should be run in headless mode, defaults to True.
- `executable_path`: Path to a Chromium or Chrome executable to run instead of the bundled Chromium.

Get the entire document HTML
Example usage:

```python
html = get_document_html(page)
init_browser(headless=False, executable_path="/usr/bin/google-chrome")
```

## Get the HTML of the body
### `create_page(site=None)`

Get just the HTML of the body and inner. Useful for parsing out the content of the page.
Create a new page in the browser. If a site is provided, navigate to that site.

```python
html = get_body_html(page)
```
Parameters:

## Get the text of the body
- `site`: URL to navigate to, defaults to None.

Get just the text of the body. Unlike the raw function, tries to remove some useless tags and divs and things. Not perfect, though.
Example usage:

```python
text = get_body_text(page)
page = create_page("https://www.example.com")
```

### `close_page(page)`

Close a page.

# Advanced Usage
Parameters:

## Get browser
- `page`: The page to close.

This will give you a reference to the browser object, which you can use for advanced stuff. The browser object comes from Playwright, so anything you can do with Playwright, you can do with this.
Example usage:

```python
browser = get_browser()
page = create_page("https://www.example.com")
close_page(page)
```

## Evaluate Javascript
### `navigate_to(url, page, wait_until="domcontentloaded")`

Call some Javascript on the page. Equivalent of opening the console and typing in some Javascript.
Navigate to a URL in a page.

```python
result = evaluate_javascript(page, "document.title")
```
Parameters:

## Initialize browser
- `url`: The URL to navigate to.
- `page`: The page to navigate in.

This will initialize the browser object. You can pass `headless` and `executable_path`. Headless will control whether the actual window appears on screen. Executable path will control which browser is used. By default, it will try to find Chrome first, then fall back to Chromium if it can't find Chrome.

The browser will be auto-initialized by default so you don't need to call this. The only reason you would is because you want to use headful or swap the browser.
Example usage:

```python
init_browser(headless=True, executable_path="/path/to/chrome")
page = create_page()
navigate_to("https://www.example.com", page)
```

# Asynchronous Usage
### `get_document_html(page)`

The library also supports asyncio and offers asynchronous versions of the methods to facilitate non-blocking operations. Here's how to use them:
Get the HTML content of a page.

## Importing into your project
Parameters:

- `page`: The page to get the HTML from.

Example usage:

```python
from agentbrowser import (
async_get_browser,
async_init_browser,
async_navigate_to,
async_get_body_html,
async_get_body_text,
async_get_document_html,
async_create_page,
async_close_page,
async_evaluate_javascript,
)
page = create_page("https://www.example.com")
html = get_document_html(page)
print(html)
```

## Quickstart
### `get_page_title(page)`

```python
import asyncio
from agentbrowser import (
async_navigate_to,
async_get_body_text,
)
Get the title of a page.

async def main():
# Navigate to a URL
page = await async_navigate_to("https://google.com")
Parameters:

# Get the text from the page
text = await async_get_body_text(page)
- `page`: The page to get the title from.

print(text)
Example usage:

# Run the asyncio event loop
asyncio.run(main())
```python
page = create_page("https://www.example.com")
title = get_page_title(page)
print(title)
```

# Basic:

## Create a new page
### `get_body_text(page)`

Equivalent of ctrl+t in Chrome. Makes a new blank page.
Get the text content of a page's body.

```python
page = await async_create_page()
```
Parameters:

## Close a page
- `page`: The page to get the text from.

Equivalent of ctrl+w in Chrome. Closes the current page.
Example usage:

```python
await async_close_page(page)
page = create_page("https://www.example.com")
text = get_body_text(page)
print(text)
```

## Navigate to a URL
### `get_body_html(page)`

Equivalent of typing a URL into the address bar and hitting enter. If you haven't created a page yet, it will create one for you.
Get the HTML content of a page's body.

```python
page = await async_navigate_to("https://google.com")
```
Parameters:

## Get the HTML of the page
- `page`: The page to get the HTML from.

Get the entire document HTML
Example usage:

```python
html = await async_get_document_html(page)
page = create_page("https://www.example.com")
body_html = get_body_html(page)
print(body_html)
```

## Get the HTML of the body
### `screenshot_page(page)`

Get just the HTML of the body and inner. Useful for parsing out the content of the page.
Get a screenshot of a page.

```python
html = await async_get_body_html(page)
```
Parameters:

## Get the text of the body
- `page`: The page to screenshot.

Get just the text of the body. Unlike the raw function, tries to remove some useless tags and divs and things. Not perfect, though.
Example usage:

```python
text = await async_get_body_text(page)
page = create_page("https://www.example.com")
screenshot = screenshot_page(page)
with open("screenshot.png", "wb") as f:
f.write(screenshot)
```

### `evaluate_javascript(code, page)`

# Advanced Usage
Evaluate JavaScript code in a page.

## Get browser
Parameters:

This will give you a reference to the browser object, which you can use for advanced stuff. The browser object comes from Playwright, so anything you can do with Playwright, you can do with this.
- `code`: The JavaScript code to evaluate.
- `page`: The page to evaluate the code in.

```python
browser = await async_get_browser()
```

## Evaluate Javascript

Call some Javascript on the page. Equivalent of opening the console and typing in some Javascript.
Example usage:

```python
result = await async_evaluate_javascript(page, "document.title")
page = create_page("https://www.example.com")
result = evaluate_javascript("document.title", page)
print(result)
```

## Initialize browser
### `find_chrome()`

This will initialize the browser object. You can pass `headless` and `executable_path`. Headless will control whether the actual window appears on screen. Executable path will control which browser is used. By default, it will try to find Chrome first, then fall back to Chromium if it can't find Chrome.
Find the Chrome executable. Returns the path to the Chrome executable, or None if it could not be found.

The browser will be auto-initialized by default so you don't need to call this. The only reason you would is because you want to use headful or swap the browser.
Example usage:

```python
await async_init_browser(headless=True, executable_path="/path/to/chrome")
chrome_path = find_chrome()
print(chrome_path)
```

Remember to use `asyncio.run(main())` to start the asynchronous event loop when using these functions.

# Contributions Welcome

If you like this library and want to contribute in any way, please feel free to submit a PR and I will review it. Please note that the goal here is simplicity and accesibility, using common language and few dependencies.

0 comments on commit 95f525f

Please sign in to comment.