Skip to content

Commit

Permalink
Merge pull request #926 from PrefectHQ/images
Browse files Browse the repository at this point in the history
  • Loading branch information
jlowin authored May 14, 2024
2 parents 8c0c083 + 30a4680 commit aedfb95
Show file tree
Hide file tree
Showing 40 changed files with 875 additions and 1,446 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,13 +236,13 @@ marvin.paint("a simple cup of coffee, still warm")

Learn more about image generation [here](https://askmarvin.ai/docs/images/generation).

## 🔍 Classify images (beta)
## 🔍 Converting images to data

In addition to text, Marvin has beta support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:
In addition to text, Marvin has support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:

```python
marvin.beta.classify(
marvin.beta.Image("docs/images/coffee.png"),
marvin.classify(
marvin.Image.from_path("docs/images/coffee.png"),
labels=["drink", "food"],
)

Expand Down
4 changes: 2 additions & 2 deletions cookbook/flows/insurance_claim.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ def build_damage_report_model(damages: list[DamagedPart]) -> type[M]:

@task(cache_key_fn=task_input_hash)
def marvin_extract_damages_from_url(image_url: str) -> list[DamagedPart]:
return marvin.beta.extract(
data=marvin.beta.Image.from_url(image_url),
return marvin.extract(
data=marvin.Image.from_url(image_url),
target=DamagedPart,
instructions=(
"Give extremely brief, high-level descriptions of the damage. Only include"
Expand Down
4 changes: 0 additions & 4 deletions docs/api_reference/beta/vision.md

This file was deleted.

Binary file added docs/assets/images/docs/vision/marvin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/images/docs/vision/marvin.webp
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/docs/video/recording.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ counter = 0
for image in recorder.stream():
counter += 1
# process each image
marvin.beta.caption(image)
marvin.caption(image)

# stop recording
if counter == 3:
Expand Down
47 changes: 38 additions & 9 deletions docs/docs/vision/captioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@

Marvin can use OpenAI's vision API to process images as inputs.

!!! tip "Beta"
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

<div class="admonition abstract">
<p class="admonition-title">What it does</p>
<p>
Expand All @@ -18,19 +15,18 @@ Marvin can use OpenAI's vision API to process images as inputs.

Generate a description of the following image, hypothetically available at `/path/to/marvin.png`:

![](/assets/images/docs/vision/marvin.webp)
![](/assets/images/docs/vision/marvin.png)


```python
import marvin
from pathlib import Path

caption = marvin.beta.caption(image=Path('/path/to/marvin.png'))
caption = marvin.caption(marvin.Image.from_path('/path/to/marvin.png'))
```

!!! success "Result"

"This is a digital illustration featuring a stylized, cute character resembling a Funko Pop vinyl figure with large, shiny eyes and a square-shaped head, sitting on abstract wavy shapes that simulate a landscape. The whimsical figure is set against a dark background with sparkling, colorful bokeh effects, giving it a magical, dreamy atmosphere."
"A cute, small robot with a square head and large, glowing eyes sits on a surface of wavy, colorful lines. The background is dark with scattered, glowing particles, creating a magical and futuristic atmosphere."


<div class="admonition info">
Expand All @@ -41,6 +37,23 @@ Marvin can use OpenAI's vision API to process images as inputs.
</div>


## Providing instructions

The `instructions` parameter offers an additional layer of control, enabling more nuanced caption generation, especially in ambiguous or complex scenarios.

## Captions for multiple images

To generate a single caption for multiple images, pass a list of `Image` objects to `caption`:

```python
marvin.caption(
[
marvin.Image.from_path('/path/to/img1.png'),
marvin.Image.from_path('/path/to/img2.png')
],
instructions='...'
)
```


## Model parameters
Expand All @@ -53,5 +66,21 @@ You can pass parameters to the underlying API via the `model_kwargs` argument of
If you are using Marvin in an async environment, you can use `caption_async`:

```python
caption = await marvin.beta.caption_async(image=Path('/path/to/marvin.png'))
```
caption = await marvin.caption_async(image=Path('/path/to/marvin.png'))
```
## Mapping

To generate individual captions for a list of inputs at once, use `.map`. Note that this is different than generating a single caption for multiple images, which is done by passing a list of `Image` objects to `caption`.

```python
inputs = [
marvin.Image.from_path('/path/to/img1.png'),
marvin.Image.from_path('/path/to/img2.png')
]
result = marvin.caption.map(inputs)
assert len(result) == 2
```

(`marvin.cast_async.map` is also available for async environments.)

Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
18 changes: 7 additions & 11 deletions docs/docs/vision/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@

Marvin can use OpenAI's vision API to process images and classify them into categories.

The `marvin.beta.classify` function is an enhanced version of `marvin.classify` that accepts images as well as text.

!!! tip "Beta"
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

<div class="admonition abstract">
<p class="admonition-title">What it does</p>
Expand Down Expand Up @@ -36,14 +32,14 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`
```python
import marvin

img = marvin.beta.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')
img = marvin.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')

animal = marvin.beta.classify(
animal = marvin.classify(
img,
labels=['dog', 'cat', 'bird', 'fish', 'deer']
)

dry_or_wet = marvin.beta.classify(
dry_or_wet = marvin.classify(
img,
labels=['dry', 'wet'],
instructions='Is the animal wet?'
Expand All @@ -60,15 +56,15 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`


## Model parameters
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `classify`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
You can pass parameters to the underlying API via the `model_kwargs` argument of `classify`. These parameters are passed directly to the API, so you can use any supported parameter.


## Async support

If you are using Marvin in an async environment, you can use `classify_async`:

```python
result = await marvin.beta.classify_async(
result = await marvin.classify_async(
"The app crashes when I try to upload a file.",
labels=["bug", "feature request", "inquiry"]
)
Expand All @@ -85,10 +81,10 @@ inputs = [
"The app crashes when I try to upload a file.",
"How do change my password?"
]
result = marvin.beta.classify.map(inputs, ["bug", "feature request", "inquiry"])
result = marvin.classify.map(inputs, ["bug", "feature request", "inquiry"])
assert result == ["bug", "inquiry"]
```

(`marvin.beta.classify_async.map` is also available for async environments.)
(`marvin.classify_async.map` is also available for async environments.)

Mapping automatically issues parallel requests to the API, making it a highly efficient way to classify multiple inputs at once. The result is a list of classifications in the same order as the inputs.
16 changes: 6 additions & 10 deletions docs/docs/vision/extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,8 @@

Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.

The `marvin.beta.extract` function is an enhanced version of `marvin.extract` that accepts images as well as text.


!!! tip "Beta"
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

<div class="admonition abstract">
<p class="admonition-title">What it does</p>
<p>
Expand Down Expand Up @@ -37,11 +33,11 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
```python
import marvin

img = marvin.beta.Image(
img = marvin.Image(
"https://images.unsplash.com/photo-1548199973-03cce0bbc87b?",
)

result = marvin.beta.extract(img, target=str, instructions="dog breeds")
result = marvin.extract(img, target=str, instructions="dog breeds")
```

!!! success "Result"
Expand All @@ -50,14 +46,14 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
```

## Model parameters
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `extract`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
You can pass parameters to the underlying API via the `model_kwargs` argument of `extract`. These parameters are passed directly to the API, so you can use any supported parameter.


## Async support
If you are using Marvin in an async environment, you can use `extract_async`:

```python
result = await marvin.beta.extract_async(
result = await marvin.extract_async(
"I drove from New York to California.",
target=str,
instructions="2-letter state codes",
Expand All @@ -75,10 +71,10 @@ inputs = [
"I drove from New York to California.",
"I took a flight from NYC to BOS."
]
result = marvin.beta.extract.map(inputs, target=str, instructions="2-letter state codes")
result = marvin.extract.map(inputs, target=str, instructions="2-letter state codes")
assert result == [["NY", "CA"], ["NY", "MA"]]
```

(`marvin.beta.extract_async.map` is also available for async environments.)
(`marvin.extract_async.map` is also available for async environments.)

Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
25 changes: 10 additions & 15 deletions docs/docs/vision/transformation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@

Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.

The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that accepts images as well as text.

!!! tip "Beta"
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

<div class="admonition abstract">
<p class="admonition-title">What it does</p>
Expand Down Expand Up @@ -41,10 +37,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
state: str = Field(description="2-letter state abbreviation")


img = marvin.beta.Image(
img = marvin.Image(
"https://images.unsplash.com/photo-1568515387631-8b650bbcdb90",
)
result = marvin.beta.cast(img, target=Location)
result = marvin.cast(img, target=Location)
```

!!! success "Result"
Expand All @@ -70,10 +66,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
authors: list[str]


img = marvin.beta.Image(
img = marvin.Image(
"https://hastie.su.domains/ElemStatLearn/CoverII_small.jpg",
)
result = marvin.beta.cast(img, target=Book)
result = marvin.cast(img, target=Book)
```

!!! success "Result"
Expand Down Expand Up @@ -101,8 +97,8 @@ If the target type isn't self-documenting, or you want to provide additional gui

shopping_list = ["bagels", "cabbage", "eggs", "apples", "oranges"]

missing_items = marvin.beta.cast(
marvin.beta.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"),
missing_items = marvin.cast(
marvin.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"),
target=list[str],
instructions=f"Did I forget anything on my list: {shopping_list}?",
)
Expand All @@ -113,15 +109,14 @@ If the target type isn't self-documenting, or you want to provide additional gui
```python
assert missing_items == ["eggs", "oranges"]
```

## Model parameters
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `cast`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
You can pass parameters to the underlying API via the `model_kwargs` argument of `cast`. These parameters are passed directly to the API, so you can use any supported parameter.

## Async support
If you are using `marvin` in an async environment, you can use `cast_async`:

```python
result = await marvin.beta.cast_async("one", int)
result = await marvin.cast_async("one", int)

assert result == 1
```
Expand All @@ -135,10 +130,10 @@ inputs = [
"I bought two donuts.",
"I bought six hot dogs."
]
result = marvin.beta.cast.map(inputs, int)
result = marvin.cast.map(inputs, int)
assert result == [2, 6]
```

(`marvin.beta.cast_async.map` is also available for async environments.)
(`marvin.cast_async.map` is also available for async environments.)

Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
2 changes: 1 addition & 1 deletion docs/examples/webcam_narration.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ By combining a few Marvin tools, you can quickly create a live narration of your
# if there are no more frames to process, generate a caption from the most recent 5
if len(recorder) == 0:
caption = marvin.beta.caption(
caption = marvin.caption(
frames[-5:],
instructions=f"""
You are a parody of a nature documentary narrator, creating an
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/xkcd_bird.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
```python
import marvin

photo = marvin.beta.Image(
photo = marvin.Image(
"https://images.unsplash.com/photo-1613891188927-14c2774fb8d7",
)

result = marvin.beta.classify(
result = marvin.classify(
photo,
labels=["bird", "not bird"]
)
Expand Down
18 changes: 14 additions & 4 deletions docs/static/css/tailwind.css
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
! tailwindcss v3.4.1 | MIT License | https://tailwindcss.com
! tailwindcss v3.4.3 | MIT License | https://tailwindcss.com
*/

/*
Expand Down Expand Up @@ -211,6 +211,8 @@ textarea {
/* 1 */
line-height: inherit;
/* 1 */
letter-spacing: inherit;
/* 1 */
color: inherit;
/* 1 */
margin: 0;
Expand All @@ -234,9 +236,9 @@ select {
*/

button,
[type='button'],
[type='reset'],
[type='submit'] {
input:where([type='button']),
input:where([type='reset']),
input:where([type='submit']) {
-webkit-appearance: button;
/* 1 */
background-color: transparent;
Expand Down Expand Up @@ -492,6 +494,10 @@ video {
--tw-backdrop-opacity: ;
--tw-backdrop-saturate: ;
--tw-backdrop-sepia: ;
--tw-contain-size: ;
--tw-contain-layout: ;
--tw-contain-paint: ;
--tw-contain-style: ;
}

::backdrop {
Expand Down Expand Up @@ -542,6 +548,10 @@ video {
--tw-backdrop-opacity: ;
--tw-backdrop-saturate: ;
--tw-backdrop-sepia: ;
--tw-contain-size: ;
--tw-contain-layout: ;
--tw-contain-paint: ;
--tw-contain-style: ;
}

.absolute {
Expand Down
Loading

0 comments on commit aedfb95

Please sign in to comment.