Merge pull request #926 from PrefectHQ/images

PrefectHQ · May 14, 2024 · aedfb95 · aedfb95
2 parents 8c0c083 + 30a4680
commit aedfb95
Show file tree

Hide file tree

Showing 40 changed files with 875 additions and 1,446 deletions.
diff --git a/README.md b/README.md
@@ -236,13 +236,13 @@ marvin.paint("a simple cup of coffee, still warm")
 
 Learn more about image generation [here](https://askmarvin.ai/docs/images/generation).
 
-## 🔍 Classify images (beta)
+## 🔍 Converting images to data
 
-In addition to text, Marvin has beta support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:
+In addition to text, Marvin has support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:
 
 ```python
-marvin.beta.classify(
-    marvin.beta.Image("docs/images/coffee.png"),
+marvin.classify(
+    marvin.Image.from_path("docs/images/coffee.png"),
     labels=["drink", "food"],
 )
 

diff --git a/cookbook/flows/insurance_claim.py b/cookbook/flows/insurance_claim.py
@@ -52,8 +52,8 @@ def build_damage_report_model(damages: list[DamagedPart]) -> type[M]:
 
 @task(cache_key_fn=task_input_hash)
 def marvin_extract_damages_from_url(image_url: str) -> list[DamagedPart]:
-    return marvin.beta.extract(
-        data=marvin.beta.Image.from_url(image_url),
+    return marvin.extract(
+        data=marvin.Image.from_url(image_url),
         target=DamagedPart,
         instructions=(
             "Give extremely brief, high-level descriptions of the damage. Only include"

diff --git a/docs/api_reference/beta/vision.md b/docs/api_reference/beta/vision.md
diff --git a/docs/assets/images/docs/vision/marvin.png b/docs/assets/images/docs/vision/marvin.png
diff --git a/docs/assets/images/docs/vision/marvin.webp b/docs/assets/images/docs/vision/marvin.webp
diff --git a/docs/docs/video/recording.md b/docs/docs/video/recording.md
@@ -26,7 +26,7 @@ counter = 0
 for image in recorder.stream():
     counter += 1
     # process each image
-    marvin.beta.caption(image)
+    marvin.caption(image)
 
     # stop recording
     if counter == 3:

diff --git a/docs/docs/vision/captioning.md b/docs/docs/vision/captioning.md
@@ -2,9 +2,6 @@
 
 Marvin can use OpenAI's vision API to process images as inputs. 
 
-!!! tip "Beta"
-    Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
-
 <div class="admonition abstract">
   <p class="admonition-title">What it does</p>
   <p>
@@ -18,19 +15,18 @@ Marvin can use OpenAI's vision API to process images as inputs.
 
     Generate a description of the following image, hypothetically available at `/path/to/marvin.png`:
 
-    ![](/assets/images/docs/vision/marvin.webp)
+    ![](/assets/images/docs/vision/marvin.png)
 
 
     ```python
     import marvin
-    from pathlib import Path
 
-    caption = marvin.beta.caption(image=Path('/path/to/marvin.png'))
+    caption = marvin.caption(marvin.Image.from_path('/path/to/marvin.png'))
     ```
 
     !!! success "Result"
 
-        "This is a digital illustration featuring a stylized, cute character resembling a Funko Pop vinyl figure with large, shiny eyes and a square-shaped head, sitting on abstract wavy shapes that simulate a landscape. The whimsical figure is set against a dark background with sparkling, colorful bokeh effects, giving it a magical, dreamy atmosphere."
+        "A cute, small robot with a square head and large, glowing eyes sits on a surface of wavy, colorful lines. The background is dark with scattered, glowing particles, creating a magical and futuristic atmosphere."
 
 
 <div class="admonition info">
@@ -41,6 +37,23 @@ Marvin can use OpenAI's vision API to process images as inputs.
 </div>
 
 
+## Providing instructions
+
+The `instructions` parameter offers an additional layer of control, enabling more nuanced caption generation, especially in ambiguous or complex scenarios.
+
+## Captions for multiple images
+
+To generate a single caption for multiple images, pass a list of `Image` objects to `caption`:
+
+```python
+marvin.caption(
+  [
+    marvin.Image.from_path('/path/to/img1.png'),
+    marvin.Image.from_path('/path/to/img2.png')
+  ],
+  instructions='...'
+)
+```
 
 
 ## Model parameters
@@ -53,5 +66,21 @@ You can pass parameters to the underlying API via the `model_kwargs` argument of
 If you are using Marvin in an async environment, you can use `caption_async`:
 
 ```python
-caption = await marvin.beta.caption_async(image=Path('/path/to/marvin.png'))
-```
+caption = await marvin.caption_async(image=Path('/path/to/marvin.png'))
+```
+## Mapping
+
+To generate individual captions for a list of inputs at once, use `.map`. Note that this is different than generating a single caption for multiple images, which is done by passing a list of `Image` objects to `caption`.
+
+```python
+inputs = [
+    marvin.Image.from_path('/path/to/img1.png'),
+    marvin.Image.from_path('/path/to/img2.png')
+]
+result = marvin.caption.map(inputs)
+assert len(result) == 2
+```
+
+(`marvin.cast_async.map` is also available for async environments.)
+
+Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
diff --git a/docs/docs/vision/classification.md b/docs/docs/vision/classification.md
@@ -2,10 +2,6 @@
 
 Marvin can use OpenAI's vision API to process images and classify them into categories.
 
-The `marvin.beta.classify` function is an enhanced version of `marvin.classify` that accepts images as well as text. 
-
-!!! tip "Beta"
-    Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
 
 <div class="admonition abstract">
   <p class="admonition-title">What it does</p>
@@ -36,14 +32,14 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`
     ```python
     import marvin
 
-    img = marvin.beta.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')
+    img = marvin.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')
 
-    animal = marvin.beta.classify(
+    animal = marvin.classify(
         img, 
         labels=['dog', 'cat', 'bird', 'fish', 'deer']
     )
 
-    dry_or_wet = marvin.beta.classify(
+    dry_or_wet = marvin.classify(
         img, 
         labels=['dry', 'wet'], 
         instructions='Is the animal wet?'
@@ -60,15 +56,15 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`
 
 
 ## Model parameters
-You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `classify`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
+You can pass parameters to the underlying API via the `model_kwargs` argument of `classify`. These parameters are passed directly to the API, so you can use any supported parameter.
 
 
 ## Async support
 
 If you are using Marvin in an async environment, you can use `classify_async`:
 
 ```python
-result = await marvin.beta.classify_async(
+result = await marvin.classify_async(
     "The app crashes when I try to upload a file.", 
     labels=["bug", "feature request", "inquiry"]
 ) 
@@ -85,10 +81,10 @@ inputs = [
     "The app crashes when I try to upload a file.",
     "How do change my password?"
 ]
-result = marvin.beta.classify.map(inputs, ["bug", "feature request", "inquiry"])
+result = marvin.classify.map(inputs, ["bug", "feature request", "inquiry"])
 assert result == ["bug", "inquiry"]
 ```
 
-(`marvin.beta.classify_async.map` is also available for async environments.)
+(`marvin.classify_async.map` is also available for async environments.)
 
 Mapping automatically issues parallel requests to the API, making it a highly efficient way to classify multiple inputs at once. The result is a list of classifications in the same order as the inputs.
diff --git a/docs/docs/vision/extraction.md b/docs/docs/vision/extraction.md
@@ -2,12 +2,8 @@
 
 Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
 
-The `marvin.beta.extract` function is an enhanced version of `marvin.extract` that accepts images as well as text.
 
 
-!!! tip "Beta"
-    Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
-
 <div class="admonition abstract">
   <p class="admonition-title">What it does</p>
   <p>
@@ -37,11 +33,11 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
     ```python
     import marvin
 
-    img = marvin.beta.Image(
+    img = marvin.Image(
         "https://images.unsplash.com/photo-1548199973-03cce0bbc87b?",
     )
 
-    result = marvin.beta.extract(img, target=str, instructions="dog breeds")
+    result = marvin.extract(img, target=str, instructions="dog breeds")
     ```
 
     !!! success "Result"
@@ -50,14 +46,14 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
         ```    
 
 ## Model parameters
-You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `extract`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
+You can pass parameters to the underlying API via the `model_kwargs` argument of `extract`. These parameters are passed directly to the API, so you can use any supported parameter.
 
 
 ## Async support
 If you are using Marvin in an async environment, you can use `extract_async`:
 
 ```python
-result = await marvin.beta.extract_async(
+result = await marvin.extract_async(
     "I drove from New York to California.",
     target=str,
     instructions="2-letter state codes",
@@ -75,10 +71,10 @@ inputs = [
     "I drove from New York to California.",
     "I took a flight from NYC to BOS."
 ]
-result = marvin.beta.extract.map(inputs, target=str, instructions="2-letter state codes")
+result = marvin.extract.map(inputs, target=str, instructions="2-letter state codes")
 assert result  == [["NY", "CA"], ["NY", "MA"]]
 ```
 
-(`marvin.beta.extract_async.map` is also available for async environments.)
+(`marvin.extract_async.map` is also available for async environments.)
 
 Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
diff --git a/docs/docs/vision/transformation.md b/docs/docs/vision/transformation.md
@@ -2,10 +2,6 @@
 
 Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
 
-The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that accepts images as well as text.
-
-!!! tip "Beta"
-    Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
 
 <div class="admonition abstract">
   <p class="admonition-title">What it does</p>
@@ -41,10 +37,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
         state: str = Field(description="2-letter state abbreviation")
 
 
-    img = marvin.beta.Image(
+    img = marvin.Image(
         "https://images.unsplash.com/photo-1568515387631-8b650bbcdb90",
     )
-    result = marvin.beta.cast(img, target=Location)
+    result = marvin.cast(img, target=Location)
     ```
 
     !!! success "Result"
@@ -70,10 +66,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
         authors: list[str]
 
 
-    img = marvin.beta.Image(
+    img = marvin.Image(
         "https://hastie.su.domains/ElemStatLearn/CoverII_small.jpg",
     )
-    result = marvin.beta.cast(img, target=Book)
+    result = marvin.cast(img, target=Book)
     ```
 
     !!! success "Result"
@@ -101,8 +97,8 @@ If the target type isn't self-documenting, or you want to provide additional gui
 
     shopping_list = ["bagels", "cabbage", "eggs", "apples", "oranges"]
 
-    missing_items = marvin.beta.cast(
-        marvin.beta.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"), 
+    missing_items = marvin.cast(
+        marvin.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"), 
         target=list[str], 
         instructions=f"Did I forget anything on my list: {shopping_list}?",
     )
@@ -113,15 +109,14 @@ If the target type isn't self-documenting, or you want to provide additional gui
         ```python
         assert missing_items == ["eggs", "oranges"]
         ```
-
 ## Model parameters
-You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `cast`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
+You can pass parameters to the underlying API via the `model_kwargs` argument of `cast`. These parameters are passed directly to the API, so you can use any supported parameter.
 
 ## Async support
 If you are using `marvin` in an async environment, you can use `cast_async`:
 
 ```python
-result = await marvin.beta.cast_async("one", int) 
+result = await marvin.cast_async("one", int) 
 
 assert result == 1
 ```
@@ -135,10 +130,10 @@ inputs = [
     "I bought two donuts.",
     "I bought six hot dogs."
 ]
-result = marvin.beta.cast.map(inputs, int)
+result = marvin.cast.map(inputs, int)
 assert result  == [2, 6]
 ```
 
-(`marvin.beta.cast_async.map` is also available for async environments.)
+(`marvin.cast_async.map` is also available for async environments.)
 
 Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.
diff --git a/docs/examples/webcam_narration.md b/docs/examples/webcam_narration.md
@@ -31,7 +31,7 @@ By combining a few Marvin tools, you can quickly create a live narration of your
         
         # if there are no more frames to process, generate a caption from the most recent 5
         if len(recorder) == 0:
-            caption = marvin.beta.caption(
+            caption = marvin.caption(
                 frames[-5:],
                 instructions=f"""
                     You are a parody of a nature documentary narrator, creating an

diff --git a/docs/examples/xkcd_bird.md b/docs/examples/xkcd_bird.md
@@ -8,11 +8,11 @@
     ```python
     import marvin
 
-    photo = marvin.beta.Image(
+    photo = marvin.Image(
         "https://images.unsplash.com/photo-1613891188927-14c2774fb8d7",
     )
 
-    result = marvin.beta.classify(
+    result = marvin.classify(
         photo,
         labels=["bird", "not bird"]
     )

diff --git a/docs/static/css/tailwind.css b/docs/static/css/tailwind.css
@@ -1,5 +1,5 @@
 /*
-! tailwindcss v3.4.1 | MIT License | https://tailwindcss.com
+! tailwindcss v3.4.3 | MIT License | https://tailwindcss.com
 */
 
 /*
@@ -211,6 +211,8 @@ textarea {
   /* 1 */
   line-height: inherit;
   /* 1 */
+  letter-spacing: inherit;
+  /* 1 */
   color: inherit;
   /* 1 */
   margin: 0;
@@ -234,9 +236,9 @@ select {
 */
 
 button,
-[type='button'],
-[type='reset'],
-[type='submit'] {
+input:where([type='button']),
+input:where([type='reset']),
+input:where([type='submit']) {
   -webkit-appearance: button;
   /* 1 */
   background-color: transparent;
@@ -492,6 +494,10 @@ video {
   --tw-backdrop-opacity:  ;
   --tw-backdrop-saturate:  ;
   --tw-backdrop-sepia:  ;
+  --tw-contain-size:  ;
+  --tw-contain-layout:  ;
+  --tw-contain-paint:  ;
+  --tw-contain-style:  ;
 }
 
 ::backdrop {
@@ -542,6 +548,10 @@ video {
   --tw-backdrop-opacity:  ;
   --tw-backdrop-saturate:  ;
   --tw-backdrop-sepia:  ;
+  --tw-contain-size:  ;
+  --tw-contain-layout:  ;
+  --tw-contain-paint:  ;
+  --tw-contain-style:  ;
 }
 
 .absolute {