Skip to content

Latest commit

 

History

History
120 lines (93 loc) · 3.85 KB

welcome.mdx

File metadata and controls

120 lines (93 loc) · 3.85 KB
title sidebarTitle description
Unstructured
Overview
**Unstructured** provides tools to ingest and preprocess unstructured documents for `Retrieval Augmented Generation (RAG)` and `Fine Tuning`.

We Offer 3 Products

  1. Unstructured API - The quickest way to get started for document transformation. We offer Serverless, Azure Marketplace, AWS Marketplace and Free versions of our API.
  2. Platform - Entirely no code enterprise platform to get all your data RAG-ready.
  3. Open Source - Best for prototyping.

TLDR

If you're here to quickly transform your unstructured documents into understandable JSON, here's the too long didn't read version:

  1. Get an API Key and Server URL by signing up to the Unstructured Serverless API page on our website.
  2. Copy and run this code to install the Unstructured Python/JavaScript API SDK.
pip install unstructured-client
npm install unstructured-client
3. Copy and run this code, replacing `api_key_auth`, `server_url`, and `filename` with actual values.
import json

from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared

# Update here with your api key and server url
client = UnstructuredClient(
    api_key_auth="YOUR_API_KEY",
    server_url="YOUR_API_URL",
)

# Update here with your filename
filename = "YOUR_FILE_NAME.pdf"

with open(filename, "rb") as f:
    files = shared.Files(
        content=f.read(),
        file_name=filename,
    )

# You can choose FAST, HI_RES or OCR_ONLY for strategy, learn more in the docs at step 4
req = operations.PartitionRequest(
    shared.PartitionParameters(files=files, strategy=shared.Strategy.AUTO)
)

try:
    resp = client.general.partition(req)
    print(json.dumps(resp.elements, indent=2))
except Exception as e:
    print(e)
import { UnstructuredClient } from "unstructured-client";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";
import { Strategy } from "unstructured-client/sdk/models/shared";
import * as fs from "fs";

// Update here with your api 
const key = "YOUR-API-KEY";

// Update here with your serverURL
const client = new UnstructuredClient({
    serverURL: "YOUR_API_URL",
    security: {
        apiKeyAuth: key,
    },
});

// Update here with your filename
filename = "YOUR_FILE_NAME.pdf"
const data = fs.readFileSync(filename);

client.general.partition({
    partitionParameters: {
        files: {
            content: data,
            fileName: filename,
        },
        strategy: Strategy.Auto,
   }
}).then((res: PartitionResponse) => {
    if (res.statusCode == 200) {
        console.log(res.elements);
    }
}).catch((e) => {
    if (e.statusCode) {
        console.log(e.statusCode);
        console.log(e.body);
    } else {
        console.log(e);
    }
});
  1. Done! If you'd like a deeper dive on the API, see the detailed API Documentation. For more on what partitioning strategies are and why they are important, check out the Partioning Strategies guide.

Get in touch

If you don't find the information you're looking for in the documentation, or require assistance, get in touch with our Support team at [email protected], or join our Slack where our team and community can help you.