title | sidebarTitle | description |
---|---|---|
Unstructured |
Overview |
**Unstructured** provides tools to ingest and preprocess unstructured documents for `Retrieval Augmented Generation (RAG)` and `Fine Tuning`. |
- Unstructured API - The quickest way to get started for document transformation. We offer Serverless, Azure Marketplace, AWS Marketplace and Free versions of our API.
- Platform - Entirely no code enterprise platform to get all your data RAG-ready.
- Open Source - Best for prototyping.
If you're here to quickly transform your unstructured documents into understandable JSON, here's the too long didn't read version:
- Get an API Key and Server URL by signing up to the Unstructured Serverless API page on our website.
- Copy and run this code to install the Unstructured Python/JavaScript API SDK.
pip install unstructured-client
npm install unstructured-client
import json
from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared
# Update here with your api key and server url
client = UnstructuredClient(
api_key_auth="YOUR_API_KEY",
server_url="YOUR_API_URL",
)
# Update here with your filename
filename = "YOUR_FILE_NAME.pdf"
with open(filename, "rb") as f:
files = shared.Files(
content=f.read(),
file_name=filename,
)
# You can choose FAST, HI_RES or OCR_ONLY for strategy, learn more in the docs at step 4
req = operations.PartitionRequest(
shared.PartitionParameters(files=files, strategy=shared.Strategy.AUTO)
)
try:
resp = client.general.partition(req)
print(json.dumps(resp.elements, indent=2))
except Exception as e:
print(e)
import { UnstructuredClient } from "unstructured-client";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";
import { Strategy } from "unstructured-client/sdk/models/shared";
import * as fs from "fs";
// Update here with your api
const key = "YOUR-API-KEY";
// Update here with your serverURL
const client = new UnstructuredClient({
serverURL: "YOUR_API_URL",
security: {
apiKeyAuth: key,
},
});
// Update here with your filename
filename = "YOUR_FILE_NAME.pdf"
const data = fs.readFileSync(filename);
client.general.partition({
partitionParameters: {
files: {
content: data,
fileName: filename,
},
strategy: Strategy.Auto,
}
}).then((res: PartitionResponse) => {
if (res.statusCode == 200) {
console.log(res.elements);
}
}).catch((e) => {
if (e.statusCode) {
console.log(e.statusCode);
console.log(e.body);
} else {
console.log(e);
}
});
- Done! If you'd like a deeper dive on the API, see the detailed API Documentation. For more on what partitioning strategies are and why they are important, check out the Partioning Strategies guide.
If you don't find the information you're looking for in the documentation, or require assistance, get in touch with our Support team at [email protected], or join our Slack where our team and community can help you.