-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Oracle Database Document Loader and Parser #7251
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
@jacoblee93 we're still working on documentation and tidying up the code but we wanted to get your thoughts and any feedback on the changes we've made so far. Thank you! |
Hey there! Thanks for this PR - it looks like the peer dep you're adding has some funny licensing. I don't fully understand the implications of it, and would be cautious because it's Oracle. Will put it on hold for now. |
Hi Jacob, just wanted to confirm—are you referring to the licensing of OracleDB? I noticed that the LangChain.py library also uses it as a peer dependency for its implementation of Oracle. |
027a10b
to
b65482e
Compare
@jacoblee93 do you have any ideas as to why the deployment to Preview-langchainjs-docs is failing? We've looked at another PR thread and saw that relative paths should be replaced by absolute paths. We've tried this solution and we're not sure if we're overseeing this, but we can't seem to pinpoint what's causing this failure. A response would be greatly appreciated. |
feat: Add Oracle Database Document Loader and Parser
Description
This PR introduces support for loading and parsing documents from Oracle databases into LangChain.js. By implementing this feature, we align the JavaScript version of LangChain with the Python library, providing a consistent API for Oracle database integration across both ecosystems.
Motivation
Currently, LangChain.js does not support Oracle databases, limiting JavaScript developers from integrating Oracle data sources into their Language Model (LLM) applications. This feature bridges that gap, enabling:
Document
andBaseDocumentLoader
abstractions.Adding Oracle database support ensures LangChain.js developers can access Oracle data without relying on external tools or libraries.
Key Features
Oracle Document Loader (
OracleDocLoader
).html
).Document
objects.Metadata Parsing with
ParseOracleDocMetadata
<title>
and<meta>
tags.File Reader with
OracleDocReader
dbms_vector_chain
package.Integration with LangChain
BaseDocumentLoader
to create a consistent API.Validation and Security
Changes Made
OracleDocLoader
,ParseOracleDocMetadata
, andOracleDocReader
classes.Cookbook
format (in progress).New Files
Source Code
langchain-ai/langchainjs/libs/langchain-community/src/document_loaders/web/oracleai.ts
: Contains the implementations ofOracleDocLoader
,ParseOracleDocMetadata
, andOracleDocReader
.Tests
langchain-ai/langchainjs/libs/langchain-community/src/document_loaders/tests/oracleai.test.ts
:ParseOracleDocMetadata
.OracleDocLoader
.Example Data
langchain-ai/langchainjs/libs/langchain-community/src/document_loaders/tests/example_data/oracleai/
: Sample data for testing file and directory loading.Tests and Coverage
Unit Tests
Metadata Parsing Tests:
<title>
and<meta>
tags.Document Loading Tests:
Integration Tests
Documentation
API Documentation (in progress)
Community Engagement
Checklist
OracleDocLoader
,ParseOracleDocMetadata
, andOracleDocReader
.Notes
oracledb
as a new dependency for Oracle database connectivity.Thank you for your time and consideration. Please let me know if you’d like me to refine any part of this PR.