Skip to content

Commit

Permalink
add docling
Browse files Browse the repository at this point in the history
  • Loading branch information
baniasbaabe committed Nov 24, 2024
1 parent c10df78 commit bcbefd3
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 1 deletion.
2 changes: 1 addition & 1 deletion book/cooltools/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2393,7 +2393,7 @@
"source": [
"Transform any Python object into a CLI with `fire`.\n",
"\n",
"`fire` is a neat library for turn your Python object into a CLI and to make the transition between Python and Bash easier."
"`fire` is a neat library for turning your Python object into a CLI and making the transition between Python and Bash easier."
]
},
{
Expand Down
42 changes: 42 additions & 0 deletions book/llm/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,48 @@
"\n",
"embeddings = list(embedding_model.embed(documents))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert Files to Markdown & JSON with `docling`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Preparing your data for LLMs is a crucial step in RAG applications.\n",
"\n",
"`docling` simplifies this step for you by converting popular document formats like PDF or PPT to Markdown or JSON.\n",
"\n",
"It uses two models, layout analyis model and table structure recognition model, to process the files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install docling"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from docling.document_converter import DocumentConverter\n",
"\n",
"source = \"https://arxiv.org/pdf/2408.09869\"\n",
"converter = DocumentConverter()\n",
"result = converter.convert(source)\n",
"print(result.document.export_to_markdown()) \n",
"# Output: \"## Docling Technical Report[...]\""
]
}
],
"metadata": {
Expand Down

0 comments on commit bcbefd3

Please sign in to comment.