Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docling and deepsearch-glm #28093

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
043e488
wip add docling and deepsearch-glm
hadim Nov 3, 2024
63359b4
use fasttext-for-deepsearch-glm instead of fasttext
hadim Nov 4, 2024
074605f
Merge branch 'main' into docling-package
hadim Nov 4, 2024
f018e80
noarch new syntax
hadim Nov 5, 2024
2995845
Merge branch 'main' into docling-package
hadim Nov 5, 2024
8facd51
fix syntax
hadim Nov 5, 2024
2ffbcb8
Merge branch 'main' into docling-package
hadim Nov 5, 2024
0cd6d8b
python_min fix
hadim Nov 5, 2024
255c058
Merge branch 'docling-package' of https://github.com/hadim/staged-rec…
hadim Nov 5, 2024
1d9c214
Merge branch 'main' into docling-package
hadim Nov 6, 2024
96ee6ea
Merge branch 'main' into docling-package
hadim Nov 7, 2024
abf05c2
fix builds
hadim Nov 7, 2024
f94d0a6
py310 for docling as min version
hadim Nov 7, 2024
6db2f18
Merge branch 'main' into docling-package
hadim Nov 11, 2024
a08d2ad
python_min
hadim Nov 11, 2024
1d2c503
disable windows for deepsearch-glm
hadim Nov 11, 2024
c1e19a3
python_min
hadim Nov 11, 2024
8fa3f9a
Merge branch 'main' into docling-package
hadim Nov 11, 2024
e6232e4
python_min=3.10 for docling
hadim Nov 11, 2024
8f8c270
Merge branch 'main' into docling-package
hadim Nov 11, 2024
bbf4944
Update recipe.yaml
hadim Nov 12, 2024
8e068f4
Merge branch 'main' into docling-package
hadim Nov 12, 2024
2a137b3
Merge branch 'main' into docling-package
hadim Nov 14, 2024
c9651ef
Merge branch 'main' into docling-package
hadim Nov 16, 2024
07b9ced
Merge branch 'main' into docling-package
hadim Nov 18, 2024
a3d4d5c
Merge branch 'main' into docling-package
hadim Nov 18, 2024
7a2732e
Merge branch 'main' into docling-package
hadim Nov 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions recipes/deepsearch-glm/fix-utfcpp.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
diff --git a/cmake/extlib_utf8.git.cmake b/cmake/extlib_utf8.git.cmake
index f35e1e9..26c4d9c 100644
--- a/cmake/extlib_utf8.git.cmake
+++ b/cmake/extlib_utf8.git.cmake
@@ -4,9 +4,9 @@ message(STATUS "entering in extlib_utf8.cmake")
set(ext_name "utf8")

if(USE_SYSTEM_DEPS)
- find_package(utf8cpp REQUIRED)
+ # find_package(utf8cpp REQUIRED)
add_library(${ext_name} INTERFACE IMPORTED)
- add_dependencies(${ext_name} utf8cpp)
+ add_dependencies(${ext_name} utfcpp)

else()

78 changes: 78 additions & 0 deletions recipes/deepsearch-glm/recipe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
context:
name: deepsearch-glm
version: 0.26.1

package:
name: ${{ name|lower }}
version: ${{ version }}

source:
url: https://pypi.org/packages/source/${{ name[0] }}/${{ name }}/deepsearch_glm-${{ version }}.tar.gz
sha256: c2938e99c4f9f48a8686d3c357778645ec76a78781c89d955720ef78502da830
patches:
- fix-utfcpp.patch

build:
number: 0
skip: win
script:
content: python -m pip install . -vv --no-deps --no-build-isolation
env:
USE_SYSTEM_DEPS: "on"

requirements:
build:
- if: build_platform != target_platform
then:
- python
- cross-python_${{ target_platform }}

- ${{ compiler('cxx') }}
- ${{ compiler('c') }}
- ${{ stdlib("c") }}
- cmake
- ${{ "make" if unix else "ninja" }}
host:
- python
- poetry-core
- pybind11 >=2.13.1
- pip
- fmt
- cxxopts
- nlohmann_json
- loguru-cpp
- utfcpp
- fasttext-for-deepsearch-glm
- json_schema_validator
- pcre2
- sentencepiece
- pkg-config
- zlib
hadim marked this conversation as resolved.
Show resolved Hide resolved
run:
- python
- docling-core >=2.0
- tabulate >=0.8.9
- numpy
- pandas
- python-dotenv >=1.0.0
- tqdm >=4.64.0
- rich >=13.7.0
- docutils !=0.21
- requests
- ${{ "pywin32 >=305" if win }}

tests:
- python:
imports:
- deepsearch_glm
pip_check: false
hadim marked this conversation as resolved.
Show resolved Hide resolved

about:
summary: Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.
license: MIT
license_file: LICENSE
homepage: https://github.com/DS4SD/deepsearch-glm/

extra:
recipe-maintainers:
- hadim
69 changes: 69 additions & 0 deletions recipes/docling/recipe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
context:
name: docling
version: 2.3.1
python_min: "3.10"

package:
name: ${{ name|lower }}
version: ${{ version }}

source:
url: https://pypi.org/packages/source/${{ name[0] }}/${{ name }}/docling-${{ version }}.tar.gz
sha256: f68a0f8a97e9f566b4a9140d854886577135e76ccfae2e899c318e57367ab12a

build:
number: 0
noarch: python
script: python -m pip install . -vv --no-deps --no-build-isolation
python:
entry_points:
- docling = docling.cli.main:app

requirements:
host:
- python ${{ python_min }}
- poetry-core
- pip
run:
- python >=${{ python_min }}
- pydantic >=2.0.0
- docling-core >=2.3.0
- docling-ibm-models >=2.0.3
- deepsearch-glm >=0.26.1
- filetype >=1.2.0
- pypdfium2 >=4.30.0
- pydantic-settings >=2.3.0
- huggingface_hub >=0.23
- requests >=2.32.3
- easyocr >=1.7.0
- docling-parse >=2.0.2
- certifi >=2024.7.4
- rtree >=1.3.0
- scipy >=1.14.1
- pyarrow >=16.1.0
- typer >=0.12.5
- python-docx >=1.1.2
- python-pptx >=1.0.2
- beautifulsoup4 >=4.12.3
- pandas >=2.1.4
- marko >=2.1.2
run_constraints:
- tesserocr >=2.7.1

tests:
- python:
imports:
- docling
pip_check: false
hadim marked this conversation as resolved.
Show resolved Hide resolved
- script:
- docling --help

about:
summary: Docling PDF conversion package
license: MIT
license_file: LICENSE
homepage: https://github.com/DS4SD/docling

extra:
recipe-maintainers:
- hadim