Skip to content

Commit

Permalink
Fix dependency conflicts for docker building (#501)
Browse files Browse the repository at this point in the history
* * try to fix the scipy installation error

* - remove numpy<2

* + add numpy<2
* remove pyarrow version limit

* * update dep versions

* * specify version for some deps

* - remove version limit for pandas and numpy

* * reorganize installation order

* - remove version limits

* * update readme
  • Loading branch information
HYLcool authored Nov 28, 2024
1 parent 4f0f16c commit 6766316
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 13 deletions.
10 changes: 3 additions & 7 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ RUN apt-get update \

# install 3rd-party system dependencies
RUN apt-get update \
&& apt-get install ffmpeg libsm6 libxext6 software-properties-common build-essential cmake -y
&& apt-get install ffmpeg libsm6 libxext6 software-properties-common build-essential cmake gfortran libopenblas-dev liblapack-dev -y

# prepare the java env
WORKDIR /opt
Expand All @@ -33,11 +33,7 @@ WORKDIR /data-juicer
RUN pip install --upgrade setuptools==69.5.1 setuptools_scm \
&& pip install git+https://github.com/xinyu1205/recognize-anything.git --default-timeout 1000

# install requirements first to better reuse installed library cache
COPY environments/ environments/
RUN cat environments/* | grep -v '^#' | xargs pip install --default-timeout 1000

# install data-juicer then
COPY . .
RUN pip install -v -e .[all]
RUN pip install -v -e .[sandbox]
RUN pip install -v -e .[all] --default-timeout 1000
RUN pip install -v -e .[sandbox] --default-timeout 1000
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Table of Contents

## Prerequisites

- Recommend Python>=3.8,<=3.10
- Recommend Python>=3.9,<=3.10
- gcc >= 5 (at least C++14 support)

## Installation
Expand Down Expand Up @@ -386,6 +386,10 @@ python tools/sandbox_starter.py --config configs/demo/sandbox/sandbox.yaml
```shell
# run the data processing directly
docker run --rm \ # remove container after the processing
--privileged \
--shm-size 256g \
--network host \
--gpus all \
--name dj \ # name of the container
-v <host_data_path>:<image_data_path> \ # mount data or config directory into the container
-v ~/.cache/:/root/.cache/ \ # mount the cache directory into the container to reuse caches and models (recommended)
Expand All @@ -398,6 +402,10 @@ docker run --rm \ # remove container after the processing
```shell
# start the container
docker run -dit \ # run the container in the background
--privileged \
--shm-size 256g \
--network host \
--gpus all \
--rm \
--name dj \
-v <host_data_path>:<image_data_path> \
Expand Down
10 changes: 9 additions & 1 deletion README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ Data-Juicer正在积极更新和维护中,我们将定期强化和新增更多

## 前置条件

* 推荐 Python>=3.8,<=3.10
* 推荐 Python>=3.9,<=3.10
* gcc >= 5 (at least C++14 support)

## 安装
Expand Down Expand Up @@ -363,6 +363,10 @@ python tools/sandbox_starter.py --config configs/demo/sandbox/sandbox.yaml
```shell
# 直接运行数据处理
docker run --rm \ # 在处理结束后将容器移除
--privileged \
--shm-size 256g \
--network host \
--gpus all \
--name dj \ # 容器名称
-v <host_data_path>:<image_data_path> \ # 将本地的数据或者配置目录挂载到容器中
-v ~/.cache/:/root/.cache/ \ # 将 cache 目录挂载到容器以复用 cache 和模型资源(推荐)
Expand All @@ -375,6 +379,10 @@ docker run --rm \ # 在处理结束后将容器移除
```shell
# 启动容器
docker run -dit \ # 在后台启动容器
--privileged \
--shm-size 256g \
--network host \
--gpus all \
--rm \
--name dj \
-v <host_data_path>:<image_data_path> \
Expand Down
7 changes: 3 additions & 4 deletions environments/minimal_requires.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
fsspec==2023.5.0
pyarrow<=12.0.0
pandas==2.0.3
datasets>=2.19.0
fsspec==2023.5.0
pandas
numpy
av
soundfile
librosa>=0.10
Expand All @@ -27,6 +27,5 @@ dill==0.3.4
psutil
pydantic>=2.0
Pillow
numpy<2
fastapi[standard]>=0.100
httpx

0 comments on commit 6766316

Please sign in to comment.