The project was initially developed with a default environment of Linux servers, so running it directly on a Windows machine can be challenging. After encountering some issues, we have compiled a list of problems that might arise on Windows and documented them in this guide. Since the Windows environment is highly fragmented, not all solutions provided here may apply to your specific setup. If you have any questions, please raise them in an issue.
- CPU Environment Click here for CPU usage
- GPU Environment Click here if you need CUDA acceleration
To run the project smoothly on Windows, perform the following preparations:
- Install ImageMagick:
- Modify configurations:
- PDF-Extract-Kit/pdf_extract.py:L148 Adjust
batch_size
to suit your GPU memory. Specifically, try to lowerbatch_size
when you encounter an error of OOM(out of memory).dataloader = DataLoader(dataset, batch_size=64, num_workers=0)
- PDF-Extract-Kit/pdf_extract.py:L148 Adjust
Use either venv or conda, with Python version recommended as 3.10.
pip install -r requirements+cpu.txt
# For detectron2, compile it yourself as per https://github.com/facebookresearch/detectron2/issues/5114
# Or use our precompiled wheel
pip install https://github.com/opendatalab/PDF-Extract-Kit/raw/main/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl
PDF-Extract-Kit/configs/model_configs.yaml:L2
device: cpu
PDF-Extract-Kit/modules/layoutlmv3/layoutlmv3_base_inference.yaml:L72
DEVICE: cpu
python pdf_extract.py --pdf assets/examples/example.pdf
- Recommended: CUDA 11.8 and cuDNN 8.7.0 (test other versions if needed)
- CUDA 11.8 https://developer.nvidia.com/cuda-11-8-0-download-archive
- cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x https://developer.nvidia.com/rdp/cudnn-archive
- Ensure your GPU has adequate memory, with a minimum of 8GB recommended; ideally, 16GB or more is preferred.
- If the GPU memory is less than 16GB, adjust the
batch_size
in the Preprocessing section as needed, lowering it to "64" or "32" appropriately.
- If the GPU memory is less than 16GB, adjust the
Use either venv or conda, with Python version recommended as 3.10.
pip install -r requirements+cpu.txt
# For detectron2, compile it yourself as per https://github.com/facebookresearch/detectron2/issues/5114
# Or use our precompiled wheel
pip install https://github.com/opendatalab/PDF-Extract-Kit/blob/main/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl
# For GPU usage, ensure PyTorch is installed with CUDA support.
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
PDF-Extract-Kit/configs/model_configs.yaml:L2
device: cuda
PDF-Extract-Kit/modules/layoutlmv3/layoutlmv3_base_inference.yaml:L72
DEVICE: cuda
python pdf_extract.py --pdf assets/examples/example.pdf
When you confirm that your VRAM is 16GB or more, you can install paddlepaddle-gpu using the following command, which will automatically enable OCR acceleration after installation:
pip install paddlepaddle-gpu==2.6.1