-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOS M1 Chip Failed to use #1197
Comments
I am running well on the m4+15.1.1 system. It is speculated that the instability of the PyPI mirror source might have caused some dependencies to fail during installation. I recommend recreating a conda environment and using a mirror source for the installations.
|
Finally, I succeed with this output 🤗! MacOS does not support NVIDIA, so I guess the problem is solved. The problem seems to be caused by my erroneously set environment variable export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries or simply type Thanks again for this wonderful tool. I will close this issue. |
分享个我的错误体会:我用的vscode,没注意右下角python环境选了3.9,一直没注意,所以pip Install pdf[all]或者[all-cpu]安装重视不顺利,然后python3.9不支持detectron2,各种方法都报错,突然看到了3.9, 后来换成3.10,立刻就可以运行了。所以一定要先确认自己的环境是python3.10,很简单 Python --version就可以看到了。 |
I tried to follow the work around form this issue but I cant get it working. To use MinerU for the current project I need it integrated and running on Apple silicon - MPS would be nice. For now I will just stick to marker with llm as it runs mps and run time pr scientific paper is less then 1min on my macbook. I you have a better solution for the script please share it import os
import subprocess
import time
import shutil
from termcolor import cprint
from tqdm import tqdm
import platform
# Constants
PDF_DIR = "pdfs"
OUTPUT_DIR = "markdown_output"
def setup_mac_environment():
"""Setup environment variables for M-series Macs"""
if platform.system() == "Darwin" and platform.machine() == "arm64":
cprint("M-series Mac detected, configuring OpenGL paths...", "cyan")
opengl_path = "/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries"
current_path = os.environ.get("DYLD_LIBRARY_PATH", "")
if opengl_path not in current_path:
os.environ["DYLD_LIBRARY_PATH"] = f"{current_path}:{opengl_path}" if current_path else opengl_path
def convert_pdf_to_markdown(pdf_path, output_dir):
try:
cprint(f"\nStarting conversion of {pdf_path}...", "yellow")
# Get base filename without extension
base_name = os.path.splitext(os.path.basename(pdf_path))[0]
temp_output_dir = os.path.join(output_dir, base_name)
final_output_path = os.path.join(output_dir, f"{base_name}.md")
# Remove existing output
if os.path.exists(temp_output_dir):
shutil.rmtree(temp_output_dir)
if os.path.exists(final_output_path):
os.remove(final_output_path)
# Convert PDF using magic-pdf
cprint("Converting with text mode...", "cyan")
process = subprocess.Popen(
["magic-pdf", "-p", pdf_path, "-o", output_dir, "-m", "txt"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True
)
stdout, stderr = process.communicate()
if process.returncode != 0:
raise Exception(f"Command failed with error: {stderr}")
# Wait a bit for file system
time.sleep(1)
# Check for output in the expected directory structure
txt_dir = os.path.join(temp_output_dir, "txt")
if not os.path.exists(txt_dir):
raise Exception("Output directory structure not found")
# Combine all text files into one markdown file
with open(final_output_path, "w", encoding="utf-8") as outfile:
# Write metadata
outfile.write(f"# {base_name}\n\n")
# Process text files in order
text_files = sorted([f for f in os.listdir(txt_dir) if f.endswith(".txt")])
for text_file in text_files:
with open(os.path.join(txt_dir, text_file), "r", encoding="utf-8") as infile:
outfile.write(infile.read() + "\n\n")
# Clean up temporary directory
shutil.rmtree(temp_output_dir)
final_size = os.path.getsize(final_output_path)
cprint(f"\nSuccessfully converted {pdf_path} to {final_output_path} (Size: {final_size/1024:.2f}KB)", "green")
return True
except Exception as e:
cprint(f"\nError converting {pdf_path}: {str(e)}", "red")
return False
def main():
try:
# Setup environment for M-series Macs
setup_mac_environment()
# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)
# Get list of PDF files
pdf_files = [f for f in os.listdir(PDF_DIR) if f.endswith('.pdf')]
if not pdf_files:
cprint("No PDF files found in the pdfs directory!", "red")
return
cprint(f"Found {len(pdf_files)} PDF files to convert", "cyan")
# Process each PDF
success_count = 0
for i, pdf_file in enumerate(pdf_files, 1):
cprint(f"\nProcessing file {i}/{len(pdf_files)}", "cyan")
pdf_path = os.path.join(PDF_DIR, pdf_file)
if convert_pdf_to_markdown(pdf_path, OUTPUT_DIR):
success_count += 1
cprint(f"\nConversion complete! Successfully converted {success_count}/{len(pdf_files)} files", "green")
except Exception as e:
cprint(f"\nAn error occurred: {str(e)}", "red")
if __name__ == "__main__":
main() |
Description of the bug | 错误描述
Hello Xiaomeng,
Thanks very much for providing this wonderful tool "MinerU" which really helps us a lot. All my teammates acknowledge that it is the most powerful, versatile and easy to use PDF parsing tool.
Unfortunately, I got stuck in some awkward problems when using it on the MacOS M1 chip version Sonoma 14.6.1 (For another Windows platform, we successfully integrate it into our project). The latest version of magic-pdf is 0.10.5 till Dec 5th, 2024. After I install it according to the official tutorial https://github.com/opendatalab/MinerU (of course with a separate conda Python 3.10 and no trouble with pip install and models downloading), I got the following problems when I try to run
magic-pdf -h
:According to the error information, I checked the following folders:
"Libraries Resources _CodeSignature"
Thus, I guess the problem arises from the support of OpenGL with MacOS M1? But I am not sure what the underlying cause is (forgive me that I am not familiar with computer vision and library OpenGL). I would really appreciate it if you could provide some help.
By the way, I also found another solved issue 273 about using MinerU on MacOS Sonoma . Luckily, I succeeded with the version "0.6.2b1":
which gives me exactly the version
0.6.2b1
.However, version
0.6.2b1
seems to be deprecated, and the shell command is incompatible with the latest version0.10.5
. Thus, I really hope to use exactly the latest version0.10.5
for more advanced support.If MinerU does not support the latest version
0.10.5
on MacOS M1, could you give me a more detailed API doc about the usage of the oldmagic-pdf pdf-command
such that I can work similarly with the new commandmagic-pdf -p pdf_path -o output_folder -m auto
. For example:~/magic-pdf.json
to change the model directory or enable table-config like the version0.10.5
, sincemagic-pdf pdf-command
requests us to provide the argument--model PATH
(which model?)magic-pdf pdf-command
Thanks a lot.
How to reproduce the bug | 如何复现
Just follow the official guide on MacOS M1:
And the models have been downloaded successfully.
Operating system | 操作系统
MacOS
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.10.x
Device mode | 设备模式
cpu
The text was updated successfully, but these errors were encountered: