图片本地化的一种解决方案 #13

jingyig01 · 2021-04-20T03:19:26Z

Hi,
我使用你的代码，成功保存了一批珍贵资料。感谢你所做的工作。
关于“收集网页出现的所有图片并保存至本地，把所有图片内嵌至html”，我的解决思路是这样的：
先使用tiebaImageGet将帖子图片下载到本地文件夹(名称为帖子PID)，然后修改html文件中的image src. 这种方式下载的图片为贴吧缩略图，避免了浏览器同时加载原图导致内存占用过大的问题。

python代码如下：

def modify_src(folder_path, file_name):
    file_path = folder_path + '//' + file_name

    soup = BeautifulSoup(open(file_path, encoding = "utf-8"), "html.parser")
    url = [elm.get_text() for elm in soup.find_all("a", href=re.compile(r"^https://tieba.baidu.com/p/"))]
    
    # Some links are http
    if len(url) == 0:
        url_new = [elm.get_text() for elm in soup.find_all("a", href=re.compile(r"^http://tieba.baidu.com/p/"))]
        pid = url_new[0][-10:]
    else:
        # get pid
        pid = url[0][-10:]

    # modify image src
    # unmodified src: https://imgsa.baidu.com/forum/w%3D580/sign=4d3033fbbdde9c82a665f9875c8080d2/4417d558ccbf6c815f62fb2ab23eb13532fa4035.jpg
    # modified: ./img/6233150605/09d6a94bd11373f0a6c6bb5daa0f4bfbf9ed0488.jpg
    # pattern: ./img/pid/img_name
    # img_name: img["src"][-44:]
    # unmodified emoticon src :https://gsp0.baidu.com/5aAHeD3nKhI2p27j8IqW0jdnxx1xbK/tb/editor/images/client/image_emoticon72.png
    # modified: ../emoticon/image_emoticon72.png
    for img in soup.findAll('img',{"src":True}):
        if img["src"].endswith(".jpg"):
            modified = './img/' + pid + '/' + img['src'][-44:]
            img['src'] = modified
        if img['src'].endswith('.png'):
            splited = img['src'].split('/')
            emoticon_name = splited[-1]
            emoti_modified = '../tieba_emoticon/' + emoticon_name
            img['src'] = emoti_modified

    with open(file_path, "w", encoding = "utf-8") as file:
        file.write(str(soup))

所用到的emoticon文件：tieba_emoticon.zip

祝好，
Jingyi

The text was updated successfully, but these errors were encountered:

hjhee · 2022-09-11T11:16:43Z

提供的方法很清晰, 待有合适的机会研究一下. 现在直接按照原URL的结构创建目录了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

图片本地化的一种解决方案 #13

图片本地化的一种解决方案 #13

jingyig01 commented Apr 20, 2021 •

edited

Loading

hjhee commented Sep 11, 2022

图片本地化的一种解决方案 #13

图片本地化的一种解决方案 #13

Comments

jingyig01 commented Apr 20, 2021 • edited Loading

hjhee commented Sep 11, 2022

jingyig01 commented Apr 20, 2021 •

edited

Loading