Skip to content

Latest commit

 

History

History
170 lines (141 loc) · 5.08 KB

File metadata and controls

170 lines (141 loc) · 5.08 KB

image_downloader_multiprocessing_python

Here we will use multiprocessing to download images in batch with python.

This saved me a lot of time while downloading images.

Installation

Clone the repository to your machine.

git clone https://github.com/nOOBIE-nOOBIE/image_downloader_multiprocessing_python

Install the requirements

pip install -r requirements.txt

Usage

python3 image_downloader.py <filename_with_urls_seperated_by_newline.txt> <num_of_process>

This will read all the urls in the text file and download them into a folder with name same as the filename. num_of_process is optional.(by default it uses 10 process)

Makefile

╰─ make help
image_downloader_aio           download images with asynchronous version
image_downloader_mp            download images with multiprocessing version
nodejs_install                 install nodejs packages
nodejs_clean                   remove node_modules
nodejs_image_downloader        download images with node-js version
clean                          remove all venv, build, coverage and Python artifacts
img-export-dir                 create images export directory
clean-img                      remove images files
clean-pyc                      remove Python file artifacts (*.pyc,*.pyo,*~,__pycache__)

Example

python3 image_downloader.py cats.txt

cat images downloading

cat image downloading

Benchmark

1183 images in 121.99 seconds with 10 process.

Asynchrone versus Multiprocessing

╰─ /usr/bin/time -v make image_downloader_mp
find cats -name '*.jpg' -exec rm -f {} +
Nb url images: 1183
MESSAGE: Running 10 process
Downloading: https://cdn.pixabay.com/photo/2017/06/12/19/02/cat-2396473__480.jpg
[...]
Download complete: https://cdn.pixabay.com/photo/2015/07/13/21/54/gray-cat-843916__480.jpg
Command being timed: "make image_downloader_mp"
        User time (seconds): 39.94
        System time (seconds): 1.81
        Percent of CPU this job got: 113%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:36.74
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 72540
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 31988
        Voluntary context switches: 55631
        Involuntary context switches: 2240
        Swaps: 0
        File system inputs: 0
        File system outputs: 132160
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
╰─ /usr/bin/time -v make image_downloader_aio
find cats -name '*.jpg' -exec rm -f {} +
Nb url images: 1183
Downloading: https://cdn.pixabay.com/photo/2017/06/12/19/02/cat-2396473__480.jpg
[...]
Download complete: https://cdn.pixabay.com/photo/2014/10/29/22/12/cat-508665__480.jpg
Command being timed: "make image_downloader_aio"
        Command being timed: "make image_downloader_aio"
        User time (seconds): 6.26
        System time (seconds): 1.43
        Percent of CPU this job got: 54%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.24
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 60476
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 21267
        Voluntary context switches: 38882
        Involuntary context switches: 185
        Swaps: 0
        File system inputs: 0
        File system outputs: 132632
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
╰─ /usr/bin/time -v make nodejs_image_downloader
find cats -name '*.jpg' -exec rm -f {} +
0
1
2
3
[...]
52
51
50
End.
        Command being timed: "make nodejs_image_downloader"
        User time (seconds): 6.77
        System time (seconds): 2.02
        Percent of CPU this job got: 54%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.19
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 95616
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 23830
        Voluntary context switches: 35553
        Involuntary context switches: 192
        Swaps: 0
        File system inputs: 0
        File system outputs: 132936
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Images folder sample

cats