The PlantVillage dataset consists of healthy and infected plant leaves of 14 different species. There are 38 classes, of which 12 correspond to healthy plants and 26 to infected plants.
The names of the species are:
Apple, Blueberry, Cherry, Corn, Grape, Orange, Peach, Bell Pepper, Potato, Raspberry, Soybean, Squash, Strawberry, and Tomato.
The names of the 38 classes are:
- Apple Scab
- Apple Black Rot
- Apple Cedar Rust
- Apple healthy
- Blueberry healthy
- Cherry healthy
- Cherry Powdery Mildew
- Corn Gray Leaf Spot
- Corn Common Rust
- Corn healthy
- Corn Northern Leaf Blight
- Grape Black Rot
- Grape Black Measles
- Grape Leaf Blight
- Grape healthy
- Orange Huanglongbing
- Peach Bacterial Spot
- Peach healthy
- Bell Pepper Bacterial Spot
- Bell Pepper healthy
- Potato Early Blight
- Potato healthy
- Potato Late Blight
- Raspberry healthy
- Soybean healthy
- Squash Powdery Mildew
- Strawberry Healthy
- Strawberry Leaf Scorch
- Tomato Bacterial Spot
- Tomato Early Blight
- Tomato Late Blight
- Tomato Leaf Mold
- Tomato Septoria Leaf Spot
- Tomato Two Spotted Spider Mite
- Tomato Target Spot
- Tomato Mosaic Virus
- Tomato Yellow Leaf Curl Virus
- Tomato healthy
The background class is omitted. For detailed information about classes, please refer to An open access repository of images on plant health to enable the development of mobile disease diagnostics [1].
The dataset we used to derive our versions of the data can be downloaded from link [2] available under license CC0 1.0. The total number of images is 54305. A sample image from each class of PlantVillage-full (256x256) is shown as:
The original dataset does not contain separate train
and test
splits. We make our versions available under the same license, i.e., CC0 1.0. 5%
from each class is reserved for the test
split and the remainder is used for the training
split.
. The total number of training images is 43456 and the number of test images is 10849. These can be downloaded from
plant-full.zip.
The histogram showing the frequency of training (in blue) and test (in orange) images is shown as:
We have downsampled the original plant leaves images from 256x256
to 32x32
, 64x64
and 96x96
respectively.
All three downsampled variants are used in our papers [3,4].
They can be downloaded
from releases, and can be loaded using numpy
as:
import numpy as np
file_path = '' # path to plant32.npz or plant64.npz or plant96.npz
npzfile = np.load(file_path)
train_images, train_labels, test_images, test_labels = npzfile['train_images'], npzfile['train_labels'], npzfile[
'test_images'], npzfile['test_labels']
Alternatively, downsampled versions can be generated by using the provided script downsample_script.py
and an appropriate command line argument, for example, --name plant32
for 32x32, --name plant64
and --name plant96
for 96x96,
by applying the script to the original downloaded file plant-full.zip.
and setting data_dir
to the extracted folder containing the train/
and test
directories.
python downsample_script.py --name plant32 --data_dir data/
Note : For running downsample_script.py
file, absl_py
and tensorflow
packages are required.
A sample image from each class of PlantVillage (32x32) is shown as:
A sample image from each class of PlantVillage (64x64) is shown as:
A sample image from each class of PlantVillage (96x96) is shown as:
The dataset we use to derive our data from is provided by [1], kindly cite the paper as:
@article{geetharamani2019identification,
title={Identification of plant leaf diseases using a nine-layer deep convolutional neural network},
author={Geetharamani, G and Pandian, Arun},
journal={Computers \& Electrical Engineering},
volume={76},
pages={323--338},
year={2019},
publisher={Elsevier}
}
if you use downsampled variants or train
and test
splits provided in this repository, kindly cite our paper [4]:
@inproceedings{sahito2022better,
title={Better self-training for image classification through self-supervision},
author={Sahito, Attaullah and Frank, Eibe and Pfahringer, Bernhard},
booktitle={Australasian Joint Conference on Artificial Intelligence},
pages={645--657},
year={2022},
organization={Springer}
}
- David. P. Hughes, Marcel Salathe (2016), An open access repository of images on plant health to enable the development of mobile disease diagnostics, arxiv:1511.08060
- J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), “Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network”, Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1
- Sahito A., Frank E., Pfahringer B. (2020) Transfer of Pretrained Model Weights Substantially Improves Semi-supervised Image Classification. In: Gallagher M., Moustafa N., Lakshika E. (eds) AI 2020: Advances in Artificial Intelligence. AI 2020. Lecture Notes in Computer Science, vol 12576. Springer, Cham. DOI:978-3-030-64984-5_34
- Sahito A., Frank E., Pfahringer B. (2021) Better Self-training for Image Classification through Self-supervision. arXiv:2109.00778