Skip to content

Latest commit

 

History

History
656 lines (427 loc) · 36.7 KB

README.md

File metadata and controls

656 lines (427 loc) · 36.7 KB
English | 简体中文

Surface Defect Detection: Dataset & Papers 📌

GitHub Computer Vision in Action License Open Collective Forks Stars

📈 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance. Important critical papers from year 2017 have been collected and compiled, which can be viewed in the 📂 [Papers] folder. 🐋


Dataset download: Google Drive Google Drive | Baidu Cloud 百度云盘 o7p5

Introduction

At present, surface defect equipment based on machine vision has widely replaced artificial visual inspection in various industrial fields, including 3C, automobiles, home appliances, machinery manufacturing, semiconductors and electronics, chemical, pharmaceutical, aerospace, light industry and other industries. Traditional surface defect detection methods based on machine vision often use conventional image processing algorithms or artificially designed features plus classifiers. Generally speaking, imaging schemes are usually designed by using the different properties of the inspected surface or defects. A reasonable imaging scheme helps to obtain images with uniform illumination and clearly reflect the surface defects of the object. In recent years, many defect detection methods based on deep learning have also been widely used in various industrial scenarios.

Compared with the clear classification, detection and segmentation tasks in computer vision, the requirements for defect detection are very general. In fact, its requirements can be divided into three different levels: "what is the defect" (classification), "where is the defect" (positioning) and "How many defects are" (split).

*** 本项目会持续更新,右上角收藏防丢失 Star ⭐ ~ ***

Star anti-lost

喜欢这个项目吗?请考虑 ❤️ 赞助本项目 以帮助长期维护!

Table of Contents

1. Key Issues in Surface Defect Detection

1)Small Sample Problem

The current deep learning methods are widely used in various computer vision tasks, and surface defect detection is generally regarded as its specific application in the industrial field. In traditional understanding, the reason why deep learning methods cannot be directly applied to surface defect detection is because in a real industrial environment, there are too few industrial defect samples that can be provided.

Compared with the more than 14 million sample data in the ImageNet dataset, the most critical problem faced in surface defect detection is small sample problem. In many real industrial scenarios, there are even only a few or dozens of defective images. In fact, for the small sample problem which is one of the key problems in industrial surface defect detection, there are currently 4 different solutions:

- Data Amplification and Generation

The most commonly used defect image expansion method is to use multiple image processing operations such as mirroring, rotation, translation, distortion, filtering, and contrast adjustment on the original defect samples to obtain more samples. Another more common method is data synthesis, where individual defects are often fused and superimposed on normal (non-defective) samples to form defective samples.

- Network Pre-training and Transfer Learning

Generally speaking, using small samples to train deep learning networks can easily lead to overfitting, so methods based on pre-training networks or transfer learning are currently one of the most commonly used methods for samples.

- Reasonable Network Structure Design

The need for samples can also be greatly reduced by designing a reasonable network structure. Based on the compressed sampling theorem to compress and expand small sample data, we use CNN to directly classify the compressed sampling data features. Compared with the original image input, compressing the input can greatly reduce the network's demand for samples. In addition, the surface defect detection method based on the twin network can also be regarded as a special network design, which can greatly reduce the sample requirement.

- Unsupervised or Semi-supervised Method

In the unsupervised model, only normal samples are used for training, so there is no need for defective samples. The semi-supervised method can use unlabeled samples to solve the network training problem in the case of small samples.

👆 BACK to Table of Contents -->

2)Real-time Problem

The defect detection methods based on deep learning include three main links in industrial applications: data annotation, model training, and model inference. Real-time in actual industrial applications pays more attention to model inference. At present, most defect detection methods are concentrated in the accuracy of classification or recognition, little attention is paid to the efficiency of model inference. There are many methods for accelerating the model, such as model weighting and model pruning. In addition, although the existing deep learning model uses GPU as a general-purpose computing unit(GPGPU), with the development of technology, it is believed that FPGA will become an attractive alternative.

👆 BACK to Table of Contents -->

2. Common Datasets for Industrial Surface Defect Detection

1)Steel Surface: NEU-CLS

NEU-CLS can be used for classification and positioning tasks.

latest access 🔗 - (#16)

The surface defect dataset released by Northeastern University (NEU) collects six typical surface defects of hot-rolled steel strips, namely rolling scale (RS), plaque (Pa), cracking (Cr), pitting surface (PS), inclusions (In) and scratches (Sc). The dataset includes 1,800 grayscale images, six different types of typical surface defects each of which contains 300 samples. For defect detection tasks, the dataset provides annotations that indicate the category and location of the defect in each image. For each defect, the yellow box is the border indicating its location, and the green label is the category score.

Kaggle - Severstal: Steel Defect Detection

Severstal: Steel Defect Detection

Severstal is leading the charge in efficient steel mining and production. They believe the future of metallurgy requires development across the economic, ecological, and social aspects of the industry—and they take corporate responsibility seriously. The company recently created the country’s largest industrial data lake, with petabytes of data that were previously discarded. Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

https://www.kaggle.com/c/severstal-steel-defect-detection


👆 BACK to Table of Contents -->

2)Solar Panels: elpv-dataset

A dataset of functional and defective solar cells extracted from EL images of solar modules.


The dataset contains 2,624 samples of 300x300 pixels 8-bit grayscale images of functional and defective solar cells with varying degree of degradations extracted from 44 different solar modules. The defects in the annotated images are either of intrinsic or extrinsic type and are known to reduce the power efficiency of solar modules.

All images are normalized with respect to size and perspective. Additionally, any distortion induced by the camera lens used to capture the EL images was eliminated prior to solar cell extraction.


👆 BACK to Table of Contents -->

3)Metal Surface: KolektorSDD

The dataset is constructed from images of defected electrical commutators that were provided and annotated by Kolektor Group. Specifically, microscopic fractions or cracks were observed on the surface of the plastic embedding in electrical commutators. The surface area of each commutator was captured in eight non-overlapping images. The images were captured in a controlled environment.


The dataset consists of:

  • 50 physical items (defected electrical commutators)
  • 8 surfaces per item
  • Altogether 399 images:
    -- 52 images of visible defect
    -- 347 images without any defect
  • Original images of sizes:
    -- width: 500 px
    -- height: from 1240 to 1270 px
  • For training and evaluation images should be resized to 512 x 1408 px

For each item the defect is only visible in at least one image, while two items have defects on two images, which means there were 52 images where the defects are visible. The remaining 347 images serve as negative examples with non-defective surfaces.


👆 BACK to Table of Contents -->

4)PCB Inspection: DeepPCB

     
an example of the tested image                                         the corresponding template image

Figure 1. PCB Inspection Dataset.


👆 BACK to Table of Contents -->

5)Fabric Defects Dataset: AITEX

This dataset consists of 245 4096x256 pixel images with seven different fabric structures. There are 140 non-defect images in the dataset, 20 of each type of fabric. In addition, there are 105 images of different types of fabric defects (12 types) common in the textile industry. The image size allows users to use different window sizes, thereby the number of samples can be increased. The online dataset also contains segmentation masks of all defective images, so that white pixels represent defective areas and the remaining pixels are black.


👆 BACK to Table of Contents -->

6)Fabric Defect Dataset (Tianchi)

In the actual production process of cloth, due to the influence of various factors, defects such as stains, holes, lint, etc. will occur. In order to ensure the quality of the product, the cloth needs to be inspected for defects.

Fabric defect inspection is an important part of the textile industry's production and quality management. At present, manual inspection is susceptible to subjective factors and lacks consistency, and inspection personnel working for a long time under strong light has a great impact on vision. Due to the wide variety of fabric defects, various morphological changes, and the difficulty of observation and recognition, the intelligent detection of fabric defects has been a technical bottleneck that has plagued the industry for many years.

This dataset covers all kinds of important defects in fabrics in the textile industry, and each picture contains one or more defects. The data includes two types of plain cloth and patterned cloth. Among them, about 8000 pieces of plain cloth data are used for preliminary matches, and about 12,000 pieces of patterned cloth data are used for semi-finals.


👆 BACK to Table of Contents -->

7)Aluminium Profile Surface Defect Dataset(Tianchi)

Due to the influence of various factors in the actual production process of aluminum profile, the surface of the aluminum profile will have cracks, peeling, scratches and other defects, which will seriously affect the quality of the aluminum profile. To ensure product quality, manual visual inspection is required. However, the surface of the aluminum profile itself contains textures, which are not highly distinguishable from defects.

Traditional manual visual inspection methods have many shortcomings, which are very laborious, cannot accurately judge surface defects in time, and have difficult to control the efficiency of quality inspection. In recent years, deep learning has made rapid progress in image recognition and other fields. Aluminum profile manufacturers are eager to use the latest AI technology to innovate the existing quality inspection process, automatically complete quality inspection tasks, reduce the incidence of missed inspections, and improve product quality. AI technology, especially deep learning, makes aluminum profile product production managers completely free from the inability to fully grasp the state of product surface quality.

In the dataset of the competition, there are 10,000 pieces of monitoring image data from aluminum profiles with defects in actual production, and each image contains one or more defects. The sample image for machine learning will clearly identify the type of defect contained in the image.


👆 BACK to Table of Contents -->

8)Weakly Supervised Learning for Industrial Optical Inspection(DAGM 2007)


Dataset introduction:

  • Mainly aimed at miscellaneous defects on textured backgrounds.

  • Training data with weaker supervision.

  • Contains ten data sets, the first six are training data sets, and the last four are test data sets.

  • Each dataset contains 1000 "non-defective" images and 150 "defective" images saved in grayscale 8-bit PNG format. Each data set is generated by a different texture model and defect model.

  • The background texture of the "No Defect" image shows no defect, and the background texture of the "No Defect" image has exactly one marked defect.

  • All datasets have been randomly divided into training and testing sub-data sets of equal size.

  • Weak labels are represented by ellipses, which roughly indicate the defect area.


👆 BACK to Table of Contents -->

9)Cracks on the Surface of the Construction

CrackForest Dataset is an annotated road crack image database which can reflect urban road surface condition in general.


Figure 2. Cracks on the Bridge(left) and Cracks on the Road Surface.

👆 BACK to Table of Contents -->

10)Magnetic Tile Dataset

Magnetic tile dataset by githuber: abin24, which can be downloaded from https://github.com/Charmve/Surface-Defect-Detection/tree/master/Magnetic-Tile-Defect, which was used in their paper "Surface defect saliency of magnetic tile", the paper can be reach by here or here

dataset

Figure 3. An overview of our dataset.

This is also the datasets of the paper "Saliency of magnetic tile surface defects". The images of 6 common magnetic tile defects were collected, and their pixel level ground-truth were labeled.

👆 BACK to Table of Contents -->

11)RSDDs: Rail Surface Defect Datasets

The RSDDs dataset contains two types of datasets: the first is a type I RSDDs dataset captured from the fast lane, which contains 67 challenging images. The second is a Type II RSDDs dataset captured from a normal/heavy transportation track, which contains 128 challenging images.

Each image of the two data sets contains at least one defect, and the background is complex and noisy.

These defects in the RSDDs dataset have been marked by professional human observers in the field of track surface inspection.



👆 BACK to Table of Contents -->

12)Kylberg Texture Dataset v.1.0

Figure 4. Example patches from each one of the 28 texture classes.

Short description

  • 28 texture classes, see Figure 4.
  • 160 unique texture patches per class. (Alternative dataset with 12 rotations per each original patch, 160*12=1920 texture patches per class).
  • Texture patch size: 576x576 pixels.
  • File format: Lossless compressed 8 bit PNG.
  • All patches are normalized with a mean value of 127 and a standard deviation of 40.
  • One directory per texture class.
  • Files are named as follows: blanket1-d-p011-r180.png, where blanket1 is the class name, d original image sample number (possible values are a, b, c, or d), p011 is patch number 11, r180 patch rotated 180 degrees.

🔗 Offical Link: http://www.cb.uu.se/~gustaf/texture/

👆 BACK to Table of Contents -->

13)KTH-TIPS database

Repeat the background texture data set, the sample picture is as follows


👆 BACK to Table of Contents -->

14)Escalator Step Defect Dataset

🔗 Offical Link:https://aistudio.baidu.com/aistudio/datasetdetail/44820


👆 BACK to Table of Contents -->

15)Transmission Line Insulator Dataset

In the data set, Normal_Insulators contains 600 insulator images captured by drones. Defective_Insulators contains defective insulators, and the number of defective images of insulators is 248. The data set includes data sets and labels.

🔗 Offical Link:https://github.com/InsulatorData/InsulatorDataSet

👆 BACK to Table of Contents -->

16)MVTEC ITODD

The MVTec Industrial 3D Object Detection Dataset (MVTec ITODD) is a public dataset for 3D object detection and pose estimation with a strong focus on industrial settings and applications.

The dataset consists of

  • 28 objects and 3500 labeled scenes containing instances of these objects
  • Five sensors (two 3D sensors and three grayscale cameras) observing each scene

More information can be found in this PDF file 🔍.

🔗 Download link https://www.mvtec.com/company/research/datasets/mvtec-itodd

👆 BACK to Table of Contents -->

17)BSData - dataset for Instance Segmentation and industrial Wear Forecasting

The dataset contains 1104 channel 3 images with 394 image-annotations for the surface damage type “pitting”. The annotations made with the annotation tool labelme, are available in JSON format and hence convertible to VOC and COCO format. All images come from two BSD types.

The other BSD type is shown on 325 images with two image-sizes. Since all images of this type have been taken with continuous time the degree of soiling is evolving.

Also, the dataset contains as above mentioned 27 pitting development sequences with every 69 images.

Figure 5. On the left image-examples, on the right associated PNG-Annotations.

🔗 Offical link https://github.com/2Obe/BSData

Sincerely, thank @Beñat Gartzia for his recommendation and all your attention!

👆 BACK to Table of Contents -->

18)The Gear Inspection Dataset

The Gear Inspection Dataset (GID) is a dataset for a competition held by Baidu (China) Co., called the "National Artificial Intelligence Innovation Application Competition." It has two thousand grayscale images with 28575 annotations for three types of defects from a real-world source. Each picture includes defects described in a separate JSON file with the image name, label categories, bounding boxes, and polygons for segmentation. Nevertheless, the tags for labeling categories do not include specific information about their type but only numbers, so spotting their similarities with other related datasets is challenging.

Figure 6. Examples of validation test images and their labels.

🔗 Offical link http://www.aiinnovation.com.cn/#/dataDetail?id=34

Note: The contest dataset is not for commercial use.

👆 BACK to Table of Contents -->

19)AeBAD Aircraft Engine Blade Anomaly Detection

Download link: http://suo.nz/2IU48P

The real-world aero-engine blade anomaly detection (AeBAD) data set consists of two sub-data sets: the single blade data set (AeBAD-S) and the blade video anomaly detection data set (AeBAD-V). Compared with existing datasets, AeBAD has the following two characteristics: 1.) The target samples are not aligned and at different scales. 2.) There is a domain shift in the distribution of normal samples in the test set and training set, where the domain shift is mainly caused by changes in illumination and view.

image

👆 BACK to Table of Contents -->

20)BeanTech Anomaly Detection(BTAD)

Download Link:http://suo.nz/2JEGEi

The BTAD (BeanTech Anomaly Detection) dataset is a real-world industrial anomaly dataset. This dataset contains a total of 2830 real-world images of 3 industrial products.

image

👆 BACK to Table of Contents -->


3. More Inventory of the Best Data Set Sources

I have been collecting surface defect detection data sets, but there are still many data sets that have not been collected. For the data sets not collected in this repo, you can go to the following sites to view. At the same time, everyone is very welcome to share the new data set and become the contributor of this repo.

Contributions welcome

source url Recommended
Kaggle https://www.kaggle.com/datasets ⭐⭐⭐⭐⭐
Paper With Code https://paperwithcode.com/sota ⭐⭐⭐⭐⭐
Registry of Open Data on AWS https://registry.opendata.aws ⭐⭐⭐
Microsoft Research Open Data https://msropendata.com ⭐⭐⭐
Awesome-public-datasets https://github.com/awesomedata/awesome-public-datasets ⭐⭐

👆 BACK to Table of Contents -->


4. Surface Defect Detection Papers

I have collected some articles on surface defect detection. The main objects to be tested are: defects or abnormal objects such as metal surfaces, LCD screens, buildings, and power lines. The methods are mainly classified method, detection method, reconstruction method and generation method. The electronic version (PDF) of the paper is placed under the file named corresponding to the date in the 'Paper' folder.

Go to 📂 [Papers].


👆 BACK to Table of Contents -->

Acknowledgements

You can see this repo now, we should be grateful to the people who originally open sourced the above data set. They have brought great help to our study and research work. The idea of collecting this data set originally came from reading an article on surface defect detection by SFXiang of "AI算法修炼营(AI_SuanFa)", which prompted me to organize a more comprehensive data set. The collection of papers comes from a CSDNer named "庆志的小徒弟". These papers are only until November 19, and I will continue to be improved after that. Hopefully, feel free to CONTRIBUTE.

Finally, I want to thank the open source contributors of the above data set again.

👆 BACK to Table of Contents -->

Download

👆 BACK to Table of Contents -->

Notification

This work is originally contributed by lots of great man for their paper work or industry application. You can only use this dataset for research purpose.

If you have any questions or idea, please let me know 📧 [email protected]

🍮 Community

  • Github discussions 💬 or issues 💭

  • QQ Group: 734758251 (password:哈哈哈)

  • WeChat Group ID: Yida_Zhang2

  • Email: yidazhang1#gmail.com


Go for it!

   Supporting

     Support this project by becoming a sponsor. Your name and/or logo will show up our homepage with a link to your website. 🙏

Sponse this project

Citation

Use this bibtex to cite this repository:

@misc{Surface Defect Detection,
  title={Surface Defect Detection: Dataset and Papers},
  author={Charmve},
  year={2020.09},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/Charmve/Surface-Defect-Detection}},
}

Stargazers over time

Stargazers over time


Feel free to ask any questions, open a PR if you feel something can be done differently!

🌟Star this repository🌟

Created by Charmve & maiwei.ai Community | Deployed on Kaggle


* Update on Sep 17, 2021 @Charmve, Star and Fork