11篇论文的笔记,总字数 3062 + 6153 + 1927 + 3092 + 5481 + 4203 + 2868 + 5381 + 2884 + 1975 + 2899 +3871 = 43796
我个人认为,需要到能给其他人讲论文的程度,这里的Notes可以满足您80~90%的需求
若有前后冲突,请以后为准。时间仓促,可能会有一些小瑕疵,欢迎批评和交流!
如果有用,还请给个Star~
本文列出的论文是Detection方向入门必读的论文。
写作这些Notes的初衷,是希望能找到一个相对高效和稳定的方法,来对一个方向建立起基本的知识体系。也希望这个方法能推广应用到我对于其他的方向的学习上。
从零基础入门Detection,到现在完成这些工作,大概花了3个月的时间。
一方面,我觉得达到这样的程度,耗费的时间还是有点长的,不过这算是人生的第一次,也情有可原。吸取经验和教训就好了!
另一方面,经过对这些论文的精读,我能明显感觉到自己知识的深度和广度有了很大的提升,成长还是很明显的(当然还是很菜)。
总结一下,我认为自己在这个过程中有这些收货:
-
从结果上看,我基本上算是建立了对Detection方向的基本理解和感知,在Computer Vision上,算是扣好了第一颗扣子。
-
从方法上看,今后我要面对一个全新的方向,我也有了方法论,知道该如何下手。
-
对于问题的细致思考和辩证思考、钻研,对于思维的提升有很大的帮助。要多问自己为什么。
现在如果让我去讲这些论文,我相信自己能够讲得明白!
当然,也存在一些不足:
-
在之后读论文的时候,应该对照着代码一起来。
这样,对于论文中一些难以理解的点,可以直接去看代码实现,很多疑难问题就可以迎刃而解。
而且,这对于代码能力的提高也有好处。
-
从具体的流程上,也有必须要优化的地方。
-
首先,入门一开始的时候,没有必要去一篇一篇地去做Overview。
要了解整个方向,直接上综述就好了。
所谓的第一轮可以大幅度地简化。
-
其次,读论文的时候一定尽可能带上代码。
所谓的后两轮也可以合并到一起。
这样,原本的 “Overview + Paper + Code” 就可以变为 “1~2 Survey + Paper & Code”
-
因GitHub对公式的支持不好,所以此处只是提供源文件
若您需要阅读,建议下载本仓库,以获得最好的阅读体验(如果不想下载可以点击链接到CSDN,但是格式会有点不舒服)
在您下载源文件到本地后,可以通过 [Detection Paper List.md](./Detection Paper List.md) 的链接快速打开对应的Notes源文件
在具体使用上,我推荐您:
- 将博客作为参考,以博客的行文思路去理解论文(博客的行文思路来自于Andrew Ng关于 “Reading Research Paper” 的课程)。
- 对于一些不明白的地方,去找博客的相应位置。
-
Topic
论文的题目
-
Motivation
包含论文要解决的问题、论文的Idea等
-
Technique
论文的模型、顶层的技术等
-
Key Element
一些底层的技术、关键概念、细节等
-
Math
所有的数学内容,包括数学表示、损失函数等
-
Use Yourself
对于论文一些要点的个人理解、一些在今后可能有用的idea等
-
Relativity
相关的论文、博客等
Topic | Motivation | Technique | Key Element | Use Yourself / Math |
---|---|---|---|---|
GoogLeNet | sparse structure & dense matrix |
Inception | Inception path 1x1 conv |
Architecture data augmentation model ensemble Test Time Augmentation Inception |
ResNet | problems to slove identity mapping |
residual learning residual block ResNet Architecture |
shortcut connection feature concatenate |
ResNet Architecture residual block shortcut connection ResNet V2 |
DenseNet | Problem to Solve Modifications |
DenseNet Architecture Advantages |
Dense Block Transition Layers Growth Rate Bottleneck Structure |
Bottleneck Structure Feature Reuse Transition Layers |
Faster RCNN | Problem to Solve Proposal with CNN |
Architecture Region Proposal Network RoI Pooling Anchor |
anchor/proposals/bbox Pyramids Positive & Negative Label Sampling Strategy 4-Step Alternating Training |
Architecture Region Proposal Network RoI Pooling Anchor |
YOLO v1 | Problem to Solve Detection as Regression |
YOLO v1 Architecture | Grid Partition Model Output “Responsible” NMS Getting Detections Error Type Analysis Data Augmentation |
Classification & Detection Explicit & Implicit Grid Handle Small Object Multi-Task Output Weighted Loss NMS Data Augmentation |
SSD | Problem to Solve Contributions |
SSD Architecture Pros & Cons |
Higher Speed What Is Resample Low & High Level Feature Small Object Difficulties Data Flow Anchor & GtBox Matching Foreground & Background Hard Negative Mining NMS Data Augmentation Testing (Inferencing) Step Performance Analysis Model Analysis |
Convolution For Speed Feature Pyramids Fashion Positive & Negative Imbalance Zoom In & Zoom Out |
RetinaNet | Problem to Solve Negative ==> Easy |
Focal Loss RetinaNet |
Class Imbalance Feature Pyramid Network ResNet-FPN Backbone Classify & Regress FCN Post Processing Anchor Design Anchor & GT Matching prior Ablation Experiments |
Data Analysis Feature Pyramid Network More Scale $\not= $ Better |
CornerNet | Problem to Solve KeyPoints (Anchor-Free) |
CornerNet Stacked Hourglass Network Prediction Module Corner Pooling |
Why CornerNet Better? Why Corner Pooling Works Grouping Corners Getting Bounding Box Data Augmentation Ablation Experiments |
Loss Function Corner Pooling Math |
CenterNet (triplets) |
Problem to Solve Idea Intuition |
CenterNet Architecture Center Pooling Cascade Corner Pooling Central Region Exploration |
Baseline:CornerNet Generating BBox Training Inferencing Ablation Experiment Error Analysis Metric AP & AR & FD Small & Medium & Large |
Central Region Loss Function |
CenterNet (Object as Points) |
Problem to Solve Idea |
CenterNet Architecture | Center Point & Anchor Getting Ground-Truth Model Output Data Augmentation Inference TTA Compared with SOTA Additional Experiments |
Loss Function KeyPoint Loss $\text{L}k$ Offset Loss $\text{L}{off}$ Size Loss |
[FCOS](./[paper reading] FCOS.md) | Idea Contribution |
FCOS Architecture Center-ness Multi-Level FPN Prediction |
Prediction Head Training Sample & Label Model Output Feature Pyramid Inference Ablation Study FCN & Detection FCOS |
Symbol Definition Loss Function Center-ness Remap of Feature & Image |
[Cascade R-CNN](./[paper reading] Cascade R-CNN.md) | Problem to Solve Motivation |
Architecture Advantages |
Degradation of High IoU Quality Match Cascade Regression Cascade Detection Iterative BB Regression Integral Loss Training Strategy Inference Strategy implement Details Generalization Capacity Ablation Experiments Comparison with SOTA |
Loss Function of Stages Standard BBox Regression Iterative BB Regression Classification |
- Basic Concept
- 结构的稀疏连接
- 层间连接的稀疏结构
- 特征连接的稀疏结构
- 稀疏/密集分布的特征集
- 结构的稀疏连接
- motivation
- problem to solve
- sparse structure (of network)
- dense matrix (of features)
- sparse structure & dense matrix
- problem to solve
- technique
- Inception
- key elements
- Inception path
- 1x1 conv
- use yourself
- Architecture
- data augmentation
- model ensemble
- Test Time Augmentation
- blogs
- modifications
- Inception V2
- summary
- motivation
- problems to slove
- network degradation
- vanishing/exploding gradient
- Relationship of degradation & gradient vanishing
- identity mapping
- problems to slove
- technique
- residual learning
- residual block
- ResNet Architecture
- key element
- shortcut connection
- feature concatenate
- math
- forward/backward propagation
- residual learning
- address degradation
- address gradient vanishing
- use yourself
- articles
- ResNet V2
- Shattered Gradients (Gradient correlation)
- Ensemble-like behavior
- Hard For Identity Mapping
- blogs
- Motivation
- Problem to Solve
- Modifications
- Skip Connection
- Feature Concatenate
- Technique
- DenseNet Architecture
- Advantages
- Parameter Efficiency & Model Compactness
- Feature Reuse & Collective Knowledge
- Implicit Deep Supervision
- Diversified Depth
- Key Element
- Dense Block
- Transition Layers
- Components
- Compression
- Growth Rate
- Bottleneck Structure
- Math
- Use Yourself
- Bottleneck Structure
- Transition Layers
- Feature Reuse
- Blogs
- Motivation
- Problem to Solve
- Proposal with CNN
- Technique
- Architecture
- Region Proposal Network
- RoI Pooling
- Anchor
- Key Element
- anchor/proposals/bbox
- Pyramids
- Positive & Negative Sample
- Sampling Strategy
- 4-Step Alternating Training
- Math
- Loss Function
- Coordinates Parametrization
- Use Yourself
- Architecture
- Region Proposal Network
- RoI Pooling
- Anchor
- Articles
- R-CNN
- YOLO V1
- Blogs
- Motivation
- Problem to Solve
- Detection as Regression
- Technique
- YOLO v1 Architecture
- Key Element
- Grid Partition
- Model Output
- “Responsible”
- NMS
- Getting Detections
- Error Type Analysis
- Handle Small Object
- Data Augmentation
- Use Yourself
- Classification & Detection
- Explicit & Implicit Grid
- Multi-Task Output
- Loss & Sample & box
- NMS
- Data Augmentation
- Math
- Activation Function
- Loss Function
- Articles
- R-CNN Based
- Deep MultiBox
- OverFeat
- MultiGrasp
- Blogs
- Motivation
- Problem to Solve
- Contributions
- Technique
- SSD Architecture
- Pros & Cons
- Key Element
- Higher Speed
- What Is Resample
- Low & High Level Feature
- Small Object Difficulties
- Data Flow
- Anchor & GtBox Matching
- Foreground & Background
- Hard Negative Mining
- NMS
- Data Augmentation
- Testing (Inferencing)
- Performance Analysis
- Model Analysis
- Use Yourself
- Convolution For Speed
- Feature Pyramids Fashion
- Positive & Negative Imbalance
- Zoom In & Zoom Out
- Math
- Loss Function
- Multi-Level & Anchor
- Layer Output & Filters
- Model Output
- Blogs
- Motivation
- Problem to Solve
- Negative ==> Easy
- Technique
- Focal Loss
- RetinaNet
- Key Element
- Class Imbalance
- Feature Pyramid Network
- ResNet-FPN Backbone
- Classify & Regress FCN
- Post Processing
- Anchor Design
- Anchor & GT Matching
- prior
$\pi$ Initialization - Ablation Experiments
- Math
- Cross Entropy Math
- Standard Cross Entropy
- Balanced Cross Entropy
- Focal Loss Math
- Cross Entropy Math
- Use Yourself
- Data Analysis
- Feature Pyramid Network
- More Scale $\not= $ Better
- Related Work
- Two-Stage Method
- One-Stage Method
- Related Articles
- Blogs
- Motivation
- Problem to Solve
- KeyPoints (Anchor-Free)
- Technique
- CornerNet
- Stacked Hourglass Network
- Prediction Module
- Corner Pooling
- Key Element
- Why CornerNet Better?
- Why Corner Pooling Works
- Grouping Corners
- Getting Bounding Box
- Data Augmentation
- Ablation Experiments
- Math
- Loss Function
- Corner Pooling Math
- Use Yourself
- Network Design
- Intuition & Interpretability
- Divide Task
- TTA
- Related Work
- Two-Stage
- One-Stage
- Blogs
- Motivation
- Problem to Solve
- Idea
- Intuition
- Technique
- CenterNet Architecture
- Center Pooling
- Cascade Corner Pooling
- Central Region Exploration
- Key Element
- Baseline:CornerNet
- Generating BBox
- Training
- Inferencing
- Ablation Experiment
- Error Analysis
- Metric AP & AR & FD
- Small & Medium & Large
- Math
- Central Region
- Loss Function
- Use Yourself
- Related Work
- Anchor-Based Method
- KeyPoint-Based Method
- Two-Stage Method
- One-stage Method
- Problems
- Motivation
- Problem to Solve
- Idea
- Technique
- CenterNet Architecture
- Key Element
- Center Point & Anchor
- Getting Ground-Truth
- Model Output
- Data Augmentation
- Inference
- TTA
- Compared with SOTA
- Additional Experiments
- Math
- Loss Function
- KeyPoint Loss
$\text{L}_k$ - Offset Loss
$\text{L}_{off}$ - Size Loss
$\text{L}_{size}$
- Use Yourself
- Related work
- Anchor-Based Method
- KeyPoint-Based Method
- Motivation
- Idea
- Contribution
- Techniques
- FCOS Architecture
- Center-ness
- Multi-Level FPN Prediction
- Key Elements
- Prediction Head
- Training Sample & Label
- Model Output
- Feature Pyramid
- Inference
- Ablation Study
- FCN & Detection
- FCOS
$vs.$ YOLO v1
- Math
- Symbol Definition
- Loss Function
- Center-ness
- Remap of Feature & Image
- Use Yourself
- Related Work
- Drawbacks of Anchor
- DenseBox-Based
- Anchor-Based Detector
- YOLO v1
- CornerNet
- Motivations
- Problem to Solve
- Motivation
- Techniques
- Architecture
- Advantages
- Key Elements
- Degradation of High IoU
- Quality Match
- Cascade Regression
- Cascade Detection
- Iterative BB Regression
- Integral Loss
- Training Strategy
- Inference Strategy
- implement Details
- Generalization Capacity
- Ablation Experiments
- Comparison with SOTA
- Math
- Loss Function of Stages
- Standard BBox Regression
- Iterative BB Regression
- Classification
- Use Yourself
- Related Work