GitHub - USTC-IMCC/PaperReading: Paper Reading of IMCC groups.

About Us

We are the Intelligent Multimedia Content Computing (IMCC) Lab members at University of Science and Technology of China (USTC).

This paper reading report about Computer Vision, with special emphasis on Fine-grained Recognition, Weakly-supervised Learning, Causal Inference, Imperfect Data Learning and relevant topics. We aim to provide an opportunity for students, researchers and faculties to discuss and keep eyes on the current progress in Computer Vision, and to learn how to do high-quality research.

For any interest in our report or our lab, please contact Doctor Chuanbin Liu.

Format

Date	Presenter	Venue	Paper Title	Slides
2020.04.12	Chuanbin Liu	NeurIPS 2019	This Looks Like That: Deep Learning for Interpretable Image Recognition	Slides

Date: The date of the report. Please arrange in reverse chronological order.
Presenter: The presenter of the report. You can also provide your personal link.
Venue: The Venue of the report.
Paper Title: Provide the title and link of this paper.
Slides: Please convert your .ppt document to .pdf document with name Presenter_Date (e.g. lcb_20200412), and keep it within 5M. As you know, GitHub limits the size of files and the storage of repositories. Also please upload your .ppt document to our tencent document.

Schedule

Date	Presenter	Venue	Paper Title	Slides
2024.11.19	Zhiying Lu	-	Where Can We Mix? From Atom to Cosmic	Slides
2024.10.10	Yunning Cao	CVPR2024	Compositional Chain-of-Thought Prompting for Large Multimodal Models	Slides
2024.08.28	Yixuan Zhang	Arxiv	xGen-MM (BLIP-3): A Family of Open Large Multimodal Models	Slides
2024.08.21	Yifan Gao	Arxiv	ControlNeXt: Powerful and Efficient Control for Image and Video Generation	Slides
2024.07.16	Zhiying Lu	Arxiv	Cambrian-1:A Fully Open, Vision-CentricExploration of Multimodal LLMs	Slides
2024.07.09	Yunning Cao	CVPR2024	VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens	Slides
2024.07.02	Yinglu Li	Arxiv	AnyTrans: Translate AnyText in the Image with Large Scale Models	Slides
2024.06.25	Bowei Pu	CVPR2024	Two papers about Video CLIP and Long Video MLLM	Slides
2024.06.11	Yifan Gao	Arxiv	Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering	Slides
2024.06.04	Peicheng Zhou	CVPR2024	Exploration of the reasons for Limiting MLLM performance	Slides
2024.05.28	TianLe Hu	CVPR2024	Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs	Slides
2024.05.21	Yiwei Sun	-	Two papers about Video LLM	Slides
2024.05.14	Yixuan Zhang	-	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Slides
2024.04.15	Borui Ding	-	masked images are counterfactual samples for robust fine-tuning	Slides
2024.04.08	Yifan Gao	-	A Suvery on Text Image Generation	Slides
2024.03.26	Zhiying Lu	-	Pretrained ViT as Vision Encoder	Slides
2024.03.19	Yunning Cao	CVPR2024	Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	Slides
2024.03.12	Yiwei Sun	-	A Survey on MLLM: IT, ICL & CoT	Slides
2024.03.05	TianLe Hu	CVPR2024	Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning	Slides
2023.11.21	Zhiying Lu	arxiv	Intializing Models with Larger Ones	Slides
2023.11.07	Tianle Hu	ICCV2023	Waffling around for Performance: Visual Classification with Random Words and Broad Concepts	Slides
2023.11.01	Yifan Gao	-	Image-based Visual Try-on	Slides
2023.10.10	Yiwei Sun	-	A Survey on Compositional Understanding	Slides
2023.09.26	Zhiying Lu	-	I can't believe there is no training!	Slides
2023.09.12	Yunning Cao	ICCV2023	I can’t believe there’s no images! Learning Visual Tasks Using Only Language Supervision	Slides
2023.07.25	Jingyuan Xu	CVPR2022	Grounded_Language-Image_Pre-Training	Slides
2023.07.11	Yiwei Sun	CVPR2023	Extracting Class Activation Maps from Non-Discriminative Features as well	Slides
2023.07.04	Tinle Hu	CVPR2023	SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer	Slides
2023.06.26	Yixuan Zhang	ICLR2023	Context Autoencoder for Self-Supervised Representation Learning	Slides
2023.06.26	Tianhao Qi	-	A Survey on Controllable Text-to-Image Diffusion Models	Slides
2023.06.19	Borui Ding	NIPS2023	Vision Transformer Adapter For Dense Predictions	Slides
2023.06.12	Yifan Gao	-	A Survey on Vision Prompt Tuning Learning	Slides
2023.06.08	Pandeng Li	-	A Survey on Multi-modal Pretraining	Slides
2023.06.08	Yunning Cao	-	A Survey on Visual Tuning	Slides
2023.06.05	Zhiying Lu	arxiv	VanillaNet: the Power of Minimalism in Deep Learning	Slides
2023.05.29	Yunning Cao	CVPR2023	Texts as Images in Prompt Tuning for Multi-Label Image Recognition	Slides
2023.05.23	Jingyuan Xu	CVPR2023	Aligning Bag of Regions for Open-Vocabulary Object Detection	Slides
2023.05.15	Fanchao Lin	arxiv	A demo survey on recent fundamental models and applications	Slides
2023.05.08	Yifan Gao	-	A Survey on Fine-Grained Self-Supervised Learning	Slides
2023.04.27	Zhiying Lu	CVPR2023	Non-Global Attention Mechanisms In Vision Transformers	Slides
2023.04.10	Yunning Cao	arxiv	Segment Anything	Slides
2023.03.27	Yiwei Sun	-	How to help your ViT learn the inductive bias?	Slides
2023.03.20	Yunyan Yan	-	Regression: Representation Space	Slides
2023.03.13	Jingyuan Xu	ICLR 2023	F-VLM: OPEN-VOCABULARY OBJECT DETECTION UPON FROZEN VISION AND LANGUAGE MODELS	Slides
2023.03.06	Yixuan Zhang	ECCV 2022	Adaptive Token Sampling For Efficient Vision Transformers	Slides
2023.02.27	Fanchao Lin	NIPS 2022	Training language models to follow instructions with human feedback	Slides
2023.02.20	Yifan Gao	NIPS 2022	ConvMAE: Masked Convolution Meets Masked Autoencoders	Slides
2023.02.06	Yunyan Yan	CVPR 2022	A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty	Slides
2023.01.03	Yunning Cao	ICLR 2023	Image as Set of Points	Slides
2022.12.19	Yiwei Sun	-	A Survey on FGVC	Slides
2022.12.14	Fanchao Lin	CVPR 2022	Recurrent Dynamic Embedding for Video Object Segmentation	Slides
2022.12.05	Yunyan Yan	AAAI 2019	Gradient Harmonized Single-Stage Detector	Slides
2022.11.28	Yunning Cao	CVPR 2022	Fine-Grained Object Classification via Self-Supervised Pose Alignment	Slides
2022.11.28	Zhiying Lu	ECCV 2022	TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers	Slides
2020.04.12	Chuanbin Liu	NeurIPS 2019	This Looks Like That: Deep Learning for Interpretable Image Recognition	Slides

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
Slides		Slides
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Us

Format

Schedule

About

Releases

Packages

Contributors 14

USTC-IMCC/PaperReading

Folders and files

Latest commit

History

Repository files navigation

About Us

Format

Schedule

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 14

Packages