We are the Intelligent Multimedia Content Computing (IMCC) Lab members at University of Science and Technology of China (USTC).
This paper reading report about Computer Vision, with special emphasis on Fine-grained Recognition, Weakly-supervised Learning, Causal Inference, Imperfect Data Learning and relevant topics. We aim to provide an opportunity for students, researchers and faculties to discuss and keep eyes on the current progress in Computer Vision, and to learn how to do high-quality research.
For any interest in our report or our lab, please contact Doctor Chuanbin Liu.
Date | Presenter | Venue | Paper Title | Slides |
---|---|---|---|---|
2020.04.12 | Chuanbin Liu | NeurIPS 2019 | This Looks Like That: Deep Learning for Interpretable Image Recognition | Slides |
- Date: The date of the report. Please arrange in reverse chronological order.
- Presenter: The presenter of the report. You can also provide your personal link.
- Venue: The Venue of the report.
- Paper Title: Provide the title and link of this paper.
- Slides: Please convert your .ppt document to .pdf document with name Presenter_Date (e.g. lcb_20200412), and keep it within 5M. As you know, GitHub limits the size of files and the storage of repositories. Also please upload your .ppt document to our tencent document.
Date | Presenter | Venue | Paper Title | Slides |
---|---|---|---|---|
2024.11.19 | Zhiying Lu | - | Where Can We Mix? From Atom to Cosmic | Slides |
2024.10.10 | Yunning Cao | CVPR2024 | Compositional Chain-of-Thought Prompting for Large Multimodal Models | Slides |
2024.08.28 | Yixuan Zhang | Arxiv | xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | Slides |
2024.08.21 | Yifan Gao | Arxiv | ControlNeXt: Powerful and Efficient Control for Image and Video Generation | Slides |
2024.07.16 | Zhiying Lu | Arxiv | Cambrian-1:A Fully Open, Vision-CentricExploration of Multimodal LLMs | Slides |
2024.07.09 | Yunning Cao | CVPR2024 | VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens | Slides |
2024.07.02 | Yinglu Li | Arxiv | AnyTrans: Translate AnyText in the Image with Large Scale Models | Slides |
2024.06.25 | Bowei Pu | CVPR2024 | Two papers about Video CLIP and Long Video MLLM | Slides |
2024.06.11 | Yifan Gao | Arxiv | Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering | Slides |
2024.06.04 | Peicheng Zhou | CVPR2024 | Exploration of the reasons for Limiting MLLM performance | Slides |
2024.05.28 | TianLe Hu | CVPR2024 | Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs | Slides |
2024.05.21 | Yiwei Sun | - | Two papers about Video LLM | Slides |
2024.05.14 | Yixuan Zhang | - | Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | Slides |
2024.04.15 | Borui Ding | - | masked images are counterfactual samples for robust fine-tuning | Slides |
2024.04.08 | Yifan Gao | - | A Suvery on Text Image Generation | Slides |
2024.03.26 | Zhiying Lu | - | Pretrained ViT as Vision Encoder | Slides |
2024.03.19 | Yunning Cao | CVPR2024 | Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs | Slides |
2024.03.12 | Yiwei Sun | - | A Survey on MLLM: IT, ICL & CoT | Slides |
2024.03.05 | TianLe Hu | CVPR2024 | Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning | Slides |
2023.11.21 | Zhiying Lu | arxiv | Intializing Models with Larger Ones | Slides |
2023.11.07 | Tianle Hu | ICCV2023 | Waffling around for Performance: Visual Classification with Random Words and Broad Concepts | Slides |
2023.11.01 | Yifan Gao | - | Image-based Visual Try-on | Slides |
2023.10.10 | Yiwei Sun | - | A Survey on Compositional Understanding | Slides |
2023.09.26 | Zhiying Lu | - | I can't believe there is no training! | Slides |
2023.09.12 | Yunning Cao | ICCV2023 | I can’t believe there’s no images! Learning Visual Tasks Using Only Language Supervision | Slides |
2023.07.25 | Jingyuan Xu | CVPR2022 | Grounded_Language-Image_Pre-Training | Slides |
2023.07.11 | Yiwei Sun | CVPR2023 | Extracting Class Activation Maps from Non-Discriminative Features as well | Slides |
2023.07.04 | Tinle Hu | CVPR2023 | SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer | Slides |
2023.06.26 | Yixuan Zhang | ICLR2023 | Context Autoencoder for Self-Supervised Representation Learning | Slides |
2023.06.26 | Tianhao Qi | - | A Survey on Controllable Text-to-Image Diffusion Models | Slides |
2023.06.19 | Borui Ding | NIPS2023 | Vision Transformer Adapter For Dense Predictions | Slides |
2023.06.12 | Yifan Gao | - | A Survey on Vision Prompt Tuning Learning | Slides |
2023.06.08 | Pandeng Li | - | A Survey on Multi-modal Pretraining | Slides |
2023.06.08 | Yunning Cao | - | A Survey on Visual Tuning | Slides |
2023.06.05 | Zhiying Lu | arxiv | VanillaNet: the Power of Minimalism in Deep Learning | Slides |
2023.05.29 | Yunning Cao | CVPR2023 | Texts as Images in Prompt Tuning for Multi-Label Image Recognition | Slides |
2023.05.23 | Jingyuan Xu | CVPR2023 | Aligning Bag of Regions for Open-Vocabulary Object Detection | Slides |
2023.05.15 | Fanchao Lin | arxiv | A demo survey on recent fundamental models and applications | Slides |
2023.05.08 | Yifan Gao | - | A Survey on Fine-Grained Self-Supervised Learning | Slides |
2023.04.27 | Zhiying Lu | CVPR2023 | Non-Global Attention Mechanisms In Vision Transformers | Slides |
2023.04.10 | Yunning Cao | arxiv | Segment Anything | Slides |
2023.03.27 | Yiwei Sun | - | How to help your ViT learn the inductive bias? | Slides |
2023.03.20 | Yunyan Yan | - | Regression: Representation Space | Slides |
2023.03.13 | Jingyuan Xu | ICLR 2023 | F-VLM: OPEN-VOCABULARY OBJECT DETECTION UPON FROZEN VISION AND LANGUAGE MODELS | Slides |
2023.03.06 | Yixuan Zhang | ECCV 2022 | Adaptive Token Sampling For Efficient Vision Transformers | Slides |
2023.02.27 | Fanchao Lin | NIPS 2022 | Training language models to follow instructions with human feedback | Slides |
2023.02.20 | Yifan Gao | NIPS 2022 | ConvMAE: Masked Convolution Meets Masked Autoencoders | Slides |
2023.02.06 | Yunyan Yan | CVPR 2022 | A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty | Slides |
2023.01.03 | Yunning Cao | ICLR 2023 | Image as Set of Points | Slides |
2022.12.19 | Yiwei Sun | - | A Survey on FGVC | Slides |
2022.12.14 | Fanchao Lin | CVPR 2022 | Recurrent Dynamic Embedding for Video Object Segmentation | Slides |
2022.12.05 | Yunyan Yan | AAAI 2019 | Gradient Harmonized Single-Stage Detector | Slides |
2022.11.28 | Yunning Cao | CVPR 2022 | Fine-Grained Object Classification via Self-Supervised Pose Alignment | Slides |
2022.11.28 | Zhiying Lu | ECCV 2022 | TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers | Slides |
2020.04.12 | Chuanbin Liu | NeurIPS 2019 | This Looks Like That: Deep Learning for Interpretable Image Recognition | Slides |