Skip to content

USTC-IMCC/PaperReading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

About Us

We are the Intelligent Multimedia Content Computing (IMCC) Lab members at University of Science and Technology of China (USTC).

This paper reading report about Computer Vision, with special emphasis on Fine-grained Recognition, Weakly-supervised Learning, Causal Inference, Imperfect Data Learning and relevant topics. We aim to provide an opportunity for students, researchers and faculties to discuss and keep eyes on the current progress in Computer Vision, and to learn how to do high-quality research.

For any interest in our report or our lab, please contact Doctor Chuanbin Liu.

Format

Date Presenter Venue Paper Title Slides
2020.04.12 Chuanbin Liu NeurIPS 2019 This Looks Like That: Deep Learning for Interpretable Image Recognition Slides
  • Date: The date of the report. Please arrange in reverse chronological order.
  • Presenter: The presenter of the report. You can also provide your personal link.
  • Venue: The Venue of the report.
  • Paper Title: Provide the title and link of this paper.
  • Slides: Please convert your .ppt document to .pdf document with name Presenter_Date (e.g. lcb_20200412), and keep it within 5M. As you know, GitHub limits the size of files and the storage of repositories. Also please upload your .ppt document to our tencent document.

Schedule

Date Presenter Venue Paper Title Slides
2024.11.19 Zhiying Lu - Where Can We Mix? From Atom to Cosmic Slides
2024.10.10 Yunning Cao CVPR2024 Compositional Chain-of-Thought Prompting for Large Multimodal Models Slides
2024.08.28 Yixuan Zhang Arxiv xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Slides
2024.08.21 Yifan Gao Arxiv ControlNeXt: Powerful and Efficient Control for Image and Video Generation Slides
2024.07.16 Zhiying Lu Arxiv Cambrian-1:A Fully Open, Vision-CentricExploration of Multimodal LLMs Slides
2024.07.09 Yunning Cao CVPR2024 VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens Slides
2024.07.02 Yinglu Li Arxiv AnyTrans: Translate AnyText in the Image with Large Scale Models Slides
2024.06.25 Bowei Pu CVPR2024 Two papers about Video CLIP and Long Video MLLM Slides
2024.06.11 Yifan Gao Arxiv Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Slides
2024.06.04 Peicheng Zhou CVPR2024 Exploration of the reasons for Limiting MLLM performance Slides
2024.05.28 TianLe Hu CVPR2024 Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs Slides
2024.05.21 Yiwei Sun - Two papers about Video LLM Slides
2024.05.14 Yixuan Zhang - Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Slides
2024.04.15 Borui Ding - masked images are counterfactual samples for robust fine-tuning Slides
2024.04.08 Yifan Gao - A Suvery on Text Image Generation Slides
2024.03.26 Zhiying Lu - Pretrained ViT as Vision Encoder Slides
2024.03.19 Yunning Cao CVPR2024 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Slides
2024.03.12 Yiwei Sun - A Survey on MLLM: IT, ICL & CoT Slides
2024.03.05 TianLe Hu CVPR2024 Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning Slides
2023.11.21 Zhiying Lu arxiv Intializing Models with Larger Ones Slides
2023.11.07 Tianle Hu ICCV2023 Waffling around for Performance: Visual Classification with Random Words and Broad Concepts Slides
2023.11.01 Yifan Gao - Image-based Visual Try-on Slides
2023.10.10 Yiwei Sun - A Survey on Compositional Understanding Slides
2023.09.26 Zhiying Lu - I can't believe there is no training! Slides
2023.09.12 Yunning Cao ICCV2023 I can’t believe there’s no images! Learning Visual Tasks Using Only Language Supervision Slides
2023.07.25 Jingyuan Xu CVPR2022 Grounded_Language-Image_Pre-Training Slides
2023.07.11 Yiwei Sun CVPR2023 Extracting Class Activation Maps from Non-Discriminative Features as well Slides
2023.07.04 Tinle Hu CVPR2023 SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Slides
2023.06.26 Yixuan Zhang ICLR2023 Context Autoencoder for Self-Supervised Representation Learning Slides
2023.06.26 Tianhao Qi - A Survey on Controllable Text-to-Image Diffusion Models Slides
2023.06.19 Borui Ding NIPS2023 Vision Transformer Adapter For Dense Predictions Slides
2023.06.12 Yifan Gao - A Survey on Vision Prompt Tuning Learning Slides
2023.06.08 Pandeng Li - A Survey on Multi-modal Pretraining Slides
2023.06.08 Yunning Cao - A Survey on Visual Tuning Slides
2023.06.05 Zhiying Lu arxiv VanillaNet: the Power of Minimalism in Deep Learning Slides
2023.05.29 Yunning Cao CVPR2023 Texts as Images in Prompt Tuning for Multi-Label Image Recognition Slides
2023.05.23 Jingyuan Xu CVPR2023 Aligning Bag of Regions for Open-Vocabulary Object Detection Slides
2023.05.15 Fanchao Lin arxiv A demo survey on recent fundamental models and applications Slides
2023.05.08 Yifan Gao - A Survey on Fine-Grained Self-Supervised Learning Slides
2023.04.27 Zhiying Lu CVPR2023 Non-Global Attention Mechanisms In Vision Transformers Slides
2023.04.10 Yunning Cao arxiv Segment Anything Slides
2023.03.27 Yiwei Sun - How to help your ViT learn the inductive bias? Slides
2023.03.20 Yunyan Yan - Regression: Representation Space Slides
2023.03.13 Jingyuan Xu ICLR 2023 F-VLM: OPEN-VOCABULARY OBJECT DETECTION UPON FROZEN VISION AND LANGUAGE MODELS Slides
2023.03.06 Yixuan Zhang ECCV 2022 Adaptive Token Sampling For Efficient Vision Transformers Slides
2023.02.27 Fanchao Lin NIPS 2022 Training language models to follow instructions with human feedback Slides
2023.02.20 Yifan Gao NIPS 2022 ConvMAE: Masked Convolution Meets Masked Autoencoders Slides
2023.02.06 Yunyan Yan CVPR 2022 A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty Slides
2023.01.03 Yunning Cao ICLR 2023 Image as Set of Points Slides
2022.12.19 Yiwei Sun - A Survey on FGVC Slides
2022.12.14 Fanchao Lin CVPR 2022 Recurrent Dynamic Embedding for Video Object Segmentation Slides
2022.12.05 Yunyan Yan AAAI 2019 Gradient Harmonized Single-Stage Detector Slides
2022.11.28 Yunning Cao CVPR 2022 Fine-Grained Object Classification via Self-Supervised Pose Alignment Slides
2022.11.28 Zhiying Lu ECCV 2022 TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers Slides
2020.04.12 Chuanbin Liu NeurIPS 2019 This Looks Like That: Deep Learning for Interpretable Image Recognition Slides

About

Paper Reading of IMCC groups.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published