diff --git a/.DS_Store b/.DS_Store index f8a493b3..0ab78ff5 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/Lectures/.DS_Store b/Lectures/.DS_Store index 9926c419..fe476a35 100644 Binary files a/Lectures/.DS_Store and b/Lectures/.DS_Store differ diff --git a/Lectures/S0-L06/.DS_Store b/Lectures/S0-L06/.DS_Store index c9e08d2d..09e45713 100644 Binary files a/Lectures/S0-L06/.DS_Store and b/Lectures/S0-L06/.DS_Store differ diff --git a/Lectures/S0-L06/images/.DS_Store b/Lectures/S0-L06/images/.DS_Store index c5f04459..c84213d7 100644 Binary files a/Lectures/S0-L06/images/.DS_Store and b/Lectures/S0-L06/images/.DS_Store differ diff --git a/Lectures/S0-L14/.DS_Store b/Lectures/S0-L14/.DS_Store new file mode 100644 index 00000000..995cb12a Binary files /dev/null and b/Lectures/S0-L14/.DS_Store differ diff --git a/Lectures/S0-L14/images/.DS_Store b/Lectures/S0-L14/images/.DS_Store new file mode 100644 index 00000000..a207bfd6 Binary files /dev/null and b/Lectures/S0-L14/images/.DS_Store differ diff --git a/Lectures/S0-L14/images/Comprehensive/1.png b/Lectures/S0-L14/images/Comprehensive/1.png new file mode 100644 index 00000000..dad4ddf3 Binary files /dev/null and b/Lectures/S0-L14/images/Comprehensive/1.png differ diff --git a/Lectures/S0-L14/images/Comprehensive/2.png b/Lectures/S0-L14/images/Comprehensive/2.png new file mode 100644 index 00000000..cd10c78d Binary files /dev/null and b/Lectures/S0-L14/images/Comprehensive/2.png differ diff --git a/Lectures/S0-L14/images/Comprehensive/3.png b/Lectures/S0-L14/images/Comprehensive/3.png new file mode 100644 index 00000000..127da24f Binary files /dev/null and b/Lectures/S0-L14/images/Comprehensive/3.png differ diff --git a/Lectures/S0-L14/images/Comprehensive/4.png b/Lectures/S0-L14/images/Comprehensive/4.png new file mode 100644 index 00000000..c7bfeb88 Binary files /dev/null and b/Lectures/S0-L14/images/Comprehensive/4.png differ diff --git a/Lectures/S0-L14/images/RAG_AIGC/1.png b/Lectures/S0-L14/images/RAG_AIGC/1.png new file mode 100644 index 00000000..54160154 Binary files /dev/null and b/Lectures/S0-L14/images/RAG_AIGC/1.png differ diff --git a/Lectures/S0-L14/images/RAG_AIGC/2.png b/Lectures/S0-L14/images/RAG_AIGC/2.png new file mode 100644 index 00000000..6907f782 Binary files /dev/null and b/Lectures/S0-L14/images/RAG_AIGC/2.png differ diff --git a/Lectures/S0-L14/images/RAG_AIGC/3.png b/Lectures/S0-L14/images/RAG_AIGC/3.png new file mode 100644 index 00000000..027c65eb Binary files /dev/null and b/Lectures/S0-L14/images/RAG_AIGC/3.png differ diff --git a/Lectures/S0-L14/images/RAG_AIGC/4.png b/Lectures/S0-L14/images/RAG_AIGC/4.png new file mode 100644 index 00000000..378a3987 Binary files /dev/null and b/Lectures/S0-L14/images/RAG_AIGC/4.png differ diff --git a/Lectures/S0-L14/images/Sora/.DS_Store b/Lectures/S0-L14/images/Sora/.DS_Store new file mode 100644 index 00000000..4d5ae143 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/.DS_Store differ diff --git a/Lectures/S0-L14/images/Sora/01_sora_in_out.png b/Lectures/S0-L14/images/Sora/01_sora_in_out.png new file mode 100644 index 00000000..79c40dae Binary files /dev/null and b/Lectures/S0-L14/images/Sora/01_sora_in_out.png differ diff --git a/Lectures/S0-L14/images/Sora/02_Sora_application.png b/Lectures/S0-L14/images/Sora/02_Sora_application.png new file mode 100644 index 00000000..afbd1f9d Binary files /dev/null and b/Lectures/S0-L14/images/Sora/02_Sora_application.png differ diff --git a/Lectures/S0-L14/images/Sora/03_history.png b/Lectures/S0-L14/images/Sora/03_history.png new file mode 100644 index 00000000..a243e720 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/03_history.png differ diff --git a/Lectures/S0-L14/images/Sora/04_sora_framework.png b/Lectures/S0-L14/images/Sora/04_sora_framework.png new file mode 100644 index 00000000..288ddaa8 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/04_sora_framework.png differ diff --git a/Lectures/S0-L14/images/Sora/05_turtle.png b/Lectures/S0-L14/images/Sora/05_turtle.png new file mode 100644 index 00000000..b44b9c19 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/05_turtle.png differ diff --git a/Lectures/S0-L14/images/Sora/06_cropped_training.png b/Lectures/S0-L14/images/Sora/06_cropped_training.png new file mode 100644 index 00000000..0e8964f8 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/06_cropped_training.png differ diff --git a/Lectures/S0-L14/images/Sora/07_compression.png b/Lectures/S0-L14/images/Sora/07_compression.png new file mode 100644 index 00000000..3b127efb Binary files /dev/null and b/Lectures/S0-L14/images/Sora/07_compression.png differ diff --git a/Lectures/S0-L14/images/Sora/08_patchify.png b/Lectures/S0-L14/images/Sora/08_patchify.png new file mode 100644 index 00000000..656afc8b Binary files /dev/null and b/Lectures/S0-L14/images/Sora/08_patchify.png differ diff --git a/Lectures/S0-L14/images/Sora/09_3D_coompression.png b/Lectures/S0-L14/images/Sora/09_3D_coompression.png new file mode 100644 index 00000000..be5a72fc Binary files /dev/null and b/Lectures/S0-L14/images/Sora/09_3D_coompression.png differ diff --git a/Lectures/S0-L14/images/Sora/10_pnp_seq2.png b/Lectures/S0-L14/images/Sora/10_pnp_seq2.png new file mode 100644 index 00000000..76315088 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/10_pnp_seq2.png differ diff --git a/Lectures/S0-L14/images/Sora/11_dit-uvit.png b/Lectures/S0-L14/images/Sora/11_dit-uvit.png new file mode 100644 index 00000000..a43b0680 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/11_dit-uvit.png differ diff --git a/Lectures/S0-L14/images/Sora/12_imagenV.png b/Lectures/S0-L14/images/Sora/12_imagenV.png new file mode 100644 index 00000000..9ca42af0 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/12_imagenV.png differ diff --git a/Lectures/S0-L14/images/Sora/13_text_prompt.png b/Lectures/S0-L14/images/Sora/13_text_prompt.png new file mode 100644 index 00000000..be86fd63 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/13_text_prompt.png differ diff --git a/Lectures/S0-L14/images/Sora/14_image_prompt.png b/Lectures/S0-L14/images/Sora/14_image_prompt.png new file mode 100644 index 00000000..76383601 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/14_image_prompt.png differ diff --git a/Lectures/S0-L14/images/Sora/15_video_prompt.png b/Lectures/S0-L14/images/Sora/15_video_prompt.png new file mode 100644 index 00000000..b03c77b3 Binary files /dev/null and b/Lectures/S0-L14/images/Sora/15_video_prompt.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/1.png b/Lectures/S0-L14/images/Table Reasoning/1.png new file mode 100644 index 00000000..43ec1701 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/1.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/2.png b/Lectures/S0-L14/images/Table Reasoning/2.png new file mode 100644 index 00000000..00a1878a Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/2.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/3.png b/Lectures/S0-L14/images/Table Reasoning/3.png new file mode 100644 index 00000000..073663d1 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/3.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/4.png b/Lectures/S0-L14/images/Table Reasoning/4.png new file mode 100644 index 00000000..4a93f9e3 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/4.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/5.png b/Lectures/S0-L14/images/Table Reasoning/5.png new file mode 100644 index 00000000..210ae7f1 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/5.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/6.png b/Lectures/S0-L14/images/Table Reasoning/6.png new file mode 100644 index 00000000..1dc85329 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/6.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/7.png b/Lectures/S0-L14/images/Table Reasoning/7.png new file mode 100644 index 00000000..f902cdca Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/7.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/8.png b/Lectures/S0-L14/images/Table Reasoning/8.png new file mode 100644 index 00000000..41b39aac Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/8.png differ diff --git a/Lectures/S0-L14/images/Table Reasoning/9.png b/Lectures/S0-L14/images/Table Reasoning/9.png new file mode 100644 index 00000000..9fe63863 Binary files /dev/null and b/Lectures/S0-L14/images/Table Reasoning/9.png differ diff --git a/Lectures/S0-L14/images/formula/f0.png b/Lectures/S0-L14/images/formula/f0.png new file mode 100644 index 00000000..089b1ede Binary files /dev/null and b/Lectures/S0-L14/images/formula/f0.png differ diff --git a/Lectures/S0-L14/images/formula/f1.png b/Lectures/S0-L14/images/formula/f1.png new file mode 100644 index 00000000..089b1ede Binary files /dev/null and b/Lectures/S0-L14/images/formula/f1.png differ diff --git a/Lectures/S0-L14/images/formula/f2.png b/Lectures/S0-L14/images/formula/f2.png new file mode 100644 index 00000000..2ebfed14 Binary files /dev/null and b/Lectures/S0-L14/images/formula/f2.png differ diff --git a/Lectures/S0-L14/images/formula/f3.png b/Lectures/S0-L14/images/formula/f3.png new file mode 100644 index 00000000..c1728593 Binary files /dev/null and b/Lectures/S0-L14/images/formula/f3.png differ diff --git a/Lectures/S0-L20/.DS_Store b/Lectures/S0-L20/.DS_Store new file mode 100644 index 00000000..b73a5977 Binary files /dev/null and b/Lectures/S0-L20/.DS_Store differ diff --git a/Lectures/S0-L20/images/.DS_Store b/Lectures/S0-L20/images/.DS_Store new file mode 100644 index 00000000..df2dde5c Binary files /dev/null and b/Lectures/S0-L20/images/.DS_Store differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/.DS_Store b/Lectures/S0-L20/images/Skeleton Of Thought/.DS_Store new file mode 100644 index 00000000..5008ddfc Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/.DS_Store differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/01_human_thoughts.png b/Lectures/S0-L20/images/Skeleton Of Thought/01_human_thoughts.png new file mode 100644 index 00000000..ee728502 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/01_human_thoughts.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/02_SoT.png b/Lectures/S0-L20/images/Skeleton Of Thought/02_SoT.png new file mode 100644 index 00000000..3844245f Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/02_SoT.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/03_SoT_Comparison.png b/Lectures/S0-L20/images/Skeleton Of Thought/03_SoT_Comparison.png new file mode 100644 index 00000000..aa83468a Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/03_SoT_Comparison.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/04_Skeleton_Prompt.png b/Lectures/S0-L20/images/Skeleton Of Thought/04_Skeleton_Prompt.png new file mode 100644 index 00000000..4266fd58 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/04_Skeleton_Prompt.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/05_Point_Expanding_Prompt.png b/Lectures/S0-L20/images/Skeleton Of Thought/05_Point_Expanding_Prompt.png new file mode 100644 index 00000000..fa6f22e5 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/05_Point_Expanding_Prompt.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/1.png b/Lectures/S0-L20/images/Skeleton Of Thought/1.png new file mode 100644 index 00000000..42ca53fb Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/1.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/2.png b/Lectures/S0-L20/images/Skeleton Of Thought/2.png new file mode 100644 index 00000000..6a9afd1c Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/2.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/3.png b/Lectures/S0-L20/images/Skeleton Of Thought/3.png new file mode 100644 index 00000000..d5896c00 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/3.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/4.png b/Lectures/S0-L20/images/Skeleton Of Thought/4.png new file mode 100644 index 00000000..87c06064 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/4.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/5.png b/Lectures/S0-L20/images/Skeleton Of Thought/5.png new file mode 100644 index 00000000..18d3ce75 Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/5.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/6.png b/Lectures/S0-L20/images/Skeleton Of Thought/6.png new file mode 100644 index 00000000..3374489b Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/6.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/7.png b/Lectures/S0-L20/images/Skeleton Of Thought/7.png new file mode 100644 index 00000000..68a2c23a Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/7.png differ diff --git a/Lectures/S0-L20/images/Skeleton Of Thought/8.png b/Lectures/S0-L20/images/Skeleton Of Thought/8.png new file mode 100644 index 00000000..4585050c Binary files /dev/null and b/Lectures/S0-L20/images/Skeleton Of Thought/8.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/1.png b/Lectures/S0-L20/images/unleash_prompt_engineering/1.png new file mode 100644 index 00000000..fce28024 Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/1.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/2.png b/Lectures/S0-L20/images/unleash_prompt_engineering/2.png new file mode 100644 index 00000000..24021043 Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/2.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/3.png b/Lectures/S0-L20/images/unleash_prompt_engineering/3.png new file mode 100644 index 00000000..007d9b31 Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/3.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/4.png b/Lectures/S0-L20/images/unleash_prompt_engineering/4.png new file mode 100644 index 00000000..202fe5b9 Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/4.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/5.png b/Lectures/S0-L20/images/unleash_prompt_engineering/5.png new file mode 100644 index 00000000..1bd8323d Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/5.png differ diff --git a/Lectures/S0-L20/images/unleash_prompt_engineering/6.png b/Lectures/S0-L20/images/unleash_prompt_engineering/6.png new file mode 100644 index 00000000..c44bb10f Binary files /dev/null and b/Lectures/S0-L20/images/unleash_prompt_engineering/6.png differ diff --git a/_contents/S0-L14.md b/_contents/S0-L14.md index 394caede..02f4004b 100755 --- a/_contents/S0-L14.md +++ b/_contents/S0-L14.md @@ -1,9 +1,9 @@ --- layout: post -title: Knowledge Augmented FMs -lecture: +title: Knowledge Augmented FMs +lecture: lectureVersion: current -extraContent: +extraContent: notes: team-6 video: team-1 tags: @@ -14,36 +14,446 @@ categories: - FMAdapt --- -In this session, our readings cover: - -## Required Readings: +In this session, our readings cover: +## Required Readings: ### Retrieval-Augmented Generation for AI-Generated Content: A Survey -+ https://arxiv.org/abs/2402.19473v1 -+ The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: this https URL +- https://arxiv.org/abs/2402.19473v1 +- The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: this https URL ### Retrieval-Augmented Generation for Large Language Models: A Survey - + https://arxiv.org/abs/2312.10997 - + Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG. +- https://arxiv.org/abs/2312.10997 +- Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG. -## More Readings: +## More Readings: +### Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models -### A Comprehensive Study of Knowledge Editing for Large Language Models -+ https://arxiv.org/abs/2401.01286 -+ Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications. +- Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun +- Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation. +### A Comprehensive Study of Knowledge Editing for Large Language Models +- https://arxiv.org/abs/2401.01286 +- Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications. +## Even More ### A Survey of Table Reasoning with Large Language Models -+ Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che -+ https://arxiv.org/abs/2402.08259 -+ Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research. +- Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che +- https://arxiv.org/abs/2402.08259 +- Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research. + +# Retrieval-Augmented Generation for​ AI-Generated Content: A Survey​ + +### Motivation and the RAG Process +Artificial Intelligence Generated Content(AIGC) refers to the texts and code generated by Large Language Model, the images generated by DALL-E and Stable-Diffusion, and video generated by Sora. Besides the recent success of AIGC, it continues to face a number of challenges. For example, it is difficult to maintain up-to-date knowledge for these models, because model training is required in order for the model to generate answers based on new knowledge. In addition, these models suffer from the inability to provide long-tail knowledge, and they are at risk of leaking private training data. Retrieval-Augmented Generation(RAG) serves as a mitigation to these problems, because it has an adaptive data repository. With such data repository, when the new knowledge or long-tail knowledge is included, or when the sensitive private data is encoded, the above challenge can be straightforwardly allievated. + + +The figure below shows the standard Retrieval-Augmented Generation process. The user's prompt (in any modalities) is taken as input for both the retriever and the generator. The retriever has access to database and retrieve the data relavent to the prompt for the generator. The generator then takes both the user prompt and the data retrieved as input and eventually generates the results. + + + +### Taxonomy of RAG Foundations + +The figure below shows the four major categories of RAG. + +- Query-Based RAG + - It combines the data retrieved and the user's prompt as the input for the generator. + - Examples include REALM that uses two BERT for retrieval and generation, and APICoder for text to code tasks. +- Latent-Representation Based RAG + - This line of methods allows the generator to deal with the latent representation of retrieved data. + - FiD is a common technique used that process the retrieved data by an encoder individually. + - The benefit of such technique is that it can generate answers after fusing multiple paragraphs in the latent representation. +- Logits-based RAG + - The retrieved data is incorperated in the logits during the decoding process. + - Some examples includes kNN-LM that augments LM with k-nearest neighbour search and TRIME. +- Speculative RAG + - This category of RAG decide when to use retriever to augment the generation process to save inference time. + + + +### Taxonomy of RAG Enhancements​ + +The performance of RAG can be further enhanced by the following techniques shown in the below figure. + +- Input Enhancement can be done in the following two ways: + - Query Transformation: The user's input prompt can be enhanced by modifying the query. + - Data Augmentation: the retrival database can exclude irrelavent data before making the retrieval. +- Retriever Enhancement + - Recursive Retrieve: a query is splitted into smaller pieces and result is combined by multiple retrievals. + - Chunk Optimization: the size of the chunk is adjusted to achieve better retrieval results. + - Some other technniques include Finetune Retriever, Hybrid Retrieve, Re-ranking and Meta-data Filtering. +- Generator Enhancement + - In a RAG system, the generator is the "upperbound" of the performance, and it is enhance by methods such as Prompt Engineering, Decoding Tuning and Finetune Generator. +- Result Enhancement + - In some cases, it is possible to rewrite the output in order to improve the performance. +- RAG Pipeline Enhancement + - Within the RAG pipeline, the model can decide when to perform retrieval to obtain the best performance. + - An iterative retrieval process may also further improve the performance. + + + +### Taxonomy of RAG Applications + +RAG is a general purpose method that can be effectively applied in different domains. The figure below shows the areas of its application, ranging from question answering, code generation, to text-to-3D and drug discovery. + + +# Sora: A review on Background, Technology, Limitations, and Opportunities of Large Vision Models + +## What is Sora? + +Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Figure below is an example of the input and output of Sora. + + + +## What can Sora do? + +The implications of Sora extend far beyond mere video creation, offering transformative potential for tasks ranging from automated content generation to complex decision-making processes. Figure below is an overview of practical deployment scenarios. + + + +## History of Generative Video + + + + +## Overview + +Sora is a diffusion transformer with flexible sampling dimensions as shown in figure below. It has three parts: + +1. A time-space compressor first maps the original video into latent space. +2. A ViT then processes the tokenized latent representation and outputs the denoised latent representation. +3. A CLIP-like conditioning mechanism receives LLM-augmented user instructions and potentially visual prompts to guide the diffusion model to generate styled or themed videos. + + + +## Data Pre-processing + +### Variable Durations, Resolutions, Aspect Ratios + +Sora can generate images in flexible sizes or resolutions ranging from 1920x1080p to 1080x1920p and anything in between. + +

+ +Sora is trained on data in their native sizes which significantly improves composition and framing in the generated videos. The comparison between Sora and a model trained on uniformly cropped square videos demonstrates a clear advantage as shown in figure below. Videos produced by Sora exhibit better framing, ensuring subjects are fully captured in the scene. + +
+ +
+ +### Unified Visual Representation + +To effectively process diverse visual inputs including images and videos with varying durations, resolutions, and aspect ratios, Sora patchifies videos by initially compressing videos into a lower-dimensional latent space, followed by decomposing the representation into spacetime patches, as shown in the figure below. + +

+ +### Video Compression Network + +Sora’s video compression network (or visual encoder) aims to **reduce the dimensionality** of input data. It is typically built upon VAE or Vector Quantised-VAE (VQ-VAE). To solve the problem that it is challenging for VAE to map visual data of any size to a unified and fixed-sized latent space, there are two implementations. + +- Spatial-patch Compression: Transforming video frames into fixed-size patches + + +

+ +- Spatial-temporal-patch Compression: Consider spatial and temporal dimensions of data and captures changes across frames. Compared with pure spatial-pachifying, 3D Convolution is utilized to achieve spatial-temporal-patch compression as shown in figure below. + + + + +### Spacetime Latent Patches + +A remaining concern in compression network part is: How to handle the **variability** in latent space dimensions (i.e., the number of latent feature chunks or patches from different video types) before feeding patches into the input layers of the diffusion transformer. + +**Patch n’ pack (PNP)** is a possible the solution. PNP packs multiple patches from different images in a single sequence as shown in figure below. + + + +## Modeling + +### Image Diffusion Transformer + +DiT and U-ViT are among the first works to employ vision transformers for latent diffusion models. DiT employs a multi-head self-attention layer and a pointwise feed-forward network interlaced with some layer norm and scaling layers. DiT incorporates conditioning via adaptive layer norm (AdaLN) with an additional MLP layer for zero-initializing, which initializes each residual block as an identity function and thus greatly stabilizes the training process. + + + + +### Video Diffusion Transformer + +Imagen Video developed by Google Research, utilizes a cascade of diffusion models, which consists of 7 sub-models that perform text-conditional video generation, spatial super-resolution, and temporal super-resolution, to transform textual prompts into high-definition videos as shown in figure below. + + + +Some points that worth noting: + +- Imagen architecture utilizes 3D U-Net architecture with temporal attention mechanisms and convolution layers to maintain the consistency and flow between frames. +- U-Net is not necessary for performance of traditional diffusion architecture. +- Adopting transformer instead of U-net is more flexible since it can allow for more training data and more model parameters. + +## Language Instruction Following + +Another question is: How does Sora follow user instructions? + +- DALLE-3 uses Contrastive Captioners (CoCa) to train an image captioner with CLIP jointly with a language model objective. +- Mismatch between user prompts and image descriptions pose a problem. + - LLMs are used rewrite descriptions into long descriptions. +- Similar to DALLE-3, Sora uses video captioners to trained to create detailed descriptions for videos. + - Little description + - Likely uses VideoCoCa, which is build on top of CoCa. + +## Prompt Engineering + +### Text Prompt + +Prompt engineering can leverage model’s natural language understanding ability to decode complex instructions and render them into cohesive, lively, and high-quality video narratives. Figure below is an example. + + + +### Image Prompt + +An image prompt serves as a **visual anchor** for the to-be-generated video’s content. The use of image prompts allows Sora to convert **static images** into **dynamic, narrative-driven videos** by leveraging both visual and textual information. Figure below is an example + + + +### Video Prompt + +Work like Moonshot and Fast-Vid2Vid demonstrate that a good video prompt requires being specific and flexible so that the model gets a clear direction and objectives. + + + +- ## Trustworthiness + +- Safety Concern + - Large multi-modal models are vulnerable to adversarial attacks due to their high dimensional nature and ability to take visual input. +- Hallucination is a problem. +- Fairness and Bias + - How to mitigate bias in Sora from training data and make the model operate fairly? +- Privacy preservation + - Can Sora protect user data? +- Alignment + - It is important to ensure human intentions and model behavior are aligned. + - RLHF used in LLMs, what will be done for Sora? +- Recommendations for Future works: + - Integrated Protection of Model and External Security. + - Security Challenges of Multimodal Models. + - The Need for Interdisciplinary Collaboration. + +## Limitations + +- Lacks in physical realism, especially complex scenarios. +- Spatial and temporal misunderstandings. +- Limits in Human-computer interaction. +- Usage limitation. + +# A Comprehensive Study of Knowledge Editing for Large Language Models + +Large Language Models (LLMs) are the maestros of modern text generation, strikingly mimicking the nuances of human communication. Yet, their brilliance comes with a challenge – the heavyweight computational cost of their expansive learning capacity. As our world shifts, so must our models; their knowledge is a race against time, continuously needing updates to stay accurate and relevant. Enter the realm of knowledge editing – a promising avenue where the agility of model modifications is not just a desire but a necessity for applications demanding precision post-training. This paper journeys through the emerging landscape of knowledge editing techniques, offers a fresh benchmark for evaluating their efficacy, and invites us to peer deeper into the cognitive framework of LLMs, setting the stage for innovations with the groundbreaking EasyEdit framework. We stand on the cusp of an era where the adaptability of AI could redefine its role across industries. + + + +### Knowledge Editing + +Efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. For an original model 𝛳, knowledge *k* and knowledge editing function *F,* the post-edited model is defined as, + + +

+ +

+ +1. **Knowledge Insertion** + +

+

+ +2. **Knowledge Modification** + +

+

+ +3. **Knowledge Erasure** + +

+

+ + + +### Benchmark Data: KnowEdit + +6 datasets on knowledge editing are curated. These encompass a range of editing types, i.e., fact manipulation, sentiment manipulation and hallucination generation. + + + +### Knowledge Editing Evaluation + +- **Edit Success** + +Also termed as Reliability. It is the average accuracy of the edit cases + +- **Portability** + +Whether the edited model can address the effect of an edit + +- **Locality** + +The edited model should not modify the irrelevant examples in out-of-scopes + +- **Generative Capacity** + +Generalization ability of the model after editing. Also, termed ‘fluency’. + + +### Error and Case Analysis + + + +### Limitations of Knowledge Editing + +- The underlying mechanism of Transformers is opaque. Therefore, it is unclear whether or not the existing knowledge editing methods are truly successful. +- Defining the boundaries of the influence of knowledge editing is challenging. It was compared with neurosurgery, where the assessment of the impact of any modifications is complex. +- Keeping pace with the dynamic and fluid nature of knowledge. + + + + +# A Survey of Table Reasoning with Large Language Models + +### Introduction to Table Reasoning​ + +Table reasoning aims to generate accurate answers from tables based on users requirements​. And table reasoning task improves the efficiency of obtaining and processing data from massive amounts of tables​. + + + +### The Rise of LLMs and their Advantages​ + +​Traditional methods relied on rule-based systems or neural networks. With LLMs' vast knowledge and language understanding capabilities, LLMs excel at table reasoning​. + +There are some key advantages of LLMs in Table Reasoning:​ + +- Instruction following ability benefits structure understanding​ + +- Step-by-step reasoning capability benefits schema linking​ + +- Reduced annotation requirements​ + +### Techniques for Improving Performance in LLM era​ + +The authors proposed some techniques for improving performance in LLM era​: + +- Supervised Fine-Tuning​ + +- Result Ensemble​ + +- In-Context Learning​ + +- Instruction Design​ + +- Step-by-Step Reasoning + +### For Supervised Fine-tuning: + +- Fine-tuning LLMs on annotated data to enhance reasoning capabilities​ + + - Using pre-existing datasets or manually labeled data​ + + - Leveraging distilled data generated by other LLMs​ + +- ​In the LLM era, instruction-based and multi-task data fine-tune models for better generalization​ + +

+ +### For Result Ensemble: + +- Obtaining diverse results by varying prompts, models, or random seeds​ + +- Selecting the most suitable answer through scoring or verification​ + +- Compared to pre-LLM methods, LLMs can generate diverse results more effectively, often by simply changing instructions, unlike pre-LLM methods requiring aligned fine-tuning and inference instructions. + +

+ +### For In-context Learning: + +- Leveraging LLMs' ability to generate expected answers using suitable prompts​ + +- In-context learning capability of LLMs allows flexible adjustment of prompts suitable for different questions without further fine-tuning​ + +- Reduces labeling overhead while enhancing performance + +

+ +### One Example of In-context Learning:ODIS + +- ODIS​ + + - Ontology-Guided Domain-Informed Sequencing​ + + - using in-domain demonstrations to enhance model performance by synthesizing in-domain SQL based on SQL similarity + +

+ +The aboving figure shows an example prompt of 2-shot in-domain text-to-SQL​ + +Two in-domain demonstrations are present prior to the test question + +### For Instruction Design: + +- Utilizing LLMs' instruction following ability​ + +- Instruction design involves instructing LLMs to complete decomposed sub-tasks for table reasoning.​ + + - Modular decomposition: Breaking tasks into sub-tasks (DATER) + +

+ +### One Example of Instruction Design: DATER + +(Decompose evidence And questions for effective Table-basEd Reasoning)​ + + + +### For Step-by-step Reasoning: + +- Solving complex tasks by incorporating intermediate reasoning stages​ + + - Techniques like Chain-of-Table​ + + - Decomposing questions into simpler sub-questions or predefined operations​ + + - Differs from modular decomposition which breaks tasks into widely different sub-tasks. + +

+ +

+ +### One Example of Step-by-step Reasoning: Chain-of-Table + + + +### Future Research Directions + +- We can focus on improving table reasoning performance​: + + - Supervised Fine-Tuning: Establishing Diverse Training Data​ + + - Result Ensemble: Sampling Results More Efficiently​ + + - In-Context Learning: Optimizing Prompts Automatically​ + + - Instruction Design: Automatically Refining Design with Verification​ + + - Step-by-Step Reasoning: Mitigating Error Cascade in Multi-Step Reasoning + +- We can focus on expanding practical applications​: + + - Multi-Modal: Enhancing Alignment between Image Tables and Questions​ + - Agent: Cooperating with More Diverse and Suitable Table Agents​ + - Dialogue: Backtracking Sub-tables in Multi-turn Interaction​ + - Retrieval-Augmented Generation: Injecting Knowledge Related to Entities diff --git a/_contents/S0-L20.md b/_contents/S0-L20.md index 08d260fc..dff757e9 100755 --- a/_contents/S0-L20.md +++ b/_contents/S0-L20.md @@ -33,9 +33,264 @@ In this session, our readings cover: + The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and others parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques. +# Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review + +### Introduction +Models that are built on Large Language Model (LLM) as the backbone are capable of extracting meaningful information that can assist medical diagnosis or creating engaging contents. These models are also referred to as Artificial Intelligence-Generated Content (AIGC). Once the AIGC model is trained, by changing the way we compose the prompts as input to the model, the quality of the models' output can change. In this paper, we focus on techniques of engineering the prompts to achieve higher quality model output from the same AIGC model. + +### Basic of Prompt Engineering + +One basic technique to improve the model output is to **be clear and precise** in writing the prompt, see an example from the below figure. When the prompt is vague, since there are numerous ways a model could respond, the model often ends up with a broad response that are less useful. Being more specific in the prompt can guide it towards the response that we are looking for. + +**Role-playing** is another basic technique that is effective in improving the model output. Prompting the model to role-play as an historian may improve the model's output when the question is related to a historical event. Prompting the model to role-play as an expert in AI may have a similar positive effect when the question is about LLM. + + + +**Few Shot prompting** is also a common prompt engineering technique, where the model is given a few examples with answers in addition to the original question. This relies on the few shot learning ability that is an emergent in large language models, which is can be understood as a form of meta learning. + +Authors of the paper also note that **adjusting the temperature and top-p** is essential for the prompt engineering. For code generation where standard pattern is valued, a smaller temperature and top-p is preferred, whereas in creative writing, a larger temperature and top-p may help the model produce original responses. + + + +### Advanced Prompt Engineering + +Chain of Thought prompting induce the model to respond with step by step reasoning, which not only improves the quality of the output, but also shows correct intermediate steps for high stake applications such as medical reasoning. **Zero-shot chain of thought** is a simple yet effective technique, where we only need to include the phrase "Let's think step by step" to the input. **Golden chain of thought** is a technique that utilizes few-shot prompting for chain of thought prompting, by providing ground truth chain of thoughts solutions as examples to the input of the model. Golden chain of thoughts can boost the solve rate from 38% to 83% in the case of GPT-4, but the method is limited by the requirement of ground truth chain of thoughts examples. + +**Self-Consistency** is an extension to chain of thought prompting. After chain of thought prompting, by sampling from the language model decoder and choosing the most self-consistent response, Self-Consistency achieves better performance in rigorous reasoning tasks such as doing proofs. + + + + + +**Knowledge Generation** break down the content generation into two step generations: in the first step generation, the model is only prompted to output pertinent information (knowledge) of the original query, then the knowledge is included as prompt in the second step generation. + +**Least-to-most prompting** also take a multi-step generation approach similar to knowledge generation. A given problem is decomposed into numerous sub-problems, and the model will output responses for each sub-problem. These responses will be included in the prompt to help the model answer the original problem. + + + +**Tree of Thoughts reasoning** construct the steps of reasoning in a tree structure. This is particularly helpful when we need to break down a problem into steps, and further break down of each steps into more steps. **Graph of Thoughts** is a generalization of tree of thought structure, where each each contains the relation between each node. Graph of thoughts may be helpful for problems requiring intricate multifaceted resolutions. + + + + + +**Chain of Verification** corrects a response that may contain false information, by prompting the LLM to ask verification questions for the response. LLM may correct the false information by answering the verification questions. These answers will help LLM to generate a more accurate response for the original query. + +In addition to the specific techniques mentioned above, there also exists **Plug-ins** of ChatGPT such as Prompt Enhancer that automatically enhance the prompt for the user. + + + +### Accessing the Efficacy of Prompt Methods + +Benchmarking the prompt methods requires evaluating the quality of response from LLM, which can be performed by human or by other metrics. + +**Subjective evaluations** requires human evaluators, which has the following pros and cons +Pros: Fluency, Accuracy, Novelty, and Relevance +Cons: Inconsistency Problem, Expensive, Time Consuming + +**Objective evaluations** relies on metrics to evaluate the response. Some examples includes + - BLEU: BiLingual Evaluation Understudy + - ROUGE: Recall-Oriented Understudy for Gisting Evaluation + - METEOR: Metric for Evaluation of Translation with Explicit ORdering + - BERTScore: BERT Model used for metric + +Objective evaluations has the following pros and cons +Pros: Automatic Evaluation, Cheap, Quick +Cons: Alignment Problem + + +Evaluation results from InstructEval shows that in few shot settings, once the examples are specified, providing additional prompt harms the performance, while in zero shot settings, the expert written prompt improves performance. + +### Application of Prompt Engineering + +Prompt engineering can help **Assessment in teaching and learning**, where tailored prompts can set the pace for the student. Zero-shot prompting can generate elements such as settings, characters and outlines, allowing for **content creation and editing**. In the domain of **computer programming**, self-debugging prompting outperforms other text-to-SQL models and minimizes the number of attempts. Prompted engineering also significantly reduces error rate when applied to **reasoning tasks**. Finally, prompt engineering can also support **dataset generation**, where LLm can be prompted to generate smaller datasets for training domain specific models. + + + + + + + + + + ### Long context prompting for Claude 2.1 + https://www.anthropic.com/news/claude-2-1-prompting + +# Skeleton Of Thought: Prompting LLMs For Efficient Parallel Generation +## Motivation +LLMs have powerful performance, but the inference speed is low due to : + +- Large model size +- Expensive attention operation +- The sequential decoding approach + +Existing work either compress/redesign the model, serving system, hardware. + +This work instead focus on **the 3rd axis** and propose **Skeleton Of Thought for efficient parallel decoding** without **any changes to LLM models, systems and hardwares.** + +## High-level Overview + +The idea comes from how humans answer questions. Steps of human thoughts can be summarized as below: +1. Derive out the skeleton according to protocals and strategies. +2. Add evidence and details to explain each point. +If we visualize these steps, it looks like: +

+ +Based on this, this paper proposed **Skeleton-of-Thought** as shown in Figure below which includes 3 steps: +1. Prompt the LLM to give out the skeleton. +2. Conduct batched decoding or parallel API calls to expand multiple points in parallel. +3. Aggregate the outputs to get final answer. +

+ +Compared with 12 recently released LLMs, SoT can not only provide considerable speed-ups but also improve the answer quality as shown in figure below. + +The y-axis `net win rate` is the difference between the fraction of questions that SoT-R has better and worse answers than normal generation. + +The x-axis `speed-up` is the ratio between the latency of normal and SoT-R generation. + +

+ +## Method +The method of SoT has two stages: `skeleton stage` and `point-expanding stage`. + +### Skeleton Stage +In skeleton stage, SoT uses a skeleton prompt to guide the LLM to output a concise skeleton of the answer so that we can extract some points from the skeleton response. A prompt example is shown in Figure below. +

+ +### Point-expanding Stage +Based on the skeleton, SoT uses point-expanding prompt to let LLM expand on each point in parallel. A prompt example is shown in Figure below. After completing all points, SoT concatenate all the point-expanding responses to get the final answer. +

+ + +### Parallelization +The authors use parallel point expanding to achieve speed-up than normal decoding. In specific: +- For proprietary models with only API access, parallelization is achieved by issuing multiple API calls. +- For open-source models that we can run locally, parallelization is achieved by letting LLMs process point-expanding requests as a batch. + +## Evaluation – Overall Quality + +For the evaluation, we can assess it from various perspectives. + +- **Evaluation Process​:** + + - Present a question and a pair of answers to an LLM judge. + +- **LLM-based evaluation frameworks:** + + - FastChat: general metric. + + - LLMZoo: general metric plus 5 detailed metrics - coherence, diversity, immersion, integrity, and relevance. + +- **Extensions to avoid evaluation bias:** + + - Running the evaluation twice with either ordering of the two answers + + - For each run, a score is assigned: 1 – win; 0 – tie; -1 – lose + + - Sum the two scores to get the final score + +- **Net win rates:** + + - (#win - #lose)/total number of questions + +## Evaluation – Evaluation of Answer Quality + +- **Regarding Overall Quality, based on the figure provided, we can conclude:** + + - There is a discrepancy between the two metrics on win rates. + + - SoT is not worse than the baseline in around 60% of the cases. + + - The lose rates are also pretty high. + +

+ +- **Regarding the quality of each model, the conclusions drawn from the figure indicate:** + + - The red rectangular frame in the figure highlights: Both metrics agree that OpenChat-13B, Vicuna-7B V1.1, Claude, LLaMA2-Chat-13B have **negative net win rates.** + + - The green rectangular frame in the figure highlights: Vicuna-13B V1.3, StableVicuna-13B, and UltraLM-13B have **positive net win rates.** + +

+ +- **Based on the figure, the reasons for bad net win rates can be identified as follows:** + +The question and answer provided by OpenChat-13B in the figure demonstrate that models construct the complete answer during the skeleton stage. And the figure showing the question and answer from Vicuna-7B V1.1 illustrates that models omit details during the point-expanding stage. + +In summary, some strong models have very high-quality answers that are hard to beat. + +

+ +- **Regarding the quality of each question category, our conclusions from the figure are:** + + - The green rectangular frame in the figure highlights: SoT performs relatively well on generic, common-sense, knowledge, and counterfactual questions. + + - The red rectangular frame in the figure highlights: Relatively poorly on writing, fermi, math, and coding. + +

+ +- **Concerning the Quality of Detailed Metrics, the information from the figure reveals:** + + - SoT improves the diversity and relevance while hurting the immersion and coherence. + +

+ +### SoT-R – Definition and Framework + +- **Prompting Router:** + + - Ask the LLM if the desired answer is in a list of independent points. + +

+ +- **Trained Router:** + + - **Annotate** the LIMA training set: a label of 1 or 0. + + - **Fine-tune** a RoBERTa model using the labeled data. + + - Ask the RoBERTa to **classify** if the SoT is suitable for the desired answer. + +## ​SoT-R – Evaluation + +Based on the provided figures, we can understand: + +- SoT-R obtains **lower speed-ups** than SoT. + +- SoT-R significantly **improves the answer quality** on questions where SoT is not suitable. + +- The two types of SoT-R perform similarly to a human router. + +

+ +

+ +## Conclusion  + +Having thoroughly reviewed the paper, we've gained significant insights into the Skeleton of Thought concept. From this, we can derive several conclusions, each from a unique perspective: + +- **Efficient LLM methods at model and system levels:** + + - SoT is a **data-level** technique. + +- **Prompting methods for LLMs:** + + - SoT is the first attempt at exploiting the **power of prompting to improve efficiency.** + +- **Answer quality evaluation:** + + - The answer quality evaluation is far from perfect due to the limited prompt set, the potential bias of GPT-4 judges, and the inherent difficulty of evaluating LLM generations. + +- **Efficiency and overhead of SoT in different scenarios:** + + - **higher costs** due to the increased number of API calls and tokens. + - **computation overhead** + +- **Eliciting or improving LLMs’ ability:** + - Graph-of-Thoughts + diff --git a/_contents/S0-L22.md b/_contents/S0-L22.md index 939173c3..085d5f8c 100755 --- a/_contents/S0-L22.md +++ b/_contents/S0-L22.md @@ -2,7 +2,7 @@ layout: post title: LLM Agents lecture: -lectureVersion: current +lectureVersion: next extraContent: notes: team-2 video: team-2