Skip to content

Latest commit

 

History

History
 
 

10_multimodal

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Chapter 10: Multimodal Foundation Models

Questions and Answers

Q: What are the typical use cases for multimodal foundation models?

A: Text summarization, rewriting, information extraction, question answering (QA) and visual question answering (VQA), detecting toxic or harmful content, classification and content moderation, conversational interface, translation, source code generation, reasoning, mask personally identifiable information (PII), personalized marketing and ads.

Q: How does image generation differ from image editing and enhancement?

A: Image generation involves creating images from text prompts, while image editing and enhancement modify existing images based on instructions and prompts, supporting use cases like artistic style transfer, domain adaptation, and upscaling.

Q: What are best practices for multimodal prompt engineering for image-based generative AI?

A: Understand the nuances of the foundation model, define the type of image, describe the subject, specify style and artists, be specific about quality, and be expressive in prompt writing.

Q: Can you explain inpainting, outpainting, and depth-to-image techniques?

A: Inpainting, Outpainting, and Depth-to-Image are specific tasks within generative AI but the document does not provide detailed explanations of these techniques."

Q: How does image captioning contribute to visual question answering?

A: Image captioning, by combining computer vision and natural language processing, enhances tasks like VQA by understanding both visual information in images and textual content of questions to provide accurate and relevant answers.

Chapters

  • Chapter 1 - Generative AI Use Cases, Fundamentals, Project Lifecycle
  • Chapter 2 - Prompt Engineering and In-Context Learning
  • Chapter 3 - Large-Language Foundation Models
  • Chapter 4 - Quantization and Distributed Computing
  • Chapter 5 - Fine-Tuning and Evaluation
  • Chapter 6 - Parameter-efficient Fine Tuning (PEFT)
  • Chapter 7 - Fine-tuning using Reinforcement Learning with RLHF
  • Chapter 8 - Optimize and Deploy Generative AI Applications
  • Chapter 9 - Retrieval Augmented Generation (RAG) and Agents
  • Chapter 10 - Multimodal Foundation Models
  • Chapter 11 - Controlled Generation and Fine-Tuning with Stable Diffusion
  • Chapter 12 - Amazon Bedrock Managed Service for Generative AI

Related Resources