Skip to content

Latest commit

 

History

History
409 lines (269 loc) · 20.5 KB

README.md

File metadata and controls

409 lines (269 loc) · 20.5 KB

The AI Papers (Breakthroughs Leading to GPT-4 and Beyond)

This curriculum traces the development of artificial intelligence and large language models (LLMs) from the early days of AI research to the emergence of GPT-4 and beyond. It highlights pivotal papers and breakthroughs that have shaped the field, focusing on developments relevant to LLMs.

I am also tracing my progress through this curriculum, to keep myself accountable. My current progress is tagged with: (I am currently here).

1. Early AI and Symbolic Systems (1940s-1980s)

  1. 1943 – McCulloch and Pitts Model

  2. 1950 – Turing Test

  3. 1956 – Dartmouth Conference (Birth of AI)

  4. 1958 – Perceptron by Frank Rosenblatt

  5. 1950s – Hebbian Learning (Influence on Neural Networks)

    • The Organization of Behavior
    • Author: Donald Hebb
    • Proposed Hebbian learning, introducing the principle "cells that fire together wire together."
  6. 1969 – Minsky and Papert Critique of Perceptrons

    • Perceptrons
    • Authors: Marvin Minsky and Seymour Papert
    • Highlighted the limitations of the perceptron, leading to a decline in interest in neural networks.
    • (I am currently here).
  7. 1974 – Backpropagation Algorithm (Paul Werbos)

  8. 1980 – Neocognitron (Precursor to CNNs)

  9. 1986 – Backpropagation Popularized

  10. 1989 – Hidden Markov Models (HMMs)

2. Shift to Statistical NLP and Early Machine Learning (1990s-2000s)

  1. 1990s – Emergence of Statistical NLP

    • The shift from rule-based systems to statistical approaches in NLP, utilizing n-gram models and probabilistic methods for tasks like part-of-speech tagging and machine translation.
  2. 1993 – IBM Model 1 for Statistical Machine Translation

  3. 1993 – Class-Based n-gram Models

  4. 1997 – Long Short-Term Memory (LSTM)

    • Long Short-Term Memory
    • Authors: Sepp Hochreiter and Jürgen Schmidhuber
    • Introduced the LSTM architecture, addressing the vanishing gradient problem in RNNs.
  5. 1998 – LeNet and Convolutional Neural Networks (CNNs)

  6. 2003 – Neural Probabilistic Language Model

3. Deep Learning Breakthroughs and Seq2Seq Models (2010s)

  1. 2012 – AlexNet and the Deep Learning Boom

  2. 2013 – Word2Vec (Efficient Word Representations)

  3. 2014 – Sequence to Sequence (Seq2Seq) Models

  4. 2014 – Gated Recurrent Units (GRUs)

  5. 2014 – Adam Optimizer

  6. 2015 – Attention Mechanism in Neural Networks

  7. 2017 – ELMo (Embeddings from Language Models)

  8. 2018 – ULMFiT (Universal Language Model Fine-tuning)

4. Transformer Revolution and Modern NLP (2017-Present)

  1. 2017 – Transformer Model (Self-Attention)

    • Attention is All You Need
    • Authors: Ashish Vaswani et al.
    • Introduced the Transformer model, replacing recurrence with self-attention.
  2. 2018 – GPT (Generative Pretrained Transformer)

  3. 2018 – BERT (Bidirectional Transformers)

  4. 2019 – Transformer-XL (Handling Longer Contexts)

  5. 2019 – XLNet (Permutation-Based Pre-training)

  6. 2019 – RoBERTa (Robustly Optimized BERT)

  7. 2019 – T5 (Text-to-Text Transfer Transformer)

  8. 2019 – GPT-2 (OpenAI’s Transformer-based Model)

5. Scaling Laws, Emergent Abilities, and GPT-4 (2020-Present)

  1. 2020 – GPT-3 (Few-Shot Learning at Scale)

  2. 2020 – Electra (Efficient Pre-training)

  3. 2020 – Reformer (Efficient Transformers)

  4. 2021 – Scaling Laws for Neural Language Models

  5. 2021 – Switch Transformer (Sparse Mixture-of-Experts)

  6. 2021 – Megatron-Turing NLG 530B

  7. 2021 – Codex and Code Generation

  8. 2022 – Chain-of-Thought Prompting

  9. 2022 – Chinchilla Scaling Laws

  10. 2022 – PaLM (Pathways Language Model)

  11. 2022 – GLAM (Mixture-of-Experts)

  12. 2022 – BLOOM (Open-Access Multilingual Model)

  13. 2022 – Emergent Abilities of Large Language Models

  14. 2022 – Instruction Tuning and RLHF (Human Feedback)

  15. 2023 – GPT-4 (Multimodal Capabilities)

    • GPT-4 Technical Report
    • Authors: OpenAI
    • Described GPT-4, a large-scale, multimodal model capable of processing both text and images.
  16. 2023 – Sparks of AGI in GPT-4 (Microsoft Research)

  17. 2023 – Toolformer: Language Models Using Tools

  18. 2023 – ChatGPT and Instruction Following

    • Organization: OpenAI
    • Demonstrated the effectiveness of fine-tuning LLMs with RLHF to follow instructions and engage in natural dialogues.
  19. 2023 – Self-Consistency in Chain-of-Thought

6. Ethics, Alignment, and Safety in AI

  1. 2016 – Concrete Problems in AI Safety

    • Concrete Problems in AI Safety
    • Authors: Dario Amodei et al.
    • Outlined key challenges in ensuring AI systems operate safely and align with human values.
  2. 2018 – Gender Shades (Bias in AI Systems)

  3. 2020 – Ethical and Social Implications of AI

  4. 2022 – AI Alignment and Interpretability

    • Ongoing research into understanding and interpreting the decision-making processes of LLMs, aiming to align AI outputs with human values.

7. Emerging and Future Directions (2023 and Beyond)

  1. 2024 - Frugal Transformer: Efficient Training at Scale

  2. 2024 - AI on the Edge

  3. 2024 - Federated GPT

  4. 2023 – Generative Agents and Interactive AI Systems

  5. 2023 – Memory-Augmented Models

  6. 2023 – OpenAI Function Calling and Plugins

  7. 2023 – Sparse Expert Models

    • Research into sparse models like Mixture-of-Experts that scale efficiently by activating relevant parts of the network.
  8. 2023 – Scaling Instruction Tuning

  9. 2023 – Advances in Multimodal Learning

    • Integration of text, image, audio, and video data in unified models, expanding LLM capabilities.

Additional Emerging Areas:

  • Multimodal Models and Unified AI Systems: Development of models like OpenAI's DALL·E and CLIP, integrating multiple modalities.
  • Tool-Using AI and Autonomous Interaction: Enabling models to interact with external tools autonomously, enhancing practical capabilities.
  • Memory-Augmented Models and Retrieval-Augmented Generation (RAG): Combining LLMs with dynamic access to knowledge bases, allowing real-time information retrieval.
  • Self-Supervised Learning and Unsupervised Learning Improvements: Making self-supervised learning more efficient from unstructured data sources.
  • Continuous and Lifelong Learning: AI systems that continuously learn from new data without retraining from scratch, preventing catastrophic forgetting.
  • AI Safety, Alignment, and Ethics: Ensuring AI aligns with human values, with research into RLHF and reducing harmful behaviors.
  • Federated Learning and Decentralized AI: Training AI models across distributed datasets without centralizing data, preserving privacy.
  • Sparsity and Efficient AI Models: Techniques like Sparse Transformers and MoE for computational efficiency, enabling scaling to trillions of parameters.