Re-cloned to deal with API key issue

worldbank · Nov 6, 2024 · bee1801 · bee1801
1 parent bc55587
commit bee1801
Show file tree

Hide file tree

Showing 28 changed files with 7,068 additions and 1,744 deletions.
diff --git a/README.md b/README.md
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -2,23 +2,38 @@ format: jb-book
 root: README
 
 parts:
-  - caption: Examples
-    numbered: True
+  - caption: Course Requirements
     chapters:
-      - file: notebooks/world-bank-api.ipynb
-      - file: notebooks/world-bank-package.ipynb
-      - file: notebooks/nasa-apod.ipynb
-      - file: notebooks/bibliography.ipynb
-  - caption: Gallery
+      - file: docs/course-requirements/learning-python
+      - file: docs/course-requirements/python-environment
+      - file: docs/course-requirements/data-science
+      - file: docs/course-requirements/platforms
+  - caption: Tunisia, May 2024
     chapters:
-      - file: docs/gallery
-  - caption: Additional Resources
+      - file: docs/tunisia-may-24/README
+      - file: docs/tunisia-may-24/module-1
+      - file: docs/tunisia-may-24/module-2
+      - file: docs/tunisia-may-24/module-3
+      - file: docs/tunisia-may-24/module-4
+      - file: docs/tunisia-may-24/project-ideas
+      - file: notebooks/tunisia-may-24/README
+        sections:
+          - file: notebooks/tunisia-may-24/1-text2sqL-demo.ipynb
+          - file: notebooks/tunisia-may-24/2-document-classification-with-sklearn.ipynb
+          - file: notebooks/tunisia-may-24/3-intro-langchain.ipynb
+  - caption: Malawi, Upcoming, November 2024
     chapters:
-      - url: https://datapartnership.org
-        title: Development Data Partnership
-      - url: https://wbdatalab.org
-        title: World Bank Data Lab
-      - url: https://www.worldbank.org/en/about/unit/unit-dec
-        title: World Bank DEC
-      - url: https://www.worldbank.org/en/research/dime
-        title: World Bank DIME
+      - file: docs/tunisia-may-24/README
+      - file: docs/tunisia-may-24/module-1
+      - file: docs/tunisia-may-24/module-2
+      - file: docs/tunisia-may-24/module-3
+      - file: docs/tunisia-may-24/module-4
+      - file: docs/tunisia-may-24/project-ideas
+      - file: notebooks/tunisia-may-24/README
+        sections:
+          - file: notebooks/malawi-nov-24/1-text2sqL-demo.ipynb
+          - file: notebooks/malawi-nov-24/2-document-classification-with-sklearn.ipynb
+          - file: notebooks/malawi-nov-24/3-intro-langchain.ipynb
+    - caption: Acknowledgements
+    chapters:
+      - file: docs/team
diff --git a/docs/course-requirements/data-science.md b/docs/course-requirements/data-science.md
@@ -0,0 +1,24 @@
+
+# Data Science Prerequisites
+
+In this section, we outline the foundational skills and knowledge in data science, including key areas such as machine learning and natural language processing (NLP), required not only to complete exercises in this course but also to grasp and understand the core concepts of LLMs that will be taught. These prerequisites will provide the essential background needed to effectively work with LangChain and build LLM-based applications.
+
+## Prerequisite Skills in Data Science
+A strong foundation in data science, machine learning, and NLP is crucial for building advanced LLM-based applications. These skills will enable efficient data handling, model building, and language processing, which are fundamental for working with LLMs in real-world scenarios. Below is a list of recommended skills to help you maximize your learning in this course.
+
+- **Data Science Basics**: Familiarity with data manipulation and analysis, especially using libraries like `pandas` and `numpy`.
+- **Machine Learning Fundamentals**: Knowledge of core ML algorithms (e.g., linear regression, decision trees, k-nearest neighbors) and concepts such as overfitting, training/testing splits, and evaluation metrics.
+- **Deep Learning Basics**: Basic understanding of neural networks, including feedforward networks and concepts like activation functions, training, and backpropagation.
+- **Natural Language Processing (NLP) Basics**: Familiarity with NLP concepts such as tokenization, word embeddings, and basic text processing techniques.
+- **Working with ML Frameworks**: Experience with libraries like `scikit-learn` for traditional ML models and `TensorFlow` or `PyTorch` for deep learning.
+
+## Recommended Free Resources
+To help you build the required skills in data science, machine learning, and NLP, we’ve compiled a list of free resources. These cover essential topics and tools needed to work with LangChain and LLM-based applications effectively. Whether you’re new to these fields or looking to deepen your understanding, these resources will be valuable in building your foundational knowledge.
+
+
+| Focus                    | Provider               | Duration   | Course URL                                                                                           |
+|--------------------------|------------------------|------------|------------------------------------------------------------------------------------------------------|
+| Machine Learning Basics | Google Developers | 8 hours    | [ML Intro with scikit-learn](https://developers.google.com/machine-learning/crash-course) |
+| NLP with Transformers | Hugging Face       | 4 hours    | [Hugging Face Transformers](https://huggingface.co/learn/nlp-course/chapter1)              |
+| NLP Basics               | fast.ai                | 3 hours    | [NLP with fast.ai](https://course.fast.ai/)                                                          |
+| Machine Learning Basics  | Coursera (Andrew Ng)   | 60 hours   | [Coursera ML course](https://www.coursera.org/learn/machine-learning)                     |
diff --git a/docs/course-requirements/learning-python.md b/docs/course-requirements/learning-python.md
@@ -0,0 +1,26 @@
+# Python Environment Configuration
+In this section, we provide the minimal Python packages required to complete the programming exercises in this course. We are saying minimal because for some of the project work, you may need extra packages
+
+## Prerequisite Python Skills
+A solid foundation in core Python skills is essential for building LLM-based applications with LangChain. These prerequisites enable efficient coding, debugging, and API interaction, which are critical for working effectively with language models. Below is a list of recommended skills to help you maximize your learning in this course.
+
+- **Basic Python Programming**: Understanding variables, data types, and control structures (loops and conditionals).
+- **Functions and Modules**: Ability to create and use functions, import modules, and manage dependencies.
+- **Object-Oriented Programming (OOP)**: Familiarity with classes, objects, inheritance, and basic OOP principles.
+- **Working with APIs**: Understanding how to make HTTP requests and handle API responses, ideally with libraries like `requests`.
+- **File I/O**: Reading from and writing to files, especially working with text files and JSON data.
+- **Environment Management**: Experience with virtual environments (`venv`, `conda`) and package management with `pip`.
+- **Error Handling**: Understanding of exceptions and error handling in Python.
+- **Jupyter Notebooks**: Experience working with Jupyter Notebooks, especially for experimenting with and testing code interactively.
+
+These prerequisites will provide a solid foundation for building applications with LangChain and LLMs.
+
+
+## Recommended Free Resources
+To support you in building the necessary Python skills for this course, we’ve compiled a list of free resources to help you learn or review key concepts. These resources cover everything from basic programming to more advanced topics, ensuring you have a solid foundation for working with LangChain and LLM-based applications. Whether you're new to Python or just need a refresher, these materials will provide valuable guidance.
+| Focus        | Provider     | Duration   | Course URL                        |
+|--------------|--------------|------------|-----------------------------------|
+| Basic Python | Codecademy   | 25 hours   | [Codecademy Python](https://www.codecademy.com/learn/learn-python-3) |
+| Basic Python | DataCamp     | 4 hours    | [Python for Data Science](https://www.datacamp.com/courses/intro-to-python-for-data-science) |
+| Basic Python | Google       | 2 days     | [Google Python Course](https://developers.google.com/edu/python) |
+| Basic Python | Udemy        | 4 hours    | [Udemy Python Course](https://www.udemy.com/course/python-for-beginners/) |
diff --git a/docs/course-requirements/platforms.md b/docs/course-requirements/platforms.md
@@ -0,0 +1,90 @@
+# Required Platforms and Access Setup
+
+To complete the course exercises and build applications effectively, you will need access to specific platforms. This document outlines the necessary accounts and API keys or tokens required for each platform, organized into three sections: **LLMs**, **Cloud Compute Platforms**, and **Other** (for additional services like Twilio and GitHub).
+
+## 1. LLM Platforms
+
+In this section, we cover the required access for platforms that provide large language models (LLMs) and related resources.
+
+### OpenAI Developer API Key
+
+To access OpenAI’s models programmatically, you need an OpenAI API key. Follow these steps:
+
+1. **Create an OpenAI Account**  
+   Go to [OpenAI’s website](https://platform.openai.com/signup) to sign up.
+
+2. **Generate an API Key**  
+   - Log in and navigate to [API Keys](https://platform.openai.com/account/api-keys).
+   - Click on **Create new secret key** to generate a new API key.
+   - Copy and store the key securely, as it will be needed to authenticate with OpenAI’s API.
+
+3. **Usage and Billing**  
+   OpenAI offers a free trial, but be mindful of usage limits and potential charges.
+
+### Hugging Face Token
+
+To access Hugging Face’s models and datasets programmatically, you’ll need a Hugging Face access token.
+
+1. **Create a Hugging Face Account**  
+   Sign up at [Hugging Face’s website](https://huggingface.co/join).
+
+2. **Generate an Access Token**  
+   - Log in, go to **Settings**, and select **Access Tokens**.
+   - Click **New token**, set a name (e.g., “Course Token”), choose “Read” for access level, and generate the token.
+   - Copy and save the token for use with Hugging Face’s resources.
+
+## 2. Cloud Compute Platforms
+
+This section details required access for cloud-based compute resources.
+
+### AWS (Amazon Web Services)
+
+AWS will provide cloud resources for deploying and running applications at scale.
+
+1. **Create an AWS Account**  
+   Go to [AWS’s website](https://aws.amazon.com/) to create an account.
+
+2. **Generate Access Keys**  
+   - Log in to the AWS Management Console.
+   - Navigate to **IAM (Identity and Access Management)** > **Users** and select your user.
+   - Under **Security credentials**, click **Create access key**.
+   - Copy and store your Access Key ID and Secret Access Key securely for connecting to AWS services.
+
+3. **Free Tier Usage**  
+   AWS offers a free tier for new users, which may be sufficient for many course exercises. Monitor usage to avoid unexpected charges.
+
+## 3. Other Platforms
+
+This section includes additional services needed for the course.
+
+### Twilio (for WhatsApp Integration)
+
+Twilio will enable WhatsApp access, allowing you to build and deploy chatbot applications.
+
+1. **Create a Twilio Account**  
+   Sign up at [Twilio’s website](https://www.twilio.com/).
+
+2. **Generate an API Key for WhatsApp**  
+   - After logging in, navigate to **Console** > **API Keys & Tokens**.
+   - Click on **Create new API Key**, give it a name, and copy the SID and Secret.
+   - Follow Twilio’s documentation to set up WhatsApp messaging capabilities, including linking your WhatsApp number.
+
+3. **Free Trial**  
+   Twilio offers a free trial with a small amount of credit, allowing you to experiment with WhatsApp API functionality. Be sure to check usage limits.
+
+### GitHub (for Project Repository Management)
+
+GitHub will be used to manage project files and collaborate on code.
+
+1. **Create a GitHub Account**  
+   Go to [GitHub’s website](https://github.com/) and sign up for an account if you don’t already have one.
+
+2. **Set Up SSH Keys (Optional)**  
+   To simplify authentication, you may want to set up SSH keys.
+   - Follow the instructions in GitHub's documentation for [generating SSH keys](https://docs.github.com/en/authentication/connecting-to-github-with-ssh).
+   - Once set up, add the public key to your GitHub account under **Settings** > **SSH and GPG keys**.
+
+3. **Forking and Cloning Repositories**  
+   During the course, you will be working with GitHub repositories. Familiarize yourself with forking and cloning repositories to easily access course materials and project files.
+
+---
diff --git a/docs/course-requirements/python-environment.md b/docs/course-requirements/python-environment.md
@@ -0,0 +1,70 @@
+# Python Environment Configuration
+In this section, we provide the minimal Python packages required to complete the programming exercises in this course. We are saying minimal because for some of the project work, you may need extra packages
+
+## Python Installation
+We will be using Python 3.12 for this course. Please refer to the installation options below. 
+
+- **Recommended: Installation with Anaconda**. [Download Anaconda](https://www.anaconda.com/download). For more details about Anaconda, refer to this [blog post](https://www.anaconda.com/blog).
+
+- **Alternative: Installation from Python Website**
+[Download Python](https://www.python.org/downloads/)
+
+## Python IDE
+An IDE (Integrated Development Environment) is a software application that provides programmers with tools for software development, such as a source code editor, compiler, build automation, and debugging tools. Popular Python IDEs include Jupyter Notebook, VS Code, and PyCharm.
+
+### Jupyter Notebook and Google Colab
+
+After installing Python, you can proceed to install Jupyter Notebook, the default IDE for data science and scientific computing. Jupyter Notebook allows you to write code and include documentation with Markdown. If you installed Python via the Anaconda distribution, Jupyter Notebook and other commonly used Python packages come pre-installed, saving you additional setup steps.
+
+In addition to the local Jupyter Notebook installation with Anaconda, you can also use a similar environment on hosted servers like Google Colab. Google Colab is an online Jupyter Notebook accessible via the cloud, offering free GPUs for working with LLMs and other AI-based Python programs.
+
+### Full-Featured IDEs
+While Jupyter Notebooks are excellent for interactive data science work, this course focuses on building a chatbot, which requires a fully-featured IDE. Below are some commonly used IDEs:
+
+> 🚀 **VS Code**: Recommended IDE for this course.See [installation instructions](https://code.visualstudio.com).
+
+**Other IDEs**
+- **Notepad++**
+- **PyCharm**
+
+## Python Environment Setup
+### Major Packages 
+For the most part, we’ll install packages as needed. However, here’s a list of core packages we’ll require:
+
+1.Transformers
+
+2.Pytorch
+
+3.HuggingFace
+
+4.Langchain 
+
+The full list of required packages is provided in the ```requirements.txt``` file. 
+
+### Python Environment Setup
+#### Create Virtual Environment
+Create a Python virtual environment to use for this project. The Python version used when this was developed was 3.12.  The code below creates a virtual environment and also installs all the Python packages we need for this tutorial
+```
+python -m venv .venv
+source .venv/bin/activate
+pip install -U pip
+pip install -r requirements.txt
+```
+#### Setup ```.env``` file 
+This file is important for keeping your API keys and other secrets
+```
+# OpenAI
+OPENAI_API_KEY="<Put your token here>"
+# Hugging Face
+HUGGINGFACEHUB_API_TOKEN="<Put your token here>"
+
+# Twilio Credentials
+TWILIO_ACCOUNT_SID="<Put your token here>"
+TWILIO_AUTH_TOKEN="<Put your token here>"
+TWILIO_NUMBER="<Put your token here>"
+
+# PostgreSQL connection details
+DB_USER = "<Put your token here>"
+DB_PASSWORD = "<Put your token here>"
+```
+
diff --git a/docs/malawi-nov-24/README.md b/docs/malawi-nov-24/README.md
@@ -0,0 +1,23 @@
+# LLM Application Development with LangChain and Python
+In this iteration of the course, participants will explore how to develop advanced applications using Large Language Models (LLMs) with the LangChain framework in Python. Through practical exercises and real-world case studies, the course dives into the technical aspects of building LLM-powered solutions, covering everything from prompt engineering to integrating various data sources and APIs. Participants will gain hands-on experience in creating dynamic applications, including intelligent chatbots and automated workflows. The course begins by providing a foundational understanding of LLMs—how they are trained and adapted for different domains through techniques like prompt engineering and fine-tuning. It then introduces LangChain, a leading framework for building LLM applications, empowering participants to enhance their business processes with LLMs. Ideal for developers, data scientists, data engineers, analysts, and professionals across industries such as banking, telecommunications, and the public sector, this course equips you with the skills needed to build your first production-grade LLM application.
+
+The course is structured into self-contained modules, each building on the skills learned in previous ones. Each module includes lectures for key concepts, practical labs with programming activities and modifiable recipes, and case studies that showcase real-world applications. To reinforce learning, assessments combine theoretical and programming questions to evaluate the learner's understanding and skills gained.
+
+
+
+## Session Details
+
+### Audience
+This session targeted staff from National Statistical Offices across 13 African countries, including Kenya, Tunisia, Burundi, Niger, Burkina Faso, Senegal, Cameroon, Mali, Côte d'Ivoire, Uganda, Central African Republic (RCA), Tanzania, and Mozambique.
+
+### Organization
+The course was divided into three phases, each tailored to maximize learning and engagement:
+
+- **Phase 1: Virtual Session**  
+  This brief, 3-hour virtual session introduced participants to the course content and sparked enthusiasm for the in-person session.
+
+- **Phase 2: In-Person Session**  
+  Conducted over five days, this phase combined two components: a 3-day module on big data, followed by this 2-day LLM course.
+
+- **Phase 3: Project Implementation**  
+  In this phase, participants applied what they learned in the previous sessions by building LLM-based applications, primarily chatbots, to facilitate the dissemination of information.
diff --git a/docs/tunisia-may-24/README.md b/docs/tunisia-may-24/README.md
@@ -0,0 +1,20 @@
+# Generative AI and LLMs for Data Literacy
+
+The first iteration of this course was delivered in Tunis, Tunisia, from May 27 to May 31, as part of the Data in Health Program organized by the World Bank Group and the African Development Bank.
+
+## Session Details
+
+### Audience
+This session targeted staff from National Statistical Offices across 13 African countries, including Kenya, Tunisia, Burundi, Niger, Burkina Faso, Senegal, Cameroon, Mali, Côte d'Ivoire, Uganda, Central African Republic (RCA), Tanzania, and Mozambique.
+
+### Organization
+The course was divided into three phases, each tailored to maximize learning and engagement:
+
+- **Phase 1: Virtual Session**  
+  This brief, 3-hour virtual session introduced participants to the course content and sparked enthusiasm for the in-person session.
+
+- **Phase 2: In-Person Session**  
+  Conducted over five days, this phase combined two components: a 3-day module on big data, followed by this 2-day LLM course.
+
+- **Phase 3: Project Implementation**  
+  In this phase, participants applied what they learned in the previous sessions by building LLM-based applications, primarily chatbots, to facilitate the dissemination of information.