Merge branch 'staging' into features/penalty-rewards

macrocosm-os · Jan 19, 2024 · 6ddafd0 · 6ddafd0
2 parents 8671c38 + cc3c1ee
commit 6ddafd0
Show file tree

Hide file tree

Showing 17 changed files with 325 additions and 132 deletions.
diff --git a/README.md b/README.md
@@ -18,71 +18,82 @@
 This repository is the **official codebase for Bittensor Subnet 1 (SN1) v3.0.0+**. To learn more about the Bittensor project and the underlying mechanics, [read here.](https://docs.bittensor.com/).
 
 
-This code is not yet running on mainnet but you are welcome run the incentive mechanism or test out miners on testnet (`--subtensor.network test --netuid 61`). Our estimated release date is Thursday 18th January 2024 📆.
+This code is not yet running on mainnet but you are welcome run the incentive mechanism or test out miners on testnet (`--subtensor.network test --netuid 61`). Our estimated release date is Monday 22nd, January 2024 📆.
 
 # Introduction
 
 This repo defines an incentive mechanism to create a distributed conversational AI. 
 
-Both the validator and miners are based on large language models (LLM). The [validation process](#validation) uses **[internet-scale datasets](#tools)** and **[goal-driven](#tasks)** behaviour to drive **[human-like conversations](#agents)**. 
+Validators and miners are based on large language models (LLM). The [validation process](#validation) uses **[internet-scale datasets](#tools)** and **[goal-driven](#tasks)** behaviour to drive **[human-like conversations](#agents)**. 
+
+</div>
+
+# Compute Requirements
+
+1. To run a **validator**, you will need at least 24GB of VRAM. 
+2. To run the default Zephyr **miner**, you will need at least 18GB of VRAM. 
 
 </div>
 
 # Validation
-The design of this incentive mechanism is based on two important requirements:
+The design of the network's incentive mechanism is based on two important requirements:
 
-### Validation should mimic human interactions
+### 1. Validation should mimic human interactions
 
 It is imperative that the validation process engages with miners in the same way as real users. The reasons for this are as follows:
 - Miners will compete and continuously improve at performing the validation task(s), and so this fine tuning should be aligned with the goals of the subnet.
 - It should not be possible to distinguish between validation and API client queries so that miners always serve requests (even when they do not recieve emissions for doing so).
 
 In the context of this subnet, miners are required to be intelligent AI assistants that provide helpful and correct responses to a range of queries. 
 
-### Reward models should mimic human preferences
+### 2. Reward models should mimic human preferences
 
 In our experience, we have found that it is tricky to evaluate whether miner responses are high quality. Existing methods typically rely on using LLMs to score completions given a prompt, but this is often exploited and gives rise to many adversarial strategies.
 
 In the present version, the validator produces one or more **reference** answers which all miner responses are compared to. Those which are most similar to the reference answer will attain the highest rewards and ultimately gain the most incentive.
 
 **We presently use a combination of string literal similarity and semantic similarity as the basis for rewarding.**
 
-## Tools
-Contexts, which are the basis of conversations, are from external APIs (which we call tools) which ensure that conversations remain grounded in factuality. 
+# Tools
+Contexts, which are the basis of conversations, are from external APIs (which we call tools) which ensure that conversations remain grounded in factuality. Contexts are also used to obtain ground-truth answers.
+
+Currently, the tooling stack includes:
+1. Wikipedia API 
+2. StackOverflow 
+3. mathgenerator
 
-Contexts are also used to obtain ground-truth answers.
+More tooling will be included in future releases. 
 
-## Tasks
+# Tasks
 The validation process supports an ever-growing number of tasks. Tasks drive agent behaviour based on specific goals, such as; 
 - Question answering
 - Summarization
 - Code debugging
 - Mathematics
  and more. 
 
-Tasks contain a **query** (basic question/problem) and a **reference** (ideal answer). You can see this in the [diagram below](#validation-diagram).
+Tasks contain a **query** (basic question/problem) and a **reference** (ideal answer), where a downstream HumanAgent creates a more nuanced version of the **query**.
 
-## Agents
+# Agents
 
-In order to mimic human interactions, validators participate in a roleplaying game where they take on the persona of random human users. Equipped with this persona and a task, validators prompt miners in a style and tone that is similar to humans and drive the conversation in order to reach a pre-defined goal. We refer to these prompts as **challenges**. 
+In order to mimic human interactions, validators participate in a roleplaying game where they take on the persona of **random** human users. Equipped with this persona and a task, validators prompt miners in a style and tone that is similar to humans and drive the conversation in order to reach a pre-defined goal. We refer to these prompts as **challenges**. 
 
-Challenges are based on the query, but are wrapped in the agent persona which results in a lossy "one-way" function which colorful and overall less predictable.
+Challenges are based on the query by wrapping the query in an agent persona which results in a lossy "one-way" function. This results in challenges that are overall more interesting, and less predictable.
 
 The [diagram below](#validation-diagram) illustrates the validation flow.
 
 #### Our approach innovatively transforms straightforward queries into complex challenges, a process akin to a 'hash function', requiring advanced NLP for resolution. This transformation is crucial for preventing simple lookups in source documents, ensuring that responses necessitate authentic analytical effort.
 
 
-## Validation Diagram
+# Validation Diagram
 ![sn1 overview](assets/sn1-overview.png)
 
- # Mining
+# Mining
 
 Miners are scored based on the similarity between their completions and the reference answer. Furthermore, they should utilize the same API tools as the validators in order to be able to closely reproduce the reference answer.
 
 Miner experiments are ongoing - we will share our results on the expected performance of various miners in the coming days! 
 
-
 </div>
 
 ---
@@ -92,7 +103,7 @@ This repository requires python3.8 or higher. To install, simply clone this repo
 ```bash
 git clone https://github.com/opentensor/prompting.git
 cd prompting
-python -m pip install -r requirements.txt -r prompting/requirements.txt
+python -m pip install -r requirements.txt
 python -m pip install -e .
 ```
 

diff --git a/neurons/miners/zephyr/miner.py b/neurons/miners/zephyr/miner.py
@@ -33,12 +33,10 @@
 
 class ZephyrMiner(Miner):
     """
-    Base miner which runs zephyr (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
-    
+    Base miner which runs zephyr (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)    
     This requires a GPU with at least 20GB of memory.
-    
     To run this miner from the project root directory:
-    
+
     python neurons/miners/zephyr/miner.py --wallet.name <wallet_name> --wallet.hotkey <wallet_hotkey> --subtensor.network <network> --netuid <netuid> --axon.port <port> --axon.external_port <port> --logging.debug True --neuron.model_id HuggingFaceH4/zephyr-7b-beta --neuron.system_prompt "Hello, I am a chatbot. I am here to help you with your questions." --neuron.max_tokens 64 --neuron.do_sample True --neuron.temperature 0.9 --neuron.top_k 50 --neuron.top_p 0.95 --wandb.on True --wandb.entity sn1 --wandb.project_name miners_experiments
     """
 
@@ -51,7 +49,7 @@ def add_args(cls, parser: argparse.ArgumentParser):
         parser.add_argument(
             "--neuron.model_id",
             type=str,
-            default="HuggingFaceH4/zephyr-7b-beta",            
+            default="HuggingFaceH4/zephyr-7b-beta",
         )
 
         parser.add_argument(
@@ -84,13 +82,10 @@ def __init__(self, config=None):
             device=self.device,
             mock=self.config.mock,
         )
-
-
+
         self.system_prompt = "You are a friendly chatbot who always responds concisely and helpfully. You are honest about things you don't know."
 
-    async def forward(
-        self, synapse: PromptingSynapse
-    ) -> PromptingSynapse:
+    async def forward(self, synapse: PromptingSynapse) -> PromptingSynapse:
         """
         Processes the incoming synapse by performing a predefined operation on the input data.
         This method should be replaced with actual logic relevant to the miner's purpose.
@@ -104,13 +99,14 @@ async def forward(
         The 'forward' function is a placeholder and should be overridden with logic that is appropriate for
         the miner's intended operation. This method demonstrates a basic transformation of input data.
         """
-        
+
         try:
             t0 = time.time()
             bt.logging.debug(f"📧 Message received, forwarding synapse: {synapse}")
 
             prompt = synapse.messages[-1]
             bt.logging.debug(f"💬 Querying zephyr: {prompt}")
+
             response = HuggingFaceLLM(
                 llm_pipeline=self.llm_pipeline,
                 system_prompt=self.system_prompt,
@@ -120,33 +116,31 @@ async def forward(
                 top_k=self.config.neuron.top_k,
                 top_p=self.config.neuron.top_p,
             ).query(
-                message=prompt, # For now we just take the last message
-                cleanup=True,
+                message=prompt,  # For now we just take the last message
                 role="user",
                 disregard_system_prompt=False,
             )
+
             synapse.completion = response
             synapse_latency = time.time() - t0
 
             if self.config.wandb.on:
                 # TODO: Add system prompt to wandb config and not on every step
                 self.log_event(
-                    timing=synapse_latency, 
+                    timing=synapse_latency,
                     prompt=prompt,
                     completion=response,
-                    system_prompt=self.system_prompt
+                    system_prompt=self.system_prompt,
                 )
-                        
+
             bt.logging.debug(f"✅ Served Response: {response}")
             torch.cuda.empty_cache()
-            
+
         except Exception as e:
-            bt.logging.error(f"Error: {e}")            
+            bt.logging.error(f"Error: {e}")
             synapse.completion = "Error: " + str(e)
         finally:
             return synapse
-
-
 
 
 # This is the main function, which runs the miner.

diff --git a/prompting/agent.py b/prompting/agent.py
@@ -4,6 +4,7 @@
 from dataclasses import asdict
 from prompting.tasks import Task
 from prompting.llm import HuggingFaceLLM
+from prompting.cleaners.cleaner import CleanerPipeline
 
 from prompting.persona import Persona, create_persona
 
@@ -65,18 +66,23 @@ def __init__(
     def create_challenge(self) -> str:
         """Creates the opening question of the conversation which is based on the task query but dressed in the persona of the user."""
         t0 = time.time()
-        self.challenge = super().query(
-            message="Ask a question related to your goal"
-        )
+
+        cleaner = None
+        if hasattr(self.task, 'cleaning_pipeline'):            
+            cleaner = CleanerPipeline(
+                cleaning_pipeline=self.task.cleaning_pipeline
+            )
+
+        self.challenge = super().query(message="Ask a question related to your goal", cleaner=cleaner)
         self.challenge = self.task.format_challenge(self.challenge)
         self.challenge_time = time.time() - t0
 
-        return self.challenge.strip(' "')
-    
+        return self.challenge
+
     def __state_dict__(self, full=False):
         return {
-            "challenge": self.challenge,   
-            "challenge_time": self.challenge_time,      
+            "challenge": self.challenge,
+            "challenge_time": self.challenge_time,
             **self.task.__state_dict__(full=full),
             **asdict(self.persona),
             "system_prompt": self.system_prompt,
@@ -109,5 +115,3 @@ def update_progress(
                 "↪ Agent did not finish its goal, continuing conversation..."
             )
             self.continue_conversation(miner_response=top_response)
-
-
diff --git a/prompting/cleaners/__init__.py b/prompting/cleaners/__init__.py
diff --git a/prompting/cleaners/all_cleaners.py b/prompting/cleaners/all_cleaners.py
@@ -0,0 +1,75 @@
+from abc import ABC, abstractmethod
+import bittensor as bt
+import re
+
+
+class BaseCleaner(ABC):
+    @abstractmethod
+    def __init__(self, **kwargs):
+        pass
+
+    @abstractmethod
+    def apply(self, generation: str) -> str:
+        pass
+
+
+class RemoveQuotes(BaseCleaner):
+    def __init__(self, **kwargs) -> None:
+        pass
+
+    def apply(self, generation: str) -> str:
+        bt.logging.debug("Pruning unfinished sentence.")
+        return generation.strip("\"'")
+
+
+class PruneEnding(BaseCleaner):
+    def __init__(self, **kwargs):
+        pass
+
+    def apply(self, generation: str) -> str:
+        punctuation_chars = [".", "?", "!"]
+
+        if not any(char in generation for char in punctuation_chars):
+            return generation
+
+        if (
+            not generation.endswith(".")
+            and not generation.endswith("?")
+            and not generation.endswith("!")
+        ):
+            index = max(generation.rfind(char) for char in punctuation_chars)
+            return generation[
+                : index + 1
+            ]  # Go to the index of where the punctuation is, and include it (+1)
+        else:
+            return generation
+
+
+class RemoveRoles(BaseCleaner):
+    def __init__(self, **kwargs):
+        pass
+
+    def capitalize_sentences(self, input_string):
+        """capitalize the first character after .!?"""
+        sentences = re.split(r"(?<=[.!?])\s+", input_string)
+        capitalized_sentences = [sentence.capitalize() for sentence in sentences]
+        result_string = " ".join(capitalized_sentences)
+        return result_string
+
+    def apply(self, generation: str) -> str:
+        roles = [
+            "User: ",
+            "System: ",
+            "Assistant: ",
+            "Assistant, ",
+            "Dear AI, ",
+            "Dear AI ",
+            "#Question: ",
+        ]
+        for role in roles:
+            if role in generation:
+                generation = generation.replace(role, "")
+
+        return self.capitalize_sentences(
+            input_string=generation
+        )  # LLMs are good at being formal. Do the same if we remove a prefix.
diff --git a/prompting/cleaners/cleaner.py b/prompting/cleaners/cleaner.py
@@ -0,0 +1,53 @@
+from typing import List, Dict
+
+import bittensor as bt
+
+from prompting.cleaners.all_cleaners import RemoveQuotes, RemoveRoles, PruneEnding
+
+SUPPORTED_CLEANERS = {
+    "remove_quotes": RemoveQuotes,
+    "remove_roles": RemoveRoles,
+    "prune_ending": PruneEnding,
+}
+
+
+class CleanerPipeline:
+    def __init__(self, cleaning_pipeline: List[Dict]) -> None:
+        """CleanerPipeline is a pipeline that can be applied to any string to
+        clean it of unwanted characters, punctuation, etc.
+
+        cleaning_pipeline (List[Dict]): List of Dicts that define the cleaning pipeline.
+            Dictionaries MUST have the keyword "name" to be valid.
+            Example: [{"name": "remove_quotes", "kwargs": {}}, {"name": "prune_ending", "kwargs": {}}]
+
+        """
+        self.cleaning_pipeline = cleaning_pipeline
+
+    def apply(self, generation: str) -> str:
+        """Apply cleaning steps to generation listed in cleaning_pipeline.
+
+        Args:
+            generation (str): string generated from LLM or otherwise.
+        Returns:
+            str: Clean generated string.
+        """
+        try:
+            for cleaner in self.cleaning_pipeline:
+                if "name" not in cleaner or cleaner["name"] not in SUPPORTED_CLEANERS:
+                    raise ValueError(
+                        f"Cleaning pipeline step {cleaner} must have a name, or must be in SUPPORTED_CLEANERS."
+                    )
+
+                func = SUPPORTED_CLEANERS[cleaner["name"]]
+
+                kwargs = cleaner.get("kwargs", {})
+                func = func(**kwargs)  # instantiate the cleaner with the kwargs
+
+                # apply all the filters for the specific task.
+                generation = func.apply(generation=generation)
+
+            return generation
+
+        except Exception as E:
+            bt.logging.error(f"Failed to apply cleaning pipeline {cleaner['name']}. {E},")
+            return generation