Skip to content

Commit

Permalink
Merge branch 'staging' into features/penalty-rewards
Browse files Browse the repository at this point in the history
  • Loading branch information
steffencruz authored Jan 19, 2024
2 parents 8671c38 + cc3c1ee commit 6ddafd0
Show file tree
Hide file tree
Showing 17 changed files with 325 additions and 132 deletions.
45 changes: 28 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,71 +18,82 @@
This repository is the **official codebase for Bittensor Subnet 1 (SN1) v3.0.0+**. To learn more about the Bittensor project and the underlying mechanics, [read here.](https://docs.bittensor.com/).


This code is not yet running on mainnet but you are welcome run the incentive mechanism or test out miners on testnet (`--subtensor.network test --netuid 61`). Our estimated release date is Thursday 18th January 2024 📆.
This code is not yet running on mainnet but you are welcome run the incentive mechanism or test out miners on testnet (`--subtensor.network test --netuid 61`). Our estimated release date is Monday 22nd, January 2024 📆.

# Introduction

This repo defines an incentive mechanism to create a distributed conversational AI.

Both the validator and miners are based on large language models (LLM). The [validation process](#validation) uses **[internet-scale datasets](#tools)** and **[goal-driven](#tasks)** behaviour to drive **[human-like conversations](#agents)**.
Validators and miners are based on large language models (LLM). The [validation process](#validation) uses **[internet-scale datasets](#tools)** and **[goal-driven](#tasks)** behaviour to drive **[human-like conversations](#agents)**.

</div>

# Compute Requirements

1. To run a **validator**, you will need at least 24GB of VRAM.
2. To run the default Zephyr **miner**, you will need at least 18GB of VRAM.

</div>

# Validation
The design of this incentive mechanism is based on two important requirements:
The design of the network's incentive mechanism is based on two important requirements:

### Validation should mimic human interactions
### 1. Validation should mimic human interactions

It is imperative that the validation process engages with miners in the same way as real users. The reasons for this are as follows:
- Miners will compete and continuously improve at performing the validation task(s), and so this fine tuning should be aligned with the goals of the subnet.
- It should not be possible to distinguish between validation and API client queries so that miners always serve requests (even when they do not recieve emissions for doing so).

In the context of this subnet, miners are required to be intelligent AI assistants that provide helpful and correct responses to a range of queries.

### Reward models should mimic human preferences
### 2. Reward models should mimic human preferences

In our experience, we have found that it is tricky to evaluate whether miner responses are high quality. Existing methods typically rely on using LLMs to score completions given a prompt, but this is often exploited and gives rise to many adversarial strategies.

In the present version, the validator produces one or more **reference** answers which all miner responses are compared to. Those which are most similar to the reference answer will attain the highest rewards and ultimately gain the most incentive.

**We presently use a combination of string literal similarity and semantic similarity as the basis for rewarding.**

## Tools
Contexts, which are the basis of conversations, are from external APIs (which we call tools) which ensure that conversations remain grounded in factuality.
# Tools
Contexts, which are the basis of conversations, are from external APIs (which we call tools) which ensure that conversations remain grounded in factuality. Contexts are also used to obtain ground-truth answers.

Currently, the tooling stack includes:
1. Wikipedia API
2. StackOverflow
3. mathgenerator

Contexts are also used to obtain ground-truth answers.
More tooling will be included in future releases.

## Tasks
# Tasks
The validation process supports an ever-growing number of tasks. Tasks drive agent behaviour based on specific goals, such as;
- Question answering
- Summarization
- Code debugging
- Mathematics
and more.

Tasks contain a **query** (basic question/problem) and a **reference** (ideal answer). You can see this in the [diagram below](#validation-diagram).
Tasks contain a **query** (basic question/problem) and a **reference** (ideal answer), where a downstream HumanAgent creates a more nuanced version of the **query**.

## Agents
# Agents

In order to mimic human interactions, validators participate in a roleplaying game where they take on the persona of random human users. Equipped with this persona and a task, validators prompt miners in a style and tone that is similar to humans and drive the conversation in order to reach a pre-defined goal. We refer to these prompts as **challenges**.
In order to mimic human interactions, validators participate in a roleplaying game where they take on the persona of **random** human users. Equipped with this persona and a task, validators prompt miners in a style and tone that is similar to humans and drive the conversation in order to reach a pre-defined goal. We refer to these prompts as **challenges**.

Challenges are based on the query, but are wrapped in the agent persona which results in a lossy "one-way" function which colorful and overall less predictable.
Challenges are based on the query by wrapping the query in an agent persona which results in a lossy "one-way" function. This results in challenges that are overall more interesting, and less predictable.

The [diagram below](#validation-diagram) illustrates the validation flow.

#### Our approach innovatively transforms straightforward queries into complex challenges, a process akin to a 'hash function', requiring advanced NLP for resolution. This transformation is crucial for preventing simple lookups in source documents, ensuring that responses necessitate authentic analytical effort.


## Validation Diagram
# Validation Diagram
![sn1 overview](assets/sn1-overview.png)

# Mining
# Mining

Miners are scored based on the similarity between their completions and the reference answer. Furthermore, they should utilize the same API tools as the validators in order to be able to closely reproduce the reference answer.

Miner experiments are ongoing - we will share our results on the expected performance of various miners in the coming days!


</div>

---
Expand All @@ -92,7 +103,7 @@ This repository requires python3.8 or higher. To install, simply clone this repo
```bash
git clone https://github.com/opentensor/prompting.git
cd prompting
python -m pip install -r requirements.txt -r prompting/requirements.txt
python -m pip install -r requirements.txt
python -m pip install -e .
```

Expand Down
34 changes: 14 additions & 20 deletions neurons/miners/zephyr/miner.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,10 @@

class ZephyrMiner(Miner):
"""
Base miner which runs zephyr (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
Base miner which runs zephyr (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
This requires a GPU with at least 20GB of memory.
To run this miner from the project root directory:
python neurons/miners/zephyr/miner.py --wallet.name <wallet_name> --wallet.hotkey <wallet_hotkey> --subtensor.network <network> --netuid <netuid> --axon.port <port> --axon.external_port <port> --logging.debug True --neuron.model_id HuggingFaceH4/zephyr-7b-beta --neuron.system_prompt "Hello, I am a chatbot. I am here to help you with your questions." --neuron.max_tokens 64 --neuron.do_sample True --neuron.temperature 0.9 --neuron.top_k 50 --neuron.top_p 0.95 --wandb.on True --wandb.entity sn1 --wandb.project_name miners_experiments
"""

Expand All @@ -51,7 +49,7 @@ def add_args(cls, parser: argparse.ArgumentParser):
parser.add_argument(
"--neuron.model_id",
type=str,
default="HuggingFaceH4/zephyr-7b-beta",
default="HuggingFaceH4/zephyr-7b-beta",
)

parser.add_argument(
Expand Down Expand Up @@ -84,13 +82,10 @@ def __init__(self, config=None):
device=self.device,
mock=self.config.mock,
)



self.system_prompt = "You are a friendly chatbot who always responds concisely and helpfully. You are honest about things you don't know."

async def forward(
self, synapse: PromptingSynapse
) -> PromptingSynapse:
async def forward(self, synapse: PromptingSynapse) -> PromptingSynapse:
"""
Processes the incoming synapse by performing a predefined operation on the input data.
This method should be replaced with actual logic relevant to the miner's purpose.
Expand All @@ -104,13 +99,14 @@ async def forward(
The 'forward' function is a placeholder and should be overridden with logic that is appropriate for
the miner's intended operation. This method demonstrates a basic transformation of input data.
"""

try:
t0 = time.time()
bt.logging.debug(f"📧 Message received, forwarding synapse: {synapse}")

prompt = synapse.messages[-1]
bt.logging.debug(f"💬 Querying zephyr: {prompt}")

response = HuggingFaceLLM(
llm_pipeline=self.llm_pipeline,
system_prompt=self.system_prompt,
Expand All @@ -120,33 +116,31 @@ async def forward(
top_k=self.config.neuron.top_k,
top_p=self.config.neuron.top_p,
).query(
message=prompt, # For now we just take the last message
cleanup=True,
message=prompt, # For now we just take the last message
role="user",
disregard_system_prompt=False,
)

synapse.completion = response
synapse_latency = time.time() - t0

if self.config.wandb.on:
# TODO: Add system prompt to wandb config and not on every step
self.log_event(
timing=synapse_latency,
timing=synapse_latency,
prompt=prompt,
completion=response,
system_prompt=self.system_prompt
system_prompt=self.system_prompt,
)

bt.logging.debug(f"✅ Served Response: {response}")
torch.cuda.empty_cache()

except Exception as e:
bt.logging.error(f"Error: {e}")
bt.logging.error(f"Error: {e}")
synapse.completion = "Error: " + str(e)
finally:
return synapse




# This is the main function, which runs the miner.
Expand Down
22 changes: 13 additions & 9 deletions prompting/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from dataclasses import asdict
from prompting.tasks import Task
from prompting.llm import HuggingFaceLLM
from prompting.cleaners.cleaner import CleanerPipeline

from prompting.persona import Persona, create_persona

Expand Down Expand Up @@ -65,18 +66,23 @@ def __init__(
def create_challenge(self) -> str:
"""Creates the opening question of the conversation which is based on the task query but dressed in the persona of the user."""
t0 = time.time()
self.challenge = super().query(
message="Ask a question related to your goal"
)

cleaner = None
if hasattr(self.task, 'cleaning_pipeline'):
cleaner = CleanerPipeline(
cleaning_pipeline=self.task.cleaning_pipeline
)

self.challenge = super().query(message="Ask a question related to your goal", cleaner=cleaner)
self.challenge = self.task.format_challenge(self.challenge)
self.challenge_time = time.time() - t0

return self.challenge.strip(' "')
return self.challenge

def __state_dict__(self, full=False):
return {
"challenge": self.challenge,
"challenge_time": self.challenge_time,
"challenge": self.challenge,
"challenge_time": self.challenge_time,
**self.task.__state_dict__(full=full),
**asdict(self.persona),
"system_prompt": self.system_prompt,
Expand Down Expand Up @@ -109,5 +115,3 @@ def update_progress(
"↪ Agent did not finish its goal, continuing conversation..."
)
self.continue_conversation(miner_response=top_response)


Empty file added prompting/cleaners/__init__.py
Empty file.
75 changes: 75 additions & 0 deletions prompting/cleaners/all_cleaners.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from abc import ABC, abstractmethod
import bittensor as bt
import re


class BaseCleaner(ABC):
@abstractmethod
def __init__(self, **kwargs):
pass

@abstractmethod
def apply(self, generation: str) -> str:
pass


class RemoveQuotes(BaseCleaner):
def __init__(self, **kwargs) -> None:
pass

def apply(self, generation: str) -> str:
bt.logging.debug("Pruning unfinished sentence.")
return generation.strip("\"'")


class PruneEnding(BaseCleaner):
def __init__(self, **kwargs):
pass

def apply(self, generation: str) -> str:
punctuation_chars = [".", "?", "!"]

if not any(char in generation for char in punctuation_chars):
return generation

if (
not generation.endswith(".")
and not generation.endswith("?")
and not generation.endswith("!")
):
index = max(generation.rfind(char) for char in punctuation_chars)
return generation[
: index + 1
] # Go to the index of where the punctuation is, and include it (+1)
else:
return generation


class RemoveRoles(BaseCleaner):
def __init__(self, **kwargs):
pass

def capitalize_sentences(self, input_string):
"""capitalize the first character after .!?"""
sentences = re.split(r"(?<=[.!?])\s+", input_string)
capitalized_sentences = [sentence.capitalize() for sentence in sentences]
result_string = " ".join(capitalized_sentences)
return result_string

def apply(self, generation: str) -> str:
roles = [
"User: ",
"System: ",
"Assistant: ",
"Assistant, ",
"Dear AI, ",
"Dear AI ",
"#Question: ",
]
for role in roles:
if role in generation:
generation = generation.replace(role, "")

return self.capitalize_sentences(
input_string=generation
) # LLMs are good at being formal. Do the same if we remove a prefix.
53 changes: 53 additions & 0 deletions prompting/cleaners/cleaner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from typing import List, Dict

import bittensor as bt

from prompting.cleaners.all_cleaners import RemoveQuotes, RemoveRoles, PruneEnding

SUPPORTED_CLEANERS = {
"remove_quotes": RemoveQuotes,
"remove_roles": RemoveRoles,
"prune_ending": PruneEnding,
}


class CleanerPipeline:
def __init__(self, cleaning_pipeline: List[Dict]) -> None:
"""CleanerPipeline is a pipeline that can be applied to any string to
clean it of unwanted characters, punctuation, etc.
cleaning_pipeline (List[Dict]): List of Dicts that define the cleaning pipeline.
Dictionaries MUST have the keyword "name" to be valid.
Example: [{"name": "remove_quotes", "kwargs": {}}, {"name": "prune_ending", "kwargs": {}}]
"""
self.cleaning_pipeline = cleaning_pipeline

def apply(self, generation: str) -> str:
"""Apply cleaning steps to generation listed in cleaning_pipeline.
Args:
generation (str): string generated from LLM or otherwise.
Returns:
str: Clean generated string.
"""
try:
for cleaner in self.cleaning_pipeline:
if "name" not in cleaner or cleaner["name"] not in SUPPORTED_CLEANERS:
raise ValueError(
f"Cleaning pipeline step {cleaner} must have a name, or must be in SUPPORTED_CLEANERS."
)

func = SUPPORTED_CLEANERS[cleaner["name"]]

kwargs = cleaner.get("kwargs", {})
func = func(**kwargs) # instantiate the cleaner with the kwargs

# apply all the filters for the specific task.
generation = func.apply(generation=generation)

return generation

except Exception as E:
bt.logging.error(f"Failed to apply cleaning pipeline {cleaner['name']}. {E},")
return generation
Loading

0 comments on commit 6ddafd0

Please sign in to comment.