The netuid
for openkaito
is 5
on mainnet, and 88
on testnet.
Generally, for both validator and miner, you need to prepare your wallet and make your key registered in the subnet. You can find related guidance in the running on mainnet section.
If you are new to bittensor subnet, you are recommended to start by following
In the root folder of this repository, run the following command to install openkaito:
pip install -e .
To use the provided crawler, you need to obtain an API key from Apify. After obtaining the API key, you can write it down in the .env
file.
APIFY_API_KEY='apify_api_xxxxxx'
The key is for both the miner and the validator. For miner, it is used to search and crawl the data needed. For validator, it is used to get the ground truth data for evaluation.
NOTE: You can also create your own crawler to crawl the data you need. But you need to be aware that the data you crawl should be able to pass the integrity check of the validator.
Then you can prepare your dotenv file by:
cp .env.example .env
To be a miner of openkaito, you need to have an Elasticsearch instance running. You can install Elasticsearch by following the instructions here. That includes intalling docker and run the Elasticsearch docker image.
You can refer to the docker official guide to install docker.
In short, you can run the following command to install docker:
Ubuntu
# Uninstall old versions
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
# Install Docker
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Verify that the Docker Engine installation is successful by running the hello-world image
sudo docker run hello-world
Then you have successfully installed and started Docker Engine.
CentOS
# Uninstall old versions
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
# Install the yum-utils package (which provides the yum-config-manager utility) and set up the repository
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# Install Docker
sudo yum install docker
# Start Docker
sudo systemctl start docker
# Verify that the Docker Engine installation is successful by running the hello-world image
sudo docker run hello-world
Then you have successfully installed and started Docker Engine.
If you have docker installed, you can run the following command to start an Elasticsearch instance:
sudo docker network create elastic
sudo docker pull docker.elastic.co/elasticsearch/elasticsearch:8.12.1
sudo docker run --name elasticsearch --net elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -t docker.elastic.co/elasticsearch/elasticsearch:8.12.1
After running the above command, you will be prompted the password for the built-in user elastic
. You can write it down in the .env
file.
ELASTICSEARCH_HOST="https://localhost:9200"
ELASTICSEARCH_USERNAME="elastic"
ELASTICSEARCH_PASSWORD="your_password"
If you forget the password, you can reset it by running the following command:
sudo docker exec -it elasticsearch /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
If the Elasticsearch instance exited unexpectedly, you can start it by running the following command:
sudo docker start elasticsearch
You may refer to and run scripts/vector_index_eth_denver_dataset.py
to index the ETH Denver 2024 dataset for semantic search.
This script extracts the Eth Denver dataset (open-sourced by Kaito AI), indexes the documents in Elasticsearch, and indexes the embeddings of the documents in Elasticsearch. It also provides a test query to retrieve the top-k similar documents to the query.
This script is intentionally kept transparent and hackable, and miners may do their own customizations.
python scripts/vector_index_eth_denver_dataset.py
After setting up the Elasticsearch and obtaining the API key, you can start the miner by running the following command:
python neurons/miner.py --netuid 5 --subtensor.network finney --wallet.name miner --wallet.hotkey default --logging.debug --blacklist.force_validator_permit --axon.port 8091
The detailed commandline arguments for the neurons/miner.py
can be obtained by python neurons/miner.py --help
, and are as follows:
usage: miner.py [-h] [--no_prompt] [--wallet.name WALLET.NAME] [--wallet.hotkey WALLET.HOTKEY] [--wallet.path WALLET.PATH]
[--subtensor.network SUBTENSOR.NETWORK] [--subtensor.chain_endpoint SUBTENSOR.CHAIN_ENDPOINT] [--subtensor._mock SUBTENSOR._MOCK]
[--logging.debug] [--logging.trace] [--logging.record_log] [--logging.logging_dir LOGGING.LOGGING_DIR] [--axon.port AXON.PORT]
[--axon.ip AXON.IP] [--axon.external_port AXON.EXTERNAL_PORT] [--axon.external_ip AXON.EXTERNAL_IP]
[--axon.max_workers AXON.MAX_WORKERS] [--netuid NETUID] [--neuron.name NEURON.NAME] [--neuron.device NEURON.DEVICE]
[--neuron.epoch_length NEURON.EPOCH_LENGTH] [--neuron.events_retention_size NEURON.EVENTS_RETENTION_SIZE] [--neuron.dont_save_events]
[--neuron.disable_crawling] [--neuron.crawl_size NEURON.CRAWL_SIZE] [--neuron.search_recall_size NEURON.SEARCH_RECALL_SIZE]
[--blacklist.force_validator_permit] [--blacklist.allow_non_registered] [--config CONFIG] [--strict] [--no_version_checking]
options:
-h, --help show this help message and exit
--no_prompt Set true to avoid prompting the user.
--wallet.name WALLET.NAME
The name of the wallet to unlock for running bittensor (name mock is reserved for mocking this wallet)
--wallet.hotkey WALLET.HOTKEY
The name of the wallet's hotkey.
--wallet.path WALLET.PATH
The path to your bittensor wallets
--subtensor.network SUBTENSOR.NETWORK
The subtensor network flag. The likely choices are: -- finney (main network) -- test (test network) -- archive (archive
network +300 blocks) -- local (local running network) If this option is set it overloads subtensor.chain_endpoint with an
entry point node from that network.
--subtensor.chain_endpoint SUBTENSOR.CHAIN_ENDPOINT
The subtensor endpoint flag. If set, overrides the --network flag.
--subtensor._mock SUBTENSOR._MOCK
If true, uses a mocked connection to the chain.
--logging.debug Turn on bittensor debugging information
--logging.trace Turn on bittensor trace level information
--logging.record_log Turns on logging to file.
--logging.logging_dir LOGGING.LOGGING_DIR
Logging default root directory.
--axon.port AXON.PORT
The local port this axon endpoint is bound to. i.e. 8091
--axon.ip AXON.IP The local ip this axon binds to. ie. [::]
--axon.external_port AXON.EXTERNAL_PORT
The public port this axon broadcasts to the network. i.e. 8091
--axon.external_ip AXON.EXTERNAL_IP
The external ip this axon broadcasts to the network to. ie. [::]
--axon.max_workers AXON.MAX_WORKERS
The maximum number connection handler threads working simultaneously on this endpoint. The grpc server distributes new worker
threads to service requests up to this number.
--netuid NETUID Subnet netuid
--neuron.name NEURON.NAME
Trials for this neuron go in neuron.root / (wallet_cold - wallet_hot) / neuron.name.
--neuron.device NEURON.DEVICE
Device to run on.
--neuron.epoch_length NEURON.EPOCH_LENGTH
The default epoch length (how often we set weights, measured in 12 second blocks).
--neuron.events_retention_size NEURON.EVENTS_RETENTION_SIZE
Events retention size.
--neuron.dont_save_events
If set, we dont save events to a log file.
--neuron.disable_crawling
If set, we disable crawling when receiving a search request.
--neuron.crawl_size NEURON.CRAWL_SIZE
The number of documents to crawl when receiving each query.
--neuron.search_recall_size NEURON.SEARCH_RECALL_SIZE
The number of search results to retrieve for ranking.
--blacklist.force_validator_permit
If set, we will force incoming requests to have a permit.
--blacklist.allow_non_registered
If set, miners will accept queries from non registered entities. (Dangerous!)
--config CONFIG If set, defaults are overridden by passed file.
--strict If flagged, config will check that only exact arguments have been set.
--no_version_checking
Set ``true`` to stop cli version checking.
We provide scripts/search_evaluation.py
to quick preview your search engine performance. You can run the following command to evaluate the search engine's performance:
python scripts/search_evaluation.py --query 'BTC' --size 5
Note the score may not always be the save even if everything is kept unchanged. But you can always improve your search engine performance systematically to gain a statistically significant improvement.
To obtain better miner performance, you can consider the following options:
- crawl and index more data, the more data you have, the better the search engine performance you can achieve
- adjust the apify crawler running parameters, e.g., increase the size of on search crawling
- implement a continuous crawler, instead of or in addition to the on search crawling, to crawl data and ingest into the Elasticsearch instance
- build better ranking model
- customize the provied ranking model, e.g., tune the parameters for
length_weight
andage_weight
- implement a better ranking model, e.g., integrating LLM
- customize the provied ranking model, e.g., tune the parameters for
- improve the recall stage
- tune the parameters for the search query, e.g., adjust the
size
parameter viaMINER_SEARCH_RECALL_SIZE
in.env
- build better index for the data you crawled and ingested into Elasticsearch, e.g., Knowledge Graph
- tune the parameters for the search query, e.g., adjust the
- any other advanced improvements you can think of
You may need to setup your wallet and hotkey according to running on mainnet.
To use the LLM ranking result evaluation, you need to obtain an API key from OpenAI. After obtaining the API key, you can write it down in the .env
file.
OPENAI_API_KEY="sk-xxxxxx"
Don't forget to also obtain and set the
APIFY_API_KEY
as mentioned in the above General section.
Log in to Weights & Biases and generate a key in your account settings.
Set the key WANDB_API_KEY
in the .env
file.
WANDB_API_KEY="your_wandb_api_key"
To enable validator auto update with github repo, you can install pm2
and jq
.
Ubuntu
# Install pm2 and jq
sudo apt-get install jq npm
sudo npm install -g pm2
CentOS
# Install jq
sudo yum install jq
# Install npm and pm2
sudo yum install nodejs20
sudo npm install -g pm2
You can start the validator by running the following command, enabling validator auto update with github repo:
# Run the run.sh to enable auto update
pm2 start run.sh --name openkaito_validator_autoupdate -- --netuid 5 --subtensor.network finney --wallet.name <your_wallet_name> --wallet.hotkey <your_hotkey> --logging.debug
You may pass in more command line arguments to the validator
command, e.g., --axon.port <your_axon_port>
etc., if needed.
This will run two PM2 process: one for the neurons/validator.py
which is called openkaito_validator_main_process
, and one for the run.sh
script which is called openkaito_validator_autoupdate
. The script will check for updates every 30 minutes, if there is an update then it will pull it, install it, restart openkaito_validator_main_process
and then restart itself.
The detailed commandline arguments for the validator
can be obtained by python neurons/validator.py --help
, and are as follows:
usage: validator.py [-h] [--no_prompt] [--wallet.name WALLET.NAME] [--wallet.hotkey WALLET.HOTKEY] [--wallet.path WALLET.PATH]
[--subtensor.network SUBTENSOR.NETWORK] [--subtensor.chain_endpoint SUBTENSOR.CHAIN_ENDPOINT] [--subtensor._mock SUBTENSOR._MOCK]
[--logging.debug] [--logging.trace] [--logging.record_log] [--logging.logging_dir LOGGING.LOGGING_DIR] [--axon.port AXON.PORT]
[--axon.ip AXON.IP] [--axon.external_port AXON.EXTERNAL_PORT] [--axon.external_ip AXON.EXTERNAL_IP]
[--axon.max_workers AXON.MAX_WORKERS] [--netuid NETUID] [--neuron.name NEURON.NAME] [--neuron.device NEURON.DEVICE]
[--neuron.epoch_length NEURON.EPOCH_LENGTH] [--neuron.events_retention_size NEURON.EVENTS_RETENTION_SIZE]
[--neuron.dont_save_events] [--neuron.num_concurrent_forwards NEURON.NUM_CONCURRENT_FORWARDS]
[--neuron.sample_size NEURON.SAMPLE_SIZE] [--neuron.search_request_interval NEURON.SEARCH_REQUEST_INTERVAL]
[--neuron.search_result_size NEURON.SEARCH_RESULT_SIZE] [--neuron.disable_set_weights]
[--neuron.moving_average_alpha NEURON.MOVING_AVERAGE_ALPHA] [--neuron.axon_off]
[--neuron.vpermit_tao_limit NEURON.VPERMIT_TAO_LIMIT] [--config CONFIG] [--strict] [--no_version_checking]
options:
-h, --help show this help message and exit
--no_prompt Set true to avoid prompting the user.
--wallet.name WALLET.NAME
The name of the wallet to unlock for running bittensor (name mock is reserved for mocking this wallet)
--wallet.hotkey WALLET.HOTKEY
The name of the wallet's hotkey.
--wallet.path WALLET.PATH
The path to your bittensor wallets
--subtensor.network SUBTENSOR.NETWORK
The subtensor network flag. The likely choices are: -- finney (main network) -- test (test network) -- archive (archive
network +300 blocks) -- local (local running network) If this option is set it overloads subtensor.chain_endpoint with an
entry point node from that network.
--subtensor.chain_endpoint SUBTENSOR.CHAIN_ENDPOINT
The subtensor endpoint flag. If set, overrides the --network flag.
--subtensor._mock SUBTENSOR._MOCK
If true, uses a mocked connection to the chain.
--logging.debug Turn on bittensor debugging information
--logging.trace Turn on bittensor trace level information
--logging.record_log Turns on logging to file.
--logging.logging_dir LOGGING.LOGGING_DIR
Logging default root directory.
--axon.port AXON.PORT
The local port this axon endpoint is bound to. i.e. 8091
--axon.ip AXON.IP The local ip this axon binds to. ie. [::]
--axon.external_port AXON.EXTERNAL_PORT
The public port this axon broadcasts to the network. i.e. 8091
--axon.external_ip AXON.EXTERNAL_IP
The external ip this axon broadcasts to the network to. ie. [::]
--axon.max_workers AXON.MAX_WORKERS
The maximum number connection handler threads working simultaneously on this endpoint. The grpc server distributes new worker
threads to service requests up to this number.
--netuid NETUID Subnet netuid
--neuron.name NEURON.NAME
Trials for this neuron go in neuron.root / (wallet_cold - wallet_hot) / neuron.name.
--neuron.device NEURON.DEVICE
Device to run on.
--neuron.epoch_length NEURON.EPOCH_LENGTH
The default epoch length (how often we set weights, measured in 12 second blocks).
--neuron.events_retention_size NEURON.EVENTS_RETENTION_SIZE
Events retention size.
--neuron.dont_save_events
If set, we dont save events to a log file.
--neuron.num_concurrent_forwards NEURON.NUM_CONCURRENT_FORWARDS
The number of concurrent forwards running at any time.
--neuron.sample_size NEURON.SAMPLE_SIZE
The number of miners to query in a single step.
--neuron.search_request_interval NEURON.SEARCH_REQUEST_INTERVAL
The interval seconds between search requests.
--neuron.search_result_size NEURON.SEARCH_RESULT_SIZE
The number of search results required for each miner to return.
--neuron.disable_set_weights
Disables setting weights.
--neuron.moving_average_alpha NEURON.MOVING_AVERAGE_ALPHA
Moving average alpha parameter, how much to add of the new observation.
--neuron.axon_off, --axon_off
Set this flag to not attempt to serve an Axon.
--neuron.vpermit_tao_limit NEURON.VPERMIT_TAO_LIMIT
The maximum number of TAO allowed to query a validator with a vpermit.
--config CONFIG If set, defaults are overridden by passed file.
--strict If flagged, config will check that only exact arguments have been set.
--no_version_checking
Set ``true`` to stop cli version checking.
To monitor your validator process, use the following pm2 commands to monitor the status and logs of your process:
pm2 status
pm2 logs <id>