This page provides basic tutorials about the usage of PyRetri. For installation instructions and dataset preparation, please see INSTALL.md.
After the gallery set and query set are separated, we package the information of each sub-dataset in pickle format for further process. We use different types to package different structured folders: general
, oxford
and reid
.
The general object recognition dataset collects images with the same label in one directory and the folder structure should be like this:
# type: general
general_recognition
├── class A
│ ├── XXX.jpg
│ └── ···
├── class B
│ ├── XXX.jpg
│ └── ···
└── ···
Oxford5k is a typical dataset in image retrieval field and the folder structure is as follows:
# type: oxford
oxford
├── gt
│ ├── XXX.txt
│ └── ···
└── images
├── XXX.jpg
└── ···
The person re-identification dataset have already split the query set and gallery set, its folder structure should be like this:
# type: reid
person_re_identification
├── bounding_box_test
│ ├── XXX.jpg
│ └── ···
├── query
│ ├── XXX.jpg
│ └── ···
└── ···
Choosing the mode carefully, you can generate data jsons by:
python3 main/make_data_json.py [-d ${dataset}] [-sp ${save_path}] [-t ${type}] [-gt ${ground_truth}]
Auguments:
data
: Path of the dataset for generating data json file.save_path
: Path for saving the output file.type
: Type of the dataset collecting images. For dataset collecting images with the same label in one directory, we usegeneral
. For oxford dataset, we useoxford
. For re-id dataset, we usereid
.ground_truth
: Optional. Path of the gt information, which is necessary for generating data json file of oxford dataset.
Examples:
# for dataset collecting images with the same label in one directory
python3 main/make_data_json.py -d /data/caltech101/gallery/ -sp data_jsons/caltech_gallery.json -t general
python3 main/make_data_json.py -d /data/caltech101/query/ -sp data_jsons/caltech_query.json -t general
# for oxford dataset
python3 main/make_data_json.py -d /data/cbir/oxford/gallery/ -sp data_jsons/oxford_gallery.json -t oxford -gt /data/cbir/oxford/gt/
python3 main/make_data_json.py -d /data/cbir/oxford/query/ -sp data_jsons/oxford_query.json -t oxford -gt /data/cbir/oxford/gt/
# for re-id dataset
python3 main/make_data_json.py -d /data/market1501/bounding_box_test/ -sp data_jsons/market_gallery.json -t reid
python3 main/make_data_json.py -d /data/market1501/query/ -sp data_jsons/market_query.json -t reid
Note: Oxford dataset contains the ground truth of each query image in a txt file, so remember to give the path of gt file when generating data json file of Oxford.
All outputs (features and labels) will be saved to the target directory in pickle format.
Extract feature for each data json file by:
python3 main/extract_feature.py [-dj ${data_json}] [-sp ${save_path}] [-cfg ${config_file}] [-si ${save_interval}]
Arguments:
-
data_json
: Path of the data json file to be extrated. -
save_path
: Path for saving the output features in pickle format. -
config_file
: Path of the configuration file in yaml format. -
save_interval
: Optional. It is the number of features saved in one part file, which is set to 5000 by default.
# extract features of gallert set and query set
python3 main/extract_feature.py -dj data_jsons/caltech_gallery.json -sp /data/features/caltech/gallery/ -cfg configs/caltech.yaml
python3 main/extract_feature.py -dj data_jsons/caltech_query.json -sp /data/features/caltech/query/ -cfg configs/caltech.yaml
The path of query set features and gallery set features is specified in the config file.
After extracting gallery set features and query set features, you can index the query set features by:
python3 main/index.py [-cfg ${config_file}]
Arguments:
config_file
: Path of the configuration file in yaml format.
Examples:
python3 main/index.py -cfg configs/caltech.yaml
For visulization results and wrong case analysis, we provide the script for single query image and you can visualize or save the retrieval results easily.
Use this command to single image index:
python3 main/single_index.py [-cfg ${config_file}]
Arguments:
config_file
: Path of the configuration file in yaml format.
Examples:
python3 main/single_index.py -cfg configs/caltech.yaml
Please see single_index.py for more details.
We basically categorize retrieval process into 4 components.
- model: the pre-trained model for feature extraction.
- extract: assign which layer to output, including splitter functions and aggregation methods.
- index: index features, including dimension process, feature enhance, distance metric and re-rank.
- evaluate: evaluate retrieval results, outputting recall and mAP results.
Here we show how to add your own model to extract features.
- Create your model file
pyretri/models/backbone/backbone_impl/reid_baseline.py
.
import torch.nn as nn
from ..backbone_base import BackboneBase
from ...registry import BACKBONES
@BACKBONES.register
class ft_net(BackboneBase):
def __init__(self):
pass
def forward(self, x):
pass
or
import torch.nn as nn
from ..backbone_base import BackboneBase
from ...registry import BACKBONES
class FT_NET(BackboneBase):
def __init__(self):
pass
def forward(self, x):
pass
@BACKBONES.register
def ft_net():
model = FT_NET()
return model
- Import the module in
pyretri/models/backbone/__init__.py
.
from .backbone_impl.reid_baseline import ft_net
__all__ = [
'ft_net',
]
- Use it in your config file.
model:
name: "ft_net"
ft_net:
load_checkpoint: "/data/my_model_zoo/res50_market1501.pth"
Since tricks used in each stage have a signicant impact on retrieval performance, we present the pipeline combinations search scripts to help users to find possible combinations of approaches with various hyper-parameters.
cd search/
We decompose the search space into three sub search spaces: pre_process, extract and index, each of which corresponds to a specified file. Search space is defined by adding methods with hyper-parameters to a specified dict. You can add a search operator as follows:
pre_processes.add(
"PadResize224",
{
"batch_size": 32,
"folder": {
"name": "Folder"
},
"collate_fn": {
"name": "CollateFn"
},
"transformers": {
"names": ["PadResize", "ToTensor", "Normalize"],
"PadResize": {
"size": 224,
"padding_v": [124, 116, 104]
},
"Normalize": {
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225]
}
}
}
)
By doing this, a pre_process operator named "PadResize224" is added to the data_process sub search space and will be searched in the following process.
Similar to the image retrieval pipeline, combinations search includes two stages: search for feature extraction and search for indexing.
Search for the feature extraction combinations by:
python3 search_extract.py [-sp ${save_path}] [-sp ${search_modules}]
Arguments:
save_path
: path for saving the output features in pickle format.search_modules
: name of the folder containing search space files.
Examples:
python3 search_extract.py -sp /data/features/gap_gmp_gem_crow_spoc/ -sm search_modules
Search for the indexing combinations by:
python3 search_index.py [-fd ${fea_dir}] [-sm ${search_modules}] [-sp ${save_path}]
Arguments:
fea_dir
: path of the output features extracted by the feature extraction combinations search.search_modules
: name of the folder containing search space files.save_path
: path for saving the retrieval results of each combination.
Examples:
python3 search_index.py -fd /data/features/gap_gmp_gem_crow_spoc/ -sm search_modules -sp /data/features/gap_gmp_gem_crow_spoc_result.json
We provide two ways to show the search results. One is to save all the search results in a csv format file, which can be used for further analyses. Another is to show the search results according to the given keywords. You can define the keywords as follows:
keywords = {
'data_name': ['market'],
'pre_process_name': list(),
'model_name': list(),
'feature_map_name': list(),
'aggregator_name': list(),
'post_process_name': ['no_fea_process', 'l2_normalize', 'pca_whiten', 'pca_wo_whiten'],
}
Show the search results by:
show_search_results.py [-r ${result_json_path}]
Arguments:
result_json_path
: path of the result json file.
Examples:
show_search_results.py -r /data/features/gap_gmp_gem_crow_spoc_result.json
See show_search_results.py for more details.