With the help of open object detection and utilizing motion information (Kalman filters), multi-object tracking can be performed. Due to time constraints, currently only GroundingDINO, GLIP, and Detic combined with ByteTrack are supported for tracking.
tracking_demo.py
: for multi-object tracking inference on video or image folders.
This project referenced GroundingDINO. Thanks!
## Base Development Environment Setup
```shell
conda create -n mmtracking-sam python=3.8 -y
conda activate mmtracking-sam
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
git clone https://github.com/open-mmlab/playground.git
pip install -U openmim
mim install "mmcv>=2.0.0"
# build from source
cd playground
git clone -b tracking https://github.com/open-mmlab/mmdetection.git
cd mmdetection; pip install -e .; cd ..
cd playground
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install git+https://github.com/IDEA-Research/GroundingDINO.git
cd playground
pip install einops shapely timm yacs tensorboardX ftfy prettytable pymongo transformers nltk inflect scipy pycocotools opencv-python matplotlib
git clone https://github.com/microsoft/GLIP.git
cd GLIP; python setup.py build develop --user; cd ..
pip install git+https://github.com/facebookresearch/segment-anything.git
cd playground
wget https://download.openmmlab.com/playground/mmtracking/tracking_demo.zip
unzip tracking_demo.zip
mkdir ../models
wget -P ../models/ https://download.openmmlab.com/mmdetection/v3.0/detic/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k_20230120-0d301978.pth
wget -P ../models/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget -P ../models/ https://penzhanwu2bbs.blob.core.windows.net/data/GLIPv1_Open/models/glip_a_tiny_o365.pth
wget -P ../models/ https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
Using GroundingDINO as an example only.
cd mmtracking_open_detection
# input a video
python tracking_demo.py "../tracking_demo/mot_challenge_track.mp4" "configs/GroundingDINO_SwinB.cfg.py" "../models/groundingdino_swinb_cogcoor.pth" --text_prompt "person . rider . car . truck . bus . train . motorcycle . bicycle ." --out-dir "outputs/mot_challenge"
# input a images folder
python tracking_demo.py "../tracking_demo/bdd_val_track" "configs/GroundingDINO_SwinB.cfg.py" "../models/groundingdino_swinb_cogcoor.pth" --text_prompt "person . rider . car . truck . bus . train . motorcycle . bicycle ." --out-dir "outputs/bdd100k" --fps 30
cd mmtracking_open_detection
# input a video
python tracking_demo.py "../tracking_demo/bdd_val_track" "configs/GroundingDINO_SwinB.cfg.py" "../models/groundingdino_swinb_cogcoor.pth" --text_prompt "person . rider . car . truck . bus . train . motorcycle . bicycle ." --out-dir "outputs/bdd100k" --fps 30 --mots