-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract the feature bank of the data set #3
Comments
I'm sorry, it seems to be my fault. I forgot to modify CHECKPOINT_TYPE: caffe2, when the mode was slowfast _ 8 * 8 _ R50 _ kinetics.pkl. Now I have a new question. Why is the 147,148 lines of extract_feature.py train _ loader = loader.construct _ loader (CFG |
A. training a model on your custom dataset without feature back first. At this stage, you should check
B. Load the trained model in step A and run
I'm sorry that due to a temporay migration, currently I have no access to the server and original code, if necessary, I will try to check the feature extraction process again and upload a config file and doc about feature extraction when I get access to original code. |
For following problem, If you set AVA.FEATURE_EXTRACT : True in the config file during feature extraction for LFB, then the train, val and test split parameters will behavior the same as "test" mode, where the scale function will resize the frames and ignore the crop operation, since we want to extract features from all boxes in a frame.
I see what you mean. When extracting features, you need to set the mode to test and keep the aspect ratio of the original image, so that all features can be extracted as much as possible. Emmmm, this is my own problem, and I didn't express my confusion clearly. What I also want to know is tools/extract_feature.py152 to 155 lines: feature_bank = { 'train': dict(), 'test': dict() }, the extracted features are train dataset and val dataset,' test':dict() should 'test' be 'val'?
Emmm, It seems that your source video does not share the same frame resolution ? Currently the processing flow follows a resize operation while keeping the aspect ratio, therefore it will not guaranttee the resized frames are of the same resolution (see here), please manually check the frame resolutions.
As you said, my video resolution is indeed different, because there are many different video resolutions in AVA dataset. If you follow the data processing method of https://github.com/TencentYouResearch/ActionDetection-LSTC/blob/main/dataset.md, Same as https://github.com/facebook research/video-long-term-feature-banks/blob/main/dataset.md, extract frames will keep the same resolution as the original video. That's why I didn't make all frames have the same resolution, but I'll follow your suggestion and try it.
3.Some other tips for feauture extraction. As far as I'm concerned, the config files and repository are not fully prepared for feature extraction. To do this, you have to do following steps:
Thank you very much for your advice, which is very helpful to me. I want to train and converge quickly by using the pre-training model you gave. Your suggestions have helped me a lot. It will be more perfect when you can access the server and check the source code, and upload the configuration file for extracting features. I look forward to your update.
…------------------ 原始邮件 ------------------
发件人: "TencentYoutuResearch/ActionDetection-LSTC" ***@***.***>;
发送时间: 2022年9月2日(星期五) 晚上11:37
***@***.***>;
***@***.******@***.***>;
主题: Re: [TencentYoutuResearch/ActionDetection-LSTC] extract the feature bank of the data set (Issue #3)
Why is the 147,148 lines of extract_feature.py train _ loader = loader.construct _ loader (CFG, Test _ loader = loader.construct _ loader (CFG, "test"). Should "test" be "val"?
For following problem, If you set AVA.FEATURE_EXTRACT : True in the config file during feature extraction for LFB, then the train, val and test split parameters will behavior the same as "test" mode, where the scale function will resize the frames and ignore the crop operation, since we want to extract features from all boxes in a frame.
RuntineError: stack expects each tensor to be equal size, but got [3,8,224,302] at entry 0 and [3,8,224, 328] at entry 4
Emmm, It seems that your source video does not share the same frame resolution ? Currently the processing flow follows a resize operation while keeping the aspect ratio, therefore it will not guaranttee the resized frames are of the same resolution (see here), please manually check the frame resolutions.
Some other tips for feauture extraction. As far as I'm concerned, the config files and repository are not fully prepared for feature extraction. To do this, you have to do following steps:
A. training a model on your custom dataset without feature back first. At this stage, you should check
AVA.FEATURE_BANK_DIM equals output dim of basic SlowFast
AVA.FEATURE_EXTRACT is False
B. Load the trained model in step A and run tools/extract_feature.py. You should check some parameters in config files at this stage
AVA.FEATURE_EXTRACT is True
I'm sorry that due to a temporay migration, currently I have no access to the server and original code, if necessary, I will try to check the feature extraction process again and upload a config file and doc about feature extraction when I get access to original code.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hello, when I want to extract the feature bank of the data set, I encountered a new problem. My yaml configuration file and the error I encountered are like this. Is the pre-training weight wrong? I hope you can give me an answer.
SLOWFAST_32x2_BANK1.yaml:
TRAIN:
ENABLE: False
DATASET: ava
BATCH_SIZE: 64
EVAL_PERIOD: 20
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
CHECKPOINT_FILE_PATH: "/home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/lstc-resnet50.pyth"
DATA:
NUM_FRAMES: 32
SAMPLING_RATE: 2
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 224
INPUT_CHANNEL_NUM: [3, 3]
DETECTION:
ENABLE: True
ALIGNED: True
AVA:
FEATURE_EXTRACTION: True
FRAME_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/frames'
FRAME_LIST_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/frame_lists'
ANNOTATION_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/annotations'
DETECTION_SCORE_THRESH: 0.8
TRAIN_PREDICT_BOX_LISTS: [
"ava_train_v2.2.csv",
"ava_train_predicted_boxes.csv",
]
TEST_PREDICT_BOX_LISTS: ["ava_val_predicted_boxes.csv"]
TEST_GT_BOX_LISTS: ["ava_val_v2.2.csv"]
FEATURE_BANK_PATH: "output/feature_bank"
SLIDING_WINDOW_SIZE: 15
GATHER_BANK: False
SLOWFAST:
ALPHA: 4
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 7
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 50
TRANS_FUNC: bottleneck_transform
STRIDE_1X1: False
NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]]
SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]]
NONLOCAL:
LOCATION: [[[], []], [[], []], [[], []], [[], []]]
GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
INSTANTIATION: dot_product
POOL: [[[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]]]
BN:
USE_PRECISE_STATS: False
FREEZE: False
NUM_BATCHES_PRECISE: 200
SOLVER:
BASE_LR: 0.1
LR_POLICY: steps_with_relative_lrs
STEPS: [0, 10, 15, 20]
LRS: [1, 0.1, 0.01, 0.001]
MAX_EPOCH: 20
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-7
WARMUP_EPOCHS: 5.0
WARMUP_START_LR: 0.000125
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 80
ARCH: slowfast
MODEL_NAME: BankContext
LOSS_FUNC: bce
DROPOUT_RATE: 0.5
HEAD_ACT: sigmoid
TEST:
ENABLE: True
DATASET: ava
BATCH_SIZE: 8
CHECKPOINT_FILE_PATH: "/home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl"
DATA_LOADER:
NUM_WORKERS: 2
PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: "output/raw_bank"
CACHE:
ENABLE: True
LOG_MODEL_INFO: False
My error:
[INFO: checkpoint.py: 401]: loading checkpoint from /home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl
[09/01 16:48:34][INFO] slowfast.utils.checkpoint: 401: loading checkpoint from /home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl
Traceback (most recent call last):
File "tools/extract_feature.py", line 175, in
launch_job(
File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/misc.py", line 307, in launch_job
func(cfg=cfg)
File "tools/extract_feature.py", line 144, in extract_feature
cu.load_test_checkpoint(cfg, model)
File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/checkpoint.py", line 405, in load_test_checkpoint
load_checkpoint(
File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/checkpoint.py", line 268, in load_checkpoint
checkpoint = torch.load(f, map_location="cpu")
File "/home/tuxiangone/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/tuxiangone/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte
The text was updated successfully, but these errors were encountered: