WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition

Abstract

Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both modalities remain scarce. In this paper we introduce WEAR, a multimodal benchmark dataset for both vision- and wearable-based Human Activity Recognition (HAR). The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations. WEAR features a diverse set of activities which are low in inter-class similarity and, unlike previous egocentric datasets, not defined by human-object-interactions nor originate from inherently distinct activity categories. Provided benchmark results reveal that single-modality architectures have different strengths and weaknesses in their prediction performance. Further, in light of the recent success of transformer-based video action detection models, we demonstrate their versatility by applying them in a plain fashion using vision, inertial and combined (vision + inertial) features as input. Results show that vision transformers are not only able to produce competitive results using only inertial data, but also can function as an architecture to fuse both modalities by means of simple concatenation, with the multimodal approach being able to produce the highest average mAP, precision and close-to-best F1-scores. Up until now, vision-based transformers have neither been explored in inertial nor in multimodal human activity recognition, making our approach the first to do so. An arXiv of our paper can be found at this link.

Our Contribution

TriDet - vision based transformer implementation
TemporalMaxer - vision based transformer implementation
Application of extracted inertial features instead of raw inertial features for multimodal HAR
Breaking the Benchmark result of author with SOTA vision transformer

Changelog

18/04/2023: provided code to reproduce experiments.
12/04/2023: initial commit and arXiv uploaded.

Installation

Please follow instructions mentioned in the INSTALL.md file.

Download

The full dataset can be downloaded via this link

The download folder is divided into 3 subdirectories

annotations (> 1MB): JSON-files containing annotations per-subject using the THUMOS14-style
processed (15GB): precomputed I3D, inertial and combined per-subject features
raw (130GB): Raw, per-subject video and inertial data

Reproduce Experiments

Once having installed requirements, one can rerun experiments by running the main.py script:

python main.py --config ./configs/60_frames_30_stride/actionformer_combined.yaml --seed 1 --eval_type split

Each config file represents one type of experiment. Each experiment was run three times using three different random seeds (i.e. 1, 2, 3).

Logging using Neptune.ai

In order to log experiments to Neptune.ai please provide projectand api_token information in your local deployment (see lines 33-34 in main.py)

Contact

Marius Bock ([email protected])

Cite as

@article{bock2023wear,
  title={WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition},
  author={Bock, Marius and Moeller, Michael and Van Laerhoven, Kristof and Kuehne, Hilde},
  volume={abs/2304.05088},
  journal={CoRR},
  year={2023},
  url={https://arxiv.org/abs/2304.05088}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
camera_baseline		camera_baseline
configs		configs
inertial_baseline		inertial_baseline
utils		utils
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
RAML_HAR.pdf		RAML_HAR.pdf
README.md		README.md
main feature.py		main feature.py
main.py		main.py
raml.png		raml.png
requirements.txt		requirements.txt
teaser.gif		teaser.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition

Abstract

Our Contribution

Changelog

Installation

Download

Reproduce Experiments

Logging using Neptune.ai

Contact

Cite as

About

Releases

Packages

Languages

License

NishilBalar/WEAR

Folders and files

Latest commit

History

Repository files navigation

WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition

Abstract

Our Contribution

Changelog

Installation

Download

Reproduce Experiments

Logging using Neptune.ai

Contact

Cite as

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages