3D Skeleton Datasets for Online Action Detection

Article by William Mocaër

New version of the post (with video examples) here : https://www-shadoc.irisa.fr/oad-datasets/

Online action detection is a challenging task in computer vision that involves recognizing and localizing human actions in real-time video streams. To facilitate research in this area, several datasets have been developed, each offering unique characteristics and challenges. In this article, we will present six datasets widely used for skeleton-based online action detection: G3D, OAD, MSRC-12, MAD, Chalearn, and PKUMMD. Links are available for download.

Warning: Only the skeleton modality can be downloaded here. For more modalities, please refer to the original websites mentionned in the corresponding sections below.

All datasets are converted in the same format (see format section). Split files (train/validation/test) are also available in the zips.

Format

Data

  • 1 file = 1 sequence
  • 1 line = 1 frame
  • 1 line = X groups of 3 position values (x,y,z), where X is the joints count. (Traditional order or the kinect joints, first joint is the root (hipcenter/spinebase))

Label

  • 1 file = 1 sequence
  • Name of the file is exactly the same as the corresponding data
  • 1 line = 1 gesture
  • 1 line is decomposed into 3 or 4 values : « Class id, start frame, end frame[, Action point Frame » .
    For PKU-MMD the 4th element is a « confidence » :

Note that $confidence$ is either $1$ or $2$ for slight and strong recommendation respectively.

https://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html

Actions.csv

contains the list of the action classes

  • 1 line = 1 class
  • the 0 is always « nothing » (never used to label anything in the label files)
  • format of a line is
id;class name

More than one Action.csv can be provided if necessary, for exemple a « Action1pers.csv » is provided with PKU-MMD, providing new ids for actions which concerns only the 1-skeleton sequence subset. A « Label1pers » folder is then provided with the corresponding ids.

Split files

A split file contains 4 or 6 lines, it contains the list of files that should be use for training, [validation] and testing. Files are separated by a comma « , ».

Example :

Train files:
1.txt,2.txt, ...
[Validation files : 
3.txt, .... ]
Test files:
4.txt, 8.txt, ....

Datasest descriptions

G3D Dataset

Full name : Gaming 3D

Paper :

V. Bloom, D. Makris and V. Argyriou, "G3D: A gaming action dataset and real time action recognition evaluation framework," 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 7-12, doi: 10.1109/CVPRW.2012.6239175.

Original link : http://velastin.dynu.com/G3D/G3D.html

20 Actions in total, separated into 7 categories: Fighting, golf,tennis, bowling, FPS, driving, and miscellaneous actions. Fighting is the most used category in the litterature.
The dataset consists of 20 gaming actions performed by 10 subjects.
Each of the 30 sequences in the fighting category contains these five actions in the same order.
G3D has both frame-level and action point annotations.

To enhance the evaluation and challenge the recognition systems further, we proposed an extended test set « re-arranged ». new sequences are identified as « _test…. » in the data. be sure not using the original corresponding sequence in the training since the new test is generated from these 3 original sequences. Please see splitFighting_unbiased for a correct split.

Some errors, mentionned in the pdf in the .zip, has been corrected in our version of the dataset.

Download G3D : https://www.irisa.fr/intuidoc/data/database/G3D.zip

OAD

Full name : Online Action Detection

Paper :

Li, Y., Lan, C., Xing, J., Zeng, W., Yuan, C., Liu, J. (2016). Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9911. Springer, Cham. https://doi.org/10.1007/978-3-319-46478-7_13

Original link : https://www.icst.pku.edu.cn/struct/Projects/OAD.html

Collected with Kinect v2, 10 classes, 700 action instances, 59 sequences, 8 fps.

Download OAD : https://www.irisa.fr/intuidoc/data/database/OAD.zip

MSRC-12 and MSRC6_IconicC4

Full name : Microsoft Research Cambridge-12

Paper :

Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). Association for Computing Machinery, New York, NY, USA, 1737–1746.

Original link: https://www.microsoft.com/en-us/download/details.aspx?id=52283

5 Intruction modalities:

  • Images
  • Text
  • Video
  • Images + Text
  • Video + Text

12 classes, 30 subjects, 2 categories: Iconic and metaphoric. 30 Fps.

The subset MSRC6_IconicC4 refers to Iconic category and only the C4 modality (Video + text). It has been used for some experiments in the litterature, like early recognition (Boulahia et al. 2018 RFIAP), and by Bloom et al.


Download MSRC-12 : https://www.irisa.fr/intuidoc/data/database/MSRC12.zip
Download the only the subset MSRC6-IconicC4 : https://www.irisa.fr/intuidoc/data/database/MSRC6_IconicC4.zip

MAD

Full name: Multi-Modal Action Detection

Paper:

Huang, D., Yao, S., Wang, Y., De La Torre, F. (2014). Sequential Max-Margin Event Detectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_27

Original link : http://humansensing.cs.cmu.edu/mad/download.html

40 sequences. 35 actions, always in the same order.

To enhance the evaluation and challenge the recognition systems further, we proposed an extended test set « re-arranged ». new sequences are identified as « _test…. » in the data. be sure not using the original corresponding sequence in the training, Please see split_unbiased for a correct split.

Download MAD : https://www.irisa.fr/intuidoc/data/database/MAD.zip

Chalearn (2013) / Montalbano V1

Full name : ChaLearn Gesture dataset (2013). Seem to be renamed as Montalbano V1 later.

Paper:

Sergio Escalera, Jordi Gonzàlez, Xavier Baró, Miguel Reyes, Oscar Lopes, Isabelle Guyon, Vassilis Athitsos, and Hugo Escalante. 2013. Multi-modal gesture recognition challenge 2013: dataset and results. In Proceedings of the 15th ACM on International conference on multimodal interaction (ICMI '13). Association for Computing Machinery, New York, NY, USA, 445–452. https://doi.org/10.1145/2522848.2532595

Original link: http://sunai.uoc.edu/chalearn [Not available anymore]

New Original link : https://chalearnlap.cvc.uab.cat/dataset/12/data/8/description/

  • 27 users
  • 20 Italian gesture classes
  • 20 fps
  • the gestures are performed in continuous sequences lasting 1-2 minutes (8-20 actions)
  • a single user is recorded
  • Number of sequences :
    • development: 393 (7.754 gestures),
    • validation: 287 (3.362 gestures)
    • test: 276 (2.742 gestures) → no per frame annotation, not used in most of the works.

Download Chalearn : https://www.irisa.fr/intuidoc/data/database/Chalearn.zip

PKU-MMD

Full name: PKU (Peking University) Multi-Modality Dataset

Paper:

Chunhui, Liu and Yueyu, Hu and Yanghao, Li and Sijie, Song and Jiaying, Liu. PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding. arXiv preprint arXiv:1703.07475,2017.

Original link: https://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html

the largest action detection dataset for 3D data

  • 51 actions, decomposed into 43 single-skeleton classes and 8 two-skeleton classes.
  • 30 fps
  • Kinect V2
  • 1076/3 Long sequences. 3 Views for the same sequences
  • 57 subjects
  • Two protocols : cross-view and cross subjects.

Download PKUMMD : https://www.irisa.fr/intuidoc/data/database/PKUMMD.zip

Contact

For any question, contact William Mocaër or Eric Anquetil

william.mocaer@irisa.fr, eric.anquetil@irisa.fr