The goal of the project ARCA has been to automatically analyze human activities observed in videos, which is the basis for novel applications. It could be used to create short videos that summarize daily activities to support patients suffering from Alzheimer's disease. It could also be used for education, e.g., by providing a video analysis for a trainee in the hospital that shows if the tasks have been correctly executed. The analysis of complex activities in videos, however, is very challenging since activities vary in temporal duration between minutes and hours, involve interactions with several objects that change their appearance and shape, e.g., food during cooking, and are composed of many sub-activities, which can happen at the same time or in various orders.
While the majority of recent works in action recognition has focused on developing better feature encoding techniques for classifying sub-activities in short video clips of a few seconds, this project moved forward and aimed to develop a higher-level representation of complex activities to overcome the limitations of current approaches. This includes the handling of large time variations and the ability to recognize and locate complex activities in videos. A second objective of the project has been to learn a representation from videos that is not limited to a specific application, but that can be reused and adapted to a new setting. The third objective has been to synthesize human motion or poses by just providing a list of human actions or a human description to demonstrate that the model cannot only interpret data but also generate data.
Li S., Zhou Y., Yi J., and Gall J., Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting (PDF, Supplementary Material), International Conference on Computer Vision (ICCV'21), To appear.
Behrmann N., Fayyaz M., Gall J., and Noroozi M., Long Short View Feature Decomposition via Contrastive Video Representation Learning (PDF, Supplementary Material), International Conference on Computer Vision (ICCV'21), To appear.
Biswas S. and Gall J., Multiple Instance Triplet Loss for Weakly Supervised Multi-Label Action Localisation of Interacting Persons (PDF), Understanding Social Behavior in Dyadic and Small Group Interactions Workshop, To appear.
Souri Y., Fayyaz M., Minciullo L., Francesca G., and Gall J., Fast Weakly Supervised Action Segmentation Using Mutual Consistency (PDF, Code), IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. ©IEEE
Li Z., Abu Farha Y., and Gall J., Temporal Action Segmentation from Timestamp Supervision (PDF, Supplementary Material, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'21), To appear.
Fayyaz M., Bahrami E., Diba A., Noroozi M., Adeli E., van Gool L., and Gall J., 3D CNNs with Adaptive Temporal Feature Resolutions (PDF, Supplementary Material, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'21), To appear.
Zatsarynna O., Abu Farha Y., and Gall J., Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos (PDF), IEEE Workshop on Precognition: Seeing through the Future, To appear.
Li S., Yi J., Abu Farha Y., and Gall J., Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition (PDF, Code), IEEE Robotics and Automation Letters (RA-L), Vol. 6, No. 2, 1028-1035, 2021. ©IEEE
Sushko V., Schönfeld E., Zhang D., Gall J., Schiele B., and Khoreva A., You Only Need Adversarial Supervision for Semantic Image Synthesis (PDF, Code), International Conference on Learning Representations (ICLR'21), 2021.
Behrmann N., Gall J., and Noroozi M., Unsupervised Video Representation Learning by Bidirectional Feature Prediction (PDF), Winter Conference on Applications of Computer Vision (WACV'21), 1669-1678, 2021. ©IEEE
Biswas S. and Gall J., Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting (PDF, Supplementary Material, Code), Asian Conference on Computer Vision (ACCV'20), Springer, LNCS 12626, 547-561, 2021. ©Springer-Verlag
Kwon O.-H., Tanke J., and Gall J., Recursive Bayesian Filtering for Multiple Human Pose Tracking from Multiple Cameras (PDF), Asian Conference on Computer Vision (ACCV'20), Springer, LNCS 12623, 438-453, 2021. ©Springer-Verlag
Li S., Abu Farha Y., Liu Y., Cheng M.-M., and Gall J., MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation (PDF, MS-TCN Code, MS-TCN++ Code), IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. ©IEEE
Abu Farha Y., Ke Q., Schiele B., and Gall J., Long-Term Anticipation of Activities with Cycle Consistency (PDF, Supplementary Material), DAGM German Conference on Pattern Recognition (GCPR'20), Springer, LNCS 12544, 159-173, 2021. ©Springer-Verlag
Zhang Y., Briq R., Tanke J., and Gall J., Adversarial Synthesis of Human Pose From Text (PDF, Supplementary Material), DAGM German Conference on Pattern Recognition (GCPR'20), Springer, LNCS 12544, 145-158, 2021. ©Springer-Verlag
Rafi U., Doering A., Leibe B., and Gall J., Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos (PDF, Supplementary Material), European Conference on Computer Vision (ECCV'20), Springer, LNCS 12365, 36-52, 2020. ©Springer-Verlag
Diba A., Fayyaz M., Sharma V., Paluri M., Gall J., Stiefelhagen R., and van Gool L., Large Scale Holistic Video Understanding (PDF, Supplementary Material, Data), European Conference on Computer Vision (ECCV'20), Springer, LNCS 12350, 593-610, 2020. ©Springer-Verlag
Fayyaz M. and Gall J., SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation (PDF, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'20), 498-507, 2020. ©IEEE
Kuehne H., Richard A., and Gall J., A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation (PDF, Code), IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, No. 4, 765-779, 2020. ©IEEE
Panareda Busto P., Iqbal A., and Gall J., Open Set Domain Adaptation for Image and Action Recognition (PDF, Supplementary Material, Slides, Code), IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, No. 2, 413-429, 2020. ©IEEE
Ruiz A. H., Gall J., and Moreno-Noguer F., Human Motion Prediction via Spatio-Temporal Inpainting (PDF), International Conference on Computer Vision (ICCV'19), 7133-7142, 2019. ©IEEE
Abu Farha Y. and Gall J., Uncertainty-Aware Anticipation of Activities (PDF), International Workshop on Human Behaviour Understanding, 1197-1204, 2019. ©IEEE
Richard A., Iqbal A., and Gall J., Enhancing Temporal Action Localization with Transfer Learning from Action Recognition (PDF), Workshop and Challenge on Comprehensive Video Understanding in the Wild, 1533-1540, 2019. ©IEEE
Sawatzky J., Banerjee D., and Gall J., Harvesting Information from Captions for Weakly Supervised Semantic Segmentation (PDF), Workshop on Cross-Modal Learning in Real World, 4481-4490, 2019. ©IEEE
Iqbal A. and Gall J., Level Selector Network for Optimizing Accuracy-Specificity Trade-offs (PDF), International Workshop on Large Scale Holistic Video Understanding, 1466-1473, 2019. ©IEEE
Panareda Busto P. and Gall J., Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data (PDF, Code, Supplementary Material), German Conference on Pattern Recognition (GCPR'19), Springer, LNCS 11824, 107-121, 2019. ©Springer-Verlag
Tanke J. and Gall J., Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views (PDF, Code), German Conference on Pattern Recognition (GCPR'19), Springer, LNCS 11824, 537-550, 2019. ©Springer-Verlag
Biswas S., Souri Y., and Gall J., Hierarchical Graph-RNNs for Action Detection of Multiple Activities (PDF), IEEE International Conference on Image Processing (ICIP'19), 1-5, 2019. ©IEEE
Thoker F. and Gall J., Cross-modal Knowledge Distillation for Action Recognition (PDF), IEEE International Conference on Image Processing (ICIP'19), 6-10, 2019. ©IEEE
Abu Farha Y. and Gall J., MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation (PDF, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'19), 3570-3579, 2019. ©IEEE
Sawatzky J., Souri Y., Grund C., and Gall J., What Object Should I Use? - Task Driven Object Detection (PDF, Supplementary Material, Code/Data), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'19), 7597-7606, 2019. ©IEEE
Kukleva A., Kuehne H., Sener F., and Gall J., Unsupervised Learning of Action Classes with Continuous Temporal Embedding (PDF, Supplementary Material, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'19), 12058-12066, 2019. ©IEEE
Sabokrou M., Pourreza M., Fayyaz M., Entezari R., Fathy M., Gall J., and Adeli E., AVID: Adversarial Visual Irregularity Detection (PDF, Code), Asian Conference on Computer Vision (ACCV'18), Springer, LNCS 11269, 169-184, 2018. ©Springer-Verlag
Briq R., Moeller M., and Gall J., Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation (PDF, Code), British Machine Vision Conference (BMVC'18), 2018.
Doering A., Iqbal U., and Gall J., Joint Flow: Temporal Flow Fields for Multi Person Tracking (PDF), British Machine Vision Conference (BMVC'18), 2018.
Rafi U., Gall J., and Leibe B., Direct Shot Correspondence Matching (PDF), British Machine Vision Conference (BMVC'18), 2018.
Iqbal U., Molchanov P., Breuel T., Gall J., and Kautz J., Hand Pose Estimation via Latent 2.5D Heatmap Regression (PDF), European Conference on Computer Vision (ECCV'18), Springer, LNCS 11215, 125-143, 2018. ©Springer-Verlag
Diba A., Fayyaz M., Sharma V., Arzani M., Yousefzadeh R., Gall J., and van Gool L., Spatio-Temporal Channel Correlation Networks for Action Classification (PDF), European Conference on Computer Vision (ECCV'18), Springer, LNCS 11208, 299-315, 2018. ©Springer-Verlag
Iqbal U., Doering A., Yasin H., Krüger B., Weber A., and Gall J., A Dual-Source Approach for 3D Human Pose Estimation from Single Images (PDF, Code), Computer Vision and Image Understanding, Vol 172, 37-49, Elsevier, 2018. ©Elsevier
Richard A., Kuehne H., Iqbal A., and Gall J., NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning (PDF, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), 7386-7395, 2018. ©IEEE
Abu Farha Y., Richard A., and Gall J., When will you do what? - Anticipating Temporal Occurrences of Activities (PDF, Code, Video), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), 5343-5352, 2018. ©IEEE
Richard A., Kuehne H., and Gall J., Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints (PDF, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), 5987-5996, 2018. ©IEEE
Andriluka M., Iqbal U., Insafutdinov E., Pishchulin L., Milan A., Gall J., and Schiele B., PoseTrack: A Benchmark for Human Pose Estimation and Tracking (PDF, Data), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), 5167-5176, 2018. ©IEEE
Biswas S. and Gall J., Structural Recurrent Neural Network (SRNN) for Group Activity Analysis (PDF), IEEE Winter Conference on Applications of Computer Vision (WACV'18), 1625-1632, 2018. ©IEEE
Kuehne H., Richard A., and Gall J., Weakly Supervised Learning of Actions from Transcripts (PDF), Computer Vision and Image Understanding, Special Issue on Language in Vision, Vol 163, 78-89, Elsevier, 2017. ©Elsevier
Panareda Busto P. and Gall J., Open Set Domain Adaptation (PDF, Supplementary Material, Slides, Code), International Conference on Computer Vision (ICCV'17), 754-763, 2017. ©IEEE (Marr Prize Honorable Mention)
Iqbal A., Richard A., Kuehne H., and Gall J., Recurrent Residual Learning for Action Recognition (PDF), German Conference on Pattern Recognition (GCPR'17), Springer, LNCS 10496, 126-137, 2017. ©Springer-Verlag
Richard A., Kuehne H., and Gall J., Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling (PDF, Code), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), 1273-1282, 2017. ©IEEE
Iqbal U., Milan A., and Gall J., PoseTrack: Joint Multi-Person Pose Estimation and Tracking (PDF, Data/Code, PoseTrack Challenge), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), 4654-4663, 2017. ©IEEE
Iqbal U., Garbade M., and Gall J., Pose for Action - Action for Pose (PDF, Code), IEEE International Conference on Automatic Face and Gesture Recognition (FG'17), 438-445, 2017. ©IEEE
Richard A. and Gall J., A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition (PDF, Code), Computer Vision and Image Understanding, Special Issue on Image and Video Understanding in Big Data, Vol 156, 79–91, Elsevier, 2017. ©Elsevier
Iqbal U. and Gall J., Multi-Person Pose Estimation with Local Joint-to-Person Associations (PDF), International Workshop on Crowd Understanding, Springer, LNCS 9914, 627-642, 2016. ©Springer-Verlag
Rafi U., Kostrikov I., Gall J., and Leibe B., An Efficient Convolutional Network for Human Pose Estimation (PDF, Code), British Machine Vision Conference (BMVC'16), 2016.
Garbade M. and Gall J., Handcrafting vs Deep Learning: An Evaluation of NTraj+ Features for Pose Based Action Recognition (PDF), Workshop on New Challenges in Neural Computation and Machine Learning (NC2), 2016.
Fast Weakly Supervised Action Segmentation Using Mutual Consistency
Temporal Action Segmentation from Timestamp Supervision
3D CNNs with Adaptive Temporal Feature Resolutions
Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition
You Only Need Adversarial Supervision for Semantic Image Synthesis
Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting
SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data
Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views
What Object Should I Use? - Task Driven Object Detection
Unsupervised Learning of Action Classes with Continuous Temporal Embedding
AVID: Adversarial Visual Irregularity Detection
Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
When will you do what? - Anticipating Temporal Occurrences of Activities
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints
Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling
PoseTrack: Joint Multi-Person Pose Estimation and Tracking
Pose for Action - Action for Pose
An Efficient Convolutional Network for Human Pose Estimation
Large Scale Holistic Video Understanding
What Object Should I Use? - Task Driven Object Detection
PoseTrack: Joint Multi-Person Pose Estimation and Tracking
Principal Investigator:
Prof. Dr. Juergen Gall
Postdocs:
Hildegard Kühne
Umer Rafi
Ph. D. students:
Shi-Jie Li
Julian Tanke
Andreas Doering
Yazan Abu Farha
Rania Briq
Mohsen Fayyaz
Yaser Souri
Mian Ahsan Iqbal
Sovan Biswas
Fadime Sener
Alexander Richard
Umar Iqbal