VidSitu Dataset: Situation Recognition in Videos
Hyper intéressant ! C'est la brique qui manque à Azure Video Indexer et qui ferait un gros jump

VidSitu is a large-scale dataset containing diverse 10-second videos from movies depicting complex situations (a collection of related events). Events in the video are richly annotated at 2-second intervals with verbs, semantic-roles, entity co-references, and event relations.
via Deep Learning Weekly : lire l’article source