Department of Computer Science, University of Verona (ITALY)
The project focuses on a specific kind of crowd, the spectator crowd. This type of social gathering is formed by people "interested in watching something specific that they came to see"
[
].
Differently from the generic crowd instances analyzed in computer vision, this entity has unique features that distinguish it, demanding for specific analysis techniques.
The project aims at exploring all the unique features of the spectator crowd, yielding to fresh-new tasks or classical computer vision applications which are faced under a new perspective, such as
The research is based on a project started within an international hockey competition, originating the first spectator crowd dataset: S-HOCK. The S-HOCK dataset and some applications on the spectator crowd have been published at CVPR 2015 [ ]. For the dataset and more information, please see here.
The data collection campaign focused on 4 hockey matches held in Trento (Italy) during the 26th Winter Universiade.
We used 5 cameras, a full HD camera (1920x1080, 30 fps, focal length 4mm) for the ice rink, another one for a panoramic view of all the bleachers, and
3 high resolution cameras (1280x1024, 30 fps, focal length 12mm) focusing on different parts of the spectator crowd.
From each match we selected a pool of sequences in order to represent a wide, uniform and representative spectrum of situations, e.g. tens of instances of
goals, shots on goal, saves, faults, timeouts (each sequence has more than one event).
Each sequence has been annotated frame by frame, spectator by spectator, by a first annotator, using the ViPER format.
The annotator had to perform three different macro tasks: detection (localizing the body and the head), posture and action annotation, respectively.
capturing fine grained actions such as hands on hips, clapping hands, watching the cellphone etc. (click here for more details), for a total of more than 100 millions of annotations.
After each match, we asked to a percentage of uniformly distributed spectators (30%) to fill a simple questionnaire with three questions:
In this work we present a set of possible applications on S-HOCK. In particular we focus on two classical tasks, such as people detection and head pose estimation, and one more interesting application from the social point of view, such as spectator categorization.
People detection is a standard and still open research topic in computer vision, with the HOG features and
the Deformable Part Model (DPM) as workhorses, and plenty of alternative algorithms. Unfortunately, most of the methods in the literature are not directly usable in our
scenario, mostly for two reasons: low resolution - a person has an average dimension of 70x110 pixels - and occlusions usually only the upper body is visible, rarely the
entire body and sometimes only the face.
We provided 5 different baselines for people detection:
On top of all these methods, we propose an extension based on the strong prior we have in our kind of crowd, i.e. the people are "constrained" by the environment to arrange in a grid - the seats on the bleachers. Assuming a regular grid (considering the camera perpendicular at the plane of the bleachers and ignoring distortion effects) and acounting for the fact that people are more likely to be located on the same rows and columns, we can just add to the detection confidence map the average of the map over the rows and the columns.
Method | no prior | with prior | ||||
Prec. | Rec. | F1 | Prec. | Rec. | F1 | |
HOG+SVM | 0.743 | 0.561 | 0.639 | 0.662 | 0.709 | 0.684 |
HASC+SVM | 0.365 | 0.642 | 0.465 | 0.357 | 0.685 | 0.469 |
ACF | 0.491 | 0.622 | 0.548 | 0.524 | 0.649 | 0.580 |
DPM | 0.502 | 0.429 | 0.463 | 0.423 | 0.618 | 0.502 |
CUBD | 0.840 | 0.303 | 0.444 | 0.613 | 0.553 | 0.581 |
Method | AVG Accuracy | Training time (sec) | Testing time (sec) |
---|---|---|---|
Orozco et al. | 0.368 | 105303 | 6263 |
WArCo | 0.376 | 186888 | 87557 |
CNN | 0.346 | 16106 | 68 |
SAE | 0.348 | 9384 | 3 |
CNN+EACH | 0.354 | 16106 | 68 |
SAE+EACH | 0.363 | 9384 | 3 |
The spectator categorization task consists in finding the supporters of each team, accounting for the fact that fan of the same team will have a similar behavior at some events (a goal etc.) which is strongly different from that of the other supporters.
Ground Truth | AS2007 | MMS2010 | Our |
![]() |
![]() |
![]() |
![]() |
Accuracy | 0.592 | 0.559 | 0.621 |
In this work we propose a new framework for Automated Spectator Crowd Analysis. In a first step we focus on individual frames, clustering local flow measures into spatial regions. The clustering is then extended by adding the temporal axis into the analysis, looking for non-randomic spatio-temporal clusters; for this purpose, the Lempel-Ziv complexity is considered. This way, choral activities can emerge, indicating for example fan groups belonging to different teams. After this, with the adoption of entropic measures, the degree of excitement of such groups can be quantified.