The Process

Step

1

Detecting People

Our first step is to determine how to identify people within an image. To accomplish this, we chose to use face detection rather than detecting the whole body. In many applications, the body is only partially visible and faces are more often completely in view. Face detection has also been studied for longer and has become widespread in the last decade, making the technology more reliable.

With StreamLogic, we can choose between face detection algorithms using traditional image processing (a feature descriptor called the histogram of oriented gradients) or the more recent deep neural networks (known as DNN). Both of these approaches involve machine learning to train a model that takes an image as input and outputs a list of bounding boxes around the regions of the image that contain a human face. Since there are pre-trained models available, we chose to use a pre-trained DNN model for this project.

Step

2

Detecting People Across Frames

With StreamLogic’s face detection algorithm, we can easily count the number of faces in a single image. However, this introduces a new challenge: how do we extend this capability to video?

Since video is just a sequence of individual frames, we are able to apply the face detection algorithm to each frame. And because we are counting people, we only need to find faces that have not been counted already in previous frames. One simple way to perform this task is to keep the locations of the faces from the previous frame, detect faces in the new frame, and discard any faces that overlap by a certain amount when evaluating both frames.

Challenges

Extending the face detection algorithm to video introduced new challenges. While this approach provides an estimate number of people, there were a couple key issues. In a crowded environment, overlap may not be a good indication of which faces are the same across different frames. This can lead to undercounting. Another prevalent issue is that the face detection is computationally expensive and it’s preferable to avoid applying it to every frame.

Solution: Object Tracking

A more sophisticated approach to detecting people across frames is to employ object tracking. Object tracking algorithms are developed specifically for tracking unique objects from one frame to the next in a video. These algorithms can run faster than face detection and are based not just on the object’s location, but also its visual appearance. That’s why StreamLogic employs the tracking algorithm available in the DlibC++ library. This algorithm tracks objects by searching for the object in an area around the location of the object in the previous frame. Not only can it handle objects moving between frames, but it can also capture objects that get larger/smaller as they move closer/farther away.

Step

3

Algorithm I

We combined the face detection method with object tracking and created the following algorithm to count the people moving past the street camera in the sample video:

Process

Initialize count and object tracker
For each video frame:

Update object tracker with new video frames
Every Nth video frame:

Run face detection
Match faces with locations of objects already being tracked
Increment count for each new face
Add each new face to object tracker
Remove objects from tracker not matched

Challenges

This algorithm described in the steps above is a great start when it comes to counting people. But it suffers from one major problem still: as people are moving through the video, particularly in a crowd, faces may get occluded by other people or objects. For example, in the clip below, the man in the middle is initially visible, however the woman in front of him blocks him as she passes.

As a result, the face is not detected during this period or frame. When the man’s face reappears once the woman has moved ahead, it gets counted again as a new face. Consequently, using Algorithm I will frequently overcount people. To better articulate this point, there are 8 unique people in the sample clip and this algorithm counts 13.

Solution: Deduplication Via Clustering

To combat this overcounting, we need to be able to determine if two face images are of the same person. Fortunately, there are a number of ways to compare two images for similarity. In the case of face images, Carnegie Mellon University (CMU) has developed a simulator model specifically for faces called OpenFace. The CMU model takes a face image as input and produces a numeric signature for the face. Similar to handwritten signatures, the OpenFace “signature” for two images of the same person’s face is not exactly the same, but can be compared for similarity.

In order to incorporate face similarity into algorithm I to reduce overcounting, we could have done it by comparing each new face to previously seen faces. However, with the StreamLogic platform, we decided to split the problem into two stages:

Use algorithm I to collect faces as they appear in the video.
Deduplicate the collected faces using the OpenFace similarity measure once.

The deduplication stage is based on clustering, a machine learning technique for grouping a set of things based on how similar they are. Clustering can be done for any set of objects as long as you have a function that can compute the similarity between two objects. And OpenFace gives us exactly that for face images because the output is a high-dimension numeric vector.

The following image shows a 2-dimension visualization of the OpenFace vectors generated for the faces extracted from the sample video:

Each point represents one face and the color represents the cluster it was assigned to.

Using the clustering algorithm, we are able to organize faces into clusters based on how similar they are because we can expect to get one cluster per unique person. Thus, the number of clusters is the number of unique people.

Step

4

Algorithm II

We combined algorithm I with the deduplication stage described above to produce the final algorithm:

Run algorithm I to collect all new face images appearing in the video
Cluster the face images using the OpenFace similarity measure
Set the person count to the number of clusters

Case Study: Counting People in Motion

The Project

Use Cases

Safety

Efficiency

The Process

Step

1

Detecting People

Step

2

Detecting People Across Frames

Challenges

Solution: Object Tracking

Step

3

Algorithm I

Process

Challenges

Solution: Deduplication Via Clustering

Step

4

Algorithm II

Results