This article describes how to benchmark Video Analytics and differentiate video analytics solutions.
πOverview:
1 How to measure video analytics performance?
1.1 Robustness
1.2 Detection distance
1.3 Number of objects
1.4 Features
1.5 Scenarios
1.6 Ease of use
2.1 Representativeness
2.2 Camera installation
2.3 Evaluation on stored video footage
In this section, the main criteria for video analytics evaluation are presented.
Robustness can be determined by counting the following three cases:
True, missed and false detection / alarm
On the left, the object is detected properly. In the middle, while there is an object, the video analytics has missed it. On the right, the video analytics falsely detects an object which is not there.
Both false alarms as well as missed alarms have to be considered in the evaluation of robustness.
In case of intrusion detection, false alerts are very time consuming and annoying and should therefore be minimized as much as possible. If too many false alerts occur, then operators have been known to shut down the video analytics system completely, as they were otherwise no longer able to fulfil their monitoring tasks. Any missed alarms, on the other hand, mean the video analytics did not fulfil their task at all and intruders could enter the premises unhindered. The ratio of true alarms to false alarms is typically very unbalanced. While a single intruder in three month is already much, video analytics can easily generate a multitude of alarms per day.
There is usually a trade-off between the sensitivity of a video analytics algorithm ensuring the detection of all objects / alarms and its false alarm robustness, as a higher sensitivity often means more false alarms, and a higher false alarm robustness often results in less sensitivity. For example, a video analytics that provides large detection distances needs to be more sensitive to be able to detect objects with few pixel only, and thus has more potential to detect false objects than a video analytics that has a reduced detection range and only detects objects covered by many pixels to start with. Exchanging a focus on sensitivity or robustness for the other might make a solution workable for a specific task, but it will not result in a better performance per se. A real progress can only be achieved if both sensitivity and false alarm robustness can be kept and improved. Note that some video analytics focus strongly on reducing false alerts as much as possible, while others focus on ensuring that every intruder will be detected, or on finding a good trade-off between sensitivity and false alarm robustness.
Detection distance measures the area in which an object / alarm can be reliably detected.
Detection distance typically depends on
For applications like people counting and traffic applications, the number of objects that can be detected and tracked simultaneously is important. In busy scenes, objects easily occlude each other, and the amount of occlusion an object can have and still be detected is also important.
Typical and useful features for intrusion detection are
Beyond that, other features include
Besides the configuration for live alarming, some video analytics offer to search for recorded alarms, or to change the configuration for a full forensic search. As a full forensic search can also be used to optimize the configuration, it is also valuable for live alarming.
As video analytics live inside the full video surveillance environment, it also needs to be checked whether the target video management system supports the video analytics, and in which depths. For Bosch video analytics, check the integration partner program (IPP) web page (http://ipp.boschsecurity.com) for integration status.
In addition to comparing configuration options, it should also be investigated whether the video analytics can cope with different scenarios like
Many video analytics have severe troubles when several objects are close to each other, and are not able to separate these objects. Thus, any evaluation of the number of objects (counting, queuing) or paths (loitering) cannot work. Usage of top-down camera views to minimize the occlusions is advisable, but cannot solve the task completely. Some video analytics assume that only people are within an area and are thus able to separate them using knowledge about the perspective in the scene. However, any shopping cart or car will then be detected as a multitude of people as well, so this is only applicable in people-only areas.
Another very challenging and often not supported area of scenarios is given by moving backgrounds like water / waves, elevators, conveyor belts and doors.
Ease of use is related to both robustness and the amount of features. The more robust a system is, the less it needs to be configured to achieve satisfactory results. The more features a system has, the more configuration options are available which might make the system seem more complex at first glance.
Typical measures to determine the ease of use are
One of the most complex and time-consuming tasks during video analytics setup can be the calibration, which is teaching the camera about the perspective in the scene. The perspective describes that an object close to the camera will appear larger than the same object further away from the camera. It also allows to transform the 2D camera image back into 3D measures like metric size, speed and geolocation. Typically, a calibration assumes that the camera is looking at a single, flat, horizontal ground plane only. Any objects walking on stairs, hills or additional levels will not be covered correctly when using the calibration information. The calibration information can be fully described via video sensor size, focal length, the camera angles with respect to the ground plane (tilt & roll angle) and the camera installation height. If instead of the full calibration and exact object location on the ground plane only the size and speed of objects in the video are of interest, then a partial calibration using typically 2-3 human markers can be done.
It is advisable to verify that the calibration is correct independent of the method used. All methods can achieve the same accuracy levels if used expertly.
Video analytics typically run 24/7 in all weather and lighting conditions, in a multitude of different setups concerning perspective, background and complexity of the scene. It is impossible to test all of them in a benchmark setup, but effort should be taken nonetheless to provide as wide a variety as possible and necessary for the video analytics task to be benchmarked.
Typical outdoor challenges for video analytics are
When planning a real camera installation for a video analytics benchmark, please keep the following in mind:
An evaluation on stored video footage has the advantage that once the video footage is collected a wide variety of scenarios and environment conditions can be tested. However, nowadays many video analytics are located directly within the camera. The only path to get the video footage to those video analytics is to display it on a monitor and film with the camera from the monitor screen. This has the following disadvantages: