- The answer to India's traffic problem could be to identify the small areas that are prime spots for congestion, analyse the traffic at these points and then look at appropriate solutions
- IISc Bengaluru researchers have developed TraCount, an automated system that analyses the images of a congested area using surveillance cameras and provides an accurate count of the number of vehicles
- TraCount uses convolutional neural networks, a class of artificial neural networks modelled after the human brain, to get around the problem occlusion due to distance and to detect vehicles accurately
Traffic battles have been an indelible part of urban life in India. The biggest arterial roads of most Indian cities — Mumbai, Delhi, Bengaluru, Chennai, Kolkata — are usually the most congested and we have no option but resign ourselves to wasting precious hours stuck in traffic, breathing in the exhaust fumes from vehicles around us (unless you roll up the windows and switch on the AC). Unfortunately, there seems to be no solution to this unrelenting mass of tangled automobiles, fighting their way through sly bottlenecks.
How do we tackle this alarming problem with its myriad of frightening consequences, ranging from premature depreciation of vehicles and increased carbon emissions to health problems and frayed tempers? The answer could be — as Shiv Surya and Dr Venkatesh Babu from the Indian Institute of Science (IISc), Bengaluru, suggest — to first identify the small areas that are the prime spots for congestion and analyse the traffic at these points. Once that is done, we could look at appropriate solutions — either building alternate routes (more infrastructure) or making an adjustment in the traffic signal timings.
IISc Bengaluru researchers have developed TraCount, an automated system that analyses the images of a congested area using the surveillance cameras installed in this area, and provides an accurate count of the number of vehicles contributing to the traffic
In order to take this first step, the IISc researchers have developed TraCount, an automated vehicle counting system that analyses the images of a congested area using the surveillance cameras installed in this area, and provides an accurate count of the number of vehicles contributing to the traffic. For a human being, it is quite easy to identify a car partially hidden behind a tree or another car. For a computer, however, this can often lead to confused predictions. Moreover, it is harder to detect vehicles present farthest from the camera due to significant occlusion and the small size of vehicles. The researchers presented their idea at the Indian Conference on Computer Vision, Graphics and Image Processing.
TraCount addresses these problems with the help of convolutional neural networks (CNNs), a class of artificial neural networks (ANNs) that are modelled after the human brain. ANNs classify data by being ‘trained’ to learn the relationship between a set of inputs and their labels. CNNs are more suitable computationally for processing images. They consist of layers of learnable filters that are activated by certain visual features / concepts in an image. For instance, a filter used to detect a visual feature such as an edge of a vertical surface would recognise edges of a table, including its legs.
The TraCount model consists of repeating blocks of a convolutional layer with a nonlinearity layer and a pooling layer. The convolutional layer consists of ‘learnable’ filters that are activated when they recognise some type of visual feature such as an edge of a surface or a blotch of a colour in the first layers. The deeper convolution layers produce a strong activation or “fires” for higher visual concepts like wheel of a vehicle etc. Pooling aggregates the strongest responses and reduces the volume of data, thus reducing the computational load.
TraCount comprises two shallow fully convolutional (FC) sub-networks fused with a deep monolithic FC network. “A monolithic CNN is a CNN with each filter operating on the feature maps of all filters on a previous layer. A feature map is the output of one filter applied to the input from the previous layer,” explains Surya. Fully convolutional layers differ from the usual CNNs in that they have no fully connected layers at the end. Fully convolutional networks are suitable when we want to predict an image like output (in our case a density map of how vehicles are distributed in an image), rather than merely predict the count of vehicles.
The team tested TraCount on the TRANCOS dataset, which consists of 1,244 images of vehicular traffic. TraCount’s novel architecture reduced the error in classification by more than 4% compared to the baseline architecture and was still an improvement over using a single deep monolithic FC network
For better detection of vehicles present in various scales and shapes and to get around the problem occlusion due to distance, different receptive fields are required to handle large variations in scale. “The size of the filter is one of the factors that determines its receptive field, which is the region of the image that the filter operates on its input,” says Surya. To augment the ability of the monolithic FC network to handle various sizes of vehicles, two smaller FC sub-networks with varied receptive fields are used. For better prediction accuracy, the predictions of the sub-networks are combined with that of the deeper monolithic network.
The team tested TraCount on the TRANCOS dataset, which consists of 1,244 images of vehicular traffic. The Mean Absolute Error metric, a quantity used to measure how close forecasts or predictions are to the eventual outcomes, was used to evaluate the performance of the system. TraCount’s novel architecture reduced the error in classification by more than 4% compared to the baseline architecture and was still an improvement over using a single deep monolithic FC network.
Talking about future work to improve TraCount, Surya mentions the use of attention modules. Attention modules help the CNN focus on the relevant parts of an image. “Attention can help a network to look at a region in the image and its context and disambiguate background that is misleading. This disambiguation can potentially arise from having networks that give feedback and use attention to increase or decrease context,” he says.
With TraCount paving the way for better handling of traffic in metropolitan areas, we can probably look forward to shorter and smoother commute to work and breathe easy, quite literally.Lead visual: Angela Anthony Pereira
The ‘Science Language’ series is sourced from ResearchMatters.in, a portal that aims to make science accessible to mainstream audiences. The articles here may have been run past the researchers whose work is covered, as is common practice in science journals, to ensure accuracy.