Thoughts on object detection using convolutional neural networks for forward-looking sonar

doi:10.15406/iratj.2018.04.00120

eISSN: 2574-8092

International Robotics & Automation Journal

Mini Review Volume 4 Issue 3

Thoughts on object detection using convolutional neural networks for forward-looking sonar

Assaf Livne,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Alon Baruch, Hugo Guterman

Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel

Correspondence: Assaf Livne, Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel

Received: October 20, 2017 | Published: May 23, 2018

Citation: Livne A, Baruch A, Guterman H. Thoughts on object detection using convolutional neural networks for forward-looking sonar. Int Rob Auto J. 2018;4(3):182-184. DOI: 10.15406/iratj.2018.04.00120

Download PDF

Abstract

This work reviews the problem of detection in Forward-Looking Sonar images. In the underwater realm, most of the imaging is done by acoustic means, i.e. sonar. The Forward-Looking Sonar usually has a very low Signal to Noise Ratio therefore object detection in Forward-Looking Sonar images is still an open issue. The article will introduce our database and some conclusions that were gathered from working with it. It will also show results from a Convolutional Neural Network designed for Forward-Looking Sonar Images.

Keywords: convolutional neural networks, forward-looking sonar, autonomous underwater vehicle, machine learning

Abbreviations

AUV, autonomous underwater vehicle; SNR, signal to noise ratio; FLS, forward looking sonar; CNN, convolutional neural networks; RPN, region proposal network; LAR, laboratory for autonomous robotics; FPS, frame per second

Introduction

The Hydrocamel II Autonomous Underwater Vehicle (AUV) was developed and built at the Laboratory for Autonomous Robotics (LAR),¹ at Ben-Gurion University. The AUV has a Forward-Looking Sonar (

The Hydrocamel II Autonomous Underwater Vehicle (AUV) was developed and built at the Laboratory for Autonomous Robotics (LAR),¹ at Ben-Gurion University. The AUV has a Forward-Looking Sonar (FLS), which is used for different tasks such as obstacle avoidance,² target tracking, infrastructure inspection³ and navigation.⁴ Today in the underwater world it is crucial to detect and classify objects such as pipes, rocks, corals, mines, etc. However, object detection and classification in sonar images is a difficult task due to the low Signal to noise ratio (SNR). Furthermore, due to the nature of the application, the image processing has to be done in real-time using the AUV's available hardware. A FLS can capture high-resolution images of underwater scenes, but their interpretation is complex. Generic object detection in such images has not been solved, most of the research that has been done on object detection was not robust to a variety of objects and has a low real-time factor.^3,5–10 Since 2012's Image Net challenge, the use of classic image processing tools in classification and detection tasks in regular colored images, has decreased dramatically. Today the academic community uses deep neural networks for those tasks. Detection and classification algorithms based on CNN have produced top performing object detectors and classifiers in real-world color images.¹¹ The main purpose of this article is to introduce some findings that have been gathered over the years about object detection using convolutional neural networks (CNN) for FLS images (Figure 1). Any system that is designed to detect objects in FLS images should meet the requirements of an embedded system that can run in real-time and with low power consumption. Also these article summaries the results of a designed CNN for object detection in FLS images.

), which is used for different tasks such as obstacle avoidance,² target tracking, infrastructure inspection³ and navigation.⁴ Today in the underwater world it is crucial to detect and classify objects such as pipes, rocks, corals, mines, etc. However, object detection and classification in sonar images is a difficult task due to the low Signal to noise ratio (SNR). Furthermore, due to the nature of the application, the image processing has to be done in real-time using the AUV's available hardware. A FLS can capture high-resolution images of underwater scenes, but their interpretation is complex. Generic object detection in such images has not been solved, most of the research that has been done on object detection was not robust to a variety of objects and has a low real-time factor.^3,5–10 Since 2012's Image Net challenge, the use of classic image processing tools in classification and detection tasks in regular colored images, has decreased dramatically. Today the academic community uses deep neural networks for those tasks. Detection and classification algorithms based on CNN have produced top performing object detectors and classifiers in real-world color images.¹¹ The main purpose of this article is to introduce some findings that have been gathered over the years about object detection using convolutional neural networks (CNN) for FLS images (Figure 1). Any system that is designed to detect objects in FLS images should meet the requirements of an embedded system that can run in real-time and with low power consumption. Also these article summaries the results of a designed CNN for object detection in FLS images.

Figure 1 Hydrocamel II.

Related work

Classification and detection in FLS images

Over the past few years, underwater vehicles have greatly improved as a tool for undersea exploration. Today with the new technology of high definition FLS, the detection problem can be addressed again. The new generation of FLS can provide better acoustic imagery with higher frame rates. However, the characteristics of the sonar data introduce different difficulties in the object detection problem. There is a large amount of literature about object classification in the marine domain, some approaches use template matching^5–7 while others use engineered feature extractors^8–10 with a trained classifier. Some researchers suggest that the FLS image features are based on shadows or highlights.³ In order to use the template matching; one needs to define a template for each class, which indicates how a class should look in an FLS image. A maximum likelihood function is used to differentiate between the classes. These methods can perform with close to 95% accuracy, but it is hard to generalize this method for a large number of classes. Between the engineered feature extractors one can find the use of a cascade of features based on Haar features¹¹ and even a use of some feature extractors such as SIFT.¹² These methods are faster but less accurate. These techniques represent the classical image processing methods. It has been shown that in regular camera images CNN concepts can improve the results. Using a CNN instead of the classical image processing methods improved the accuracy and the robustness of the object detection task.¹¹

Object classification and detection in color images using CNN concepts

Since Krizhevsky's¹³ work it has become a known fact that object classification in the image processing world can be done using CNN. While Krizhevsky¹³ was one of the first to demonstrate this in 2012, additional networks were designed for the object detection task.^14,15 In the 2016 Image Net¹⁶ classification challenge the error was under 3% which indicates the power of the CNN. The Faster R-CNN¹⁷ combined a region proposal network (RPN) with a classification network to address the detection problem. The Faster RCNN can work at 7 Frames per Second (FPS). YOLO¹⁸ tried to tackle the detection problem by increasing the FPS for real-time implementations. This presents a new approach to detection, where a single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. YOLO achieved 45 FPS. The CNN technique showed great results for the object detection problem in regular camera images both in accuracy and runtime. But still, as will mention in the next sections there is a place for improvement in sonar imaging.

CNN concepts for FLS images

Valdenegro-Toro^19–21 used CNN for detection and localization in FLS images. In his work, an ARIS Explorer 3000 sonar was employed to capture a database of 2500 FLS images. The dataset which contains different objects in a water tank was employed to train a small CNN for detection and classification. In Valdenegro²¹ words: "This key finding signals that deep and convolutional neural networks are a clear direction for future research in sonar image processing". Juhwan et al.^22,23 applied CNN to FLS images to localize a remotely operated vehicle (ROV) linked to the AUV through a tether cable. Two different labeled images, the ROV images, and some background images were collected. By using CNN they recognized the ROV and improved the localization of it. Using the same CNN architecture as YOLO they manage to track the ROV in 5FPS. Their conclusion was: "it shows that applying machine learning algorithms on processing sonar image is much more useful". These two examples show the potential in using CNN concepts for FLS images, but still, both of them didn't succeed to build a solution for the objects detection task to AUVs.

FLS image dataset

The Hydrocamel II has a Blue View M900x. The sonar has a range of 100 meters and a field of view of 130degrees. The sonar image space is a matrix with a width of 900pixels and a height of 896pixels. This article will only discuss Horizontal FLS images. The pixels can be transformed into a Cartesian coordinate system, an angle and distance from the AUV. The data was taken from a series of experiments in the Red and the Mediterranean Seas. The data was annotated manually. The database is built from 118000 sonar images, in which 1000objects were tagged. 50objects were metallic like objects and the rest were large rocks or unknown objects (Table 1). The small number of objects in an image is a common result, but this can be a problem for training the network. Because of the low ratio between the true labeled images to the false labeled images, the training process will easily converge into the False System, a system that returns false for every image. Controlling the ratio between the true labeled images to the false labeled images will prevent this problem. Another Issue is that in FLS images the raw image pixels to meter factor is depending on the pixel position. The meaning of this is that objects farther than the AUV will look smaller in pixel area than objects that are near the AUV. This can fool the CNN because the objects are labeled with the same notation. A solution can be working in the Cartesian coordination system but a different problem will be the borders of the image. In Cartesian coordination system, the image is a sector of a circle as shown in Figure 2. The last issue that should be mentioned is that today it is impossible to find an open FLS image database that can be used to compare different methods for object detection. An open dataset will increase the development time of perception algorithms in the AUV community.

Sonar videos	1000
Sonar images	118000
Image size	900 * 896
Object annotated	1000
% of metallic object	5
% of large rocks	95

Table 1 Sonar images details

Figure 2 Sonar images examples: Cartesian coordinate system and radial coordinate system.

Basic CNN structure

The basic CNN structure that has been used in this article is composed of three convolutional layers and three fully connected layers, a dropout layer and some pooling layers. The ReLu activation function is being used between the layers. The first convolutional layer is built from three on three kernels with 32filters. The second convolutional layer is built from three on three kernels with 64filters. The last convolutional layer is built from five by five kernels with 128filters. In Figure 3 the design is described. The weights initialized with zero mean normal distribution and variance of 0.001, the biases initialized with the constant 0.01. For training and testing, the images were cropped into N smaller images size MxM. This has been done to address the issue described in the last section. Every slice that has a labeled object on it got the true label and the rest got the false label. The data was in a radial coordination system. Lastly, for every true labeled slice, two false label slices were randomly added. Three datasets were trained and tested on the basic CNN structure. Details about them are summarized in Table 2. Training the basic CNN structure for 5000 iterations with a batch size of 128 gave the results in Figure 4. This basic CNN structure only converges into one of the two possible outcomes. The first one is the false system, for every input the output will be false. The other one is the true system, for every input the output will be true. Trying to change the ratio between the false labeled images and the true labeled images didn't yield any different results. These results point out that the CNN couldn't learn the features of the objects. It could be that because of the high SNR the features can't be observed in a single image.

Dataset Name	M	# of train slices	# of test slices
Database20-r	20	1163052	129228
Database50-r	50	45441	5049
Database100-r	100	10692	1188

Table 2 Summary of different Datasets

Figure 3 The basic CNN structure used in this article. The Blue layers have pooling layers. The ReLu function was used as an activation function.

Figure 4 the training accuracy of the different databases. The blue line is for the Database20-r, the orange line is for the Database50-r and the green line is for the Database100-r.

Conclusion

This article discusses the object detection problem in FLS images. Some findings of the features structure were described in detail and also a basic CNN structure was introduced and tested. The results show that much more work is needed to be done. A deeper network should be considered but the power consumption could be a problem for AUVs. In contrast to regular camera images the features in FLS images are harder to learn. Trying to use a basic CNN structured did not yield any results. A recurrent network can be considered for this type of problem.