Mini Review Volume 4 Issue 3
Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel
Correspondence: Assaf Livne, Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel
Received: October 20, 2017 | Published: May 23, 2018
Citation: Livne A, Baruch A, Guterman H. Thoughts on object detection using convolutional neural networks for forward-looking sonar. Int Rob Auto J. 2018;4(3):182-184. DOI: 10.15406/iratj.2018.04.00120
This work reviews the problem of detection in Forward-Looking Sonar images. In the underwater realm, most of the imaging is done by acoustic means, i.e. sonar. The Forward-Looking Sonar usually has a very low Signal to Noise Ratio therefore object detection in Forward-Looking Sonar images is still an open issue. The article will introduce our database and some conclusions that were gathered from working with it. It will also show results from a Convolutional Neural Network designed for Forward-Looking Sonar Images.
Keywords: convolutional neural networks, forward-looking sonar, autonomous underwater vehicle, machine learning
AUV, autonomous underwater vehicle; SNR, signal to noise ratio; FLS, forward looking sonar; CNN, convolutional neural networks; RPN, region proposal network; LAR, laboratory for autonomous robotics; FPS, frame per second
The Hydrocamel II Autonomous Underwater Vehicle (AUV) was developed and built at the Laboratory for Autonomous Robotics (LAR),1 at Ben-Gurion University. The AUV has a Forward-Looking Sonar (
The Hydrocamel II Autonomous Underwater Vehicle (AUV) was developed and built at the Laboratory for Autonomous Robotics (LAR),1 at Ben-Gurion University. The AUV has a Forward-Looking Sonar (FLS), which is used for different tasks such as obstacle avoidance,2 target tracking, infrastructure inspection3 and navigation.4 Today in the underwater world it is crucial to detect and classify objects such as pipes, rocks, corals, mines, etc. However, object detection and classification in sonar images is a difficult task due to the low Signal to noise ratio (SNR). Furthermore, due to the nature of the application, the image processing has to be done in real-time using the AUV's available hardware. A FLS can capture high-resolution images of underwater scenes, but their interpretation is complex. Generic object detection in such images has not been solved, most of the research that has been done on object detection was not robust to a variety of objects and has a low real-time factor.3,5–10 Since 2012's Image Net challenge, the use of classic image processing tools in classification and detection tasks in regular colored images, has decreased dramatically. Today the academic community uses deep neural networks for those tasks. Detection and classification algorithms based on CNN have produced top performing object detectors and classifiers in real-world color images.11 The main purpose of this article is to introduce some findings that have been gathered over the years about object detection using convolutional neural networks (CNN) for FLS images (Figure 1). Any system that is designed to detect objects in FLS images should meet the requirements of an embedded system that can run in real-time and with low power consumption. Also these article summaries the results of a designed CNN for object detection in FLS images.
), which is used for different tasks such as obstacle avoidance,2 target tracking, infrastructure inspection3 and navigation.4 Today in the underwater world it is crucial to detect and classify objects such as pipes, rocks, corals, mines, etc. However, object detection and classification in sonar images is a difficult task due to the low Signal to noise ratio (SNR). Furthermore, due to the nature of the application, the image processing has to be done in real-time using the AUV's available hardware. A FLS can capture high-resolution images of underwater scenes, but their interpretation is complex. Generic object detection in such images has not been solved, most of the research that has been done on object detection was not robust to a variety of objects and has a low real-time factor.3,5–10 Since 2012's Image Net challenge, the use of classic image processing tools in classification and detection tasks in regular colored images, has decreased dramatically. Today the academic community uses deep neural networks for those tasks. Detection and classification algorithms based on CNN have produced top performing object detectors and classifiers in real-world color images.11 The main purpose of this article is to introduce some findings that have been gathered over the years about object detection using convolutional neural networks (CNN) for FLS images (Figure 1). Any system that is designed to detect objects in FLS images should meet the requirements of an embedded system that can run in real-time and with low power consumption. Also these article summaries the results of a designed CNN for object detection in FLS images.
Classification and detection in FLS images
Over the past few years, underwater vehicles have greatly improved as a tool for undersea exploration. Today with the new technology of high definition FLS, the detection problem can be addressed again. The new generation of FLS can provide better acoustic imagery with higher frame rates. However, the characteristics of the sonar data introduce different difficulties in the object detection problem. There is a large amount of literature about object classification in the marine domain, some approaches use template matching5–7 while others use engineered feature extractors8–10 with a trained classifier. Some researchers suggest that the FLS image features are based on shadows or highlights.3 In order to use the template matching; one needs to define a template for each class, which indicates how a class should look in an FLS image. A maximum likelihood function is used to differentiate between the classes. These methods can perform with close to 95% accuracy, but it is hard to generalize this method for a large number of classes. Between the engineered feature extractors one can find the use of a cascade of features based on Haar features11 and even a use of some feature extractors such as SIFT.12 These methods are faster but less accurate. These techniques represent the classical image processing methods. It has been shown that in regular camera images CNN concepts can improve the results. Using a CNN instead of the classical image processing methods improved the accuracy and the robustness of the object detection task.11
Object classification and detection in color images using CNN concepts
Since Krizhevsky's13 work it has become a known fact that object classification in the image processing world can be done using CNN. While Krizhevsky13 was one of the first to demonstrate this in 2012, additional networks were designed for the object detection task.14,15 In the 2016 Image Net16 classification challenge the error was under 3% which indicates the power of the CNN. The Faster R-CNN17 combined a region proposal network (RPN) with a classification network to address the detection problem. The Faster RCNN can work at 7 Frames per Second (FPS). YOLO18 tried to tackle the detection problem by increasing the FPS for real-time implementations. This presents a new approach to detection, where a single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. YOLO achieved 45 FPS. The CNN technique showed great results for the object detection problem in regular camera images both in accuracy and runtime. But still, as will mention in the next sections there is a place for improvement in sonar imaging.
Valdenegro-Toro19–21 used CNN for detection and localization in FLS images. In his work, an ARIS Explorer 3000 sonar was employed to capture a database of 2500 FLS images. The dataset which contains different objects in a water tank was employed to train a small CNN for detection and classification. In Valdenegro21 words: "This key finding signals that deep and convolutional neural networks are a clear direction for future research in sonar image processing". Juhwan et al.22,23 applied CNN to FLS images to localize a remotely operated vehicle (ROV) linked to the AUV through a tether cable. Two different labeled images, the ROV images, and some background images were collected. By using CNN they recognized the ROV and improved the localization of it. Using the same CNN architecture as YOLO they manage to track the ROV in 5FPS. Their conclusion was: "it shows that applying machine learning algorithms on processing sonar image is much more useful". These two examples show the potential in using CNN concepts for FLS images, but still, both of them didn't succeed to build a solution for the objects detection task to AUVs.
The Hydrocamel II has a Blue View M900x. The sonar has a range of 100 meters and a field of view of 130degrees. The sonar image space is a matrix with a width of 900pixels and a height of 896pixels. This article will only discuss Horizontal FLS images. The pixels can be transformed into a Cartesian coordinate system, an angle and distance from the AUV. The data was taken from a series of experiments in the Red and the Mediterranean Seas. The data was annotated manually. The database is built from 118000 sonar images, in which 1000objects were tagged. 50objects were metallic like objects and the rest were large rocks or unknown objects (Table 1). The small number of objects in an image is a common result, but this can be a problem for training the network. Because of the low ratio between the true labeled images to the false labeled images, the training process will easily converge into the False System, a system that returns false for every image. Controlling the ratio between the true labeled images to the false labeled images will prevent this problem. Another Issue is that in FLS images the raw image pixels to meter factor is depending on the pixel position. The meaning of this is that objects farther than the AUV will look smaller in pixel area than objects that are near the AUV. This can fool the CNN because the objects are labeled with the same notation. A solution can be working in the Cartesian coordination system but a different problem will be the borders of the image. In Cartesian coordination system, the image is a sector of a circle as shown in Figure 2. The last issue that should be mentioned is that today it is impossible to find an open FLS image database that can be used to compare different methods for object detection. An open dataset will increase the development time of perception algorithms in the AUV community.
Sonar videos |
1000 |
Sonar images |
118000 |
Image size |
900 * 896 |
Object annotated |
1000 |
% of metallic object |
5 |
% of large rocks |
95 |
Table 1 Sonar images details
The basic CNN structure that has been used in this article is composed of three convolutional layers and three fully connected layers, a dropout layer and some pooling layers. The ReLu activation function is being used between the layers. The first convolutional layer is built from three on three kernels with 32filters. The second convolutional layer is built from three on three kernels with 64filters. The last convolutional layer is built from five by five kernels with 128filters. In Figure 3 the design is described. The weights initialized with zero mean normal distribution and variance of 0.001, the biases initialized with the constant 0.01. For training and testing, the images were cropped into N smaller images size MxM. This has been done to address the issue described in the last section. Every slice that has a labeled object on it got the true label and the rest got the false label. The data was in a radial coordination system. Lastly, for every true labeled slice, two false label slices were randomly added. Three datasets were trained and tested on the basic CNN structure. Details about them are summarized in Table 2. Training the basic CNN structure for 5000 iterations with a batch size of 128 gave the results in Figure 4. This basic CNN structure only converges into one of the two possible outcomes. The first one is the false system, for every input the output will be false. The other one is the true system, for every input the output will be true. Trying to change the ratio between the false labeled images and the true labeled images didn't yield any different results. These results point out that the CNN couldn't learn the features of the objects. It could be that because of the high SNR the features can't be observed in a single image.
Dataset Name |
M |
# of train slices |
# of test slices |
Database20-r |
20 |
1163052 |
129228 |
Database50-r |
50 |
45441 |
5049 |
Database100-r |
100 |
10692 |
1188 |
Table 2 Summary of different Datasets
This article discusses the object detection problem in FLS images. Some findings of the features structure were described in detail and also a basic CNN structure was introduced and tested. The results show that much more work is needed to be done. A deeper network should be considered but the power consumption could be a problem for AUVs. In contrast to regular camera images the features in FLS images are harder to learn. Trying to use a basic CNN structured did not yield any results. A recurrent network can be considered for this type of problem.
None.
The author declares there is no conflict of interest.
©2018 Livne, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.