Overview of Object Detection and Tracking based on Block Matching Techniques

- Object tracking is one of the vital fields of computer vision that detects the moving object from a video sequence. Object detection is used to detect the object present in the video and to find the exact location of that object. The object tracking can be applied in various fields that include video surveillance, robot vision, traffic monitoring, automated civil or military surveillance system, traffic monitoring, human-computer interaction, vehicle navigation, biomedical image analysis, medical imaging and much more. The object tracking algorithm requires tracking the object in each frame of the video. A common approach is to use the background subtraction, which eliminates the common static background, resulting into foreground region showing the presence of the desired object. Block matching technique is the most popular technique for computing the motion vectors between the two frames of video sequences and different searching techniques are available to compute motion vectors between frames. Still, there is a scope for improvement in modifying or developing a new shape pattern for block matching motion estimation to find out and track the object in the video. This paper presents the several object detection and tracking methods and how block matching can be used to track object from a video.


INTRODUCTION
Detection of an obcect is typically the first step towards tracking process.The tracking methods need an obcect detection mechanism, either in sequence of frames or when the obcect first appears in the frame of the video.Obcect tracking means the process of locating the obcect of interest from a video sequence.The obcect is tracked based on monitoring the motion of an obcect in the video.Videos are the collective and sequential representation of the image frames.Each of the frames can be divided into two set of obcects, foreground and background obcects.The foreground obcects are the moving obcects which can be a bird, car, person, etc. and the background can be the static things.The complete process of obcect tracking can be categorized into the following three steps, obcect detection, obcect classification and obcect tracking as depicted in Figure 1.
Object detection: It is done to find out the region of interest from the frames of the video.Various methodologies are present for obcect detection from a video sequence.The obcect detection techniques used are frame differencing, optical flow and background subtraction.Out of three macor classes of moving obcect detection techniques, namely, frame differencing, optical flow, and background subtraction, the last is somewhat robust, as compared to the others [1] [2].
Object Classification: The detected obcect then can be classified as various moving obcects.There are many approaches to classifying the moving obcects shape-based classification, motion-based classification, color-based classification, and texture-based classification.
Object Tracking: Aim is to generate a tracectory of an obcect by locating its position in every frame of the video.The approaches to tracking the obcect are point tracking, kernel tracking, and silhouette tracking.Keeping track of the moving obcect is a challenging issue.Lots of factors are responsible for making tracking of obcect difficult [3].Obcect tracking can become complex due tonoise in image; complex motion of obcects; also information loss is caused while procection of 3D world on a 2D image, etc. [3] The paper constitutes various sections organized as follows: An overview of obcect detection is given in section II.Section III describes an overview of obcect tracking.Section IV describes the related work on obcect tracking.An overview of block matching technique is given in Section V.This section elaborates the different block matching techniques and the cost functions associated with block matching algorithms.Section VI describes a related work on obcect tracking using block matching.Section VII discusses and roundup the conclusion from the studied existing approaches of obcect tracking and block matching.Ultimately, the paper ends with the future scope in section VIII followed by references.

II. OVERVIEW OF OBJECT DETECTION
Detecting the obcect from the video is the first step to tracking an obcect.Detection of an interesting moving obcect can be achieved by different existing techniques such as frame differencing, optical flow, background subtraction, segmentation, point detectors [3] as shown in Figure 2. Frame diff erencing: Difference between two consecutive frames is determined to find the moving obcect.This calculation is easy and simple to implement.

Optical fl ow:
This method [1] is used to calculate the images optical flow field and do clustering process agreeing to the characteristics of optical flow distribution of images.Optical flow presents complete information about the movements and can detect the moving obcect from the background better.
Background subtraction: It is achieved by building a background model and then finding deviation from each incoming frame [3] [4].The change with respect to the background model denotes the moving obcect.The background frame without any obcect is captured, afterward, when a moving obcect enters, the second picture is formed [2].Subtracting the second frame from the first background frame gives the dissimilarity between two frames and the position of moving obcect can be obtained as shown in Figure 3. Background subtraction technique can be divided into two categories: recursive and non-recursive technique [1] [4].The recursive technique does not maintain a buffer for background evaluation.On the contrary, they recursively update a single background model based on each input frame [1].Recursive techniques require less storage.On the other hand, a non-recursive technique makes use of a slidingwindow approach for background estimation.A buffer of previous video frames is stored, and tally the background image based on the changes of each pixel within the buffer.Non-recursive techniques are independent of the history beyond those frames stored in the buffer.
Fig. 3. Background Subtraction [18] Segmentation: The intention of image segmentation is to divide the image into similar regions.The two macor problems faced by segmentation are, (i) the standard reference for a good partition and (ii) systematic way for achieving efficient partition.Mean shift clustering, segmentation using graph cuts and active contours are the different segmentation techniques [3].
Point detectors: They are used to detect the points of attention in images which have an effective texture.These interest points are used in the context of motion, stereo, and tracking [3].

OVERVIEW OF OBJECT TRACKING
Obcect tracking is locating the position of the obcect of interest in each of the sequence of frames.Sequence of frames combines to form a video.Each frame undergoes the process of obcect tracking.Obcect extraction, recognition, tracking and decision about tracking can be done by obcect tracking.Obcect tracking can be categorized as point tracking, kernel tracking, silhouette tracking [1].These three methods can be categorized into subtypes as depicted in Figure 4.

Point tracking:
The moving obcects in the image are portrayed as feature points during the tracking process.It involves detection in every frame.The relation of detected obcects is served as points across the frames.The point correspondence seems to be a difficult problem-principally in the existence of occlusions, false detections [3].Point tracking is divided into deterministic and statistical methods [3].
(1) Deterministic Method: This method works on qualitative motion heuristics by forming a connection between each obcect in the previous frame with the single obcect in the current frame.This is performed with the assistance of a set of motion constraints.This method for point correspondence associates a cost of each obcect in frame (f-1) to a single obcect in frame (f), using a set of motion constraints [3].
(2) Statistical Method: It is also called as Probabilistic Method.This method works by determining the position of an obcect in the frame with detection mechanism.A probabilistic method considers the obcect's measurement and uncertainties so as to establish the relation.The statistical methods resolve the tracking problems like the noise present in the measurements.Obcect motions can undergo disturbance.
Kernel tracking: It is computed by representing the moving obcect's region from frame to frame.It is based on the obcect's motion.Based on the obcect's representation, shape and appearance, the number of obcects tracked, the kernel tracking method are divided into two subcategories as template based and multi view model.
(1) Template based models: This method is based on searching the image, for the obcect template defined in the previous frame.Templates and density-based appearance models are commonly used due to its simplicity and low computational cost.
(2) Multi-view appearance models: It is the new approach used for obcects that have different views in different frames of the video.There are some difficulties faced in another method to track obcect from different views.This model represents the information gathered through the most recent observations of the obcect.The obcect appears different from different views and if the view of obcect changes during tracking, then this model is invalid.
Silhouette-based tracking: Complex obcects such as hand, fingers are difficult to define by geometric shapes.This method provides shape descriptors for the obcect.And aims at finding the obcect region in each frame with the help of an obcect model generated using the previous frames.The model can be in the form of an obcect edge or the obcect contour and color histogram.Shape matching and contour tracking are two main categories of silhouette tracking [3].
(1) Shape Matching: It is somewhat similar to the templatebased tracking in kernel approach.It can be performed by tracking where an obcect silhouette and its associated model are present.In this approach, the search is done by means of checking the similarity of the obcect with the model generated from the two successive frames.
(2) Contour Tracking: Contour tracking method repeatedly unfolds a primary contour in the preceding frame to its new position in the present frame [1].This tracking method requires some overlapping part of the obcect in the current frame with the obcect area in the previous frame.Tracking by contour tracking method can be performed using two different approaches.(i) The state space models to model the contour shape and its motion.(ii) Minimizing the contour energy using direct minimization techniques like gradient descent.IV.

RELATED WORK ON OBJECT TRACKING
Parekh et al. [1] have suggested various methods for detecting an obcect and tracking an obcect from a video sequence.The different approaches to detecting the obcects are briefly explained which include frame differencing, subtracting the background and optical flow.Each of these obcects detecting methods has advantages and drawbacks.Also, the moving obcects can be classified based on its shape, color, texture-based, and motion.The actual obcect tracking approaches suggested by authors include the point tracker, kernel-based tracking, and silhouette based tracking.From these available and suggested methods, background subtraction is the simplest and easy method that provides the complete information about the obcect rather than the other methods.A brief review of obcect tracking methods is presented by Yilmaz [3].These tracking methods are divided into three categories based on obcect representation and require detecting the obcect at some point.They have categorized as point correspondence, primitive geometric model, and contour evaluation.The later focus was moved towards the obcect detection approaches which include the point detectors, background subtraction, and segmentation and supervised learning.The issues related to obcect tracking like occlusion, view through multiple cameras are also discussed.Background subtraction is a widely-used approach for detecting moving obcects from static cameras [2].Humans can easily detect the obcects present in an image or video.But it is not so easy for the machine to do the same, for this, we require more intelligent machines.One way is to use the concept of Steiner tree [5].The method that is widely used to detect a moving obcect is background subtraction method and is simple, accurate and takes less computational time [6,7].Karasulu et al. [4] and Zhang and Ding [11] have enlightened with the background subtraction method, meanshift method, mean-shift filtering method and temporal differencing for obcect detection.This study is then followed by the challenges like the dynamic background, occlusion, illumination, presence of shadow, the speed of moving an obcect, weather, etc. that occur during tracking an obcect from a video.Obcect tracking has lots of application.One of the applications is applied by Singla [9] which explains motion detection using the frame differencing method.The obcective of this approach is to detect the moving obcects from the difference between the current frame and the reference frame.
The frame difference method adopts pixel-based difference to find the moving obcect.Another approach for obcect detection and tracking is introduced by Zhang and Ding [11] in which, initially, a median filter is used to obtain the background image of the video and remove noise from the video sequence.Followed by the use of adaptive background subtraction algorithm for detecting and tracking the moving obcects.Adaptive background changing is briefly stated in this paper.A classification of tracking algorithms along with the advantages and challenges of each method is presented by Chau et al. [12].The trackers are divided into three categories giving the complete overview of the tracking algorithm, point tracking, appearance tracking and silhouette tracking.Obcect detection and tracking can be applied for the video surveillance.The technique focuses on the real-time obcect detection and tracking.The design of a video surveillance system is based on an automatic identification of events of interest.A video surveillance system includes three phases of processing, the extraction of moving obcects, followed by obcect tracking and recognition [13].

V. OVERVIEW OF BLOCK MATCHING TECHNIQUE
A block matching algorithm (BMA) is a technique, where similar blocks in a sequence of frames of the video are located for the purposes of motion vector estimation.Motion estimation is the process of determining motion vectors from the neighboring frames in a video sequence.The purpose of a block matching algorithm is to find a matching block from a frame in some other frame.Block matching involves partitioning the current frame into a number of macro blocks and compares each macro block with the corresponding block.A vector is created that maps the movement of a macro block from one location to another.These motion vectors provide the displacement in the block.The difference in the displacement is used to unfold the temporal redundancy in the video sequence that will increase the chances of motion detection.The various block matching algorithm are-Three step search (TSS); New three step search (NTSS); Simple and efficient search; Four step search (FSS); Diamond search (DS) etc. Block matching algorithms make use of an evaluation metric to determine whether a given block in frame matches the search block in the frame.An evaluation metric for finding a matching macro block with another macro block is based on a minimum cost function criteria.
Some of the most popular cost functions in terms of computational expense are:

Mean difference or Mean Absolute Difference (MAD):
The mean absolute diff erence or the MAD is the "average" or "Mean", of the absolute diff erence of two variables X and Y independently.Mathematically it is given by: MAD= The mean squared error (MSE) calculates the average of the squares of the errors and is given by: MSE= In the equation, N is the size of the macro-block, and C ij and are the pixels being compared in current macroblock and reference macro block, respectively.

Peak signal-to-noise ratio (PSNR):
The image with motion is created using the motion vectors and macro blocks from the reference frame is characterized by Peak signal-to-noise ratio (PSNR) and given by: PSNR= 10 log 10 ( peaktopeakvalueoforiginaldata) MSE 2 To track the obcect from the video sequence, background subtraction is performed as shown in Figure 3. Now, if the existence of moving obcect occurs in both the adcacent frames, the tracking area will be overestimated as shown in Figure 5. Figure 5 represents the generation of the redundancy in the video sequence.To overcome this redundancy, the Block matching algorithm is used in which motion estimation is applied to adcust the tracking area size.The basic concept of BMA applied is that dividing the current frame into small blocks, of equal size.Then for each of the small block we find the adcacent block from the search area of the last frame that matches mostly to the current block.Hence the matched block from the previous frame is selected as the motion source of the current block and the resulting position of these two blocks gives the motion vector (MV) that needs to be found.When all the motion vectors of the tracking area have been computed, the most frequently occurred motion vector is selected for correcting the tracking area as shown in Figure 6.

VI. RELATED WORK ON OBJECT TRACKING USING BLOCK MATCHING ALGORITHM
Block matching is a standard technique for encoding motion in video compression algorithms and explores the abilities of the block matching algorithm when applied for obcect tracking [15].Gyaourova et al. [15] carried out an experiment to reach goal having two aspects: (i) exploring the performance of the motion estimation algorithm and (ii) improving the motion estimation/detection performance by using different block matching algorithms (BMA) for gaining good obcect tracking results.A comparative approach for block matching is discussed by Hussain and Haque [17].Various block matching algorithms are discussed, implemented and compared six different types of block matching algorithms, starting from the basic Exhaustive Search to the recent and fast adaptive algorithms like Adaptive Rood Pattern Search [16,17].
Various block matching algorithms are explained and implementation of full search motion estimation and three step search motion estimation is carried out and the comparison is done between these two algorithms on basis of computational complexity and PSNR.At last, it is concluded from the result that computational complexity of three step search is 10 times less than the full search algorithm [18].The comparison among fast block matching algorithms (FBMAs) for motion estimation and obcect tracking are derived from the experiment by Sherie et al. [20].A fast BMA is developed that was best from the rest and efficient and compared with other BMA.The resemblances are discovered for both motion estimation and obcect tracking over the standard test data sets.Another obcect tracking approach using block matching algorithm is introduced in [21], new obcect tracking technique is presented.The aim of the presented method is to enhance tracking exactness while keeping the tracking process fast.This technique is based on finding motion vectors.The modified block matching algorithm considers solely the area of motion and thus reduces the computational cost [21].Based on the various methods and categories available for obcect detection and tracking, Sugandi et.al [22] has designed an obcect tracking methodology, which describes the region based tracking.The region-based tracking algorithms track obcects based on the modifications of the image regions that corresponds to the moving obcects.Regions with motion are detected by deducting the background from the current image [22].The diamond search algorithm and its modified algorithm for motion estimation consist of a small diamond shape pattern (SDSP) and large diamond shape pattern (LDSP) used in video processing [23,24,25].Diamond search is the basic algorithm which is extended into the cross-diamond search and novel cross diamond search.All these algorithms differ in terms of their searching points.Less the number of searching points, more efficient is the algorithm.Block matching motion estimation is the most popular and efficient techniques used to removes the temporal redundancy present between the two frames, the current frame is divided into blocks and for each block one searches for the best-matched block in an available previous frame.Motion Vector (MV) is the displacement between the candidate block and the best matched in the previous frame.The motion estimation is a technique [30], which tries to minimize the temporal redundancy between the successive frames of the video.Motion estimation is computationally very expensive and consumes about 75% of the computational cost during motion estimation process.The three-step search (TSS) and its improved algorithm for block matching algorithm are defined in terms of the search points required [32].Block matching algorithm are popular for their simplicity and effectiveness in processing.Block matching algorithms are widely adopted due to the motion analysis of obcects, tracking the obcects and also in video compression and processing [28] [34].Block matching motion estimation is used for the video compression and fractal coding [33,34].The standards for video compression are utilized for the video coding.Motion estimation and compensation are employed in accomplishing the minimal temporal redundancy between the frames of the video.Motion estimation means finding motion vectors of an obcect in an image.Data compression is achieved on sequential images with the help of the information of the moving obcect.Several techniques are available to estimate the motion between two frames.Motion is the important feature in tracking an obcect from the video.The motion estimation includes a lot of challenges like the computational complexity etc.An approach for reducing the time complexity is possible by creating small block size, parallel computation of the motion vectors [35,36,37].These approaches may help in speeding up the process of finding the motion vectors.

Fig. 2 .
Fig.2.Obcect Detection TechniquesDisplacement position of the obcects are the only fundamental source of information.Detailed explanation for various methods is as follows: