Detection and dimension of moving objects using single camera applied to the round timber measurement

The paper is devoted to the problem of automatic geometry evaluation of the log moving through the conveyor. The video sequence obtained from the single camera is used as the input data. The principal restrictions of the target objects described for the given task, and the requirements to the video recording of the manufacturing process are formulated on the basis of datasets from more than .5M video images. The authors' method for the video sequence segmentation in respect to the log tracking is presented. The algorithm is based on the combination of background subtraction techniques and probabilistic methods. Next part of the paper is devoted to the log geometry estimation methods. The authors' algorithm for the log geometry structure recovery is based on the detection, isolation and approximation of log boundaries. The results of the research are implemented in the development of the conveyor-tracking system for automatic log sorting.


I. INTRODUCTION
HE recent problem of solid body geometry determination by using machine vision techniques is connected with development of the fast and precise methods for object form and dimension measurements by its twodimensional images.The peculiarity of the given task is logs volume measurement during their passing through the conveyor.The input data for the measurement algorithm is digitalized video sequence obtained from the camera which is mounted over a conveyor.It should be mentioned that such a problem can be rather successfully solved with 3D scanning by using an expensive laser scanner and particular methods for its output data processing [1].This paper presents another approach which is least expensive in the view of required technical equipment: data on objects of interest is obtained from one video camera (Fig. 1).
Log geometry determination is a complex task.On the one hand it involves development of the mathematical algorithms for video processing which can sufficiently represent in real time the processes related to the observed objects.This group includes segmentation, detecting and tracking methods.On the other hand it is necessary to investigate the methods for geometry estimation and 3D structure recovery of the object of interest.Implementation of the 3D structure recovery is the principle requirement for the successful development of the machine vision system for the round timber automatic sorting.
This paper presents a log detection algorithm, which develops the previously suggested approach based on combination of background subtraction and probabilistic methods.The filtering of the false positives at pixel or region of connected pixel levels is presented.The method of log video tracing is considered, thus the method of efficient detection and tracking of several observed logs by predicting of the object position in consecutive frames is developed.Finally, method of an object boundaries search and approximation to restore the geometry of logs is given.
The paper structure is the following.The related works are analyzed and discussed in the Part 2. In the Part 3 an overview of the authors' method for segmentation, detection and isolation of the geometric features of logs is given.The results of experiments and their discussion are given in the Part 4. The Part 5 is the findings of the research performed.

II. RELATED WORK
First stage of the image sequence processing is the isolation of the moving objects in the scene from the background.The well-known methods performing this operation can be roughly divided into three main groups: background subtraction methods [2,3], probabilistic methods [4][5][6][12][13][14] and frame difference methods [11,19].Each group has its own advantages and disadvantages, so it is necessary to select method or combination of methods obeyed the given task in order to achieve the optimum efficiency of the system.The specifics of the isolation of logs passing through the conveyor are the following:  Strict restrictions to the algorithm speedup (real time mode);  Background dynamic changing (due to the moving parts of the conveyor)  Flat contrast of the scene;  Probable overlap of the objects of interest which discourages their separation.
The next stage is determination of a direction and velocity of the objects of interest.The problem-solving techniques considered in the research are cross-correlation function, phase correction and Lukas-Kanade method [10,11,19].These methods are widely used for movement analysis in real-time surveillance and control systems.
The large amount of methods is developed for the purpose of analyzed scene recovery.They permit estimation of the 3D objects properties by 2D projections with sufficient precise depending on the restrictions for the objects in the scene and recording conditions.All the observed methods can be divided into several groups according to the data source on analyzed scene.This is about features which permit the structure recovery by motion [11,16], texture and silhouettes [26], contours of the objects of interest [15] or data on scene luminosity [19,25].Structure recovery by motion [18] involves search in the image the key points regions in the form of angles or spots [16,17], determination the correspondence between detected regions, computation of their location and forming the surface of the objects.The approach based on form determination by scene luminosity data is presented in [25].It means the surface form determination through the calculation of correlation between intensity (luminosity) of the surface element and direction of the normal to the surface by the Lambert's cosine law.Lambertian reflectance method determines the correlation for light source power, surface albedo and distance between surface, sensor and light sources; it can be successfully implemented in tasks where mentioned parameters are priory known or determined by calibration procedure.
Analysis of the video sequence of the given technological process shows that image features suited for making hypothesis about geometry and dimension of the object cannot be implemented in the given task in general as far as they are subjected to the many factors, such as luminosity, form distortion, reflectivity of the objects' surface, etc.For example the surface of a log can be texturized or machined, which is influence on reflectivity of logs.This restriction does not permit implementation of the motion or scene luminosity methods for object form recovery in the given task.Thus the surface recovery method based on the silhouettes of the object [15,26] was selected for implementation in the given task.

A. Image segmentation and object detection
Literature data and log movement video sequence analyses show that the most appropriate for log segmentation are the background subtraction and statistics-based methods.The former group of methods assumes the extraction of the foreground objects by subtraction of the pattern called background model from the current frame of the video sequence, therefore the subtractive image is formed.The subtractive image of two images can be defined as following: , where ppreset threshold, D(i,j) -subtractive (binary) image, I(i,j) -video frame, B(i,j) -background model in each pixel (i,j) of images.
In order to consider the background alteration it is to be periodically estimated and updated.For this purpose the Gaussian smoothing method which assumes the sequential calculating of frame pixel deviation from pixel value of periodically updated background model.It is expected that each pixel of the background model is described by expectation value and dispersion.The randomized processes can be described by using the Gaussian distribution; however the expectation value and dispersion can be determined without probability distribution law by averaging the finite number of measurements: where I t (i,j)randomized process for pixel (i,j) at the instant t.
That is how the background model initialized during first n frames, so the expectation value and mean square deviation are calculated over n frames.The belonging of the pixel to the foreground object is confirmed when the difference between mean square deviation of the background pixel and dispersion of the current pixel exceeds the threshold p: The background is updated with the infinite impulse response for the purpose of the scene changes accounting: , , , where  defines the background model sensitivity to external condition alteration.The problem of the optimal threshold p and parameter  selection is considered in Part 4 of this paper.
That way, the segmentation algorithm implements the following procedure for background and foreground separation (Fig. 2):  preliminary formation of the background model;  background model updating in real-time mode;  log isolation at the pixel level.
Next stage is a log detection.It is possible to extract noise from the obtained foreground image by using fast and simple morphology methods such as dilatation and erosion [11].Then remained connected components are combined into blobs and the minimal bounding rectangle is calculated [21] for each region, by doing so the small regions are excluded from the consideration.After the foreground objects were isolated they should be matched with the objects in the previous frame.At this stage the problem of log tracking among sequential video frames should be solved.It can be reduced to the assignment problem if the matching of a pair of contiguous frames is formulated as optimization problem with characteristic function which minimum provides the best matching.The assignment problem can be solved by using combinatorial optimization apparatus [22].In general this problem is stated as following: Let there be given two sets U и V of the same size and a cost function C. It is necessary to correspond each element of one set to exactly one element of another in such a manner that the cost function would be minimum.
In the context of the given task the sum of the Euclidian distances between log images of two contiguous frames is to be minimum.Hence the algorithm output in terms of bipartite graph is a list of edges with minimum weight matching directed from U to V. Such parameters as shape similarity and location of blobs as well as dimension and location of their bounding rectangles are implemented as metrics in the given task.
Two common cases are possible during the objects matching: 1.The one-to-one correspondence for the objects in current and previous frames is specified.2. The full or partial correspondence for the objects in current frame to the objects in previous frame cannot be recognized.This case corresponds with disappearing of the object from video sequence, appearing of the new object, overlapping of two or more objects or splitting object into several blobs.
The separation of the objects by using prediction of the object location from previous frame in the current one is implemented to avoid their overlapping or merging [23].

B. Contour extraction and parameters estimation
The main parameters of a log that should be determined are diameter and length.The length of the log is defined as integral sum of its shifts determined for each pair of contiguous frames in video sequence as far as log moves.The magnitude and direction of the shift is determined by matching contiguous frames.The idea of matching is in the determination of the spatial g:ST and brightness f:RR transformations which permit transformation of the image I t towards image I t+1 in such a way that points belonging to the In the given task the magnitude and direction of the log movement is determined in real-time mode by using group of methods based on the phase correlation [19,20].
For the purpose of log diameter determination the log boundaries detection algorithm by line-to-line image scanning was developed.In assumption that the object is stretched and linear, with vertical orientation, the search of points and belonging to the right and left boundaries of the log respectively is applied to each line of log binary image (Fig. 3).As a result two sets and containing points of right and left probabilistic boundaries of the log are obtained after processing each line of the current frame.Mahalanobis distance [7] is implemented to determine diameter (distance between points of the right and left boundaries) which is define as following: Matrix S can be explained as correcting coefficient which considers slope angle of the log towards vertical projection, if S is a unity matrix the Mahalanobis distance is equal to Euclidian distance, the log is straight up and down.In order to calculate this coefficient the mathematical tools of inertia moment theory [6,11,19] is implemented.Obtained sets of the diameters for each frame with a binding to the log movement are stored in the resulted log accumulator D. The accumulator D is defined as a set of ordered pairs (x,y)XY, where Y is a set of diameters and X is a set of lengths.The diameters' set Y might contain not only the required points of log boundaries but also the points of other objects, such as conveyor parts, knots or bark, which are distort the log form.In order to exclude these elements three methods for adjustment the noisy data to the log geometry were observed: Random sample consensus (RANSAC) method [9], non-parametric locally weighted scatterplot smoothing LOWESS [8] and polynomial regression [24].The results of the method comparison and discussion are presented in Part 3.

C. Log model reconstruction
The unequivocal reconstruction of the object 3D shape by its contour in 2D image is impossible [19].However, the reasonable approximation of the objects of interest can be developed in presence of an appropriate model and suited recording conditions.Some assumptions which hold true in practice and simplify the algorithm development should be introduced for this purpose: 1. Log is a generalized cylinder which surface is induced by the movement of cross-section area along the symmetry line; radius of the cross-sectional area can have smooth variations.2. Internal and external calibration parameters for the camera are given.3. Camera is downward directed to observe log in such a way that image plane is parallel to the conveyor plane and the distance between the latter and camera is given.
The 3D coordinates of the points which projections in the image are located at the silhouette boundaries are to be determined for the purpose of observed object 3D structure recovery.The photo and video cameras used in technical systems generate image according to the central projection law.This projection of 3D space into plane is not unequivocal as far as all 3D points along the line are reduced into one point of 2D image.The authors' method for log structure determination is based on the assumption that the physical dimensions of the log presented in the image as a silhouette can be determined by using the fact that the rotation body section perpendicular to the symmetry line is a circle.The description of the method is given below.The radius R can be determined as following by using well-known trigonometric expressions: where focal distance f is known after calibration performance, distance to the conveyor Z can be determined at the stage of installation and start-up work.
Thus the 3D structure of a log can be recovered by determination of all the radii forming the generalized cylinder.The log volume in this case can be defined as a sum of the volumes of frustum cone sections along the log length.
where R i and r i  upper and lower radii of the log section, l i section length.

IV. RESULTS AND DISCUSSION
Some experiments on real data while changing input parameters of the algorithms and analysis methods were carried out in order to estimate the quality of logs detection and accuracy of their dimensions determination.First experiment was dedicated to the log segmentation quality estimation.The idea of the experiment is in the following.For all images in the sample the standard location of the object of interest is marked out within the accuracy of a pixel (Fig. 5b) and recorded in database.Then the same images (Fig. 5a) are inputted to the detection algorithm at various values of threshold p and background model sensitivity The F-score index which based on the concept of precision and recall is implemented for algorithm estimation: where TP  true-positive predicted condition, TN  true-negative predicted condition, FPfalse-positive predicted condition and FN false-negative predicted condition.
Parameter  lies in the range of 0<β<1 if the priority is given to precision, otherwise >1.In the given task the priority is given to the recall as far as the accuracy of the log silhouette boundaries detection relies on the minimum rate of the type II error.Thus the =2 was implemented.
The metrics of segmentation algorithm applied to the test video set are illustrated in Fig. 6.The resulted charts demonstrate how the algorithm characteristics vary depending on variations in threshold p (Fig. 6a) and sensitivity parameter  (Fig. 6b).The F-scores in both charts have clearly defined global maximum.In this case the algorithm provides permissible compromise between the precision and recall for the log segmentation.For this reason the further investigations implement the detector with threshold p=8 and parameter =0,004.For the task of the log boundaries approximation the set Y determines the set of diameters and set X determines the set of lengths.The results of the observed regression methods implementation are shown in Fig. 7.
The noise rate in the input data is high (Fig. 7d, blue column) because of the low contrast of some logs and adverse impact of the conveyor elements and bark.The main disadvantage of the polynomial regression is sensitivity to the spikes in the input data.The sufficient deviation of the approximation function from the real boundary of the log near the minimum and maximum x values (edge effect) is evidenced by using the polynomial of degree k > 1.The methods based on locally weighted smoothing and random sample are less sensitivity to the problem of spikes and edge effect.The average results of the regression algorithm implementation are shown in Table 1.The approximation error is calculated according to the formula: Analysis of the Table 1 allows us to conclude that the RANSAC has the best performance among the observed methods.The RANSAC method is tolerant to noisy input data and provides the relevant connection approximation with minimum mean square error 0,045 ± 0,041.
The log detection and dimension algorithm introduced within this paper was programmed in C++.It was tested on the PC Intel Core i7, 2800 Mhz, 6Gb DDR RAM, GeForce GTS 450.The operation speed of the algorithm provides processing of the video sequence of 384x288 frame size at 25 frames per second.Thus the algorithm meets the requirement for the implementation in the real-time machine vision system for round timber sorting.

V. CONCLUSIONS AND FURTHER WORK
The problem of logs dimensions and form determination during their passing through the conveyor was observed within this paper.The principal feature of this task is that the input data in the form of digitalized video sequence is obtained by using single camera.The results of logs segmentation allow us to conclude that the image can be separated into background and foreground regions by using quite simple subtraction methods.These methods have successful performance for the cases of the static background.When the global changes of the scene, i.e. movement of the conveyor parts, bark or as a result of the camera vibration, are happened the impropriate image pixels non-related to any log can be selected even with periodically updated background model.The implementation of the morphological operations partially solves this problem.The result of the segmentation can be recognized as satisfactory as far as algorithm provides quality of the detection at the rate of 96,9% true positive rate with 2,9•10 -2 false positive rate.
The results of the regression and log surface reconstruction experiment show that the RANSAC has the best performance among the observed methods.Moreover the implementation of RANSAC allow eliminating effects of improper segmentation.
The further development of this project is in the adaptation of the system to the two-camera mode for the log surface reconstruction with higher accuracy.This approach also provides an opportunity to estimate not only quantity (volume, length) but the quality characteristics of logs, such as crook, ovality and buttswell.

Fig. 1
Fig. 1 Sample images from the video sequence of the logs passing through the conveyor

Fig. 2
Fig. 2 Log segmentation a) background expectation value b) background mean square deviation c) input frame d) background model subtraction result (log silhouette)

Fig. 4
illustrates the process of log capturing into image P at a height Z over a conveyor plane E. Points a and b are the images of the boundary point A and B of the circle crosssection of the conic surface of a log.These points are located at the distances ao = r and bo = r from the central point.Intervals SA, SB are tangents to the circle of radius R. The problem is to find a real radius R of the object by given value r.

Fig. 3
Fig. 3 Result of the log boundaries detection

Fig. 4
Fig. 4 Result of the log boundaries detection Second experiment was dedicated to the problem of the real log boundaries recovery from noisy input data.This problem can be formulated in terms of regression analysis as following.The set of objects X and set of possible response Y are given.The relevant connection y * :XY exists, which true values are known for the test sample only.The transformation y:XY which provides minimum mean square error for test sample is to be found: 15) where diameter obtained by the observed algorithm; diameter nominal value.Diameter nominal values were founded manually for each test log.a) b) Fig. 6 Detector adjustment.a) binarization threshold p b) background sensitivity parameter TABLE I. MEAN SQUARE ERROR FOR THE REGRESSION METHODS