Binary Segmentation Methods for Identifying Boundaries of Spatial Domains

Nishanthi Raveendran; Georgy Sofronov

Binary Segmentation Methods for Identifying Boundaries of Spatial Domains

Nishanthi Raveendran, Georgy Sofronov

DOI: http://dx.doi.org/10.15439/2017F206

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 95–102 (2017)

Full text

Abstract. Spatial clustering is an important component of spatial data analysis which aims in identifying the boundaries of domains and their number. It is commonly used in disease surveillance, spatial epidemiology, population genetics, landscape ecology, crime analysis and many other fields. In this paper, we focus on identifying homogeneous sub-regions in binary data, which indicate the presence or absence of a certain plant species which are observed over a two-dimensional lattice. The problem of finding regional homogeneous domains is known as segmentation, partitioning or clustering. To solve this clustering problem we propose to use the change-point methodology. We develop new methods based on a binary segmentation algorithm, which is a well-known multiple change-point detection method. The proposed algorithms are applied to artificially generated data to illustrate their usefulness. Our results show that the proposed methodologies are effective in identifying multiple domains and their boundaries in two dimensional spatial data.

References

T. Y. Yang and T.B Swartz, “Application of binary segmentation to the estimation of quantal response curves and spatial intensity,” Biometrical journal, vol. 4, 2005, pp. 489–501. https://doi.org/10.1002/bimj.200310136
S. S. Chen and P. S. Gopalakrishnan, “Clustering via the Bayesian information criterion with applications in speech recognition,” In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1988 IEEE International Conference on, vol. 2, 1998, pp. 645–648. https://doi.org/10.1109/ICASSP.1998.675347
A. K. Tung, J. Hou and J. Han, “Spatial clustering in the presence of obstacles,” In Data Engineering, 2001. Proceedings. 17th International Conference on, IEEE, 2001, pp. 359–367. https://doi.org/10.1109/ICDE.2001.914848
R. E. Gangnon and M.K Clayton, “Bayesian detection and modeling of spatial disease clustering,” Biometrics, vol. 3, 2000, pp. 922–935. https://doi.org/10.1111/j.0006-341X.2000.00922.x
C. Anderson, D. Lee and N. Dean, “Bayesian cluster detection via adjacency modelling.” Spatial and spatio-temporal epidemiology, 2016, pp. 11–20. https://doi.org/10.1016/j.sste.2015.11.005
B. Beckage, L. Joseph, P. Belisle, D. B. Wolfson and W. J. Platt, “Bayesian change-point analyses in ecology,” New Phytologist, vol. 2, 2007, pp. 11–20. https://doi.org/10.1111/j.1469-8137.2007.01991.x
I. López, M. Gámez, J. Garay, T. Standovár and Z. Varga, “Applications of change-point problem to the detection of plant patches,” Acta biotheoretica, vol. 1, 2010, pp. 51–63. https://doi.org/10.1007/s10441-009-9093-x
S. Tripathi and R. S. Govindaraju, “Change detection in rainfall and temperature patterns over India,” In Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data, ACM, 2009, pp. 133–141. https://doi.org/10.1145/1601966.1601988
J. D. Helterbrand, N. Cressie and J. L. Davidson, “A statistical approach to identifying closed object boundaries in images,” Advances in applied probability, vol. 4, 1994, pp. 831–854. https://doi.org/10.1017/S0001867800026641
Y. Wang, “Change curve estimation via wavelets,” Journal of the American Statistical Association, vol. 441, 1998, pp. 163–172. http://dx.doi.org/10.1080/01621459.1998.10474098
G. J. Upton and B. Fingleton, “Spatial data analysis by example. vol. 1: Point pattern and quantitative data,” Chichester: Wiley, vol. 1, 1985.
A. D. Cliff and J. K. Ord, Spatial processes: models & applications, Taylor & Francis, 1981.
N. Cressie, Statistics for spatial data, John Wiley and Sons, 2015.
A. J. Scott and M. Knott, “A cluster analysis method for grouping means in the analysis of variance,” Biometrics, 1974, pp. 507–512. https://doi.org/10.2307/2529204
A. Sen and M. S. Srivastava, “On tests for detecting change in mean,” The Annals of statistics, vol. 1, 1975, pp. 98–108. https://doi.org/10.1214/aos/1176343001
L. Vostrikova, “Detection of the disorder in multidimensional random-point problems,” Doklady Akademii Nauk SSSR, vol. 2, 1998, pp. 270–274.
I. A. Eckley, P. Fearnhead and R. Killick, “Analysis of changepoint models,” Bayesian Time Series Models, 2011, pp. 205–224.
E. S. Venkatraman, Consistency results in multiple change-point problems, PhD thesis, to the Department of Statistics. Stanford University. 1992.
R. Killick, P. Fearnhead and I. A. Eckley “Optimal detection of change-points with a linear computational cost,” Journal of the American Statistical Association, vol. 500, 2012, pp. 1590–1598. http://dx.doi.org/10.1080/01621459.2012.737745
W. Li, “DNA segmentation as a model selection process,” In proceedings of the fifth annual international conference on Computational biology, ACM, 2001, pp. 204–210. http://dx.doi.org/10.1145/369133.369202
H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 6, 1974, pp. 716–723. http://dx.doi.org/10.1109/TAC.1974.1100705
G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 2, 1978, pp. 461–464. http://dx.doi.org/10.1214/aos/1176344136
J. Chen, A. Gupta and J. Pan “Information criterion and change point problem for regular models,” Sankhyā: The Indian Journal of Statistics, 2006, pp. 252–282.
J. Chen and A. K. Gupta, “Testing and locating variance changepoints with application to stock prices,” Journal of the American Statistical association, vol. 438, 1997, pp. 739–747. http://dx.doi.org/10.1080/01621459.1997.10474026
T. Young Yang and L. Kuo, “Bayesian binary segmentation procedure for a poisson process with multiple changepoints,” Journal of Computational and Graphical Statistics, vol. 4, 1998, pp. 772–785. http://dx.doi.org/10.1198/106186001317243449
R. Killick, I. A. Eckley, P. Jonathan and U. Chester, “Efficient detection of multiple changepoints within an oceano-graphic time series,” In Proceedings of the 58th World Science Congress of ISI, 2011.
P. Fryzlewicz, “Wild binary segmentation for multiple change-point detection,” The Annals of Statistics, vol. 6, 2014, pp. 2243–2281. http://dx.doi.org/10.1214/14-AOS1245
W. J. R. M. Priyadarshana, The cross-entropy method and multiple change-point detection in genomic sequences, PhD thesis, Macquarie University, 2015.
H. Cho and P. Fryzlewicz, “Multiple-change-point detection for high dimensional time series via sparsified binary segmentation,” Journal of the Royal Statistical Society: series B (statistical methodology), vol. 2, 2015, pp. 475–507. http://dx.doi.org/10.1111/rssb.12079
A. B. Olshen, E. Venkatraman, R. Lucito and M. Wigler,“Circular binary segmentation for the analysis of array-based DNA copy number data,” Biostatistics, vol. 4, 2004, pp. 557–572. https://doi.org/10.1093/biostatistics/kxh008
J. Chen and A. K. Gupta, Parametric statistical change point analysis: with applications to genetics, medicine, and finance, Springer Science & Business Media, 2011.
D. V. Hinkley and E. A. Hinkley, “Inference about the change-point in a sequence of binomial variables,” Biometrika, vol. 3, 1970, pp. 477–488. https://doi.org/10.2307/2334766
B. Jackson, J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan and T. T. Tsai,“An algorithm for optimal partitioning of data on an interval,” IEEE Signal Processing Letters, vol. 12, 2005, pp. 105–108. https://doi.org/10.1109/LSP.2001.838216
J. Pan and J. Chen, “Application of modified information criterion to multiple change point problems,” Journal of multivariate analysis, vol. 10, 2006, pp. 2221–2241. https://doi.org/10.1016/j.jmva.2006.05.009
A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory of Probability & Its Applications, vol. 1, 1963, pp. 22–46. https://doi.org/10.1137/1108002
N. R. Zhang and D. O. Siegmund, “A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data,” Biometrics, vol. 1, 2007, pp. 22–32. https://doi.org/10.1111/j.1541-0420.2006.00662.x
T. Polushina and G. Sofronov, “Change-point detection in binary Markov DNA sequences by the Cross-Entropy method,” In Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on, IEEE, vol. 2, 2014, pp. 471–478. https://doi.org/10.15439/2014F88
G. Yu Sofronov, G. E. Evans, J. M. Keith and D. P. Kroese, “Identifying change-points in biological sequences via sequential importance sampling,” In Environmental Modeling & Assessment, vol. 5, 2009, pp. 577–584. https://doi.org/10.1007/s10666-008-9160-8
G. Sofronov, “Change-Point Modelling in Biological Sequences via the Bayesian Adaptive Independent Sampler,” In International Proceedings of Computer Science and Information Technology, vol. 5, 2011, pp. 122–126.
M. Manuguerra, G. Sofronov, M. Tani and G. Heller, “Monte Carlo methods in spatio-temporal regression modeling of migration in the EU,” In Computational Intelligence for Financial Engineering & Economics (CIFEr), 2013 IEEE Conference on, IEEE, 2013, pp. 128–134. https://doi.org/10.1109/CIFEr.2013.6611708