Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 13

Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems

Binary Segmentation Methods for Identifying Boundaries of Spatial Domains

,

DOI: http://dx.doi.org/10.15439/2017F206

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 95102 ()

Full text

Abstract. Spatial clustering is an important component of spatial data analysis which aims in identifying the boundaries of domains and their number. It is commonly used in disease surveillance, spatial epidemiology, population genetics, landscape ecology, crime analysis and many other fields. In this paper, we focus on identifying homogeneous sub-regions in binary data, which indicate the presence or absence of a certain plant species which are observed over a two-dimensional lattice. The problem of finding regional homogeneous domains is known as segmentation, partitioning or clustering. To solve this clustering problem we propose to use the change-point methodology. We develop new methods based on a binary segmentation algorithm, which is a well-known multiple change-point detection method. The proposed algorithms are applied to artificially generated data to illustrate their usefulness. Our results show that the proposed methodologies are effective in identifying multiple domains and their boundaries in two dimensional spatial data.

References

  1. T. Y. Yang and T.B Swartz, “Application of binary segmentation to the estimation of quantal response curves and spatial intensity,” Biometrical journal, vol. 4, 2005, pp. 489–501. https://doi.org/10.1002/bimj.200310136
  2. S. S. Chen and P. S. Gopalakrishnan, “Clustering via the Bayesian information criterion with applications in speech recognition,” In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1988 IEEE International Conference on, vol. 2, 1998, pp. 645–648. https://doi.org/10.1109/ICASSP.1998.675347
  3. A. K. Tung, J. Hou and J. Han, “Spatial clustering in the presence of obstacles,” In Data Engineering, 2001. Proceedings. 17th International Conference on, IEEE, 2001, pp. 359–367. https://doi.org/10.1109/ICDE.2001.914848
  4. R. E. Gangnon and M.K Clayton, “Bayesian detection and modeling of spatial disease clustering,” Biometrics, vol. 3, 2000, pp. 922–935. https://doi.org/10.1111/j.0006-341X.2000.00922.x
  5. C. Anderson, D. Lee and N. Dean, “Bayesian cluster detection via adjacency modelling.” Spatial and spatio-temporal epidemiology, 2016, pp. 11–20. https://doi.org/10.1016/j.sste.2015.11.005
  6. B. Beckage, L. Joseph, P. Belisle, D. B. Wolfson and W. J. Platt, “Bayesian change-point analyses in ecology,” New Phytologist, vol. 2, 2007, pp. 11–20. https://doi.org/10.1111/j.1469-8137.2007.01991.x
  7. I. López, M. Gámez, J. Garay, T. Standovár and Z. Varga, “Applications of change-point problem to the detection of plant patches,” Acta biotheoretica, vol. 1, 2010, pp. 51–63. https://doi.org/10.1007/s10441-009-9093-x
  8. S. Tripathi and R. S. Govindaraju, “Change detection in rainfall and temperature patterns over India,” In Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data, ACM, 2009, pp. 133–141. https://doi.org/10.1145/1601966.1601988
  9. J. D. Helterbrand, N. Cressie and J. L. Davidson, “A statistical approach to identifying closed object boundaries in images,” Advances in applied probability, vol. 4, 1994, pp. 831–854. https://doi.org/10.1017/S0001867800026641
  10. Y. Wang, “Change curve estimation via wavelets,” Journal of the American Statistical Association, vol. 441, 1998, pp. 163–172. http://dx.doi.org/10.1080/01621459.1998.10474098
  11. G. J. Upton and B. Fingleton, “Spatial data analysis by example. vol. 1: Point pattern and quantitative data,” Chichester: Wiley, vol. 1, 1985.
  12. A. D. Cliff and J. K. Ord, Spatial processes: models & applications, Taylor & Francis, 1981.
  13. N. Cressie, Statistics for spatial data, John Wiley and Sons, 2015.
  14. A. J. Scott and M. Knott, “A cluster analysis method for grouping means in the analysis of variance,” Biometrics, 1974, pp. 507–512. https://doi.org/10.2307/2529204
  15. A. Sen and M. S. Srivastava, “On tests for detecting change in mean,” The Annals of statistics, vol. 1, 1975, pp. 98–108. https://doi.org/10.1214/aos/1176343001
  16. L. Vostrikova, “Detection of the disorder in multidimensional random-point problems,” Doklady Akademii Nauk SSSR, vol. 2, 1998, pp. 270–274.
  17. I. A. Eckley, P. Fearnhead and R. Killick, “Analysis of changepoint models,” Bayesian Time Series Models, 2011, pp. 205–224.
  18. E. S. Venkatraman, Consistency results in multiple change-point problems, PhD thesis, to the Department of Statistics. Stanford University. 1992.
  19. R. Killick, P. Fearnhead and I. A. Eckley “Optimal detection of change-points with a linear computational cost,” Journal of the American Statistical Association, vol. 500, 2012, pp. 1590–1598. http://dx.doi.org/10.1080/01621459.2012.737745
  20. W. Li, “DNA segmentation as a model selection process,” In proceedings of the fifth annual international conference on Computational biology, ACM, 2001, pp. 204–210. http://dx.doi.org/10.1145/369133.369202
  21. H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 6, 1974, pp. 716–723. http://dx.doi.org/10.1109/TAC.1974.1100705
  22. G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 2, 1978, pp. 461–464. http://dx.doi.org/10.1214/aos/1176344136
  23. J. Chen, A. Gupta and J. Pan “Information criterion and change point problem for regular models,” Sankhyā: The Indian Journal of Statistics, 2006, pp. 252–282.
  24. J. Chen and A. K. Gupta, “Testing and locating variance changepoints with application to stock prices,” Journal of the American Statistical association, vol. 438, 1997, pp. 739–747. http://dx.doi.org/10.1080/01621459.1997.10474026
  25. T. Young Yang and L. Kuo, “Bayesian binary segmentation procedure for a poisson process with multiple changepoints,” Journal of Computational and Graphical Statistics, vol. 4, 1998, pp. 772–785. http://dx.doi.org/10.1198/106186001317243449
  26. R. Killick, I. A. Eckley, P. Jonathan and U. Chester, “Efficient detection of multiple changepoints within an oceano-graphic time series,” In Proceedings of the 58th World Science Congress of ISI, 2011.
  27. P. Fryzlewicz, “Wild binary segmentation for multiple change-point detection,” The Annals of Statistics, vol. 6, 2014, pp. 2243–2281. http://dx.doi.org/10.1214/14-AOS1245
  28. W. J. R. M. Priyadarshana, The cross-entropy method and multiple change-point detection in genomic sequences, PhD thesis, Macquarie University, 2015.
  29. H. Cho and P. Fryzlewicz, “Multiple-change-point detection for high dimensional time series via sparsified binary segmentation,” Journal of the Royal Statistical Society: series B (statistical methodology), vol. 2, 2015, pp. 475–507. http://dx.doi.org/10.1111/rssb.12079
  30. A. B. Olshen, E. Venkatraman, R. Lucito and M. Wigler,“Circular binary segmentation for the analysis of array-based DNA copy number data,” Biostatistics, vol. 4, 2004, pp. 557–572. https://doi.org/10.1093/biostatistics/kxh008
  31. J. Chen and A. K. Gupta, Parametric statistical change point analysis: with applications to genetics, medicine, and finance, Springer Science & Business Media, 2011.
  32. D. V. Hinkley and E. A. Hinkley, “Inference about the change-point in a sequence of binomial variables,” Biometrika, vol. 3, 1970, pp. 477–488. https://doi.org/10.2307/2334766
  33. B. Jackson, J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan and T. T. Tsai,“An algorithm for optimal partitioning of data on an interval,” IEEE Signal Processing Letters, vol. 12, 2005, pp. 105–108. https://doi.org/10.1109/LSP.2001.838216
  34. J. Pan and J. Chen, “Application of modified information criterion to multiple change point problems,” Journal of multivariate analysis, vol. 10, 2006, pp. 2221–2241. https://doi.org/10.1016/j.jmva.2006.05.009
  35. A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory of Probability & Its Applications, vol. 1, 1963, pp. 22–46. https://doi.org/10.1137/1108002
  36. N. R. Zhang and D. O. Siegmund, “A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data,” Biometrics, vol. 1, 2007, pp. 22–32. https://doi.org/10.1111/j.1541-0420.2006.00662.x
  37. T. Polushina and G. Sofronov, “Change-point detection in binary Markov DNA sequences by the Cross-Entropy method,” In Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on, IEEE, vol. 2, 2014, pp. 471–478. https://doi.org/10.15439/2014F88
  38. G. Yu Sofronov, G. E. Evans, J. M. Keith and D. P. Kroese, “Identifying change-points in biological sequences via sequential importance sampling,” In Environmental Modeling & Assessment, vol. 5, 2009, pp. 577–584. https://doi.org/10.1007/s10666-008-9160-8
  39. G. Sofronov, “Change-Point Modelling in Biological Sequences via the Bayesian Adaptive Independent Sampler,” In International Proceedings of Computer Science and Information Technology, vol. 5, 2011, pp. 122–126.
  40. M. Manuguerra, G. Sofronov, M. Tani and G. Heller, “Monte Carlo methods in spatio-temporal regression modeling of migration in the EU,” In Computational Intelligence for Financial Engineering & Economics (CIFEr), 2013 IEEE Conference on, IEEE, 2013, pp. 128–134. https://doi.org/10.1109/CIFEr.2013.6611708