Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

Properties and Limits of Supercombinator Set Acquired from Context-free Grammar Samples

,

DOI: http://dx.doi.org/10.15439/2017F249

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 711720 ()

Full text

Abstract. We present an improved version of algorithm that can transform any context-free grammar into a supercombinator form. Such a form is composed only of lambda calculus' supercombinators that are enriched by grammar operations. The main properties of this form are non-redundancy and scalability. We show the improvements that we've made to create smaller supercombinator set than in our previous algorithm's version. We present experiments performed on Context-free grammars obtained by transformation from Groningen meaning bank corpus. Experiments confirm that our form has a theoretical maximum limit of possible supercombinators. That limit is a mathematical sequence called Catalan number. We show that in some cases we are able to reach that limit if we use large enough input data source and we limit the size of supercombinator permitted into the final set. We also describe another benefit of our algorithm, which is the identification of most reoccurring structures in the input set.

References

  1. R. J. M. Hughes, “Super-combinators a new implementation method for applicative languages,” in Proceedings of the 1982 ACM symposium on LISP and functional programming. ACM, 1982. http://dx.doi.org/10.1145/800068.802129 pp. 1–10. [Online]. Available: http://dx.doi.org/10.1145/800068.802129
  2. P. Klint, R. Lämmel, and C. Verhoef, “Toward an engineering discipline for grammarware,” ACM Trans. Softw. Eng. Methodol., vol. 14, no. 3, pp. 331–380, Jul. 2005. http://dx.doi.org/10.1145/1072997.1073000. [Online]. Available: http://doi.acm.org/10.1145/1072997.1073000
  3. J. Kollár, M. Sičák, and M. Spišiak, “Towards machine mind evolution,” in Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on. IEEE, 2015. http://dx.doi.org/10.15439/2015F210 pp. 985–990. [Online]. Available: http://dx.doi.org/10.15439/2015F210
  4. M. Sičák and J. Kollár, “Supercombinator set construction from a context-free representation of text,” in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016. http://dx.doi.org/10.15439/2016F334 pp. 503–512. [Online]. Available: http://dx.doi.org/10.15439/2016F334
  5. C. G. Nevill-Manning and I. H. Witten, “Identifying hierarchical strcture in sequences: A linear-time algorithm,” J. Artif. Intell. Res.(JAIR), vol. 7, pp. 67–82, 1997. http://dx.doi.org/10.1613/jair.374. [Online]. Available: http://dx.doi.org/10.1613/jair.374
  6. V. Basile, J. Bos, K. Evang, and N. Venhuizen, “Developing a large semantically annotated corpus,” in LREC 2012, Eighth International Conference on Language Resources and Evaluation, 2012. [Online]. Available: https://hal.inria.fr/hal-01389432
  7. E. M. Gold, “Language identification in the limit,” Information and control, vol. 10, no. 5, pp. 447–474, 1967. http://dx.doi.org/10.1016/S0019-9958(67)91165-5. [Online]. Available: http://dx.doi.org/10.1016/S0019-9958(67)91165-5
  8. L. Onnis, H. R. Waterfall, and S. Edelman, “Learn locally, act globally: Learning language from variation set cues,” Cognition, vol. 109, no. 3, pp. 423–430, 2008. http://dx.doi.org/10.1016/j.cognition.2008.10.004. [Online]. Available: http://dx.doi.org/10.1016/j.cognition.2008.10.004
  9. A. Stevenson and J. R. Cordy, “Grammatical inference in software engineering: an overview of the state of the art,” in Software Language Engineering. Springer, 2013, pp. 204–223. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-36089-3_12
  10. J. Kollár, M. Spišiak, and M. Sičák, “Abstract language of the machine mind,” Acta Electrotechnica et Informatica, vol. 15, no. 3, pp. 24–31, 2015. http://dx.doi.org/10.15546/aeei-2015-0025. [Online]. Available: http://dx.doi.org/10.15546/aeei-2015-0025
  11. M. Sičák, “Higher order regular expressions,” in Engineering of Modern Electric Systems (EMES), 2015 13th International Conference on. IEEE, 2015. http://dx.doi.org/10.1109/EMES.2015.7158427 pp. 1–4. [Online]. Available: http://dx.doi.org/10.1109/EMES.2015.7158427
  12. M. Steedman and J. Baldridge, “Combinatory categorial grammar,” Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley-Blackwell, 2011.
  13. D. Hrnčič, M. Mernik, B. R. Bryant, and F. Javed, “A memetic grammar inference algorithm for language learning,” Applied Soft Computing, vol. 12, no. 3, pp. 1006–1020, 2012. http://dx.doi.org/10.1016/j.asoc.2011.11.024. [Online]. Available: http://dx.doi.org/10.1016/j.asoc.2011.11.024
  14. A. Clark, “Distributional learning of some context-free languages with a minimally adequate teacher,” in Grammatical Inference: Theoretical Results and Applications. Springer, 2010, pp. 24–37. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-15488-1_4
  15. A. Dubey, P. Jalote, and S. K. Aggarwal, “Learning context-free grammar rules from a set of program,” IET software, vol. 2, no. 3, pp. 223–240, 2008. http://dx.doi.org/10.1049/iet-sen:20070061. [Online]. Available: http://dx.doi.org/10.1049/iet-sen:20070061
  16. C. De La Higuera, “A bibliographical study of grammatical inference,” Pattern recognition, vol. 38, no. 9, pp. 1332–1348, 2005. http://dx.doi.org/10.1016/j.patcog.2005.01.003. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2005.01.003
  17. J. F. Power and B. A. Malloy, “A metrics suite for grammar-based software,” Journal of Software Maintenance and Evolution: Research and Practice, vol. 16, no. 6, pp. 405–426, 2004. http://dx.doi.org/10.1002/smr.293. [Online]. Available: https://doi.org/10.1002/smr.293
  18. M. Črepinšek, T. Kosar, M. Mernik, J. Cervelle, R. Forax, and G. Roussel, “On automata and language based grammar metrics,” Computer Science and Information Systems, vol. 7, no. 2, pp. 309–329, 2010. http://dx.doi.org/10.2298/CSIS1002309C. [Online]. Available: https://doi.org/10.2298/CSIS1002309C
  19. N. Carvalho, J. J. Almeida, M. J. Pereira, and P. Henriques, “Probabilistic synset based concept location,” in SLATe’12—Symposium on Languages, Applications and Technologies. Alberto Simões and Ricardo Queirós and Daniela da Cruz, 2012. http://dx.doi.org/10198/7062 pp. 239–253. [Online]. Available: http://hdl.handle.net/10198/7062
  20. S. Ristić, S. Kordić, M. Čeliković, V. Dimitrieski, and I. Luković, “A model-driven approach to data structure conceptualization,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F224 pp. 977–984. [Online]. Available: http://dx.doi.org/10.15439/2015F224
  21. D. Lakatos, J. Poruban, and M. Bacikova, “Declarative specification of references in dsls,” in Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. IEEE, 2013, pp. 1527–1534.
  22. S. Chodarev, “Development of human-friendly notation for xml-based languages,” in Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 8. IEEE, 2016. http://dx.doi.org/10.15439/2016F530 pp. 1565–1571. [Online]. Available: http://dx.doi.org/10.15439/2016F530