Properties and Limits of Supercombinator Set Acquired from Context-free Grammar Samples

Michal Sičák; Ján Kollár

Properties and Limits of Supercombinator Set Acquired from Context-free Grammar Samples

Michal Sičák, Ján Kollár

DOI: http://dx.doi.org/10.15439/2017F249

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 711–720 (2017)

Full text

Abstract. We present an improved version of algorithm that can transform any context-free grammar into a supercombinator form. Such a form is composed only of lambda calculus' supercombinators that are enriched by grammar operations. The main properties of this form are non-redundancy and scalability. We show the improvements that we've made to create smaller supercombinator set than in our previous algorithm's version. We present experiments performed on Context-free grammars obtained by transformation from Groningen meaning bank corpus. Experiments confirm that our form has a theoretical maximum limit of possible supercombinators. That limit is a mathematical sequence called Catalan number. We show that in some cases we are able to reach that limit if we use large enough input data source and we limit the size of supercombinator permitted into the final set. We also describe another benefit of our algorithm, which is the identification of most reoccurring structures in the input set.

References

R. J. M. Hughes, “Super-combinators a new implementation method for applicative languages,” in Proceedings of the 1982 ACM symposium on LISP and functional programming. ACM, 1982. http://dx.doi.org/10.1145/800068.802129 pp. 1–10. [Online]. Available: http://dx.doi.org/10.1145/800068.802129
P. Klint, R. Lämmel, and C. Verhoef, “Toward an engineering discipline for grammarware,” ACM Trans. Softw. Eng. Methodol., vol. 14, no. 3, pp. 331–380, Jul. 2005. http://dx.doi.org/10.1145/1072997.1073000. [Online]. Available: http://doi.acm.org/10.1145/1072997.1073000
J. Kollár, M. Sičák, and M. Spišiak, “Towards machine mind evolution,” in Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on. IEEE, 2015. http://dx.doi.org/10.15439/2015F210 pp. 985–990. [Online]. Available: http://dx.doi.org/10.15439/2015F210
M. Sičák and J. Kollár, “Supercombinator set construction from a context-free representation of text,” in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016. http://dx.doi.org/10.15439/2016F334 pp. 503–512. [Online]. Available: http://dx.doi.org/10.15439/2016F334
C. G. Nevill-Manning and I. H. Witten, “Identifying hierarchical strcture in sequences: A linear-time algorithm,” J. Artif. Intell. Res.(JAIR), vol. 7, pp. 67–82, 1997. http://dx.doi.org/10.1613/jair.374. [Online]. Available: http://dx.doi.org/10.1613/jair.374
V. Basile, J. Bos, K. Evang, and N. Venhuizen, “Developing a large semantically annotated corpus,” in LREC 2012, Eighth International Conference on Language Resources and Evaluation, 2012. [Online]. Available: https://hal.inria.fr/hal-01389432
E. M. Gold, “Language identification in the limit,” Information and control, vol. 10, no. 5, pp. 447–474, 1967. http://dx.doi.org/10.1016/S0019-9958(67)91165-5. [Online]. Available: http://dx.doi.org/10.1016/S0019-9958(67)91165-5
L. Onnis, H. R. Waterfall, and S. Edelman, “Learn locally, act globally: Learning language from variation set cues,” Cognition, vol. 109, no. 3, pp. 423–430, 2008. http://dx.doi.org/10.1016/j.cognition.2008.10.004. [Online]. Available: http://dx.doi.org/10.1016/j.cognition.2008.10.004
A. Stevenson and J. R. Cordy, “Grammatical inference in software engineering: an overview of the state of the art,” in Software Language Engineering. Springer, 2013, pp. 204–223. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-36089-3_12
J. Kollár, M. Spišiak, and M. Sičák, “Abstract language of the machine mind,” Acta Electrotechnica et Informatica, vol. 15, no. 3, pp. 24–31, 2015. http://dx.doi.org/10.15546/aeei-2015-0025. [Online]. Available: http://dx.doi.org/10.15546/aeei-2015-0025
M. Sičák, “Higher order regular expressions,” in Engineering of Modern Electric Systems (EMES), 2015 13th International Conference on. IEEE, 2015. http://dx.doi.org/10.1109/EMES.2015.7158427 pp. 1–4. [Online]. Available: http://dx.doi.org/10.1109/EMES.2015.7158427
M. Steedman and J. Baldridge, “Combinatory categorial grammar,” Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley-Blackwell, 2011.
D. Hrnčič, M. Mernik, B. R. Bryant, and F. Javed, “A memetic grammar inference algorithm for language learning,” Applied Soft Computing, vol. 12, no. 3, pp. 1006–1020, 2012. http://dx.doi.org/10.1016/j.asoc.2011.11.024. [Online]. Available: http://dx.doi.org/10.1016/j.asoc.2011.11.024
A. Clark, “Distributional learning of some context-free languages with a minimally adequate teacher,” in Grammatical Inference: Theoretical Results and Applications. Springer, 2010, pp. 24–37. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-15488-1_4
A. Dubey, P. Jalote, and S. K. Aggarwal, “Learning context-free grammar rules from a set of program,” IET software, vol. 2, no. 3, pp. 223–240, 2008. http://dx.doi.org/10.1049/iet-sen:20070061. [Online]. Available: http://dx.doi.org/10.1049/iet-sen:20070061
C. De La Higuera, “A bibliographical study of grammatical inference,” Pattern recognition, vol. 38, no. 9, pp. 1332–1348, 2005. http://dx.doi.org/10.1016/j.patcog.2005.01.003. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2005.01.003
J. F. Power and B. A. Malloy, “A metrics suite for grammar-based software,” Journal of Software Maintenance and Evolution: Research and Practice, vol. 16, no. 6, pp. 405–426, 2004. http://dx.doi.org/10.1002/smr.293. [Online]. Available: https://doi.org/10.1002/smr.293
M. Črepinšek, T. Kosar, M. Mernik, J. Cervelle, R. Forax, and G. Roussel, “On automata and language based grammar metrics,” Computer Science and Information Systems, vol. 7, no. 2, pp. 309–329, 2010. http://dx.doi.org/10.2298/CSIS1002309C. [Online]. Available: https://doi.org/10.2298/CSIS1002309C
N. Carvalho, J. J. Almeida, M. J. Pereira, and P. Henriques, “Probabilistic synset based concept location,” in SLATe’12—Symposium on Languages, Applications and Technologies. Alberto Simões and Ricardo Queirós and Daniela da Cruz, 2012. http://dx.doi.org/10198/7062 pp. 239–253. [Online]. Available: http://hdl.handle.net/10198/7062
S. Ristić, S. Kordić, M. Čeliković, V. Dimitrieski, and I. Luković, “A model-driven approach to data structure conceptualization,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F224 pp. 977–984. [Online]. Available: http://dx.doi.org/10.15439/2015F224
D. Lakatos, J. Poruban, and M. Bacikova, “Declarative specification of references in dsls,” in Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. IEEE, 2013, pp. 1527–1534.
S. Chodarev, “Development of human-friendly notation for xml-based languages,” in Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 8. IEEE, 2016. http://dx.doi.org/10.15439/2016F530 pp. 1565–1571. [Online]. Available: http://dx.doi.org/10.15439/2016F530