Parsing with Earley Virtual Machines

Audrius Šaikūnas

Parsing with Earley Virtual Machines

Audrius Šaikūnas

DOI: http://dx.doi.org/10.15439/2017F162

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 165–173 (2017)

Full text

Abstract. Earley parser is a well-known parsing method used to analyse context-free grammars. While being less efficient in practical contexts than other generalized context-free parsing algorithms, such as GLR, it is also more general. As such it can be used as a foundation to build more complex parsing algorithms. We present a new, virtual machine based approach to parsing, heavily based on the original Earley parser. We present several variations of the Earley Virtual Machine, each with increasing feature set. The final iteration of the Earley Virtual Machine is capable of parsing context-free grammars with data-dependant constraints and support grammars with regular right hand sides and regular lookahead. We show how to translate grammars into virtual machine instruction sequences that are then used by the parsing algorithm. Additionally, we present two methods for constructing shared packed parse forests based on the parse input.

References

J. Earley, “An efficient context-free parsing algorithm,” Commun. ACM, vol. 13, no. 2, pp. 94–102, 1970.
T. Jim and Y. Mandelbaum, “Efficient earley parsing with regular right-hand sides,” Electronic Notes in Theoretical Computer Science, vol. 253, no. 7, pp. 135 – 148, 2010.
T. Jim, Y. Mandelbaum, and D. Walker, “Semantics and algorithms for data-dependent grammars,” in Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’10. New York, United States: ACM, 2010, pp. 417–430.
P. Stansifer and M. Wand, “Parsing reflective grammars,” in Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications, ser. LDTA ’11. New York, United States: ACM, 2011, pp. 10:1–10:7.
J. Aycock and N. Horspool, Directly-Executable Earley Parsing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 229–243.
E. Scott, “Sppf-style parsing from earley recognisers,” Electron. Notes Theor. Comput. Sci., vol. 203, no. 2, pp. 53–67, Apr. 2008.
E. Scott and A. Johnstone, “Right nulled glr parsers,” ACM Trans. Program. Lang. Syst., vol. 28, no. 4, pp. 577–618, Jul. 2006.
T. Jim and Y. Mandelbaum, “A new method for dependent parsing,” in Proceedings of the 20th European Conference on Programming Languages and Systems: Part of the Joint European Conferences on Theory and Practice of Software, ser. ESOP’11/ETAPS’11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 378–397.
G. Economopoulos, P. Klint, and J. Vinju, “Faster scannerless glr parsing,” in In Proceedings of the 18th International Conference on Compiler Construction (CC. Springer-Verlag, 2009.
B. Ford, “Packrat parsing: a practical linear-time algorithm with back-tracking,” Master’s thesis, Massachusetts Institute of Technology, 2002.
——, “Parsing expression grammars: A recognition-based syntactic foundation,” SIGPLAN Not., vol. 39, no. 1, pp. 111–122, 2004.
S. Medeiros and R. Ierusalimschy, “A parsing machine for pegs,” in Proceedings of the 2008 Symposium on Dynamic Languages, ser. DLS’08. New York, NY, USA: ACM, 2008, pp. 2:1–2:12.
D. E. Knuth, “Top-down syntax analysis,” Acta Inf., vol. 1, no. 2, pp. 79–110, Jun. 1971.
R. Cox. (2009) Regular expression matching: the virtual machine approach. [Online]. Available: https://swtch.com/~rsc/regexp/regexp2.html