Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

Unsupervised tool for quantification of progress in L2 English phraseological

, ,

DOI: http://dx.doi.org/10.15439/2017F433

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 383388 ()

Full text

Abstract. This study aimed to aid the enormous effort required to analyze phraseological writing competence by developing an automatic evaluation tool for texts. We attempted to measure both second language (L2) writing proficiency and text quality. In our research, we adapted the CollGram technique that searches a reference corpus to determine the frequency of each pair of tokens (bi-grams) and calculates the t-score and related information. We used the Level 3 Corpus of Contemporary American English as a reference corpus. Our solution performed well in writing evaluation and is freely available as a web service or as source for other researchers.


  1. J. Sinclair, John. „Corpus, concordance, collocation.” Oxford University Press, 1991.
  2. R. Ellis. „Understanding second language acquisition.” Oxford, UK: Oxford University Press, 1985
  3. M. Lewis. „The lexical approach: The state of ELT and a way forward.” Hove, UK: Language Teaching Publications, 1993 Office of Qualifications and Examinations Regulation, „Functional Skills Criteria for English Entry 1, Entry 2, Entry 3, Level 1 and Level 2”, 2011
  4. R. Garside, N. Smith. „A hybrid grammatical tagger: CLAWS4”, in Garside, R., Leech, G., and McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London, pp. 102-121, 1997
  5. N. Storch. „The impact of studying in a second language (L2) medium university on the development of L2 writing.” Journal of Second Language Writing, 18, 103-118, 2009, http://dx.doi.org/10.1016/j.jslw.2009.02.003
  6. N. Ellis. „Construction, chunking, and connectionism: The emergence of second language structure.” In C. J. Doughty &
  7. M. H. Long (Eds.), The handbook of second language acquisition (pp. 63-103). Malden, MA: Blackwell, 2003, http://dx.doi.org/10.1002/9780470756492.ch4
  8. Y. Bestgen, S. Granger. „Quantifying the development of phraseological competence in L2 English writing: An automated approach”. Journal of Second Language Writing, 2014, 26: 28-41, http://dx.doi.org/10.1016/j.jslw.2014.09.004
  9. P. Durrant, N. Schmitt. „To what extent do native and non-native writers make use of collocations?” IRAL: International Review of Applied Linguistics in Language Teaching, 47, 157-177, 2009, http://dx.doi.org/10.1515/iral.2009.007
  10. J. Billiet, B. Maddens, R. Beerten. „National identity and attitude toward foreigners.” in a multinational state: A replication. Political Psychology, 2003, 24.2: 241-257, http://dx.doi.org/10.1111/0162-895X.00327
  11. S. Granger, Y. Bestgen. „The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study.” International Review of Applied Linguistics in Language Teaching, 2014, 52.3: 229-252, http://dx.doi.org/10.1515/iral-2014-0011
  12. K. Wołk, K. Marasek. “Polish – English Speech Statistical Machine Translation Systems for the IWSLT 2014.”, Proceedings of the 11th International Workshop on Spoken Language Translation, Tahoe Lake, USA, 2014, p. 143-149, http://dx.doi.org/10.13140/RG.2.1.1128.9204
  13. S. Evert, "Corpora and collocations." Corpus linguistics. An international handbook 2, 2008, p. 1212-1248, DOI: 10.1515/9783110213881.2.1212
  14. Zhang, Y., Vogel, S., & Waibel, A. (May 2004). Interpreting BLEU/NIST scores: How much improvement do we need to have a better system?. In LREC.
  15. Ma, W. Y., Ju, Y. C., He, X., & Deng, L. (2014). Language Model Adaptation through Shared Linear Transformations.