A Comparative Study of Classifying Legal Documents with Neural Networks

Samir Undavia; Adam Meyers; John Ortega

A Comparative Study of Classifying Legal Documents with Neural Networks

Samir Undavia, Adam Meyers, John Ortega

DOI: http://dx.doi.org/10.15439/2018F227

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 515–522 (2018)

Full text

Abstract. In recent years, deep learning has shown promising results when used in the field of natural language processing (NLP). Neural networks (NNs) such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used for various NLP tasks including sentiment analysis, information retrieval, and document classification. In this paper, we the present the Supreme Court Classifier (SCC), a system that applies these methods to the problem of document classification of legal court opinions. We compare methods using traditional machine learning with recent NN-based methods. We also present a CNN used with pre-trained word vectors which shows improvements over the state-of-the-art applied to our dataset. We train and evaluate our system using the Washington University School of Law Supreme Court Database (SCDB). Our best system (word2vec + CNN) achieves 72.4\% accuracy when classifying the court decisions into 15 broad SCDB categories and 31.9\% accuracy when classifying among 279 finer-grained SCDB categories.

References

J. Wood, “Source-lda: Enhancing probabilistic topic models using prior knowledge sources,” CoRR, vol. abs/1606.00577, 2016. [Online]. Available: http://arxiv.org/abs/1606.00577
H. J. Spaeth, L. Epstein, A. D. Martin, J. A. Segal, T. J. Ruger, and S. C. Benesh, “2017 supreme court database, version 2017 release 01,” 2017. [Online]. Available: http://supremecourtdatabase.org
D. R. Cox, “The regression analysis of binary sequences,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 215–242, 1958.
O. Sulea, M. Zampieri, S. Malmasi, M. Vela, L. P. Dinu, and J. van Genabith, “Exploring the use of text classification in the legal domain,” CoRR, vol. abs/1710.09306, 2017. [Online]. Available: http://arxiv.org/abs/1710.09306
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998, pp. 2278–2324.
J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.
M. Hughes, I. Li, S. Kotoulas, and T. Suzumura, “Medical text classification using convolutional neural networks,” CoRR, vol. abs/1704.06841, 2017. [Online]. Available: http://arxiv.org/abs/1704.06841
A. Conneau, H. Schwenk, L. Barrault, and Y. LeCun, “Very deep convolutional networks for natural language processing,” CoRR, vol. abs/1606.01781, 2016. [Online]. Available: http://arxiv.org/abs/1606.01781
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” CoRR, vol. abs/1409.1259, 2014. [Online]. Available: http://arxiv.org/abs/1409.1259
J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” CoRR, vol. abs/1412.3555, 2014. [Online]. Available: http://arxiv.org/abs/1412.3555
S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” CoRR, vol. abs/1803.01271, 2018. [Online]. Available: http://arxiv.org/abs/1803.01271
F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, Mar. 2002. [Online]. Available: http://doi.acm.org/10.1145/505282.505283
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” CoRR, vol. abs/1310.4546, 2013. [Online]. Available: http://arxiv.org/abs/1310.4546
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in In EMNLP, 2014.
R. Nallapati and C. D. Manning, “Legal docket-entry classification: Where machine learning stumbles,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, ser. EMNLP ’08. Stroudsburg, PA, USA: Association for Computational Linguistics, 2008, pp. 438–446. [Online]. Available: http://dl.acm.org/citation.cfm?id=1613715.1613771
W.-H. Weng, K. B. Wagholikar, A. T. McCray, P. Szolovits, and H. C. Chueh, “Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach,” in BMC Medical Informatics and Decision Making, 2017.
W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative study of CNN and RNN for natural language processing,” CoRR, vol. abs/1702.01923, 2017. [Online]. Available: http://arxiv.org/abs/1702.01923
Y. Kim, “Convolutional neural networks for sentence classification,” CoRR, vol. abs/1408.5882, 2014. [Online]. Available: http://arxiv.org/abs/1408.5882
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944937
D. M. Blei, “Probabilistic topic models,” Commun. ACM, vol. 55, no. 4, pp. 77–84, Apr. 2012. [Online]. Available: http://doi.acm.org/10.1145/2133806.2133826
Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents,” CoRR, vol. abs/1405.4053, 2014. [Online]. Available: http://arxiv.org/abs/1405.4053
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014. [Online]. Available: http://jmlr.org/papers/v15/srivastava14a.html
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
C. B. Do and A. Y. Ng, “Transfer learning for text classification,” in Proceedings of the 18th International Conference on Neural Information Processing Systems, ser. NIPS’05. Cambridge, MA, USA: MIT Press, 2005, pp. 299–306. [Online]. Available: http://dl.acm.org/citation.cfm?id=2976248.2976286