< Access HERE >
Paper Abstract – LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at https://github.com/LexPredict/lexpredict-lexnlp
Beyond the specific prize attached to upcoming hackathon event, we welcome anyone who (for fun) would like to take a crack at this challenge.
Our LexPredict Challenge is an opportunity to develop basic tools for processing contracts.
Specifically, you will use the sample contract data below to develop algorithms to:
(1) identify the parties to an agreement
(2) identify effective date segment and date
(3) identify termination clause segment(s) and date(s)
At LexPredict, we have built this simple (and other more complex) technology for use in commercial applications. This is an opportunity to use this challenge to produce open source content which we can be used by all (including in the Legal Analytics Course).
From the Abstract: “We explore the idea that authoring a piece of text is an act of maximizing one’s expected utility. To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States. Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text. We incorporate into such a model texts authored by amici curiae (“friends of the court” separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model. We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis. (HT: R.C. Richards from Legal Informatics Blog)
This upcoming week and next week I have the pleasure of teaching “Complex Systems Models in the Social Sciences” here at the University of Michigan ICPSR Summer Program in Quantitative Methods. The field of complex systems is very diverse and it is difficult to do complete justice to the range of scholarship conducted under this umbrella in a short survey course. However, we strive to cover the canonical topics such as computational game theory and computational modeling, network science, natural language processing, randomness vs. determinism, diffusion, cascades, emergence, empirical approaches to study complexity (including measurement), social epidemiology, non-linear dynamics, etc. Click here or on the image above to access my course materials!
This is an ongoing project with Adam Wyner (Dept. of Computer Science @ University of Aberdeen) and Wim Peters (Dept. of Computer Science + NLP Group @ University of Sheffield) … our very initial pilot project was presented at the 2013 Jurix Conference. Slides are located here and the case study paper for the pilot project is located here. Hoping for more to come on this project in 2014!