Skip to content

Latest commit

 

History

History
76 lines (46 loc) · 3.37 KB

README.md

File metadata and controls

76 lines (46 loc) · 3.37 KB

law-net

What can we learn by applying network and text analysis to the law? This project contains code to analyze legal text and citation networks using data generously provided by CourtListener and the Supreme Court Database.

Some interesting networks include

  • Supreme Court citation network (27,885 nodes, 234,312 directed edges)
  • Federal Appellate circuit (959,985 nodes, 6,649,916 directed edges)
  • any one of the over 400 jurisdiction subnetworks listed on CourtListener

These all have accompanying opinion text files as well as additional node metadata such as the case date and hand coded issue area (for SCOTUS).

We recently gave a presentation about our exploratory analysis at the PyData conference.

PyData Carolinas

Our code

You can load the SCOTUS subnetwork (saved in this directory as a .graphml file)

import igraph as ig
G = ig.Graph.Read_GraphML('scotus_network.graphml')

User beware: we have not yet make the code clean/robust/user friendly/pleasant/etc -- we will get to this soon. If you have trouble with something please reach out to Iain ([email protected]).

To download much more data see download_data.ipynb. This notebook allows you to work with other jurisdiction subnetworks and the opinion text files. Note the two directories you have to change at the top of the notebook.

One of the functions in download_data.ipynb will set up a data directory. I suggest putting data_dir outside your copy of the github repo or Dropbox. Github doesn't like large data files and Dropbox might slow things down if you do a lot of reading and writing (i.e. for some NLP operations).

About the data

Current we are using data from CourtListener (CL) and the Supreme Court Data Base (SCDB)

  • the citation network comes from CL

  • opinion texts come from CL

  • some case metadata (jurisdiction, data, judges) comes from CL

  • additional case meta data comes from SCDB

    • for issueArea we have coded Missing as 0. Only SCOTUS cases can have issueArea.
  • we identify cases by their CourtListener opinion id

    • CL opinion ids and cluster ids are not necessarily the same. One cluster can have many opinions.

Code dependancies

The code is written in Python 2.7. You need

  • Anaconda

  • igraph

  • nltk

    • after installing nltk run the following commands in python

    import nltk

    nltk.download()

Our group

If you are interested in collaborating feel free to reach out to us! This is a collaboration between

Anna Zhao

Bill Shi

Brendan Schneiderman

Ethan Koch

Iain Carmichael

James Jushchuk

James Wudel

Michael Kim

Shankar Bhamidi