A judicial surprise

I’ve been scraping court decisions from the New York, California, and Texas high courts (they’re the biggest legal markets). Much to my surprise, when comparing 100 decisions relating to property law, I found that New York was the least wordy. A quick wc  of the json files I exported from Mongo showed 33,000 words for the 100 NY decisions vs 50,ooo words for Texas vs 83,000 words for California. California state courts are famously liberal (just like their federal courts) but clearly liberal leanings don’t account for the word count given that Texas is smack in the middle of New York and California.

Also a few interesting things from their word clouds:



Here’s what jumped out at me.

  • This word cloud is missing the big defend we see in the other two word clouds. It’s there, but it’s tiny.
  • Plaintiff  also gets pretty short shrift. Do people get referred to their names/non-trial roles more often in TX decisions?
  • On the other hand, trial seems to get more play in TX than elsewhere. Maybe in CA and NY there’s far more motion practice and appeals related to those non-trial-type questions of law? Is that another way of saying NY and CA could be more litigious, because trials become less important?
  • I found tort and contract in this word cloud but not in the others. Perhaps most property-related issues in TX actually come under tort/contract law, and this isn’t the case in other states?




Couple of things to notice:

  • constitutional stuff is reasonably prominent. This is the same in NY, but no in TX. Is this because high real estate values make it worthwhile to do takings-type lawsuits, or is there another reason?
  • community comes up a lot in CA, as it does in TX, but not in NY.

New York


And rounding out this super-scientific word cloud survey, here’s what grabbed my attention in the NY cloud.

  • law is only important in the NY word cloud. That’s a funny one. Is this a signature of the uptight Northeast approach to, like, legal stuff?
  • town is much more prominent, or perhaps only present, in the NY cloud (don’t trust my hasty assessment). Does this mean there’s more municipal litigation in NY? CA and TX seemed to prefer community.


Couple of interesting uniformities in the word cloud:

  • Seems like all courts like to talk about themselves, because court is always a very popular word. In fairness, they may also be referring to lower courts earlier in a process in the case of appeals decisions. But still.
  • person got fairly important coverage in all states, which makes me think the states all have a lot of decisions talking about a theoretical person to give legal examples. I take this as a sign of progress, as I would guess (and would love to verify some day if the sources are available) that language is probably fairly gendered if you look at older cases than what I scraped (these cases were mostly from the last thirty years).

So that’s some fun with judicial scraping and word clouds.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s