Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document changes and enhancements made to Harvard data during import #61

Open
mlissner opened this issue Jul 30, 2024 · 2 comments
Open
Assignees
Labels
question Further information is requested

Comments

@mlissner
Copy link
Member

@quevon24 and @flooie, we're working with folks from Harvard (and others) to bring our system up to parity with theirs.

A big question that has come up several times is: What did we do to enhance/change/modify/etc the Harvard data while importing it.

Is it possible to document that here so that we can merge our changes in with those Harvard has recently made?

@flooie flooie self-assigned this Aug 6, 2024
@flooie flooie transferred this issue from freelawproject/courtlistener Aug 6, 2024
@flooie flooie added the question Further information is requested label Aug 6, 2024
@flooie
Copy link
Contributor

flooie commented Aug 6, 2024

@mlissner

All of the changes to the source data are in this repository. They generally can be categorized.

  1. Small OCR mistakes - wrong dates / typos
  2. Citation Fixes etc.
  1. Structural Fixes

These take the form of wrapping the small opinion in an opinion tag- we found lots of empty opinions

Updating the tags - we found that lots of opinions were not correctly wrapped as opinions - often the opinion would start I the headmatter or opinion content like concurrences would not be identified as opinions and would just be

tags in-between majority and dissents. This is mostly using an ML model to make a good guess what something should be so we could properly import it.

PR 54 we addressed some footnotes issues where footnote text was disconnected from the opinion and did not follow what was standard practice. Where we could identify them we reconnected them.

I believe we also linked directly to case.law for cases where CaseLaw indicate that no opinion was found so we added a link to the pdf so users could see them themselves.

@mlissner
Copy link
Member Author

mlissner commented Aug 6, 2024

Very helpful, thanks Bill. So I think the changes that the folks at LIL made are all to the body of the case. Can you comment on which of our fixes could intersect with that, if any? And if some do, can you say whether you're confident that we would have our fixes in this repo?

Looking at the things you posted above, I'm thinking maybe, just maybe, this is easier than we think, if LIL made fixes to one part of the JSON, and we made them to another.

@jtmst jtmst moved this from To do to In progress in Harvard-Kind-FLP Collaboration Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: To do
Development

No branches or pull requests

2 participants