-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 299 match citations without page number #1226
Issue 299 match citations without page number #1226
Conversation
This looks good technically, but I'm nervous about false positives otherwise. One way to reduce false positives would be to add per volume dates to the reporters_db so that something like 22 U.S. can have a date range associated with it. Short of that, it'd be nice to have some sense of how bad the false positive problem would be if we implemented this, but the only way to do that would be to keep a record of the cases that are matched as part of this and then to go back and check through them. Some kind of record-keeping process might not be a bad next step if the reporters_db approach doesn't work. |
@mattdahl, just checking in. Guessing corona has you pinned, but thought I'd ask what you thought about this one. I guess the questions are:
|
I don't know, I don't have a good sense of what the false positive rate would be. I think adding the reporter years to the reporters_db would be a great improvement though (would also improve matching performance generally). However, I don't think I have time right now to tackle that, though it depends on how tricky it looks. What's the format of those CSVs that you mentioned? |
Adding reporter years would be easy. @mattdahl it would take one pass thru the dB to provide it in whatever format you would want it. |
I made an issue for the reporter thing, here: freelawproject/reporters-db#19 @mattdahl, if you were willing to take the citator part over the finish line, I think @flooie could build up the volume-reporter-date range stuff for you. What do you think? |
Yes, I'm certainly willing to be responsible for integrating the volume years into the citator -- that should be easy -- if @flooie is willing to deal with generating them! |
EXCELLENT! Thank you both! This will be cool. |
I just closed our freelawproject/reporters_db#19, because as @brianwc says:
Instead of doing the work of getting per-volume dates into our reporter DB, we can just do the above. What do you think @malteos ? |
I suspect this is probably the right conclusion (I have not fully digested the issue), but I am slightly confused on this. We're talking about citations like Maybe it's not so important to understand that, though, so maybe just ignore me. |
@mlissner Ah, this is obvious now that you point this out. That's an easy heuristic to use. However, I think a window of +/- 1 year is too small -- this 2015 opinion (for example) still doesn't have a page number assigned in the U.S. reporter, and that's from 5 years ago. So if we grabbed the year from an opinion citing it today, we'd have to look back at least 5 years to find it. (N.B., This has always infuriated me -- why would it take 5+ years for an official citation to be assigned to a case?? Maybe I've been doing something wrong?) @johnhawkinson I admit I have no idea about the principles behind when (or why) opinions are missing volumes or pages numbers (or both); all I can say is that I encounter citations in the form |
The proper span of time varies with the reporter. The U.S. reporter is probably the worst of them, and hasn't published since 2012 (570 U.S.; see https://www.supremecourt.gov/opinions/boundvolumes.aspx). It probably should get a different lookaround window than any other reporter. If it's not too distracting, what's an example source for I'm not sure that any other slip opinions (other than the Supreme Court and the US Reporter) do that. So maybe it's all a special case. At least be aware of that. |
Ah, interesting! I confess to being a bit of an "elitist" -- my research is really only about the Supreme Court -- so I don't read a lot of opinions from other courts 😅 For an example of the missing page citation, Carpenter v. United States cites |
Well, unbeknownst to me, @flooie got a big part of the reporter thing done yesterday. He now has reporter data for most of the WestLaw corpus in a JSOn file. He's going to expand it with Lexis too and we'll be able to do this the hard way instead of the cheat (@brianwc's) way. I don't think we'll add the data from Case.law for the moment. Beyond being useful here, we just decided that the output is interesting enough in itself that we're probably going to do a blog post about it that'll enable people to answer things like, "Is SCOTUS indeed the slowest court to get citations?" |
(All these changes belong in eyecite now, closing.) |
Re: issue #299. Basically this PR does two things:
1 U.S. ____
)I'm not an expert on Solr so maybe there's more we could do with number 2; I just kept it pretty simple here.