Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to include authority data with bibliographic data during ES index #4

Open
4 tasks
domm opened this issue Jul 22, 2024 · 4 comments
Open
4 tasks

Comments

@domm
Copy link

domm commented Jul 22, 2024

In Koha Chat we were told that it should be possible to merge authority data with bibliographic data into the elasticsearch document during indexing.

  • Is this indeed possible?
  • Are the fields being merged hardcoded?
    • If yes, we need to improve Koha to get those fields from some config/Syspref (this will need to be seperate Koha ticket)
  • Actually merge 034s and 034t from authority into biblio
@tadzik
Copy link
Collaborator

tadzik commented Aug 30, 2024

Koha currently only does this for the "see also" fields. It's hardcoded, and hidden behind IncludeSeeFromInSearches and IncludeSeeAlsoFromInSearches preferences. The code handling these (Koha::Filter::MARC::EmbedSeeFromHeadings) is specifically tailored for these two specifically.

According to cait:

it's off by default

it probably makes the index bigger and maybe the search results more confusing, because you don't see why you found a record. I think the geographic data is much smaller athan all the different name forms of "Goethe" you would not wan tto show in a record. different use cases

all that to say... if you develop some magic i'd make it separate

Which to me sounds like "build a new thing instead of extending the existing one, and it'll probably be safe to be on-by-default". I'd still keep it in mind to make it at least a little bit generic, since I imagine it being useful for other things as well. But, for our case, a simple logic such would probably suffice: if a record contains a 651 (SUBJECT ADDED ENTRY--GEOGRAPHIC NAME), lookup the authority it references, and treat its geographic fields as if we found them in the biblio record itself.

I'll build a prototype of this and see how it works out.

@domm
Copy link
Author

domm commented Aug 30, 2024

Koha currently only does this for the "see also" fields. It's hardcoded, and hidden behind IncludeSeeFromInSearches and IncludeSeeAlsoFromInSearches preferences. The code handling these (Koha::Filter::MARC::EmbedSeeFromHeadings) is specifically tailored for these two specifically.

That's what I've assumend

Which to me sounds like "build a new thing instead of extending the existing one, and it'll probably be safe to be on-by-default". I'd still keep it in mind to make it at least a little bit generic, since I imagine it being useful for other things as well. But, for our case, a simple logic such would probably suffice: if a record contains a 651 (SUBJECT ADDED ENTRY--GEOGRAPHIC NAME), lookup the authority it references, and treat its geographic fields as if we found them in the biblio record itself.

Generally, I agree.

In the current case (Geologische Bundesanstalt) we'll have to convert their data into MARC by ourselves and so can make sure that the geographic authorities will in fact be stored in 651 (FYI, these can be geonames (i.e. a ciry/region, ..) or numbers pointing to a "Kartenblatt" (map number) like https://www.bev.gv.at/Services/Produkte/Landkarten/OEK25V-UTM.html)

I'll build a prototype of this and see how it works out.

For a prototype, a hardcoded mapping is enough. But I think it would make sense to already plan the prototype in a way that we can later replace the hardcoded mapping with mapping(s) coming from a config file. Though defining these mappings for eg a polygon will be interesting :-)

Anyway, we could then also use this to eg append some data to a bibliographic record, so maybe the mappings will also need to define a "method"

so something like:

{
 "651": [
    { "auth": "032s", "elastic": "lat", "method": "set_geopoint" },
    { "auth": "032t", "elastic": "lon",  "method": "set_geopoint" },
    { "auth": "032defg", "elastic": "coordinates",  "method": "set_georectangle" },
  ],
 "100": [
    { "auth": "123x", "elastic": "author.name", "method": "append" },
 ]
}

But again, probably this is too early now..

@tadzik
Copy link
Collaborator

tadzik commented Aug 30, 2024

Quick and dirty, but this seems to work:

diff --git a/Koha/SearchEngine/Elasticsearch.pm b/Koha/SearchEngine/Elasticsearch.pm
index af9bdd97a1..08d100cf45 100644
--- a/Koha/SearchEngine/Elasticsearch.pm
+++ b/Koha/SearchEngine/Elasticsearch.pm
@@ -621,6 +621,10 @@ sub marc_records_to_documents {
                         $altscript = 1;
                     }
                 }
+                # Handle references to GEOGR_NAME authorities
+                if ($marcflavour eq 'marc21' && $tag eq '651') {
+                    $self->embed_geographic_name($field, $record_document, $data_fields_rules);
+                }
 
                 my $data_field_rules = $data_fields_rules->{$tag};
                 if ($data_field_rules) {
@@ -853,6 +857,40 @@ sub marc_records_to_documents {
     return \@record_documents;
 }
 
+sub embed_geographic_name {
+    my ($self, $field, $record_document, $rules) = @_;
+
+    my $authid = $field->subfield('9');
+    return unless $authid;
+    my $authority = Koha::MetadataRecord::Authority->get_from_authid($authid);
+    return unless $authority;
+
+    my $tag = '034';
+
+    my $auth_marc = $authority->record;
+    my @coordinate_fields = $auth_marc->field($tag);
+
+    for my $field (@coordinate_fields) {
+        my $data_field_rules = $rules->{$tag};
+        if ($data_field_rules) {
+            my $subfields_mappings = $data_field_rules->{subfields};
+            my $wildcard_mappings = $subfields_mappings->{'*'};
+            foreach my $subfield ($field->subfields()) {
+                my ($code, $data) = @{$subfield};
+                my $mappings = $subfields_mappings->{$code} // [];
+                if (@{$mappings}) {
+                    $self->_process_mappings($mappings, $data, $record_document, {
+                            data_source => 'subfield',
+                            code => $code,
+                            field => $field
+                        }
+                    );
+                }
+            }
+        }
+    }
+}
+
 =head2 _marc_to_array($record)
 
     my @fields = _marc_to_array($record)

@tadzik
Copy link
Collaborator

tadzik commented Sep 3, 2024

Submitted as a bug+patch to Koha bugzilla now for further discussion: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=37821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants