Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Temporary Goobi Ingest objects with Preservica #2465

Open
1 of 4 tasks
sshetenhelm opened this issue Apr 17, 2023 · 38 comments
Open
1 of 4 tasks

Sync Temporary Goobi Ingest objects with Preservica #2465

sshetenhelm opened this issue Apr 17, 2023 · 38 comments
Assignees
Labels
blocked waiting waiting on external resources

Comments

@sshetenhelm
Copy link

sshetenhelm commented Apr 17, 2023

Follow up to #2426

All objects uploaded with the Goobi Temporary Ingests need to be synced to their files in Preservica. A spreadsheet with these is here https://docs.google.com/spreadsheets/d/1iqsayDnZz_ur8dH5iUHGap9kDfQkn2vMJ1dQms0hXBc/edit#gid=1783466688

Some materials are not yet ingested into Preservica, so these will need to be ingested into Preservica before we can reassociate in DCS. These are

  • Law
  • Medical

The following are already in Preservica and could be reassociated now

  • Music
  • MSSA
@motropuk
Copy link

We should bring this into sprint, as we working on this. I have rewritten this ticket as Law and Medical objects are being ingested into Preservica right now by DPS. The Music and MSSA items are already in Preservica, so could be reassociated now. I would be happy to take this on.

@motropuk motropuk self-assigned this Jun 15, 2023
@motropuk
Copy link

Music and MSSA items have now been updated with Preservica info in Prod. Next steps are to get the Law and Medical content into Preservica and connected up

@motropuk
Copy link

Music batch is actually taking a while, so need to keep an eye on this https://collections.library.yale.edu/management/batch_processes/13334

@motropuk
Copy link

motropuk commented Jul 24, 2023

Music update batch process stalled at OID 32205563. As this parent failed, it seems as though the batch itself then stalled. There were 88 parents on the manifest to update. It got to number 25 (which failed) and then didnt process any more. Should discuss if this is intended functionality or not. If so, needs more status reporting for the job to make it clear one parent failed and that the rest of the batch was therefore not processed.

Batch: https://collections.library.yale.edu/management/batch_processes/13334

Remaining parents to update

Music_parent_preservica_update.csv

Will run an updated manifest with the remaining parents to update.

@motropuk
Copy link

Update on above, looks like the above failed due to issues in Preservica with the API availability. DPS are investigating a fix. Even though the first 24 say they completed, when you go into the parents, it looks as if the SOLR records have not regenerated, or PDFs. So fix is probably to fix the Preservica API issues, and then run the whole parent update job for Music again

Music_parent_preservica_update.csv

@DraxIndustries79 DraxIndustries79 added the waiting waiting on external resources label Jul 27, 2023
@DraxIndustries79
Copy link
Contributor

Waiting for previous ticket to close

@motropuk motropuk removed the waiting waiting on external resources label Aug 16, 2023
@motropuk
Copy link

Issue with https://collections.library.yale.edu/management/parent_objects/10022080 but different to the one before.

This was a parent I tried to update previously and it brought in the new children from Preservica but also left the existing children.

Tried a straight resync and it did nothing https://collections.library.yale.edu/management/batch_processes/13493. Said that DCS matched Preservica?

So then I cleared out all of the existing children for this parent, so the parent was left with 0 children. Then ran a resync with Preservica.

This brought in the children, but their sort order does not match Preservica

DCS
Image

Preservica
https://github.com/yalelibrary/YUL-DC/assets/13023486/0b32879b-0153-451e-b80f-8f83c00b02c0

Parent is still not displaying in Blacklight but I dont know if this is because jobs are still run https://collections.library.yale.edu/catalog/10022080. There is a large backlog of jobs (mainly PDF jobs) which might be holding this up

@motropuk
Copy link

Testing script and batch process CSVs to test the Preservica issues in DCS UAT - for @K8Sewell

Testing script for DCSPreservica Issues - 08162023.pdf
create_parent_objects_preservicatesting.csv
update_parent_objects_preservicatesting.csv

@K8Sewell
Copy link

Getting some preservica errors (https://collections-uat.library.yale.edu/management/batch_processes/1504) but still investigating. Will retry the process and see if I can discern what is hanging us up.

@K8Sewell
Copy link

K8Sewell commented Aug 21, 2023

Still working through some preservica issues so putting this down for a little bit while preservica comes back online. Resolved but still working on confirming parity with objects in test preservica instance.

Testing Results
Create Parent Script - Successful match with parent objects 900148858 to test preservica folder 13527050_39002126219543 and 900149766 to folder 13527069_39002126219543

Update Parent Script - Failed - kept old child objects instead of removing them. Will craft some tests that should reveal why they are not being removed as expected.

@K8Sewell
Copy link

PR ready for review - yalelibrary/yul-dc-management#1247

@K8Sewell
Copy link

Deployed to Test with release v2.63.1 but will need deployed to UAT for testing.

@K8Sewell
Copy link

Not the result I expected. This should have found the old child records and cleared them out. Taking back to in progress.

Image

@K8Sewell
Copy link

PR ready for review yalelibrary/yul-dc-management#1251 It's not elegant but it will get us past the issue we had with the last attempt.

@K8Sewell
Copy link

Deployed to UAT for testing with release v2.63.2

@K8Sewell
Copy link

Failing for a checksum mismatch now so taking back to in progress

Image

@K8Sewell
Copy link

K8Sewell commented Aug 28, 2023

I think the issue is fixed. While there was an error raised because of a checksum mismatch the parent object 900124050 now matches with the 46 child objects in Test Presevica for structural object ...76868 and they appear to be in the correct order as well. I'm currently testing the other parent object 900099833 up for update testing. The before screenshot below shows both the old and the preservica child objects but hopefully once this object has processed (waiting on a few delayed jobs) we will see only the expected 54 child objects for structural object ...babeb.

Before

Image

@K8Sewell
Copy link

I'm not 100% sure but I think I need to wait for issue with test preservica to resolve before I can test this. It's skipping the import due to a timeout. Right now I'm not able to login to the test preservica instance on Firefox or Chrome.

Image

@motropuk
Copy link

motropuk commented Aug 29, 2023

@K8Sewell I have reported the Preservica Test outage to our digital preservation folks. They will work on a fix

@K8Sewell
Copy link

Looks like we are in a good place again. The last parent object to update has the correct 54 children now (instead of 108).

Image

Image

@sshetenhelm
Copy link
Author

Need to roll work into PROD but still keep ticket for others things. Can split out.

@sshetenhelm
Copy link
Author

Spawning jobs again.

@motropuk
Copy link

I tried to resync this object again in Production and it is still not working as expected. The notable issue here is the sort order is still wrong https://collections.library.yale.edu/management/parent_objects/10022080. Additionally the parent will not display in Blacklight. Is this possibly just an issue with this parent we need to fix?

@K8Sewell
Copy link

Can we change the Bitstream filename over in Preservica? That's how the caption and ordering are created and thus what is throwing the sort order off. I am unable to find the matching record in Preservica test so if anyone has a link to that - would be greatly appreciated. I'd like to confirm the bitstream filename matches what is in Preservica for parent object 10022080 and I'd like to change it from _1 to _01 so that it captures the correct ordering and try updating the parent to confirm the sort order gets fixed. In the meantime I can draft up some logic that will adjust the filename to avoid this order issue but it feels a little bit of overkill now that I have an idea why the sort order was incorrect.

@K8Sewell
Copy link

11 days of work as of 9.18.

As per David, "so it looks like the API returns as lexiographic sorting, which is alphabetic sorting of the numbers instead of numerically"

Will break adjusting the sorting we interpret from the API into another ticket #2621

@DraxIndustries79 DraxIndustries79 added the waiting waiting on external resources label Sep 22, 2023
@DraxIndustries79
Copy link
Contributor

Waiting for sync issue to be resolved

@sshetenhelm
Copy link
Author

Just a note to say that Medical objects are now in Preservica.

I may try to resync one or two objects, to see if we have a similar issue as the one MUS object we are trying to resolve.

@sshetenhelm
Copy link
Author

Attempted to sync parent 32320833 with Preservica files. Received 'Unable to login' error in DCS. Confirmed with Digital Preservation that the object has the correct security tag in Preservica (one that the DCS user s_dcs_medical should have access to), and that the correct structure and representation type were added on the spreadsheet. Will likely need to investigate on our end, as the Preservica stuff should be fine.

https://collections.library.yale.edu/management/batch_processes/14323

@sshetenhelm
Copy link
Author

Still experience login issue with Medical.

@sshetenhelm sshetenhelm removed the waiting waiting on external resources label Dec 6, 2023
@sshetenhelm
Copy link
Author

Login issue with Medical fixed, will start work on these again.

@sshetenhelm
Copy link
Author

sshetenhelm commented Dec 7, 2023

For Medical, tried to update parent 32320833

Received the following error:
Parent OID: 32320833 because of Request error 404 <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Error><ExtendedMessage>No Information Object with ref but there another type of entity with the ref</ExtendedMessage><MessageKey>entity.does.not.exist</MessageKey></Error> for /structural-objects/426e5201-1d21-4b44-8b01-0a660383ee59

As you can see from the error, I put in "Structural" and not "Information", so I don't understand why it's telling me there is no Information object with that ref but a different type of entity with ref.

In Preservica, the object is a Structural Object with a Preservation representation type. As such, it seems like this is a DCS issue. Could we please investigate?

@sshetenhelm
Copy link
Author

Created ticket #2691 for the above issue.

@sshetenhelm
Copy link
Author

@motropuk Do you remember if your child OIDs were retained when you synced your Goobi objects, or were they replaced?

Today, I:

Used ‘Update Parent Objects’ batch process to add Preservica information to Parent 32329442, which had one child object (32329443).

Parent now has two child objects with the following oids:
33093723
33093724

The caption for both parents is 32329443.tif.

The old child, 32329443, appears to have been deleted; it is no longer in the Child Objects data table.

I resynced with Preservica, and it retained both two new children with the new OIDs. The folder in Preservica located at the assigned UUID only has one image.

We should not add Preservica information for any more Medical objects until we can confirm that (a) the original child OID can be retained and (b) the correct number of children are created for each object. These issues might be solved with #2510 ?

@motropuk
Copy link

motropuk commented Jan 4, 2024

@sshetenhelm good question. I honestly I cant remember, or at least dont know if I checked. I think for the Music objects I was not too concerned if they child oids were updated, so didnt pay enough attention to that.

@sshetenhelm
Copy link
Author

A sample of Preservica-reassociated Music objects have:

  • double children (Preservica and original images)
  • new child OIDs (for Preservica-ingested images)

Affected parents include:
https://collections.library.yale.edu/catalog/32204414
https://collections.library.yale.edu/catalog/32204693
https://collections.library.yale.edu/catalog/32202808

We should put this ticket back on hold until these issues are rectified. Will push back to backlog and pull in #2510 to fix errors (#2510 includes specifications to retain child OIDs and remove existing child objects without Preservica info).

@sshetenhelm
Copy link
Author

The alternative being whether or not we are comfortable creating new child OIDs for all objects, and then manually deleting the prior "double" images.

@motropuk
Copy link

motropuk commented Jan 5, 2024

@motropuk for the music objects, it is fine to delete the double children, leaving just the preservica ingested images, just to clean those parents up. But otherwise, sounds like working on #2510 first is the best way forward

@sshetenhelm
Copy link
Author

Created #2703 for cleaning up parents

@sshetenhelm sshetenhelm added waiting waiting on external resources blocked labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked waiting waiting on external resources
Projects
None yet
Development

No branches or pull requests

4 participants