Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Syncing of DB entries to Translation.io #553

Open
aaronskiba opened this issue Dec 12, 2023 · 4 comments
Open

Optimize Syncing of DB entries to Translation.io #553

aaronskiba opened this issue Dec 12, 2023 · 4 comments
Assignees

Comments

@aaronskiba
Copy link
Collaborator

aaronskiba commented Dec 12, 2023

In config/initializers/translation.rb, we have database tables and columns specified inside of the config.db_fields hash. Currently, all db entries corresponding to those (table, column) values are being sync'd to translation.io.

We do not need to sync all of these entries. Rather, we only need the ones that pertain to the default funder org.

This feature should be optimized so only fields related to the default funder org are sync'd to translation.io.

@aaronskiba
Copy link
Collaborator Author

#745 adds an ActiveRecord query within config/initializers/translation.rb. It queries all of the templates-associated tables ('templates', 'phases', 'sections', 'questions', 'annotations', and 'question_options') and filters them be the app's default org.

    Template.includes(phases: { sections: { questions: %i[annotations question_options] } })
            .where(org_id: Org.find_by!(name: default_org_name).id)
            .pluck(:title, :description,
                   'phases.title', 'phases.description',
                   'sections.title', 'sections.description',
                   'questions.text', 'questions.default_value',
                   'annotations.text',
                   'question_options.text')
            .flatten.uniq

The tables and columns here correspond with the templates-related tables that were originally in the config.db_fields hash.

Here is the output when executing the proc that this ActiveRecord query is contained within:

2.7.6 :001 > x = templates_data_for_default_org_proc.call
Creating scope :publicly_visible. Overwriting existing method Template.publicly_visible.
Creating scope :organisationally_visible. Overwriting existing method Template.organisationally_visible.
  Org Load (0.6ms) 
SELECT "orgs".*
FROM "orgs"
WHERE "orgs"."name" = $1 LIMIT $2  [["name", "Portage Network"], ["LIMIT", 1]]
   (213.6ms) 
SELECT "templates"."title", "templates"."description", "phases"."title", "phases"."description", "sections"."title", "sections"."description", "questions"."text", "questions"."default_value", "annotations"."text", "question_options"."text"
FROM "templates"
LEFT OUTER JOIN "phases" ON "phases"."template_id" = "templates"."id"
LEFT OUTER JOIN "sections" ON "sections"."phase_id" = "phases"."id"
LEFT OUTER JOIN "questions" ON "questions"."section_id" = "sections"."id"
LEFT OUTER JOIN "annotations" ON "annotations"."question_id" = "questions"."id"
LEFT OUTER JOIN "question_options" ON "question_options"."question_id" = "questions"."id"
WHERE "templates"."org_id" = $1  [["org_id", 8]]
 => ["CASRAI Data in Publications DMP", "<p>This is an hypothetical data management plan to support the implementation of an open access...

@aaronskiba
Copy link
Collaborator Author

aaronskiba commented May 9, 2024

2.7.6 :001 > Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).pluck(:title, :description,
2.7.6 :002 >   'phases.title', 'phases.description',
2.7.6 :003 >   'sections.title', 'sections.description',
2.7.6 :004 >   'questions.text', 'questions.default_value',
2.7.6 :005 >   'annotations.text',
2.7.6 :006 >                      'question_options.text').flatten.uniq.count
Creating scope :publicly_visible. Overwriting existing method Template.publicly_visible.
Creating scope :organisationally_visible. Overwriting existing method Template.organisationally_visible.
   (744.2ms)  SELECT "templates"."title", "templates"."description", "phases"."title", "phases"."description", "sections"."title", "sections"."description", "questions"."text", "questions"."default_value", "annotations"."text", "question_options"."text" FROM "templates" LEFT OUTER JOIN "phases" ON "phases"."template_id" = "templates"."id" LEFT OUTER JOIN "sections" ON "sections"."phase_id" = "phases"."id" LEFT OUTER JOIN "questions" ON "questions"."section_id" = "sections"."id" LEFT OUTER JOIN "annotations" ON "annotations"."question_id" = "questions"."id" LEFT OUTER JOIN "question_options" ON "question_options"."question_id" = "questions"."id"
 => 5779
2.7.6 :007 > templates_data_for_default_org_proc.call.count
  Org Load (0.8ms)  SELECT "orgs".* FROM "orgs" WHERE "orgs"."name" = $1 LIMIT $2  [["name", "Portage Network"], ["LIMIT", 1]]
   (205.5ms)  SELECT "templates"."title", "templates"."description", "phases"."title", "phases"."description", "sections"."title", "sections"."description", "questions"."text", "questions"."default_value", "annotations"."text", "question_options"."text" FROM "templates" LEFT OUTER JOIN "phases" ON "phases"."template_id" = "templates"."id" LEFT OUTER JOIN "sections" ON "sections"."phase_id" = "phases"."id" LEFT OUTER JOIN "questions" ON "questions"."section_id" = "sections"."id" LEFT OUTER JOIN "annotations" ON "annotations"."question_id" = "questions"."id" LEFT OUTER JOIN "question_options" ON "question_options"."question_id" = "questions"."id" WHERE "templates"."org_id" = $1  [["org_id", 8]]
 => 1557 

On integration branch:

aaron@ubuntu:~/Documents/GitHub/roadmap
$ bundle exec rake translation:sync
...
271 keys/strings are in Translation.io but not in your current branch.

On PR #745 With gem 'translation' branch: 'aaron/features/callable-queries' set in Gemfile:

aaron@ubuntu:~/Documents/GitHub/roadmap
$ bundle exec rake translation:sync
...
4476 keys/strings are in Translation.io but not in your current branch.

4222 (5779 - 1557) db entries are filtered out when we apply the .where(org_id: Org.find_by!(name: default_org_name).id) filter.

There are 4205 (4476 - 271) additional purge-able translation.io keys when we apply the default org filter to the templates-related tables.

The 17 entry discrepancy (4222 - 4205) may not be an issue; of the 4222 template-related entries being filtered out due to the .where(org_id: Org.find_by!(name: default_org_name).id) condition, some of these exact entries may exist in other synced db tables, or as gettext (_()) strings within the codebase. As an example of this, let template.title = 'xyz' for a template where template.org_id != Org.find_by!(name: default_org_name).id but not for any templates where template.org_id == Org.find_by!(name: default_org_name).id. 'xyz' would account for one of the 4222 entry elements that are filtered out when applying .where(org_id: Org.find_by!(name: default_org_name).id). However, if _('xyz') exists somewhere in the codebase, or if 'xyz' exists in another table column that we are syncing to the db, then it would NOT be purge-able, and would therefore be part of the 17 discrepancies we are encountering.

@aaronskiba
Copy link
Collaborator Author

The below query returns all distinct template-related-entries in the db, but first removes any of those entries which are also associated with a default_org template:

def test
  default_org_name = Rails.application.secrets.default_funder_name
  proc do
    # Return all templates data pertaining to default_org
    Template.includes(phases: { sections: { questions: %i[annotations question_options] } })
            .pluck(:title, :description,
                   'phases.title', 'phases.description',
                   'sections.title', 'sections.description',
                   'questions.text', 'questions.default_value',
                   'annotations.text',
                   'question_options.text')
            .flatten.uniq - Template.includes(phases: { sections: { questions: %i[annotations question_options] } })
                                    .where(org_id: Org.find_by!(name: default_org_name).id)
                                    .pluck(:title, :description,
                                           'phases.title', 'phases.description',
                                           'sections.title', 'sections.description',
                                           'questions.text', 'questions.default_value',
                                           'annotations.text',
                                           'question_options.text')
                                    .flatten.uniq
  end
end
aaron@ubuntu:~/Documents/GitHub/roadmap
$ rails c
Running via Spring preloader in process 803131
Loading development environment (Rails 6.1.7.6)
irb: warn: can't alias context from irb_context.
2.7.6 :001 > x = test.call
Creating scope :publicly_visible. Overwriting existing method Template.publicly_visible.
   (763.8ms)  SELECT "templates"."title", "templates"."description", "phases"."title", "phases"."description", "sections"."title", "sections"."description", "questions"."id" LEFT OUTER JOIN "sections" ON "sections"."phase_id" = "phases"."id" LEFT OUTER JOIN "questions" ON "questions"."section_id" = "sections"."id" LEFT OUTER JOIN "annotations" ON "annotations"."question_id" = "questions"."id" LEFT OUTER JOIN "question_options" ON "question_options"."question_id" = "questions"."id"
  Org Load (1.1ms)  SELECT "orgs".* FROM "orgs" WHERE "orgs"."name" = $1 LIMIT $2  [["name", "Portage Network"], ["LIMIT", 1]]
   (199.1ms)  SELECT "templates"."title", "templates"."description", "phases"."title", "phases"."description", "sections"."title", "sections"."description", "questions"."text", "questions"."default_value", "annotations"."text", "question_options"."text" FROM "templates" LEFT OUTER JOIN "phases" ON "phases"."template_id" = "templates"."id" LEFT OUTER JOIN "sections" ON "sections"."phase_id" = "phases"."id" LEFT OUTER JOIN "questions" ON "questions"."section_id" = "sections"."id" LEFT OUTER JOIN "annotations" ON "annotations"."question_id" = "questions"."id" LEFT OUTER JOIN "question_options" ON "question_options"."question_id" = "questions"."id" WHERE "templates"."org_id" = $1  [["org_id", 8]]
 => ["MICYRN Data Management Plan Template", "The template is to help researchers associated with MICYRN member organizations to create data management plan that... 
2.7.6 :002 > x.count
 => 4222 
2.7.6 :003 > x.include?('Test')
 => true 
2.7.6 :004 > 
// app/views/paginable/plans/_index.html.erb
<%= plan.visibility === 'is_test' ? _('Test') : sanitize(display_visibility(plan.visibility)) %>

So 'Test' is an actual one of the 17 discrepancies that we are noticing.

@aaronskiba
Copy link
Collaborator Author

Testing the plucked columns in steps:

.pluck(:title, :description) (templates)
Screenshot from 2024-05-10 11-36-20

2.7.6 :002 > a = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.find_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description).flatten.uniq
2.7.6 :003 > a.count
 => 67

'phases.title', 'phases.description'
Screenshot from 2024-05-10 11-45-25

 2.7.6 :004 > b = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.fi
nd_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description, 'phases.title', 'phases.descriptio
n').flatten.uniq
2.7.6 :005 > (b-a).count
 => 50

'sections.title', 'sections.description'
Screenshot from 2024-05-10 12-00-15

 c = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.fi
nd_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description, 'phases.title', 'phases.descriptio
n', 'sections.title', 'sections.description').flatten.uniq
 2.7.6 :007 > (c-b).count
 => 164

'questions.text', 'questions.default_value',
Screenshot from 2024-05-10 12-49-28

 2.7.6 :008 > d = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.fi
nd_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description, 'phases.title', 'phases.descriptio
n', 'sections.title', 'sections.description','questions.text','questions.default_value').flatten.uniq
2.7.6 :009 > (d-c).count
 => 534

'annotations.text'
Screenshot from 2024-05-10 13-00-47

 2.7.6 :010 > e = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.fi
nd_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description, 'phases.title', 'phases.descriptio
n', 'sections.title', 'sections.description','questions.text','questions.default_value','annotations.text').flatten.uniq
2.7.6 :011 > (e-d).count
 => 721

'question_options.text'
Screenshot from 2024-05-10 13-05-59

 2.7.6 :012 > f = Template.includes(phases: { sections: { questions: %i[annotations question_options] } }).where(org_id: Org.fi
nd_by!(name: Rails.application.secrets.default_funder_name).id).pluck(:title, :description, 'phases.title', 'phases.descriptio
n', 'sections.title', 'sections.description','questions.text','questions.default_value','annotations.text','question_options.t
ext').flatten.uniq
2.7.6 :013 > (f-e).count
 => 21

Total translation.io uploads: 64 + 48 + 150 + 534 + 721 + 18 = 1535


`rails console total: 67 + 50 + 164 + 534 + 721 + 21 = 1557`
templates_data_for_default_org_proc.call.count
=> 1557

So based on my db dump (dmp_prod.20240312110101.sql), some of the templates-related entries we are trying to upload already exist either in existing _() strings and/or as db entries that correspond with the following:

'Theme' => %w[title description],
'QuestionFormat' => %w[title description],
'ResearchDomain' => %w[label]

Specifically, there are 22 of these entries: 3 from the templates table, 2 from the phases table, 14 from the sections table, and 3 from the question_options table (3 + 2 + 14 + 3 = 22).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant