-
Notifications
You must be signed in to change notification settings - Fork 9
Fluent-to-PO / PO-to-Fluent conversion actions #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Generated by 🚫 Danger |
"MIME-Version: 1.0\n", | ||
"Content-Type: text/plain; charset=UTF-8\n", | ||
"Content-Transfer-Encoding: 8bit\n", | ||
"Plural-Forms: #{plural_rule || 'nplurals=2; plural=(n != 1);'}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the Plural-Forms
header is being added but, as explained in the description, I've decided to make a first iteration to get the translation process running (already fulfilling the current requirements) and haven't implemented more complex structures like Fluent plurals.
FastlaneCore::ConfigItem.new( | ||
key: :input_file, | ||
description: 'Path to the input PO (.po) file', | ||
optional: false, | ||
type: String | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 What about using fastlane's built-in verify_block:
to check that the input file exists, and then remove that existence test on line 18?
|
||
# Handle comments | ||
if line.start_with?('#') | ||
current_comment = current_comment ? "#{current_comment}\n#{line[1..].strip}" : line[1..].strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 What about initializing current_comment
to []
instead of nil
(on line 21 and 29) then just use current_comment << line[1..].strip
here to append to the current comment lines, and .join
the lines at the end on L49? Imho that would make it more readable by avoiding the need for this ternary operator + string interpolation with repeated parts here.
end | ||
|
||
# Handle key-value pairs | ||
next unless line.include?('=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ This means that we won't be able to handle multiline text, like:
headline-title =
The best way to bring
your data always with you
While I think we probably definitively should.
e.g. Some logic like: continue parsing the next lines after the first line you find containing =
, accumulating/appending the subsequent lines' content into an array… until you stumble upon another line with an =
, or a comment line, at which point only can trim the common indent of the lines you accumulated, join them, and finally add that key, value
pair as a entries << FluentEntry.new(…)
and reset the current key = nil
and value = ''
for the next one.
|
||
key, value = line.split('=', 2) | ||
key = key.strip | ||
value = value.strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 I think it'd be nice to detect unsupported syntaxes—and UI.user_error!
in those cases—rather than silently ignoring those and producing incorrect .po
files that would accidentally trim some strings for translation or similar.
e.g. if we match a {.*->
in one of the lines of the value
, then warn that we don't yet support this advanced syntax in .ftl
files.
And if we later make a PR to iterate on this action to start supporting just the plurals selectors (but no other selectors or function or anything more complex) we could always adjust this detection so that it won't error anymore on plural selectors but would still error on any other more complex syntaxes still unsupported.
That being said, as long as you implement the support for parsing multiline values like I mentioned above, I think worst case scenario the selectors you'd find in the .ftl
source file would just be treated as the literal msgstr
value in the .po
file and thus in GlotPress… which wouldn't be that dramatic maybe (as opposed to their presence creating bugs in GlotPress or parts of the original string to be translated being lost during the conversion etc).
But my main worry is not that we don't support some advanced features of the Fluent format, we've clearly established that we don't plan to support all the features anyway, and I know you mentioned in your PR description that this PR is an initial implementation without even plural support and I'm fine with that. But my worry is that at some point the developers working on wordpress-rs
will start using a Fluent feature that this action does not support, and instead of being warned about it the worst thing that we want to prevent is for the parsing logic accidentally ignores parts of the original string supposed to be translated, and that only part of the string ends up in GlotPress… completely silently without the devs being warned by the tooling that we failed to parse the Fluent input file because it was using a feature we didn't recognize/support and that it resulted in invalid / misinterpreted parsing due to limited implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to detect unsupported syntaxes—and UI.user_error! in those cases—rather than silently ignoring those and producing incorrect .po
Great idea 👍 I'm going to add that!
Fluent has an impressive list of features while our use case is, at least for now, rather simple.
I had a few back-and-forths on implementing plurals but that easily also gets complex while my focus is having an initial working version and a localization workflow for our current use case asap.
But indeed... making what is supported clearer with errors to avoid long running mistakes (overflowing even to translators) is a must 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think that if you implement proper parsing for multiline values, that should already cover most of my concern though.
- For example if the Fluent file just have a regular string (without any particular advanced selector for plural or whatnot) but that is multiline, we don't want the parsing logic to only parse the first line and totally ignore the rest of the string in the Fluent file and only send part of the string to GlotPress. But if you implement multiline parsing that would solve that risk of cutting the text short.
- And if the Fluent file happens to use selectors or advanced syntax like that that we don't support (as in: don't provide any particular treatment for), then as long as you implement multiline support, worst case scenario the content of that value, selector syntax and all, will be presented in GlotPress as a singular value verbatim.
For an initial implementation of this project, I don't really mind if the translators would end up seeing a verbatim string like below in their GlotPress strings to be translated.
{ $unreadEmails ->
[one] You have one unread email.
*[other] You have { $unreadEmails } unread emails.
}
Which is what would happen as long as you just implement support for parsing values spanning multiple lines in the .ftl
file, And translators would just translate this as if it was a singular string, as long as they'd keep the syntax consistent like they keep %1$@
consistent for iOS strings.
Hell, even if in the initial version we don't add any warning if we detect the presence of ->
to warn about the action not supporting plurals properly, I'd still be fine with that
What would be wayyy more problematic is if you didn't implement multiline and that would result in GlotPress only having { $unreadEmails ->
shown to translators as the text that would need to be translated, and that the whole You have one unread email
actual text needing translation would be totally lost in the process.
end | ||
|
||
# Add the Fluent entry | ||
fluent_content += "#{entry.msgid} = #{entry.msgstr}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ What would happen if entry.msgstr
contains literal newlines (multiline text)? Shouldn't subsequent lines be indented in the .ftl
generated content?
# Check if a string contains Fluent variables (e.g., {$variable}) | ||
def self.contains_variables?(text) | ||
text.match?(/\{\$[^}]+\}/) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this is being used in the current implementation 🤔
Maybe it's a leftover of you planning to implement that "detection of unsupported syntax" I'm talking about above but you ended up forgetting about it?
Though even if that was for this, I don't think the use of a variable in the values of the Fluent's file would be a problem at all: we should be ok to just pass those unchanged to GlotPress (and translators would see, say, Error deleting post "{$title}. Try again?"
as a string to translate, would just have to keep the {$title}
verbatim in their translation just like they keep %1$s
verbatim for iOS/Android translations), and that would still work.
While what we'd wand to detect to warn about unsupported syntax is not the use of a Fluent variable, but the use of selectors via ->
syntax instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch -- In fact, we already use a few variables. I think this was a leftover for supporting plurals in a very early implementation.
While I replied to your other comment about showing an error on unsupported syntax, I was also thinking about catching selectors 👍
# Use a simple mapping based on the number of plural categories | ||
# This covers the most common cases without hardcoding complex expressions | ||
case nplurals | ||
when 1 | ||
# Languages like Chinese, Japanese, Korean | ||
'nplurals=1; plural=0;' | ||
when 3 | ||
# Slavic languages (Russian, Polish, etc.) | ||
# Use a generic 3-form rule that works for most Slavic languages | ||
'nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);' | ||
when 4 | ||
# Languages like Slovenian | ||
'nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3);' | ||
when 6 | ||
# Arabic and a few other languages | ||
'nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);' | ||
else | ||
# Most Germanic and Romance languages (English, German, Spanish, etc.) and fallback | ||
'nplurals=2; plural=(n != 1);' | ||
end | ||
rescue StandardError | ||
# If TwitterCldr fails for any reason, fallback to English-style rule | ||
'nplurals=2; plural=(n != 1);' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done the exercise, but would be worth checking that all those cases correctly cover at least our Mag16, with the same plural rules used by GlotPress itself (e.g. when it exports translations for each of those 16 locales as .po
format) vs here (and in particular that none of the .po
files handled by GlotPress for any of our Mag16 would have a rule with one of those nplurals
but with a different plural=
rule for that locale that wasn't accounted for here)
#. Welcome text | ||
msgid "welcome-message" | ||
msgstr "Welcome to our app!" | ||
|
||
#. Error messages | ||
msgid "error-network" | ||
msgstr "Network error occurred" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Your tests need to cover more cases, in particular:
- entries with multiple
msgstr
lines for the quite common case of when the.po
files encodes long strings on multiplemsgstr
lines instead of a single one - entries with
msgctxt
—even if that context info ends up being ignored during the process we're testing of converting to Fluent, we need to ensure its presence in the source.po
won't cause unintended side effects on the Fluent file - entries with
msgstr[0]
,msgstr[1]
etc for plurals—again, even if I know that currently this PR intentionally doesn't support plurals yet, we still need to write tests to check how the action would react if it encountered a.po
file with some (would itUI.user_error!('That action does not support plurals yet')
? Or incorrectly generates an entry with an empty value in the Fluent file, without any warning to the user, because it couldn't find amsgstr
without index? Or crash if in those casesGetText::POEntry#msgstr
ends up being an Array instead of a String and your implementation didn't account for that?) - entries comment types other than
#.
(e.g.# Foo
,#,bar
and whatever other category of comments.po
supports)—to ensure they are ignored, as opposed to being parsed as part of the translator comment you want to include in the Fluent file
…
{ 'msgid' => 'welcome-message', 'msgstr' => 'Welcome to our app!', 'translator-comments' => 'Welcome text' }, | ||
{ 'msgid' => 'error-network', 'msgstr' => 'Network error occurred', 'translator-comments' => 'Error messages' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ We need more varied example for the tests to have decent coverage there as well.
- Examples of
msgstr
containing multiline text (text with\n
in them) in particular - Examples of entries containing Fluent variables—even if we don't expect them to be treated any particularly, e.g. a
{ 'msgid' => 'greetings', 'msgstr' => 'Hello {$name}!' }
is totally OK in the.po
entries and should happily translate togreetings = Hello {$name}!
in the Fluent file, it'd still be nice to have a test case to cover this case and actually validate that they are not treated in any particular way or cause any trouble
Side note: since this PR is only the first iteration for solving AINFRA-572 (and you might consider creating another PR later to provide basic support for plurals?), do you really want to use the |
|
||
module Fastlane | ||
module Actions | ||
class FluentToPoAction < Action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: While it's nice to have concise name for actions so they don't become a mouthful, I feel like we could use a bit more descriptive action names—especially so that their goal is not too mysterious when we see those action names without much context attached, like when called in a Fastfile
or when running fastlane actions
to list all the available actions…
Maybe convert_fluent_translation_file_to_po_format
/ convert_po_translation_file_to_fluent_format
, or convert_strings_file_fluent_to_po
/ convert_strings_file_po_to_fluent
? Sure, that's a bit more verbose to type, but also less ambiguous—especially for people who never heard of Fluent and that file format and would see that action in the while without knowing that Fluent and Po are terms related to translations and strings file formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 👍 I also like the side effect of grouping similar actions by name (convert_*
).
What about convert_fluent_strings_file_to_po
vs. convert_po_strings_file_to_fluent
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I like that naming suggestion
Moved PR back to Draft after a few discussions with @AliSoftware about the actions generating / parsing the intermediate files and possible alternatives. |
Related:
What does it do?
This PR introduces two new Fastlane actions that enable bidirectional conversion between Fluent localization files (.ftl) (used by the project wordpress-rs) and PO files (.po/.pot), facilitating integration with GlotPress translation workflows.
The new actions:
FluentToPoAction
- Converts Fluent files to PO format. Designed as the first step in a translation workflow where Fluent files need to be sent to translators in GlotPress.PoToFluentAction
- Provides the final step in the translation workflow, converting completed translations back to the original Fluent format.Current Limitations
It's worth noting that this implementation provides basic Fluent support focused on simple message conversion and the start of a translation workflow.
Fluent's full specification includes advanced features like asymmetric localization with complex plural categories, grammatical cases, terms with variants, and locale-specific logic that can handle sophisticated grammar requirements across different languages.
The current conversion actions handle straightforward message mappings, enough for the current use case of wordpress-rs.
Checklist before requesting a review
bundle exec rubocop
to test for code style violations and recommendations.specs/*_spec.rb
) if applicable.bundle exec rspec
to run the whole test suite and ensure all your tests pass.CHANGELOG.md
file to describe your changes under the appropriate existing###
subsection of the existing## Trunk
section.MIGRATION.md
file to describe how the changes will affect the migration from the previous major version and what the clients will need to change and consider.