Skip to content

Fluent-to-PO / PO-to-Fluent conversion actions #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: trunk
Choose a base branch
from

Conversation

iangmaia
Copy link
Contributor

@iangmaia iangmaia commented Jun 4, 2025

Related:

What does it do?

This PR introduces two new Fastlane actions that enable bidirectional conversion between Fluent localization files (.ftl) (used by the project wordpress-rs) and PO files (.po/.pot), facilitating integration with GlotPress translation workflows.

The new actions:

  • FluentToPoAction - Converts Fluent files to PO format. Designed as the first step in a translation workflow where Fluent files need to be sent to translators in GlotPress.
  • PoToFluentAction - Provides the final step in the translation workflow, converting completed translations back to the original Fluent format.

Current Limitations

It's worth noting that this implementation provides basic Fluent support focused on simple message conversion and the start of a translation workflow.
Fluent's full specification includes advanced features like asymmetric localization with complex plural categories, grammatical cases, terms with variants, and locale-specific logic that can handle sophisticated grammar requirements across different languages.
The current conversion actions handle straightforward message mappings, enough for the current use case of wordpress-rs.

Checklist before requesting a review

  • Run bundle exec rubocop to test for code style violations and recommendations.
  • Add Unit Tests (aka specs/*_spec.rb) if applicable.
  • Run bundle exec rspec to run the whole test suite and ensure all your tests pass.
  • Make sure you added an entry in the CHANGELOG.md file to describe your changes under the appropriate existing ### subsection of the existing ## Trunk section.
  • If applicable, add an entry in the MIGRATION.md file to describe how the changes will affect the migration from the previous major version and what the clients will need to change and consider.

@iangmaia iangmaia self-assigned this Jun 4, 2025
@iangmaia iangmaia added the enhancement New feature or request label Jun 4, 2025
@dangermattic
Copy link
Collaborator

dangermattic commented Jun 4, 2025

1 Warning
⚠️ This PR is larger than 500 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

Generated by 🚫 Danger

@iangmaia iangmaia marked this pull request as ready for review June 4, 2025 17:04
"MIME-Version: 1.0\n",
"Content-Type: text/plain; charset=UTF-8\n",
"Content-Transfer-Encoding: 8bit\n",
"Plural-Forms: #{plural_rule || 'nplurals=2; plural=(n != 1);'}\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Plural-Forms header is being added but, as explained in the description, I've decided to make a first iteration to get the translation process running (already fulfilling the current requirements) and haven't implemented more complex structures like Fluent plurals.

Comment on lines +47 to +52
FastlaneCore::ConfigItem.new(
key: :input_file,
description: 'Path to the input PO (.po) file',
optional: false,
type: String
),
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 What about using fastlane's built-in verify_block: to check that the input file exists, and then remove that existence test on line 18?


# Handle comments
if line.start_with?('#')
current_comment = current_comment ? "#{current_comment}\n#{line[1..].strip}" : line[1..].strip
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 What about initializing current_comment to [] instead of nil (on line 21 and 29) then just use current_comment << line[1..].strip here to append to the current comment lines, and .join the lines at the end on L49? Imho that would make it more readable by avoiding the need for this ternary operator + string interpolation with repeated parts here.

end

# Handle key-value pairs
next unless line.include?('=')
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This means that we won't be able to handle multiline text, like:

headline-title =
    The best way to bring
    your data always with you

While I think we probably definitively should.

e.g. Some logic like: continue parsing the next lines after the first line you find containing =, accumulating/appending the subsequent lines' content into an array… until you stumble upon another line with an =, or a comment line, at which point only can trim the common indent of the lines you accumulated, join them, and finally add that key, value pair as a entries << FluentEntry.new(…) and reset the current key = nil and value = '' for the next one.


key, value = line.split('=', 2)
key = key.strip
value = value.strip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 I think it'd be nice to detect unsupported syntaxes—and UI.user_error! in those cases—rather than silently ignoring those and producing incorrect .po files that would accidentally trim some strings for translation or similar.

e.g. if we match a {.*-> in one of the lines of the value, then warn that we don't yet support this advanced syntax in .ftl files.

And if we later make a PR to iterate on this action to start supporting just the plurals selectors (but no other selectors or function or anything more complex) we could always adjust this detection so that it won't error anymore on plural selectors but would still error on any other more complex syntaxes still unsupported.


That being said, as long as you implement the support for parsing multiline values like I mentioned above, I think worst case scenario the selectors you'd find in the .ftl source file would just be treated as the literal msgstr value in the .po file and thus in GlotPress… which wouldn't be that dramatic maybe (as opposed to their presence creating bugs in GlotPress or parts of the original string to be translated being lost during the conversion etc).

But my main worry is not that we don't support some advanced features of the Fluent format, we've clearly established that we don't plan to support all the features anyway, and I know you mentioned in your PR description that this PR is an initial implementation without even plural support and I'm fine with that. But my worry is that at some point the developers working on wordpress-rs will start using a Fluent feature that this action does not support, and instead of being warned about it the worst thing that we want to prevent is for the parsing logic accidentally ignores parts of the original string supposed to be translated, and that only part of the string ends up in GlotPress… completely silently without the devs being warned by the tooling that we failed to parse the Fluent input file because it was using a feature we didn't recognize/support and that it resulted in invalid / misinterpreted parsing due to limited implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to detect unsupported syntaxes—and UI.user_error! in those cases—rather than silently ignoring those and producing incorrect .po

Great idea 👍 I'm going to add that!

Fluent has an impressive list of features while our use case is, at least for now, rather simple.
I had a few back-and-forths on implementing plurals but that easily also gets complex while my focus is having an initial working version and a localization workflow for our current use case asap.

But indeed... making what is supported clearer with errors to avoid long running mistakes (overflowing even to translators) is a must 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think that if you implement proper parsing for multiline values, that should already cover most of my concern though.

  • For example if the Fluent file just have a regular string (without any particular advanced selector for plural or whatnot) but that is multiline, we don't want the parsing logic to only parse the first line and totally ignore the rest of the string in the Fluent file and only send part of the string to GlotPress. But if you implement multiline parsing that would solve that risk of cutting the text short.
  • And if the Fluent file happens to use selectors or advanced syntax like that that we don't support (as in: don't provide any particular treatment for), then as long as you implement multiline support, worst case scenario the content of that value, selector syntax and all, will be presented in GlotPress as a singular value verbatim.

For an initial implementation of this project, I don't really mind if the translators would end up seeing a verbatim string like below in their GlotPress strings to be translated.

{ $unreadEmails ->
   [one] You have one unread email.
  *[other] You have { $unreadEmails } unread emails.
}

Which is what would happen as long as you just implement support for parsing values spanning multiple lines in the .ftl file, And translators would just translate this as if it was a singular string, as long as they'd keep the syntax consistent like they keep %1$@ consistent for iOS strings.

Hell, even if in the initial version we don't add any warning if we detect the presence of -> to warn about the action not supporting plurals properly, I'd still be fine with that

What would be wayyy more problematic is if you didn't implement multiline and that would result in GlotPress only having { $unreadEmails -> shown to translators as the text that would need to be translated, and that the whole You have one unread email actual text needing translation would be totally lost in the process.

end

# Add the Fluent entry
fluent_content += "#{entry.msgid} = #{entry.msgstr}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ What would happen if entry.msgstr contains literal newlines (multiline text)? Shouldn't subsequent lines be indented in the .ftl generated content?

Comment on lines +143 to +146
# Check if a string contains Fluent variables (e.g., {$variable})
def self.contains_variables?(text)
text.match?(/\{\$[^}]+\}/)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is being used in the current implementation 🤔

Maybe it's a leftover of you planning to implement that "detection of unsupported syntax" I'm talking about above but you ended up forgetting about it?

Though even if that was for this, I don't think the use of a variable in the values of the Fluent's file would be a problem at all: we should be ok to just pass those unchanged to GlotPress (and translators would see, say, Error deleting post "{$title}. Try again?" as a string to translate, would just have to keep the {$title} verbatim in their translation just like they keep %1$s verbatim for iOS/Android translations), and that would still work.

While what we'd wand to detect to warn about unsupported syntax is not the use of a Fluent variable, but the use of selectors via -> syntax instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch -- In fact, we already use a few variables. I think this was a leftover for supporting plurals in a very early implementation.
While I replied to your other comment about showing an error on unsupported syntax, I was also thinking about catching selectors 👍

Comment on lines +159 to +182
# Use a simple mapping based on the number of plural categories
# This covers the most common cases without hardcoding complex expressions
case nplurals
when 1
# Languages like Chinese, Japanese, Korean
'nplurals=1; plural=0;'
when 3
# Slavic languages (Russian, Polish, etc.)
# Use a generic 3-form rule that works for most Slavic languages
'nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);'
when 4
# Languages like Slovenian
'nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3);'
when 6
# Arabic and a few other languages
'nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);'
else
# Most Germanic and Romance languages (English, German, Spanish, etc.) and fallback
'nplurals=2; plural=(n != 1);'
end
rescue StandardError
# If TwitterCldr fails for any reason, fallback to English-style rule
'nplurals=2; plural=(n != 1);'
end
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done the exercise, but would be worth checking that all those cases correctly cover at least our Mag16, with the same plural rules used by GlotPress itself (e.g. when it exports translations for each of those 16 locales as .po format) vs here (and in particular that none of the .po files handled by GlotPress for any of our Mag16 would have a rule with one of those nplurals but with a different plural= rule for that locale that wasn't accounted for here)

Comment on lines +29 to +35
#. Welcome text
msgid "welcome-message"
msgstr "Welcome to our app!"

#. Error messages
msgid "error-network"
msgstr "Network error occurred"
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Your tests need to cover more cases, in particular:

  • entries with multiple msgstr lines for the quite common case of when the .po files encodes long strings on multiple msgstr lines instead of a single one
  • entries with msgctxt—even if that context info ends up being ignored during the process we're testing of converting to Fluent, we need to ensure its presence in the source .po won't cause unintended side effects on the Fluent file
  • entries with msgstr[0], msgstr[1] etc for plurals—again, even if I know that currently this PR intentionally doesn't support plurals yet, we still need to write tests to check how the action would react if it encountered a .po file with some (would it UI.user_error!('That action does not support plurals yet')? Or incorrectly generates an entry with an empty value in the Fluent file, without any warning to the user, because it couldn't find a msgstr without index? Or crash if in those cases GetText::POEntry#msgstr ends up being an Array instead of a String and your implementation didn't account for that?)
  • entries comment types other than #. (e.g. # Foo, #,bar and whatever other category of comments .po supports)—to ensure they are ignored, as opposed to being parsed as part of the translator comment you want to include in the Fluent file

Comment on lines +8 to +9
{ 'msgid' => 'welcome-message', 'msgstr' => 'Welcome to our app!', 'translator-comments' => 'Welcome text' },
{ 'msgid' => 'error-network', 'msgstr' => 'Network error occurred', 'translator-comments' => 'Error messages' },
Copy link
Contributor

@AliSoftware AliSoftware Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ We need more varied example for the tests to have decent coverage there as well.

  • Examples of msgstr containing multiline text (text with \n in them) in particular
  • Examples of entries containing Fluent variables—even if we don't expect them to be treated any particularly, e.g. a { 'msgid' => 'greetings', 'msgstr' => 'Hello {$name}!' } is totally OK in the .po entries and should happily translate to greetings = Hello {$name}! in the Fluent file, it'd still be nice to have a test case to cover this case and actually validate that they are not treated in any particular way or cause any trouble

@AliSoftware
Copy link
Contributor

Side note: since this PR is only the first iteration for solving AINFRA-572 (and you might consider creating another PR later to provide basic support for plurals?), do you really want to use the Closes: magic keyword for this Linear issue in your PR description, as opposed to only closing the Linear issue once you'll also have landed your subsequent PR? Or do you plan to create a separate Linear issue for the follow-up iterations of that implementation?


module Fastlane
module Actions
class FluentToPoAction < Action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: While it's nice to have concise name for actions so they don't become a mouthful, I feel like we could use a bit more descriptive action names—especially so that their goal is not too mysterious when we see those action names without much context attached, like when called in a Fastfile or when running fastlane actions to list all the available actions…

Maybe convert_fluent_translation_file_to_po_format / convert_po_translation_file_to_fluent_format, or convert_strings_file_fluent_to_po / convert_strings_file_po_to_fluent? Sure, that's a bit more verbose to type, but also less ambiguous—especially for people who never heard of Fluent and that file format and would see that action in the while without knowing that Fluent and Po are terms related to translations and strings file formats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍 I also like the side effect of grouping similar actions by name (convert_*).
What about convert_fluent_strings_file_to_po vs. convert_po_strings_file_to_fluent? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I like that naming suggestion

@iangmaia iangmaia marked this pull request as draft June 5, 2025 09:44
@iangmaia
Copy link
Contributor Author

iangmaia commented Jun 5, 2025

Moved PR back to Draft after a few discussions with @AliSoftware about the actions generating / parsing the intermediate files and possible alternatives.

@iangmaia iangmaia added Do Not Merge e.g. if it depends on another PR to be merged first, or if that PR is only to trigger CI on forks, … and removed Do Not Merge e.g. if it depends on another PR to be merged first, or if that PR is only to trigger CI on forks, … labels Jun 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Do Not Merge e.g. if it depends on another PR to be merged first, or if that PR is only to trigger CI on forks, … enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants