Fluent-to-PO / PO-to-Fluent conversion actions #650

iangmaia · 2025-06-04T17:00:14Z

Fixes AINFRA-572: Implement tool to convert Fluent file format to a GlotPress-compatible interchange format.
PR using the new actions: [Tooling] Add GlotPress integration for Fluent localization files Automattic/wordpress-rs#756

What does it do?

This PR introduces two new Fastlane actions that enable bidirectional conversion between Fluent localization files (.ftl) (used by the project wordpress-rs) and PO files (.po/.pot), facilitating integration with GlotPress translation workflows.

The new actions:

FluentToPoAction - Converts Fluent files to PO format. Designed as the first step in a translation workflow where Fluent files need to be sent to translators in GlotPress.
PoToFluentAction - Provides the final step in the translation workflow, converting completed translations back to the original Fluent format.

Current Limitations

It's worth noting that this implementation provides basic Fluent support focused on simple message conversion and the start of a translation workflow.
Fluent's full specification includes advanced features like asymmetric localization with complex plural categories, grammatical cases, terms with variants, and locale-specific logic that can handle sophisticated grammar requirements across different languages.
The current conversion actions handle straightforward message mappings, enough for the current use case of wordpress-rs.

Checklist before requesting a review

Run bundle exec rubocop to test for code style violations and recommendations.
Add Unit Tests (aka specs/*_spec.rb) if applicable.
Run bundle exec rspec to run the whole test suite and ensure all your tests pass.
Make sure you added an entry in the CHANGELOG.md file to describe your changes under the appropriate existing ### subsection of the existing ## Trunk section.
If applicable, add an entry in the MIGRATION.md file to describe how the changes will affect the migration from the previous major version and what the clients will need to change and consider.

dangermattic · 2025-06-04T17:01:04Z

	1 Warning
⚠️	This PR is larger than 500 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

Generated by 🚫 Danger

iangmaia · 2025-06-04T17:08:13Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+          "MIME-Version: 1.0\n",
+          "Content-Type: text/plain; charset=UTF-8\n",
+          "Content-Transfer-Encoding: 8bit\n",
+          "Plural-Forms: #{plural_rule || 'nplurals=2; plural=(n != 1);'}\n",


Note that the Plural-Forms header is being added but, as explained in the description, I've decided to make a first iteration to get the translation process running (already fulfilling the current requirements) and haven't implemented more complex structures like Fluent plurals.

AliSoftware · 2025-06-04T17:23:18Z

lib/fastlane/plugin/wpmreleasetoolkit/actions/common/po_to_fluent_action.rb

+          FastlaneCore::ConfigItem.new(
+            key: :input_file,
+            description: 'Path to the input PO (.po) file',
+            optional: false,
+            type: String
+          ),


💡 What about using fastlane's built-in verify_block: to check that the input file exists, and then remove that existence test on line 18?

AliSoftware · 2025-06-04T17:25:07Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+
+          # Handle comments
+          if line.start_with?('#')
+            current_comment = current_comment ? "#{current_comment}\n#{line[1..].strip}" : line[1..].strip


💡 What about initializing current_comment to [] instead of nil (on line 21 and 29) then just use current_comment << line[1..].strip here to append to the current comment lines, and .join the lines at the end on L49? Imho that would make it more readable by avoiding the need for this ternary operator + string interpolation with repeated parts here.

AliSoftware · 2025-06-04T17:31:41Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+          end
+
+          # Handle key-value pairs
+          next unless line.include?('=')


❌ This means that we won't be able to handle multiline text, like:

headline-title = The best way to bring your data always with you

While I think we ~~probably~~ definitively should.

e.g. Some logic like: continue parsing the next lines after the first line you find containing =, accumulating/appending the subsequent lines' content into an array… until you stumble upon another line with an =, or a comment line, at which point only can trim the common indent of the lines you accumulated, join them, and finally add that key, value pair as a entries << FluentEntry.new(…) and reset the current key = nil and value = '' for the next one.

AliSoftware · 2025-06-04T17:42:36Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+
+          key, value = line.split('=', 2)
+          key = key.strip
+          value = value.strip


💡 I think it'd be nice to detect unsupported syntaxes—and UI.user_error! in those cases—rather than silently ignoring those and producing incorrect .po files that would accidentally trim some strings for translation or similar.

e.g. if we match a {.*-> in one of the lines of the value, then warn that we don't yet support this advanced syntax in .ftl files.

And if we later make a PR to iterate on this action to start supporting just the plurals selectors (but no other selectors or function or anything more complex) we could always adjust this detection so that it won't error anymore on plural selectors but would still error on any other more complex syntaxes still unsupported.

That being said, as long as you implement the support for parsing multiline values like I mentioned above, I think worst case scenario the selectors you'd find in the .ftl source file would just be treated as the literal msgstr value in the .po file and thus in GlotPress… which wouldn't be that dramatic maybe (as opposed to their presence creating bugs in GlotPress or parts of the original string to be translated being lost during the conversion etc).

But my main worry is not that we don't support some advanced features of the Fluent format, we've clearly established that we don't plan to support all the features anyway, and I know you mentioned in your PR description that this PR is an initial implementation without even plural support and I'm fine with that. But my worry is that at some point the developers working on wordpress-rs will start using a Fluent feature that this action does not support, and instead of being warned about it the worst thing that we want to prevent is for the parsing logic accidentally ignores parts of the original string supposed to be translated, and that only part of the string ends up in GlotPress… completely silently without the devs being warned by the tooling that we failed to parse the Fluent input file because it was using a feature we didn't recognize/support and that it resulted in invalid / misinterpreted parsing due to limited implementation.

I think it'd be nice to detect unsupported syntaxes—and UI.user_error! in those cases—rather than silently ignoring those and producing incorrect .po

Great idea 👍 I'm going to add that!

Fluent has an impressive list of features while our use case is, at least for now, rather simple.
I had a few back-and-forths on implementing plurals but that easily also gets complex while my focus is having an initial working version and a localization workflow for our current use case asap.

But indeed... making what is supported clearer with errors to avoid long running mistakes (overflowing even to translators) is a must 👍

Again, I think that if you implement proper parsing for multiline values, that should already cover most of my concern though.

For example if the Fluent file just have a regular string (without any particular advanced selector for plural or whatnot) but that is multiline, we don't want the parsing logic to only parse the first line and totally ignore the rest of the string in the Fluent file and only send part of the string to GlotPress. But if you implement multiline parsing that would solve that risk of cutting the text short.

And if the Fluent file happens to use selectors or advanced syntax like that that we don't support (as in: don't provide any particular treatment for), then as long as you implement multiline support, worst case scenario the content of that value, selector syntax and all, will be presented in GlotPress as a singular value verbatim.

For an initial implementation of this project, I don't really mind if the translators would end up seeing a verbatim string like below in their GlotPress strings to be translated.

{ $unreadEmails -> [one] You have one unread email. *[other] You have { $unreadEmails } unread emails. }

Which is what would happen as long as you just implement support for parsing values spanning multiple lines in the .ftl file, And translators would just translate this as if it was a singular string, as long as they'd keep the syntax consistent like they keep %1$@ consistent for iOS strings.

Hell, even if in the initial version we don't add any warning if we detect the presence of -> to warn about the action not supporting plurals properly, I'd still be fine with that

What would be wayyy more problematic is if you didn't implement multiline and that would result in GlotPress only having { $unreadEmails -> shown to translators as the text that would need to be translated, and that the whole You have one unread email actual text needing translation would be totally lost in the process.

AliSoftware · 2025-06-04T17:48:38Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+          end
+
+          # Add the Fluent entry
+          fluent_content += "#{entry.msgid} = #{entry.msgstr}\n"


❌ What would happen if entry.msgstr contains literal newlines (multiline text)? Shouldn't subsequent lines be indented in the .ftl generated content?

AliSoftware · 2025-06-04T18:01:22Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+      # Check if a string contains Fluent variables (e.g., {$variable})
+      def self.contains_variables?(text)
+        text.match?(/\{\$[^}]+\}/)
+      end


I don't see where this is being used in the current implementation 🤔

Maybe it's a leftover of you planning to implement that "detection of unsupported syntax" I'm talking about above but you ended up forgetting about it?

Though even if that was for this, I don't think the use of a variable in the values of the Fluent's file would be a problem at all: we should be ok to just pass those unchanged to GlotPress (and translators would see, say, Error deleting post "{$title}. Try again?" as a string to translate, would just have to keep the {$title} verbatim in their translation just like they keep %1$s verbatim for iOS/Android translations), and that would still work.

While what we'd wand to detect to warn about unsupported syntax is not the use of a Fluent variable, but the use of selectors via -> syntax instead.

Nice catch -- In fact, we already use a few variables. I think this was a leftover for supporting plurals in a very early implementation.
While I replied to your other comment about showing an error on unsupported syntax, I was also thinking about catching selectors 👍

AliSoftware · 2025-06-04T18:04:10Z

lib/fastlane/plugin/wpmreleasetoolkit/helper/fluent_localization_helper.rb

+        # Use a simple mapping based on the number of plural categories
+        # This covers the most common cases without hardcoding complex expressions
+        case nplurals
+        when 1
+          # Languages like Chinese, Japanese, Korean
+          'nplurals=1; plural=0;'
+        when 3
+          # Slavic languages (Russian, Polish, etc.)
+          # Use a generic 3-form rule that works for most Slavic languages
+          'nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);'
+        when 4
+          # Languages like Slovenian
+          'nplurals=4; plural=(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3);'
+        when 6
+          # Arabic and a few other languages
+          'nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);'
+        else
+          # Most Germanic and Romance languages (English, German, Spanish, etc.) and fallback
+          'nplurals=2; plural=(n != 1);'
+        end
+      rescue StandardError
+        # If TwitterCldr fails for any reason, fallback to English-style rule
+        'nplurals=2; plural=(n != 1);'
+      end


I haven't done the exercise, but would be worth checking that all those cases correctly cover at least our Mag16, with the same plural rules used by GlotPress itself (e.g. when it exports translations for each of those 16 locales as .po format) vs here (and in particular that none of the .po files handled by GlotPress for any of our Mag16 would have a rule with one of those nplurals but with a different plural= rule for that locale that wasn't accounted for here)

AliSoftware · 2025-06-04T18:14:57Z

spec/fluent_to_po_action_spec.rb

+      #. Welcome text
+      msgid "welcome-message"
+      msgstr "Welcome to our app!"
+
+      #. Error messages
+      msgid "error-network"
+      msgstr "Network error occurred"


❌ Your tests need to cover more cases, in particular:

entries with multiple msgstr lines for the quite common case of when the .po files encodes long strings on multiple msgstr lines instead of a single one

entries with msgctxt—even if that context info ends up being ignored during the process we're testing of converting to Fluent, we need to ensure its presence in the source .po won't cause unintended side effects on the Fluent file

entries with msgstr[0], msgstr[1] etc for plurals—again, even if I know that currently this PR intentionally doesn't support plurals yet, we still need to write tests to check how the action would react if it encountered a .po file with some (would it UI.user_error!('That action does not support plurals yet')? Or incorrectly generates an entry with an empty value in the Fluent file, without any warning to the user, because it couldn't find a msgstr without index? Or crash if in those cases GetText::POEntry#msgstr ends up being an Array instead of a String and your implementation didn't account for that?)

entries comment types other than #. (e.g. # Foo, #,bar and whatever other category of comments .po supports)—to ensure they are ignored, as opposed to being parsed as part of the translator comment you want to include in the Fluent file
…

AliSoftware · 2025-06-04T18:18:37Z

spec/po_to_fluent_action_spec.rb

+      { 'msgid' => 'welcome-message', 'msgstr' => 'Welcome to our app!', 'translator-comments' => 'Welcome text' },
+      { 'msgid' => 'error-network', 'msgstr' => 'Network error occurred', 'translator-comments' => 'Error messages' },


❌ We need more varied example for the tests to have decent coverage there as well.

Examples of msgstr containing multiline text (text with \n in them) in particular

Examples of entries containing Fluent variables—even if we don't expect them to be treated any particularly, e.g. a { 'msgid' => 'greetings', 'msgstr' => 'Hello {$name}!' } is totally OK in the .po entries and should happily translate to greetings = Hello {$name}! in the Fluent file, it'd still be nice to have a test case to cover this case and actually validate that they are not treated in any particular way or cause any trouble

AliSoftware · 2025-06-04T18:32:15Z

Side note: since this PR is only the first iteration for solving AINFRA-572 (and you might consider creating another PR later to provide basic support for plurals?), do you really want to use the Closes: magic keyword for this Linear issue in your PR description, as opposed to only closing the Linear issue once you'll also have landed your subsequent PR? Or do you plan to create a separate Linear issue for the follow-up iterations of that implementation?

AliSoftware · 2025-06-04T18:44:44Z

lib/fastlane/plugin/wpmreleasetoolkit/actions/common/fluent_to_po_action.rb

+
+module Fastlane
+  module Actions
+    class FluentToPoAction < Action


Nitpick: While it's nice to have concise name for actions so they don't become a mouthful, I feel like we could use a bit more descriptive action names—especially so that their goal is not too mysterious when we see those action names without much context attached, like when called in a Fastfile or when running fastlane actions to list all the available actions…

Maybe convert_fluent_translation_file_to_po_format / convert_po_translation_file_to_fluent_format, or convert_strings_file_fluent_to_po / convert_strings_file_po_to_fluent? Sure, that's a bit more verbose to type, but also less ambiguous—especially for people who never heard of Fluent and that file format and would see that action in the while without knowing that Fluent and Po are terms related to translations and strings file formats?

Agreed 👍 I also like the side effect of grouping similar actions by name (convert_*).
What about convert_fluent_strings_file_to_po vs. convert_po_strings_file_to_fluent? 🤔

Yep, I like that naming suggestion

iangmaia · 2025-06-05T09:46:33Z

Moved PR back to Draft after a few discussions with @AliSoftware about the actions generating / parsing the intermediate files and possible alternatives.

iangmaia added 10 commits June 2, 2025 21:34

Add Fluent to PO conversion actions for GlotPress integration

49e2c72

Better handle multi-line comments

b52a3c6

Add generated comment to PO header

be90081

Add allow_empty_file to po 2 fluent action

97999a4

Replace poparser with gettext

8187521

Remove unneeded comments

6d2d686

Add FluentLocalizationHelper tests

9271cd4

Add PoToFluentAction tests

46068d0

Add tests for FluentToPoAction

36ebf05

Fix duplicated branch with generic fallback

9a1bd6a

iangmaia self-assigned this Jun 4, 2025

iangmaia added the enhancement New feature or request label Jun 4, 2025

Update CHANGELOG.md

5525869

iangmaia marked this pull request as ready for review June 4, 2025 17:04

iangmaia commented Jun 4, 2025

View reviewed changes

iangmaia requested a review from a team June 4, 2025 17:08

iangmaia mentioned this pull request Jun 4, 2025

[Tooling] Add GlotPress integration for Fluent localization files Automattic/wordpress-rs#756

Closed

AliSoftware requested changes Jun 4, 2025

View reviewed changes

AliSoftware reviewed Jun 4, 2025

View reviewed changes

iangmaia marked this pull request as draft June 5, 2025 09:44

iangmaia added Do Not Merge e.g. if it depends on another PR to be merged first, or if that PR is only to trigger CI on forks, … and removed Do Not Merge e.g. if it depends on another PR to be merged first, or if that PR is only to trigger CI on forks, … labels Jun 14, 2025

		{ 'msgid' => 'welcome-message', 'msgstr' => 'Welcome to our app!', 'translator-comments' => 'Welcome text' },
		{ 'msgid' => 'error-network', 'msgstr' => 'Network error occurred', 'translator-comments' => 'Error messages' },

Fluent-to-PO / PO-to-Fluent conversion actions #650

Are you sure you want to change the base?

Fluent-to-PO / PO-to-Fluent conversion actions #650

Uh oh!

Conversation

iangmaia commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does it do?

Current Limitations

Checklist before requesting a review

Uh oh!

dangermattic commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AliSoftware commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iangmaia commented Jun 5, 2025

Uh oh!

Uh oh!

iangmaia commented Jun 4, 2025 •

edited

Loading

dangermattic commented Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading

AliSoftware Jun 4, 2025 •

edited

Loading