Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM IR] Add support for Unicode strings #9764

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

grendello
Copy link
Contributor

Another portion of code taken from #9572 to make it smaller.

This adds support for Unicode strings to the LLVM IR generator, together with de-duplication and
support for outputting the same string encoded both as UTF-8 and Unicode.

@grendello grendello requested review from Copilot and jonpryor February 6, 2025 19:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 10 changed files in this pull request and generated 1 comment.

Files not reviewed (5)
  • src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrModule.cs: Evaluated as low risk
  • src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrStringManager.cs: Evaluated as low risk
  • src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrVariable.cs: Evaluated as low risk
  • src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/MemberInfoUtilities.cs: Evaluated as low risk
  • src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrComposer.cs: Evaluated as low risk
Comments suppressed due to low confidence (3)

src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/StringHolder.cs:21

  • This line will throw a NullReferenceException if value is null. It should return a new StringHolder with null data instead.
if (value == null) { return new StringHolder ((string)value); }

src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/StringHolder.cs:5

  • The behavior introduced in the StringHolder class should be covered by tests to ensure that the comparison and equality operations work as expected.
class StringHolder : IComparable, IComparable<StringHolder>, IEquatable<StringHolder>

src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/NativeAssemblerAttribute.cs:41

  • The phrase 'an UTF16' should be corrected to 'a UTF16'.
/// the string is output as an UTF16 in

@@ -299,7 +304,7 @@ public void Add (LlvmIrGlobalVariable variable)
EnsureValidGlobalVariableType (variable);

if (IsStringVariable (variable)) {
AddStringGlobalVariable (variable);
AddStringGlobalVariable ((LlvmIrStringVariable)variable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is breaking the world, e.g. BuildApplicationAndClean(True, "apk"):

…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007: System.InvalidCastException: Unable to cast object of type 'Xamarin.Android.Tasks.LLVMIR.LlvmIrGlobalVariable' to type 'Xamarin.Android.Tasks.LLVMIR.LlvmIrStringVariable'.
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.LLVMIR.LlvmIrModule.Add(LlvmIrGlobalVariable variable) in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrModule.cs:line 307
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.LLVMIR.LlvmIrModule.AddGlobalVariable(Type type, String name, Object value, LlvmIrVariableOptions options, String comment) in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrModule.cs:line 274
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.LLVMIR.LlvmIrModule.AddGlobalVariable(String name, Object value, LlvmIrVariableOptions options, String comment) in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrModule.cs:line 262
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.ApplicationConfigNativeAssemblyGenerator.Construct(LlvmIrModule module) in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Utilities/ApplicationConfigNativeAssemblyGenerator.cs:line 209
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.LLVMIR.LlvmIrComposer.Construct() in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrComposer.cs:line 27
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.GeneratePackageManagerJava.AddEnvironment() in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Tasks/GeneratePackageManagerJava.cs:line 344
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Xamarin.Android.Tasks.GeneratePackageManagerJava.RunTask() in /Users/builder/azdo/_work/4/s/xamarin-android/src/Xamarin.Android.Build.Tasks/Tasks/GeneratePackageManagerJava.cs:line 128
…\tools\Xamarin.Android.Common.targets(1741,3): error XAGPM7007:    at Microsoft.Android.Build.Tasks.AndroidTask.Execute() in /Users/builder/azdo/_work/4/s/xamarin-android/external/xamarin-android-tools/src/Microsoft.Android.Build.BaseTasks/AndroidTask.cs:line 25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have to bring over some more changes from #9572, haven't had time for this today. Will update on Monday.

@grendello grendello force-pushed the dev/grendel/llvmirgen-strings branch 2 times, most recently from 3cba69d to 1b44c07 Compare February 10, 2025 19:22
@grendello grendello force-pushed the dev/grendel/llvmirgen-strings branch 3 times, most recently from 89e6295 to 1869215 Compare February 12, 2025 14:08
@@ -203,6 +203,8 @@ public ApplicationConfigNativeAssemblyGenerator (IDictionary<string, string> env

protected override void Construct (LlvmIrModule module)
{
module.DefaultStringGroup = "env";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "default string group"? Anything to link to in the commit message? (Is this why .apkdesc files needed updating?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal implementation detail, it makes the string manager put strings not assigned to a particular group in this one. It's done this way because all LLVM IR source files are generated separately and we can't be sure that (local) string constant names (e.g. the former-default .str.X where X is a per-module counter) wouldn't clash between the different generated files. So this is a way to "namespace" strings. If we were able to keep a shared state between all LLVM IR generators, it wouldn't be necessary.

enum LlvmIrStringEncoding
{
UTF8,
Unicode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly Q, but should we use UTF16LE instead of Unicode, seeing as "Unicode" doesn't necessarily mean anything…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the managed land calls the encoding, so I followed suit.

return stringVar;
}

LlvmIrStringGroup? group;
string groupPrefix;
if (String.IsNullOrEmpty (groupName) || String.Compare ("str", groupName, StringComparison.Ordinal) == 0) {
group = defaultGroup;
groupPrefix = ".str";
groupPrefix = $".{defaultGroupName}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is where the "default string group" comes in?

What does this do to the resulting LLVM-IR?

Copy link
Contributor Author

@grendello grendello Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows us to "namespace" strings, as described above - their names mustn't clash between modules. It may result in some change of the final .so size, but it shouldn't (there's a small chance we will have duplicate strings in different .ll files)


namespace Xamarin.Android.Tasks.LLVMIR;

class StringHolder : IComparable, IComparable<StringHolder>, IEquatable<StringHolder>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to use a record? You'll get IEquatable<T> for free!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, but I don't want to change it anymore :)

return holder;
}

public int CompareTo (object obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that CompareTo(StringHolder?) already has a null check, this could be simplified to:

public int CompareTo (object obj) => CompareTo (obj as StringHolder);

return hc ^ Encoding.GetHashCode ();
}

public override bool Equals (object obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with CompareTo(), this can be simplified to:

public override bool Equals (object obj) => Equals (obj as StringHolder);

@grendello
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@grendello
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@grendello
Copy link
Contributor Author

An example of string "namespacing" (both from the same build):

; From marshal_methods.arm64-v8a.ll
@.mm.0 = dso_local constant [102 x i8] c"Android.App.Activity, Mono.Android, Version=0.0.0.0, Culture=neutral, PublicKeyToken=84e04ff9cfb79065\00", align 1
@.mm.1 = dso_local constant [101 x i8] c"Java.IO.InputStream, Mono.Android, Version=0.0.0.0, Culture=neutral, PublicKeyToken=84e04ff9cfb79065\00", align 1
@.mm.2 = dso_local constant [102 x i8] c"Java.IO.OutputStream, Mono.Android, Version=0.0.0.0, Culture=neutral, PublicKeyToken=84e04ff9cfb79065\00", align 1
@.mm.3 = dso_local constant [108 x i8] c"Java.Lang.IRunnableInvoker, Mono.Android, Version=0.0.0.0, Culture=neutral, PublicKeyToken=84e04ff9cfb79065\00", align 1
@.mm.4 = dso_local constant [122 x i8] c"Java.Interop.TypeManager/JavaTypeManager, Mono.Android, Version=0.0.0.0, Culture=neutral, PublicKeyToken=84e04ff9cfb79065\00", align 1
@.mm.5 = dso_local constant [40 x i8] c"get_function_pointer MUST be specified\0A\00", align 1
; from environment.arm64-v8a.ll
@.env.0 = dso_local constant [7 x i8] c"normal\00", align 1
; from typemaps.arm64-v8a.ll
@.tmr.0 = dso_local constant [22 x i8] c"android/os/BaseBundle\00", align 1
@.tmr.1 = dso_local constant [18 x i8] c"android/os/Bundle\00", align 1
@.tmr.2 = dso_local constant [19 x i8] c"android/os/Handler\00", align 1
@.tmr.3 = dso_local constant [18 x i8] c"android/os/Looper\00", align 1
@.tmr.4 = dso_local constant [29 x i8] c"android/os/PersistableBundle\00", align 1

All the above strings use the default string group for the given generator/module. If any of them were symbols with scope other than the given translation unit, their names wouldn't clash.

Since all the files are generated from separate tasks, we don't have a global LLVM IR state which
can keep track of strings and ensure that there are no duplicate symbol names.  To prevent potential
clashes, each generator now sets the default string group name which is unique for each module.

In the future, we should try to manage strings globally (which would also result in more de-duplication)
@grendello grendello force-pushed the dev/grendel/llvmirgen-strings branch from f090c31 to 5f9205f Compare February 13, 2025 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants