Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baked data: use VarULE to store data when specified #6133

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

sffc
Copy link
Member

@sffc sffc commented Feb 16, 2025

#5230

I did it by creating a trait, MaybeAsVarULE, which is implemented on data structs, similar to how Bake and Serialize are implemented on data structs.

The trait has an associated type that I can leverage in the baked exporter without having to do dyn Any downcasting.

I tried to keep my understanding of @robertbastian's position in mind while writing this PR (don't add anything hyper-specific to bake in the provider crate), and I think I achieved this, since the thing I added is on the same level as the existing format-specific traits.

See the hello world data for an example. I didn't apply it to the other data markers yet.

@sffc sffc requested review from Manishearth, robertbastian and a team as code owners February 16, 2025 00:55
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this design of sticking it on the export traits is good: it avoids cluttering core data marker infra whilst still providing extra metadata for export.

fn maybe_as_varule(&self) -> Option<&Self::VarULE>;
}

/// Implements [`MaybeAsVarULE`] on a type that is NOT representable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: the reverse default might be better UX for ICU4X devs so they do not accidentally feel forced to structure all of their data structs to support VarULE. I know Younies has found this confusing before.

but I figure there's a reason behind this choice here. if it's just macroability I think the macro can be done differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reverse default

I don't understand what you mean. Do you mean the name of the trait? The name of the macro? Something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro opts out of this system by default.

I'd rather have the data marker macro opt in by default, with a flag to opt out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this is primarily feedback about what the macro is named and what flags it takes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the name, just the default behavior, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically: if I define a new data struct, the default behavior of the data struct/marker macros should not make me think I need to impl VarULE. The non-None impl of this trait should be opt in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro does what I think you want it to do? It implements the trait to return None. If you want not None, there is no macro; implement the trait manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, I'm talking about the data_marker/data_struct macros. And I see the source of my confusion: The data_struct macro does handle this correctly, however in list/properties we have data structs that do not use the macro (why?) and they need to explicitly call it.

I saw explicit calls to this macro and thought the trait was opt-out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied @robertbastian's suggestion to create a new icu_provider::marker::data_struct! macro rules.

utils/zerovec/src/ule/plain.rs Outdated Show resolved Hide resolved
provider/core/src/marker.rs Outdated Show resolved Hide resolved
@sffc
Copy link
Member Author

sffc commented Feb 16, 2025

Created #6135 to reduce the diff

robertbastian pushed a commit that referenced this pull request Feb 17, 2025
Toward #5230

Reduces the diff in #6133

Not intended to be controversial.

Autosubmit = true
@@ -41,6 +42,21 @@ impl Default for HelloWorld<'_> {
}
}

impl MaybeAsVarULE for HelloWorld<'_> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I think allowing implementation of this trait on arbitrary data structs is going to be super confusing (and it should definitely not use From to do the reverse).

HelloWorldV1::DataStruct should just be Cow<'static, str>, and this trait should be implemented for Cow<'static, str> in icu_provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay implementing it on Cow, but we still need the no-op version of the trait implement on other things. I don't see a reason to not allow it to be implemented on arbitrary data structs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am still allowing the trait to be implemented on any data struct)

@@ -69,6 +69,8 @@ data_marker!(
ListFormatterPatterns<'static>,
);

icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this is an opt-out, but it should be opt-in

Copy link
Member Author

@sffc sffc Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without specialization, I need to implement the trait on all data structs, and use runtime Option to determine if it is actually implement or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how specialization is relevant for opt out vs opt in? We're using a macro, the macro can default to generating a blank impl, and one can ask the macro to not do so.

This is my comment above about the reverse default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how specialization is relevant for opt out vs opt in

Ideally I would have a blanket impl that returns None and then a small number of specific impls that return Some. Instead, everyone needs to decide how to implement the trait, which I think @robertbastian doesn't like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My objection at least is that people need to explicitly opt out: it's not an objection about the trait architecture, it's an objection about the macro architecture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just make this

Suggested change
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>);
icu_provider::data_struct!(ListFormatterPatterns<'_>);

Then we can use something like

icu_provider::data_struct!(HelloWorld<'_>, encoding: |v| v.message.as_str(), |e| Self { message: Cow::Borrowed(v) });

to do the optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did the marker name cleanup first, we didn't have to keep maintaining this macro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the contrary, I want to land this before marker cleanup so that we can make a decision on a case-by-case basis on whether to apply the optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying this optimization is not 2.0 blocking. This is scope creep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying this optimization is not 2.0 blocking. This is scope creep.

I consider it 2.0 to have a design for this. This is a specific design proposal. It involves changing the bounds of ExportableDataPayload, which seems like something that would be smoother to do in a major release.

@sffc sffc requested a review from robertbastian February 17, 2025 18:02
@sffc sffc requested a review from Manishearth February 17, 2025 19:59
@sffc
Copy link
Member Author

sffc commented Feb 17, 2025

This PR is passing CI and ready from my perspective. I am unclear whether the open comments are bikeshed issues or fundamental design issues, and how you hope that they be resolved.

@Manishearth
Copy link
Member

@sffc From my side, the trait naming is a bikeshed that can be deferred, and the macro opt-in/opt-out is a concern I'd prefer solved here but is not a major design issue.

The overall design looks good.

Manishearth
Manishearth previously approved these changes Feb 18, 2025
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to land from my end, would prefer the trait be renamed, but can happen later.

///
/// For most data structs, a default implementation available via
/// [`does_not_deref_to_varule`] can be used.
pub trait MaybeAsVarULE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this trait can be behind the export feature

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion with @robertbastian:

  • The maybe_as_var_ule function can be gated, but not the whole trait, because we need the associated type
  • Add from_varule to replace the From bound
  • Use a never type (empty enum) for the non-implemented case; this lets the from_varule match on it (and be implementable)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure how that gating would work, but will have a look when you have it ready. Would rather not have implementors of this forced to enable the export feature, i think

(yes, that might mean not gating it, that's fine by me)

but no strong opinion, let's see how the gating works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I do the feature-gating, to make sure we is on the same page, here are some facts

  1. The VarULE associated type is used at runtime in data structs that apply the optimization
  2. The maybe_as_varule function is only used during data export
  3. Most component crates have a "datagen" feature; maybe we can assume it is all component crates

So if we want to move forward with the gating, there are a few choices

  1. Feature-gate the function in the trait and the impls. In icu_provider, it is in the "export" feature, and in the impls, it is in the "datagen" feature of the specific crates.
  2. Do not feature-gate the trait in icu_provider. Feature-gate the impls except when the impl is the non-default impl, since that is the case when it is used at runtime.
  3. Both 1 and 2.

I think I prefer either option 0 (don't feature-gate) or option 2 (feature-gate the whole trait).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not feature gating. This is a trait impl and I'd rather keep it clean.

I'm happy to have the impl itself gated on datagen: the Encode impls could pull in additional deps, but I think the trait itself should always be available to avoid specifying transitive deps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Encode impls could pull in additional deps

This is compelling to me.

I would prefer not to "split" the trait with a function enabled only on certain features enabled. That sounds like a bit of a disaster since the features need to be always 100% in sync. If the icu_provider trait enables its export feature, but a crate doesn't have its datagen feature enabled, then you break code.

So I would like to keep the trait unconditionally exported from icu_provider and conditionally implemented, in whole, in component crates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aligned

@@ -69,6 +69,8 @@ data_marker!(
ListFormatterPatterns<'static>,
);

icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just make this

Suggested change
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>);
icu_provider::data_struct!(ListFormatterPatterns<'_>);

Then we can use something like

icu_provider::data_struct!(HelloWorld<'_>, encoding: |v| v.message.as_str(), |e| Self { message: Cow::Borrowed(v) });

to do the optimization.

use zerovec::VarZeroVec;

#[cfg(feature = "export")]
pub(crate) enum BakedValue<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: this optimisation should also be applied to the binary search data store, or we should delete it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok if I make an issue to fix the binary search baked backend instead of deleting or updating it in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

get_index(self.trie, id, attributes_prefix_match)
// Safety: Allowed since `i` came from the trie and the field safety invariant
.map(|i| unsafe { self.values.get_unchecked(i) })
.map(From::from)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: please introduce a specific trait for this instead of abusing From.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a possible design. I don't think I like it.

As discussed in DM, we need 2 different associated types with this design: one that is a reference and one that is a VarULE. The one that is a reference, which I named EncodeAsVarULE,

I disagree that my use of From is an "abuse" of the trait, but I acknowledge the point of view that From looks like it might do some sort of transformation on the string. I believe that ZeroFrom does not carry similar baggage. I would prefer to revert to something closer to what I had before, with a single associated type, but using ZeroFrom.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ would like to highlight this open question

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the proposed conclusion on #6133 (comment), I want to double-down on my position that we should use two traits, one for datagen and one for runtime. I would like the datagen trait to be the MaybeExportAsVarULE and the runtime trait to be ZeroFrom. I am willing to use custom traits in both situations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this could be 3 traits, since there are 3 unique, disjoint things we need, in different situations:

  1. maybe_as_varule -> Option<(impl VarULE or impl EncodeAsVarULE)>
    • Needed in datagen only
  2. pub type VarULE
    • Needed in datagen for all structs and at runtime for structs that use it
  3. from_varule -> Self
    • Needed at runtime for structs that use it

A potential design for this would be 3 traits: (2) would be a base trait and (1) and (3) would extend it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ZeroFrom as the runtime trait sounds good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently exploring this constellation of traits:

pub trait MaybeAsVarULE {
    type VarULE: ?Sized + VarULE;
}

#[cfg(feature = "export")]
pub trait MaybeEncodeAsVarULE: MaybeAsVarULE {
    fn maybe_encode_as_varule(&self) -> Box<Self::VarULE>;
}

pub trait FromVarULE<'a>: MaybeAsVarULE {
    fn from_varule(varule: &'a Self::VarULE) -> Self;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the deficiency in using ZeroFrom? We have tons of magical derive infra for that, and none for a new trait.

Copy link
Member Author

@sffc sffc Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The trait makes it more explicit that it is the reverse operation as MaybeEncodeAsVarULE
  2. A hand-written impl is sometimes required since there may be intermediate fields involved; for example, if the data struct is Cow<str>, the auto-derived impl will work, but not if the data struct is HelloWorld(Cow<str>)
  3. @robertbastian was opposed to using ZeroFrom

(none of these points are super compelling to me and I am happy to switch to ZeroFrom)

@@ -69,7 +69,7 @@ data_marker!(
ListFormatterPatterns<'static>,
);

icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>);
icu_provider::marker::data_struct!(ListFormatterPatterns<'_>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be in the marker module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can export this from icu_provider. Do you also have an opinion on where the trait and type (MaybeExportAsVarULE, NeverVarULE) should be exported?

}
}
};
($ty:path, varule: $varule:path, as_varule: $as_varule:path, from_varule: $from_varule:path) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
($ty:path, varule: $varule:path, as_varule: $as_varule:path, from_varule: $from_varule:path) => {
($ty:ty, varule: $varule:ty, as_varule: $as_varule:expr, from_varule: $from_varule:expr) => {

Comment on lines 44 to 60
impl<'a> HelloWorld<'a> {
fn as_varule(&'a self) -> &'a str {
&self.message
}
fn from_varule(message: &'a str) -> Self {
Self {
message: Cow::Borrowed(message)
}
}
}

crate::marker::data_struct!(
HelloWorld<'a>,
varule: str,
as_varule: HelloWorld::as_varule,
from_varule: HelloWorld::from_varule
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl<'a> HelloWorld<'a> {
fn as_varule(&'a self) -> &'a str {
&self.message
}
fn from_varule(message: &'a str) -> Self {
Self {
message: Cow::Borrowed(message)
}
}
}
crate::marker::data_struct!(
HelloWorld<'a>,
varule: str,
as_varule: HelloWorld::as_varule,
from_varule: HelloWorld::from_varule
);
crate::marker::data_struct!(
HelloWorld<'a>,
varule: str,
as_varule: |v| &v.message,
from_varule: |v| Self { message: Cow::Borrowed(v) }
);

for<'a> <M::DataStruct as Yokeable<'a>>::Output: Bake + BakeSize + serde::Serialize + PartialEq,
for<'a> <M::DataStruct as Yokeable<'a>>::Output:
Bake + BakeSize + serde::Serialize + MaybeExportAsVarULE<'a> + PartialEq,
for<'a> <<M::DataStruct as Yokeable<'a>>::Output as MaybeExportAsVarULE<'a>>::VarULE: VarULE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of naming associated types by their bound. ...::VarULE: VarULE does not really tell me what any of this is. Can you come up with clear names for these associated type, like we renamed Yokeable to DataStruct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tells you what it literally is (it is an associated VarULE type, which is 100% correct), just not what it does.

I don't have any other name that is as precise and correct as this. But if you want to brainstorm, and if you like the DataStruct themed naming:

  • DataStructAsVarULE
  • OptionalVarULE
  • CompactDataStruct
  • DataStruct (it is on a different trait so using the same name is possible)

M::DataStruct: MaybeExportAsVarULE<'static>,
<M::DataStruct as MaybeExportAsVarULE<'static>>::VarULE: VarULE,
<M::DataStruct as MaybeExportAsVarULE<'static>>::EncodeAsVarULE:
From<&'static <M::DataStruct as MaybeExportAsVarULE<'static>>::VarULE>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there still a From bound?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to transform &str (a &Self::VarULE) to &str (a Self::EncodeAsVarULE)

@@ -359,6 +365,11 @@ fn data_struct_impl(attr: DataStructArgs, input: DeriveInput) -> TokenStream2 {
}
}

// TODO: Allow this to be configured
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment to "TODO Allow the automatic impl of MaybeAsVarULE to be configured" since it's not obvious from the macro anymore

for<'a> <M::DataStruct as Yokeable<'a>>::Output: Bake + BakeSize + serde::Serialize + PartialEq,
for<'a> <M::DataStruct as Yokeable<'a>>::Output:
Bake + BakeSize + serde::Serialize + MaybeExportAsVarULE<'a> + PartialEq,
for<'a> <<M::DataStruct as Yokeable<'a>>::Output as MaybeExportAsVarULE<'a>>::VarULE: VarULE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

/// if it cannot be represented as [`VarULE`].
///
/// This is expected to be `&'a Self::VarULE` for compatible types.
type EncodeAsVarULE: 'a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: can the complicated Encode bound be moved here?

};
}
#[doc(inline)]
pub use __data_struct as data_struct;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm assuming that the intent is for this macro to over time take over all purposes of the old proc macro?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, most of what the old proc macro does is now done by data_marker!

Comment on lines 185 to 202
/// Safety checklist for VarULE:
///
/// 1. No uninitialized or padding bytes (empty type)
/// 2. Type is an empty enum
/// 3. `VarULE::validate_bytes()` always returns errors
/// 4. `VarULE::validate_bytes()` always returns errors
/// 5. `VarULE::from_bytes_unchecked()` returns the same pointer
/// 6. All other methods are left with their default impl
/// 7. Byte equality is semantic equality
unsafe impl VarULE for NeverVarULE {
fn validate_bytes(_bytes: &[u8]) -> Result<(), UleError> {
Err(UleError::parse::<Self>())
}
unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self {
// Note: this returns a thin pointer instead of a fat pointer
&*(bytes.as_ptr() as *const Self)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth Flagging this code for a more detailed unsafe review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be an empty enum IMO, there are far too many tricky things there. Let's just use a ZST and have it always error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty enum is required in order to implement @robertbastian's preferred solution (discussed in the meeting this morning) where from_varule is always implemented; the empty enum means that types that don't support VarULE can match on it. However, I would prefer a different set of traits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you concerned about from_bytes_unchecked taking in a fat pointer and returning a thin pointer?

Copy link
Member

@Manishearth Manishearth Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe review for zerovec becomes much trickier when you start considering uninhabited types, since it's insta-UB to have an Empty, &Empty, or other such types floating around. We have not reviewed the zerovec crate for this, and I am unwilling to perform such a deep review and updation of invariants for this change.

Generally uninhabited types do not mix well with unsafe code. It's typical to assume they won't be introduced around this type of unsafe code when reviewing it, and opting uninhabited types into unsafe traits is a major red flag that requires reevaluation of this assumption.

validate_bytes always failing makes this plausibly correct, but I'm not able to quickly verify that in a nonglobal way. We have a lot of code using these traits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO just abort instead of matching on the empty enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will remove the condition and impl for uninhabited types.

Are you concerned about from_bytes_unchecked taking in a fat pointer and returning a thin pointer?

^ can you address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: I can do either of the following

  1. Implement ULE on a ZST like (), and use [()] as the placeholder VarULE type (this is what I did in an earlier version of this PR)
  2. Implement VarULE on the ZST and use () directly as the placeholder VarULE, but then it is a thin pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I hadn't noticed that had changed as well. I am somewhat concerned about that, we do assume VarULE types are DSTs but I don't actually think we document it as a requirement on the trait, or have unsafe code relying on it.

So it should be fine. But the [ZST] approach is also good.

@@ -261,7 +261,7 @@ where
/// Safety checklist for `VarULE`:
///
/// 1. The type *must not* include any uninitialized or padding bytes.
/// 2. The type must have an alignment of 1 byte.
/// 2. The type must have an alignment of 1 byte, or it is a ZST or an empty enum.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth Flagging this code for a more detailed unsafe review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's exclude empty enum

"or it is a ZST that is safe to construct" so that we disallow free construction of token ZSTs

@sffc sffc requested a review from Manishearth February 18, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants