-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baked data: use VarULE to store data when specified #6133
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this design of sticking it on the export traits is good: it avoids cluttering core data marker infra whilst still providing extra metadata for export.
provider/core/src/marker.rs
Outdated
fn maybe_as_varule(&self) -> Option<&Self::VarULE>; | ||
} | ||
|
||
/// Implements [`MaybeAsVarULE`] on a type that is NOT representable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: the reverse default might be better UX for ICU4X devs so they do not accidentally feel forced to structure all of their data structs to support VarULE. I know Younies has found this confusing before.
but I figure there's a reason behind this choice here. if it's just macroability I think the macro can be done differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reverse default
I don't understand what you mean. Do you mean the name of the trait? The name of the macro? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro opts out of this system by default.
I'd rather have the data marker macro opt in by default, with a flag to opt out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so this is primarily feedback about what the macro is named and what flags it takes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the name, just the default behavior, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically: if I define a new data struct, the default behavior of the data struct/marker macros should not make me think I need to impl VarULE. The non-None impl of this trait should be opt in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro does what I think you want it to do? It implements the trait to return None. If you want not None, there is no macro; implement the trait manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, I'm talking about the data_marker/data_struct macros. And I see the source of my confusion: The data_struct macro does handle this correctly, however in list/properties we have data structs that do not use the macro (why?) and they need to explicitly call it.
I saw explicit calls to this macro and thought the trait was opt-out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied @robertbastian's suggestion to create a new icu_provider::marker::data_struct!
macro rules.
Created #6135 to reduce the diff |
provider/core/src/hello_world.rs
Outdated
@@ -41,6 +42,21 @@ impl Default for HelloWorld<'_> { | |||
} | |||
} | |||
|
|||
impl MaybeAsVarULE for HelloWorld<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I think allowing implementation of this trait on arbitrary data structs is going to be super confusing (and it should definitely not use From
to do the reverse).
HelloWorldV1::DataStruct
should just be Cow<'static, str>
, and this trait should be implemented for Cow<'static, str>
in icu_provider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay implementing it on Cow
, but we still need the no-op version of the trait implement on other things. I don't see a reason to not allow it to be implemented on arbitrary data structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I am still allowing the trait to be implemented on any data struct)
components/list/src/provider/mod.rs
Outdated
@@ -69,6 +69,8 @@ data_marker!( | |||
ListFormatterPatterns<'static>, | |||
); | |||
|
|||
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this is an opt-out, but it should be opt-in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without specialization, I need to implement the trait on all data structs, and use runtime Option
to determine if it is actually implement or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how specialization is relevant for opt out vs opt in? We're using a macro, the macro can default to generating a blank impl, and one can ask the macro to not do so.
This is my comment above about the reverse default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how specialization is relevant for opt out vs opt in
Ideally I would have a blanket impl that returns None and then a small number of specific impls that return Some. Instead, everyone needs to decide how to implement the trait, which I think @robertbastian doesn't like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My objection at least is that people need to explicitly opt out: it's not an objection about the trait architecture, it's an objection about the macro architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just make this
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>); | |
icu_provider::data_struct!(ListFormatterPatterns<'_>); |
Then we can use something like
icu_provider::data_struct!(HelloWorld<'_>, encoding: |v| v.message.as_str(), |e| Self { message: Cow::Borrowed(v) });
to do the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we did the marker name cleanup first, we didn't have to keep maintaining this macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the contrary, I want to land this before marker cleanup so that we can make a decision on a case-by-case basis on whether to apply the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying this optimization is not 2.0 blocking. This is scope creep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying this optimization is not 2.0 blocking. This is scope creep.
I consider it 2.0 to have a design for this. This is a specific design proposal. It involves changing the bounds of ExportableDataPayload
, which seems like something that would be smoother to do in a major release.
This PR is passing CI and ready from my perspective. I am unclear whether the open comments are bikeshed issues or fundamental design issues, and how you hope that they be resolved. |
@sffc From my side, the trait naming is a bikeshed that can be deferred, and the macro opt-in/opt-out is a concern I'd prefer solved here but is not a major design issue. The overall design looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK to land from my end, would prefer the trait be renamed, but can happen later.
provider/core/src/marker.rs
Outdated
/// | ||
/// For most data structs, a default implementation available via | ||
/// [`does_not_deref_to_varule`] can be used. | ||
pub trait MaybeAsVarULE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this trait can be behind the export
feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion with @robertbastian:
- The
maybe_as_var_ule
function can be gated, but not the whole trait, because we need the associated type - Add
from_varule
to replace theFrom
bound - Use a never type (empty enum) for the non-implemented case; this lets the
from_varule
match on it (and be implementable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure how that gating would work, but will have a look when you have it ready. Would rather not have implementors of this forced to enable the export feature, i think
(yes, that might mean not gating it, that's fine by me)
but no strong opinion, let's see how the gating works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I do the feature-gating, to make sure we is on the same page, here are some facts
- The
VarULE
associated type is used at runtime in data structs that apply the optimization - The
maybe_as_varule
function is only used during data export - Most component crates have a
"datagen"
feature; maybe we can assume it is all component crates
So if we want to move forward with the gating, there are a few choices
- Feature-gate the function in the trait and the impls. In
icu_provider
, it is in the"export"
feature, and in the impls, it is in the"datagen"
feature of the specific crates. - Do not feature-gate the trait in icu_provider. Feature-gate the impls except when the impl is the non-default impl, since that is the case when it is used at runtime.
- Both 1 and 2.
I think I prefer either option 0 (don't feature-gate) or option 2 (feature-gate the whole trait).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not feature gating. This is a trait impl and I'd rather keep it clean.
I'm happy to have the impl itself gated on datagen: the Encode impls could pull in additional deps, but I think the trait itself should always be available to avoid specifying transitive deps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Encode impls could pull in additional deps
This is compelling to me.
I would prefer not to "split" the trait with a function enabled only on certain features enabled. That sounds like a bit of a disaster since the features need to be always 100% in sync. If the icu_provider trait enables its export feature, but a crate doesn't have its datagen feature enabled, then you break code.
So I would like to keep the trait unconditionally exported from icu_provider and conditionally implemented, in whole, in component crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aligned
components/list/src/provider/mod.rs
Outdated
@@ -69,6 +69,8 @@ data_marker!( | |||
ListFormatterPatterns<'static>, | |||
); | |||
|
|||
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just make this
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>); | |
icu_provider::data_struct!(ListFormatterPatterns<'_>); |
Then we can use something like
icu_provider::data_struct!(HelloWorld<'_>, encoding: |v| v.message.as_str(), |e| Self { message: Cow::Borrowed(v) });
to do the optimization.
provider/baked/src/zerotrie.rs
Outdated
use zerovec::VarZeroVec; | ||
|
||
#[cfg(feature = "export")] | ||
pub(crate) enum BakedValue<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: this optimisation should also be applied to the binary search data store, or we should delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok if I make an issue to fix the binary search baked backend instead of deleting or updating it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
provider/baked/src/zerotrie.rs
Outdated
get_index(self.trie, id, attributes_prefix_match) | ||
// Safety: Allowed since `i` came from the trie and the field safety invariant | ||
.map(|i| unsafe { self.values.get_unchecked(i) }) | ||
.map(From::from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: please introduce a specific trait for this instead of abusing From
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a possible design. I don't think I like it.
As discussed in DM, we need 2 different associated types with this design: one that is a reference and one that is a VarULE. The one that is a reference, which I named EncodeAsVarULE
,
I disagree that my use of From
is an "abuse" of the trait, but I acknowledge the point of view that From
looks like it might do some sort of transformation on the string. I believe that ZeroFrom
does not carry similar baggage. I would prefer to revert to something closer to what I had before, with a single associated type, but using ZeroFrom
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ would like to highlight this open question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the proposed conclusion on #6133 (comment), I want to double-down on my position that we should use two traits, one for datagen and one for runtime. I would like the datagen trait to be the MaybeExportAsVarULE
and the runtime trait to be ZeroFrom
. I am willing to use custom traits in both situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this could be 3 traits, since there are 3 unique, disjoint things we need, in different situations:
maybe_as_varule -> Option<(impl VarULE or impl EncodeAsVarULE)>
- Needed in datagen only
pub type VarULE
- Needed in datagen for all structs and at runtime for structs that use it
from_varule -> Self
- Needed at runtime for structs that use it
A potential design for this would be 3 traits: (2) would be a base trait and (1) and (3) would extend it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ZeroFrom as the runtime trait sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently exploring this constellation of traits:
pub trait MaybeAsVarULE {
type VarULE: ?Sized + VarULE;
}
#[cfg(feature = "export")]
pub trait MaybeEncodeAsVarULE: MaybeAsVarULE {
fn maybe_encode_as_varule(&self) -> Box<Self::VarULE>;
}
pub trait FromVarULE<'a>: MaybeAsVarULE {
fn from_varule(varule: &'a Self::VarULE) -> Self;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the deficiency in using ZeroFrom? We have tons of magical derive infra for that, and none for a new trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The trait makes it more explicit that it is the reverse operation as
MaybeEncodeAsVarULE
- A hand-written impl is sometimes required since there may be intermediate fields involved; for example, if the data struct is
Cow<str>
, the auto-derived impl will work, but not if the data struct isHelloWorld(Cow<str>)
- @robertbastian was opposed to using ZeroFrom
(none of these points are super compelling to me and I am happy to switch to ZeroFrom)
components/list/src/provider/mod.rs
Outdated
@@ -69,7 +69,7 @@ data_marker!( | |||
ListFormatterPatterns<'static>, | |||
); | |||
|
|||
icu_provider::marker::does_not_deref_to_varule!(ListFormatterPatterns<'_>); | |||
icu_provider::marker::data_struct!(ListFormatterPatterns<'_>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be in the marker
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can export this from icu_provider
. Do you also have an opinion on where the trait and type (MaybeExportAsVarULE
, NeverVarULE
) should be exported?
provider/core/src/marker.rs
Outdated
} | ||
} | ||
}; | ||
($ty:path, varule: $varule:path, as_varule: $as_varule:path, from_varule: $from_varule:path) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
($ty:path, varule: $varule:path, as_varule: $as_varule:path, from_varule: $from_varule:path) => { | |
($ty:ty, varule: $varule:ty, as_varule: $as_varule:expr, from_varule: $from_varule:expr) => { |
provider/core/src/hello_world.rs
Outdated
impl<'a> HelloWorld<'a> { | ||
fn as_varule(&'a self) -> &'a str { | ||
&self.message | ||
} | ||
fn from_varule(message: &'a str) -> Self { | ||
Self { | ||
message: Cow::Borrowed(message) | ||
} | ||
} | ||
} | ||
|
||
crate::marker::data_struct!( | ||
HelloWorld<'a>, | ||
varule: str, | ||
as_varule: HelloWorld::as_varule, | ||
from_varule: HelloWorld::from_varule | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<'a> HelloWorld<'a> { | |
fn as_varule(&'a self) -> &'a str { | |
&self.message | |
} | |
fn from_varule(message: &'a str) -> Self { | |
Self { | |
message: Cow::Borrowed(message) | |
} | |
} | |
} | |
crate::marker::data_struct!( | |
HelloWorld<'a>, | |
varule: str, | |
as_varule: HelloWorld::as_varule, | |
from_varule: HelloWorld::from_varule | |
); | |
crate::marker::data_struct!( | |
HelloWorld<'a>, | |
varule: str, | |
as_varule: |v| &v.message, | |
from_varule: |v| Self { message: Cow::Borrowed(v) } | |
); |
provider/core/src/export/payload.rs
Outdated
for<'a> <M::DataStruct as Yokeable<'a>>::Output: Bake + BakeSize + serde::Serialize + PartialEq, | ||
for<'a> <M::DataStruct as Yokeable<'a>>::Output: | ||
Bake + BakeSize + serde::Serialize + MaybeExportAsVarULE<'a> + PartialEq, | ||
for<'a> <<M::DataStruct as Yokeable<'a>>::Output as MaybeExportAsVarULE<'a>>::VarULE: VarULE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of naming associated types by their bound. ...::VarULE: VarULE
does not really tell me what any of this is. Can you come up with clear names for these associated type, like we renamed Yokeable
to DataStruct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tells you what it literally is (it is an associated VarULE type, which is 100% correct), just not what it does.
I don't have any other name that is as precise and correct as this. But if you want to brainstorm, and if you like the DataStruct
themed naming:
DataStructAsVarULE
OptionalVarULE
CompactDataStruct
DataStruct
(it is on a different trait so using the same name is possible)
provider/baked/src/zerotrie.rs
Outdated
M::DataStruct: MaybeExportAsVarULE<'static>, | ||
<M::DataStruct as MaybeExportAsVarULE<'static>>::VarULE: VarULE, | ||
<M::DataStruct as MaybeExportAsVarULE<'static>>::EncodeAsVarULE: | ||
From<&'static <M::DataStruct as MaybeExportAsVarULE<'static>>::VarULE>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there still a From
bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to transform &str
(a &Self::VarULE
) to &str
(a Self::EncodeAsVarULE
)
@@ -359,6 +365,11 @@ fn data_struct_impl(attr: DataStructArgs, input: DeriveInput) -> TokenStream2 { | |||
} | |||
} | |||
|
|||
// TODO: Allow this to be configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update comment to "TODO Allow the automatic impl of MaybeAsVarULE to be configured" since it's not obvious from the macro anymore
provider/core/src/export/payload.rs
Outdated
for<'a> <M::DataStruct as Yokeable<'a>>::Output: Bake + BakeSize + serde::Serialize + PartialEq, | ||
for<'a> <M::DataStruct as Yokeable<'a>>::Output: | ||
Bake + BakeSize + serde::Serialize + MaybeExportAsVarULE<'a> + PartialEq, | ||
for<'a> <<M::DataStruct as Yokeable<'a>>::Output as MaybeExportAsVarULE<'a>>::VarULE: VarULE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
provider/core/src/marker.rs
Outdated
/// if it cannot be represented as [`VarULE`]. | ||
/// | ||
/// This is expected to be `&'a Self::VarULE` for compatible types. | ||
type EncodeAsVarULE: 'a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: can the complicated Encode bound be moved here?
provider/core/src/marker.rs
Outdated
}; | ||
} | ||
#[doc(inline)] | ||
pub use __data_struct as data_struct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I'm assuming that the intent is for this macro to over time take over all purposes of the old proc macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, most of what the old proc macro does is now done by data_marker!
provider/core/src/marker.rs
Outdated
/// Safety checklist for VarULE: | ||
/// | ||
/// 1. No uninitialized or padding bytes (empty type) | ||
/// 2. Type is an empty enum | ||
/// 3. `VarULE::validate_bytes()` always returns errors | ||
/// 4. `VarULE::validate_bytes()` always returns errors | ||
/// 5. `VarULE::from_bytes_unchecked()` returns the same pointer | ||
/// 6. All other methods are left with their default impl | ||
/// 7. Byte equality is semantic equality | ||
unsafe impl VarULE for NeverVarULE { | ||
fn validate_bytes(_bytes: &[u8]) -> Result<(), UleError> { | ||
Err(UleError::parse::<Self>()) | ||
} | ||
unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self { | ||
// Note: this returns a thin pointer instead of a fat pointer | ||
&*(bytes.as_ptr() as *const Self) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Manishearth Flagging this code for a more detailed unsafe review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be an empty enum IMO, there are far too many tricky things there. Let's just use a ZST and have it always error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty enum is required in order to implement @robertbastian's preferred solution (discussed in the meeting this morning) where from_varule
is always implemented; the empty enum means that types that don't support VarULE can match on it. However, I would prefer a different set of traits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you concerned about from_bytes_unchecked
taking in a fat pointer and returning a thin pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsafe review for zerovec becomes much trickier when you start considering uninhabited types, since it's insta-UB to have an Empty
, &Empty
, or other such types floating around. We have not reviewed the zerovec crate for this, and I am unwilling to perform such a deep review and updation of invariants for this change.
Generally uninhabited types do not mix well with unsafe code. It's typical to assume they won't be introduced around this type of unsafe code when reviewing it, and opting uninhabited types into unsafe traits is a major red flag that requires reevaluation of this assumption.
validate_bytes
always failing makes this plausibly correct, but I'm not able to quickly verify that in a nonglobal way. We have a lot of code using these traits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO just abort instead of matching on the empty enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove the condition and impl for uninhabited types.
Are you concerned about
from_bytes_unchecked
taking in a fat pointer and returning a thin pointer?
^ can you address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: I can do either of the following
- Implement
ULE
on a ZST like()
, and use[()]
as the placeholder VarULE type (this is what I did in an earlier version of this PR) - Implement
VarULE
on the ZST and use()
directly as the placeholder VarULE, but then it is a thin pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I hadn't noticed that had changed as well. I am somewhat concerned about that, we do assume VarULE types are DSTs but I don't actually think we document it as a requirement on the trait, or have unsafe code relying on it.
So it should be fine. But the [ZST]
approach is also good.
@@ -261,7 +261,7 @@ where | |||
/// Safety checklist for `VarULE`: | |||
/// | |||
/// 1. The type *must not* include any uninitialized or padding bytes. | |||
/// 2. The type must have an alignment of 1 byte. | |||
/// 2. The type must have an alignment of 1 byte, or it is a ZST or an empty enum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Manishearth Flagging this code for a more detailed unsafe review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's exclude empty enum
"or it is a ZST that is safe to construct" so that we disallow free construction of token ZSTs
#5230
I did it by creating a trait,
MaybeAsVarULE
, which is implemented on data structs, similar to howBake
andSerialize
are implemented on data structs.The trait has an associated type that I can leverage in the baked exporter without having to do
dyn Any
downcasting.I tried to keep my understanding of @robertbastian's position in mind while writing this PR (don't add anything hyper-specific to bake in the provider crate), and I think I achieved this, since the thing I added is on the same level as the existing format-specific traits.
See the hello world data for an example. I didn't apply it to the other data markers yet.