-
Notifications
You must be signed in to change notification settings - Fork 193
Support the same Regexps to be Regexp.linear_time? as CRuby #3858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
* See #3858 * We need to pass a java.lang.String to TRegex, in this case we can pass it as raw bytes since we also pass the encoding name to TRegex. * Remove the UnsupportedCharsetException catch clause as no Charset should be involved in this conversion since the migration to TruffleString.
Need to look at a new run now that #3859 got merged |
@eregon Great! That brings it from 28 pending tests (23 |
All seven of the remaining pending
|
Thank you for checking! (I was planning to check soon but this is very helpful) The 7 remaining ones are all Unicode properties currently unsupported by TRegex:
I'll check with the TRegex folks if it would make sense to implement them in TRegex or not so much. These Regexps are from:
https://github.com/ruby/net-imap/blob/master/lib/net/imap/stringprep/saslprep_tables.rb @nevans Do you know if these are used by typical net-imap usages, or test-only or something else? Given their location I supposed they are production/non-test code but just wanted to confirm. |
FWIW, I found there is EDIT: @jirkamarsik said:
|
There are two cases: For AGE like |
@eregon All of the tables and regexps are needed for a complete stringprep (RFC 3454) implementation, but Net::IMAP only needs the SASLprep (RFC 4013) and "trace" profiles (RFC 4505). The "trace" profile is only used by the Although these are SASL mechanisms are not supported by the most popular publicly hosted IMAP servers (e.g: Gmail, Office365, Yahoo), they are supported by open source servers like Dovecot and Cyrus and by many smaller email hosts like Fastmail. For the providers that do support them, For what it's worth: stringprep has been obsoleted by PRECIS (RFC 8264). But in practice, each stringprep application and protocol needs to explicitly release updated specifications stating that stringprep has been replaced by PRECIS. And, as far as I know, none of the updated IMAP/SASL specifications have converted over. Also, I think PRECIS might lean even more heavily on Unicode character properties than stringprep did. |
Also, yes, @jirkamarsik is correct. I should've written I'm guessing the BIDI Regexps could be much more succinctly compressed as: BIDI_R_AL = /[\p{Bidi_Class=R}\p{Bidi_Class=AL}&&\p{AGE=3.2}]/u
BIDI_NOT_R_AL = /[^\p{Bidi_Class=R}\p{Bidi_Class=AL}&&\p{AGE=3.2}]/u
BIDI_L = /[\p{Bidi_Class=L}&&\p{AGE=3.2}]/u
# If a string contains any RandALCat character, the string MUST NOT
# contain any LCat character.
BIDI_FAILS_REQ2 = Regexp.union(
/#{BIDI_R_AL}.*?#{BIDI_L}/mu, # RandALCat followed by LCat
/#{BIDI_L}.*?#{BIDI_R_AL}/mu, # RandALCat preceded by LCat
)
# If a string contains any RandALCat character, a RandALCat
# character MUST be the first character of the string, and a
# RandALCat character MUST be the last character of the string.
BIDI_FAILS_REQ3 =
# contains RandALCat:
Regexp.union(
/\A#{BIDI_NOT_R_AL}.*?#{BIDI_R_AL}/mu, # but doesn't start with RandALCat
/#{BIDI_R_AL}.*?#{BIDI_NOT_R_AL}\z/mu, # but doesn't end with RandALCat
)
BIDI_FAILS = Regexp.union(BIDI_FAILS_REQ2, BIDI_FAILS_REQ3) |
We noticed in ruby/net-imap#470 that some Regexp are not Regexp.linear_time? on TruffleRuby, but they should be.
See the output in CI of that PR/repo, e.g. https://github.com/ruby/net-imap/actions/runs/14917675208/job/41906831286#step:5:54
One specific I noticed is
/[\x80-\xff\r\n]/n
, which I have a fix for.(cc @nevans)
The text was updated successfully, but these errors were encountered: