You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.
The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.
Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.
Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.
Proof On Concepts
Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:
text.replace(/^\s+|\s+$/g,'');// BAD
The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.
This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes. Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).
Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.
/^0\.\d+E?\d+$/.test(str)// BAD
Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:
/^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.test(str)// BAD
It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.
if(str.length>1000){thrownewError("Input too long");}/^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.test(str)
Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.
The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.
Typically, a regular expression is affected by this problem if it contains a repetition of the form
r*
orr+
where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.logdna-agent/lib/utils.js
Line 41 in 2313b15
Recommendation
Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.
Proof On Concepts
Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:
The sub-expression
"\s+$"
will match the whitespace characters intext
from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "
a b
" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes. Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g
), or just by using the built-in trim method (text.trim()
).Note that the sub-expression "
^\s+
" is not problematic as the^
anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:
It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.
References
OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.
The text was updated successfully, but these errors were encountered: