grep-regex: improve literal detection with -w
When the -w/--word-regexp was used, ripgrep would in many cases fail to
apply literal optimizations. This occurs specifically when the regex
given by the user is an alternation of literals with no common prefixes
or suffixes, e.g.,
rg -w 'foo|bar|baz|quux'
In this case, the inner literal detector fails. Normally, this would
result in literal prefixes being detected by the regex engine. But
because of the -w/--word-regexp flag, the actual regex that we run ends
up looking like this:
(^|\W)(foo|bar|baz|quux)($|\W)
which of course defeats any prefix or suffix literal optimizations in
the regex crate's somewhat naive extractor. (A better extractor could
still do literal optimizations in the above case.)
So this commit fixes this by falling back to prefix or suffix literals
when they're available instead of prematurely giving up and assuming the
regex engine will do the rest.
This commit is contained in:
@@ -49,7 +49,7 @@ impl RegexMatcherBuilder {
|
||||
let fast_line_regex = chir.fast_line_regex()?;
|
||||
let non_matching_bytes = chir.non_matching_bytes();
|
||||
if let Some(ref re) = fast_line_regex {
|
||||
trace!("extracted fast line regex: {:?}", re);
|
||||
debug!("extracted fast line regex: {:?}", re);
|
||||
}
|
||||
|
||||
let matcher = RegexMatcherImpl::new(&chir)?;
|
||||
|
||||
Reference in New Issue
Block a user