searcher: fix performance bug with -A/--after-context when searching stdin

This was a crazy subtle bug where ripgrep could slow down exponentially as increasingly larger values of `-A/--after-context` were used. But, interestingly, this would only occur when searching `stdin` and _not_ when searching the same data as a regular file. This confounded me because ripgrep, pretty early on, erases the difference between searching a single file and `stdin`. So it wasn't like there were different code paths. And I mistakenly assumed that they would otherwise behave the same as they are just treated as streams. But... it turns out that running `read` on a `stdin` versus a regular file seems to behave differently. At least on my Linux system, with `stdin`, `read` never seems to fill the buffer with more than 64K. But with a regular file, `read` pretty reliably fills the caller's buffer with as much space as declared. Of course, it is expected that `read` doesn't *have* to fill up the caller's buffer, and ripgrep is generally fine with that. But when `-A/--after-context` is used with a very large value---big enough that the default buffer capacity is too small---then more heap memory needs to be allocated to correctly handle all cases. This can result in passing buffers bigger than 64K to `read`. While we *correctly* handle `read` calls that don't fill the buffer, it turns out that if we don't fill the buffer, then we get into a pathological case where we aren't processing as many bytes as we could. That is, because of the `-A/--after-context` causing us to keep a lot of bytes around while we roll the buffer and because reading from `stdin` gives us fewer bytes than normal, we weren't amortizing our `read` calls as well as we should have been. Indeed, our buffer capacity increases specifically take this amortization into account, but we weren't taking advantage of it. We fix this by putting `read` into an inner loop that ensures our buffer gets filled up. This fixes the performance bug: ``` $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) | wc -l real 1.330 user 0.767 sys 0.559 maxmem 29 MB faults 0 10000 $ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A9999) | wc -l real 2.355 user 0.860 sys 0.613 maxmem 29 MB faults 0 10000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) | wc -l real 3.636 user 3.091 sys 0.537 maxmem 29 MB faults 0 100000 $ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A99999) | wc -l real 4.918 user 3.236 sys 0.710 maxmem 29 MB faults 0 100000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) | wc -l real 5.430 user 4.666 sys 0.750 maxmem 51 MB faults 0 1000000 $ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A999999) | wc -l real 6.894 user 4.907 sys 0.850 maxmem 51 MB faults 0 1000000 ``` For comparison, here is GNU grep: ``` $ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l real 1.466 user 0.159 sys 0.839 maxmem 29 MB faults 0 10000 $ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l real 1.663 user 0.166 sys 0.941 maxmem 29 MB faults 0 100000 $ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l real 1.631 user 0.204 sys 0.910 maxmem 29 MB faults 0 1000000 ``` GNU grep is still notably faster. We'll fix that in the next commit. Fixes #3184
2025-10-14 13:04:39 -04:00
parent de2567a4c7
commit 8c6595c215
3 changed files with 17 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,8 @@ Performance improvements:
  Don't resolve helper binaries on Windows when `-z/--search-zip` isn't used.
 * [PERF #2865](https://github.com/BurntSushi/ripgrep/pull/2865):
  Avoid using path canonicalization on Windows when emitting hyperlinks.
+* [PERF #3184](https://github.com/BurntSushi/ripgrep/pull/3184):
+  Improve performance of large values with `-A/--after-context`.

 Bug fixes:

--- a/crates/searcher/src/line_buffer.rs
+++ b/crates/searcher/src/line_buffer.rs
@@ -415,21 +415,26 @@ impl LineBuffer {
        assert_eq!(self.pos, 0);
        loop {
            self.ensure_capacity()?;
-            let readlen = rdr.read(self.free_buffer().as_bytes_mut())?;
-            if readlen == 0 {
-                // We're only done reading for good once the caller has
-                // consumed everything.
-                self.last_lineterm = self.end;
-                return Ok(!self.buffer().is_empty());
+            let oldend = self.end;
+            while !self.free_buffer().is_empty() {
+                let readlen = rdr.read(self.free_buffer())?;
+                if readlen == 0 {
+                    break;
+                }
+                self.end += readlen;
            }

            // Get a mutable view into the bytes we've just read. These are
            // the bytes that we do binary detection on, and also the bytes we
            // search to find the last line terminator. We need a mutable slice
            // in the case of binary conversion.
-            let oldend = self.end;
-            self.end += readlen;
            let newbytes = &mut self.buf[oldend..self.end];
+            if newbytes.is_empty() {
+                self.last_lineterm = self.end;
+                // We're only done reading for good once the caller has
+                // consumed everything.
+                return Ok(!self.buffer().is_empty());
+            }

            // Binary detection.
            match self.config.binary {
--- a/crates/searcher/src/searcher/glue.rs
+++ b/crates/searcher/src/searcher/glue.rs
@@ -737,7 +737,7 @@ d
        // Namely, it will *always* detect binary data in the current buffer
        // before searching it. Thus, the total number of bytes searched is
        // smaller than below.
-        let exp = "0:a\n\nbyte count:262146\nbinary offset:262153\n";
+        let exp = "0:a\n\nbyte count:262142\nbinary offset:262153\n";
        // In contrast, the slice readers (for multi line as well) will only
        // look for binary data in the initial chunk of bytes. After that
        // point, it only looks for binary data in matches. Note though that
@@ -771,7 +771,7 @@ d
        haystack.push_str("a\x00a\n");
        haystack.push_str("a\n");

-        let exp = "0:a\n\nbyte count:262146\nbinary offset:262149\n";
+        let exp = "0:a\n\nbyte count:262142\nbinary offset:262149\n";
        // The binary offset for the Slice readers corresponds to the binary
        // data in `a\x00a\n` since the first line with binary data
        // (`b\x00b\n`) isn't part of a match, and is therefore undetected.