289 Commits

Author SHA1 Message Date
Ian McKellar
85edf4c796 ignore: only stat .jj if we actually care
I was comparing the work being done by fd and find and noticed (with
`strace -f -c -S` calls) that fd was doing a ton of failed `statx`
calls. Upon closer inspection it was stating `.jj` even though I
was passing `--no-ignore`. Eventually I turned up this check in
`Ignore::add_child_path` that was doing stat on `.jj` regardless of
whether the options request it.

With this patch it'll only stat `.jj` if that's relevant to the query.

PR #3212
2025-10-30 13:29:58 -04:00
Andrew Gallant
36b7597693 changelog: start next section 2025-10-22 09:02:40 -04:00
Andrew Gallant
f5be160839 changelog: 15.1.0 2025-10-22 08:21:34 -04:00
Andrew Gallant
d47663b1b4 searcher: fix regression with --line-buffered flag
In my fix for #3184, I actually had two fixes. One was a tweak to how we
read data and the other was a tweak to how we determined how much of the
buffer we needed to keep around. It turns out that fixing #3184 only
required the latter fix, found in commit
d4b77a8d89. The former fix also helped the
specific case of #3184, but it ended up regressing `--line-buffered`.

Specifically, previous to 8c6595c215 (the
first fix), we would do one `read` syscall. This call might not fill our
caller provided buffer. And in particular, `stdin` seemed to fill fewer
bytes than reading from a file. So the "fix" was to put `read` in a loop
and keep calling it until the caller provided buffer was full or until
the stream was exhausted. This helped alleviate #3184 by amortizing
`read` syscalls better.

But of course, in retrospect, this change is clearly contrary to how
`--line-buffered` works. We specifically do _not_ want to wait around
until the buffer is full. We want to read what we can, search it and
move on.

So this reverts the first fix but leaves the second, which still
keeps #3184 fixed and also fixes #3194 (the regression).

This reverts commit 8c6595c215.

Fixes #3194
2025-10-19 11:06:39 -04:00
Andrew Gallant
f09b55b8e7 changelog: start next section 2025-10-15 23:32:00 -04:00
Andrew Gallant
3780168c13 changelog: 15.0.0 2025-10-15 22:53:30 -04:00
Andrew Gallant
79d393a302 release: remove riscv64 and powerpc64 artifacts
Their CI workflows broke for different reasons.

I perceive these as niche platforms that aren't worth blocking
a release on. And not worth my time investigating CI problems.
2025-10-15 22:42:51 -04:00
Andrew Gallant
63209ae0b9 printer: fix --stats for --json
Somehow, the JSON printer seems to have never emitted correct summary
statistics. And I believe #3178 is the first time anyone has ever
reported it. I believe this bug has persisted for years. That's
surprising.

Anyway, the problem here was that we were bailing out of `finish()` on
the sink if we weren't supposed to print anything. But we bailed out
before we tallied our summary statistics. Obviously we shouldn't do
that.

Fixes #3178
2025-10-15 21:21:20 -04:00
Andrew Gallant
b610d1cb15 ignore: fix global gitignore bug that arises with absolute paths
The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).

In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.

Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.

But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.

We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:

1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
   at all. (Since this is global process state and folks might want to
   set it to different values for $reasons.)

The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.

Fixes #3179
2025-10-15 19:44:23 -04:00
Andrew Gallant
8c6595c215 searcher: fix performance bug with -A/--after-context when searching stdin
This was a crazy subtle bug where ripgrep could slow down exponentially
as increasingly larger values of `-A/--after-context` were used. But,
interestingly, this would only occur when searching `stdin` and _not_
when searching the same data as a regular file.

This confounded me because ripgrep, pretty early on, erases the
difference between searching a single file and `stdin`. So it wasn't
like there were different code paths. And I mistakenly assumed that they
would otherwise behave the same as they are just treated as streams.

But... it turns out that running `read` on a `stdin` versus a regular
file seems to behave differently. At least on my Linux system, with
`stdin`, `read` never seems to fill the buffer with more than 64K. But
with a regular file, `read` pretty reliably fills the caller's buffer
with as much space as declared.

Of course, it is expected that `read` doesn't *have* to fill up the
caller's buffer, and ripgrep is generally fine with that. But when
`-A/--after-context` is used with a very large value---big enough that
the default buffer capacity is too small---then more heap memory needs
to be allocated to correctly handle all cases. This can result in
passing buffers bigger than 64K to `read`.

While we *correctly* handle `read` calls that don't fill the buffer,
it turns out that if we don't fill the buffer, then we get into a
pathological case where we aren't processing as many bytes as we could.
That is, because of the `-A/--after-context` causing us to keep a lot of
bytes around while we roll the buffer and because reading from `stdin`
gives us fewer bytes than normal, we weren't amortizing our `read` calls
as well as we should have been. Indeed, our buffer capacity increases
specifically take this amortization into account, but we weren't taking
advantage of it.

We fix this by putting `read` into an inner loop that ensures our
buffer gets filled up. This fixes the performance bug:

```
$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) | wc -l

real    1.330
user    0.767
sys     0.559
maxmem  29 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A9999) | wc -l

real    2.355
user    0.860
sys     0.613
maxmem  29 MB
faults  0
10000

$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) | wc -l

real    3.636
user    3.091
sys     0.537
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A99999) | wc -l

real    4.918
user    3.236
sys     0.710
maxmem  29 MB
faults  0
100000

$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) | wc -l

real    5.430
user    4.666
sys     0.750
maxmem  51 MB
faults  0
1000000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A999999) | wc -l

real    6.894
user    4.907
sys     0.850
maxmem  51 MB
faults  0
1000000
```

For comparison, here is GNU grep:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l

real    1.466
user    0.159
sys     0.839
maxmem  29 MB
faults  0
10000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l

real    1.663
user    0.166
sys     0.941
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.631
user    0.204
sys     0.910
maxmem  29 MB
faults  0
1000000
```

GNU grep is still notably faster. We'll fix that in the next commit.

Fixes #3184
2025-10-14 14:27:43 -04:00
Andrew Gallant
de2567a4c7 printer: fix panic in replacements in look-around corner case
The abstraction boundary fuck up is the gift that keeps on giving. It
turns out that the invariant that the match would never exceed the range
given is not always true. So we kludge around it.

Also, update the CHANGELOG to include the fix for #2111.

Fixes #3180
2025-10-12 17:25:19 -04:00
Andrew Gallant
5c42c8c48f test: add regression test for fixed bug
It turns out that #2094 was fixed in my `--max-count` refactor a few
commits back. This commit adds a regression test for it.

Closes #2094
2025-10-12 12:45:34 -04:00
Andrew Gallant
293ef80eaf test: add another regression test for gitignore matching bug
I believe this was also fixed by #2933.

Closes #2770
2025-10-10 22:06:59 -04:00
Andrew Gallant
fa80aab6b0 test: add regression test for fixed gitignore bug
I believe this was actually fixed by #2933.

Closes #3067
2025-10-10 22:06:59 -04:00
mariano-m13
7c2161d687 release: add binaries for riscv64gc-unknown-linux-gnu target
Note that we skip lz4/brotli/zstd tests on RISC-V.

The CI runs RISC-V tests using cross/QEMU emulation. The decompression
tools (lz4, brotli, zstd) are x86_64 binaries on the host that cannot
execute in the RISC-V QEMU environment.

Skip these three tests at compile-time on RISC-V to avoid test failures.
The -z/--search-zip functionality itself works correctly on real RISC-V
hardware where native decompression tools are available.

PR #3165
2025-10-10 20:50:28 -04:00
Andrew Gallant
096f79ab98 deps: update everything
This includes an update to `regex 1.12.1`, which fixes a couple of
outstanding bugs in ripgrep.

Fixes #2750, Fixes #3135
2025-10-10 20:13:29 -04:00
Andrew Gallant
0407e104f6 ignore: fix problem with searching whitelisted hidden files
... specifically, when the whitelist comes from a _parent_ gitignore
file.

Our handling of parent gitignores is pretty ham-fisted and has been a
source of some unfortunate bugs. The problem is that we need to strip
the parent path from the path we're searching in order to correctly
apply the globs. But getting this stripping correct seems to be a subtle
affair.

Fixes #3173
2025-10-08 21:16:59 -04:00
Andrew Gallant
1b07c6616a cli: document that -c/--count can be inconsistent with -l/--files-with-matches
This is unfortunate, but is a known bug that I don't think can be fixed
without either making `-l/--files-with-matches` much slower or changing
what "binary filtering" means by default.

In this PR, we document this inconsistency since users may find it quite
surprising. The actual work-around is to disable binary filtering with
the `--binary` flag.

We add a test confirming this behavior.

Closes #3131
2025-09-22 20:24:53 -04:00
Andrew Gallant
c1fc6a5eb8 release: build aarch64 artifacts for macos on GitHub Actions
GitHub now supports this natively, so there's no need for me to do it
any more.

Fixes #3155
2025-09-22 11:56:33 -04:00
Isaac
64174b8e68 printer: preserve line terminator when using --crlf and --replace
Ref #3097, Closes #3100
2025-09-19 21:08:19 -04:00
Pavel Safronov
a6e0be3c90 searcher: move "max matches" from printer to searcher
This is a bit of a brutal change, but I believe is necessary in order to
fix a bug in how we handle the "max matches" limit in multi-line mode
while simultaneously handling context lines correctly.

The main problem here is that "max matches" refers to the shorter of
"one match per line" or "a single match." In typical grep, matches
*can't* span multiple lines, so there's never a difference. But in
multi-line mode, they can. So match counts necessarily must be handled
differently for multi-line mode.

The printer was previously responsible for this. But for $reasons, the
printer is fundamentally not in charge of how matches are found and
reported.

See my comments in #3094 for even more context.

This is a breaking change for `grep-printer`.

Fixes #3076, Closes #3094
2025-09-19 21:08:19 -04:00
Andrew Gallant
74959a14cb man: escape all hyphens in flag names
Apparently, if we don't do this, some roff renderers with use a special
Unicode hyphen. That in turn makes searching a man page not work as one
would expect.

Fixes #3140
2025-09-19 21:08:19 -04:00
dana
78383de9b2 complete/zsh: improve --hyperlink-format completion
Also don't re-define helper functions if they exist.

Closes #3102
2025-09-19 21:08:19 -04:00
Ilya Grigoriev
519c1bd5cf complete: improvements for the --hyperlink-format flag
The goal is to make the completion for `rg --hyperlink-format v<TAB>`
work in the fish shell.

These are not exhaustive (the user can also specify custom formats).
This is somewhat unfortunate, but is probably better than not doing
anything at all.

The `grep+` value necessitated a change to a test.

Closes #3096
2025-09-19 21:08:19 -04:00
emrebengue
99fe884536 colors: add highlight type support for matching lines
This lets users highlight non-matching text in matching lines.

Closes #3024, Closes #3107
2025-09-19 21:08:19 -04:00
Andrew Gallant
126bbeab8c printer: fix handling of has_match for summary printer
Previously, `Quiet` mode in the summary printer always acted like
"print matching paths," except without the printing. This happened even
if we wanted to "print non-matching paths." Since this only afflicted
quiet mode, this had the effect of flipping the exit status when
`--files-without-match --quiet` was used.

Fixes #3108, Ref #3118
2025-09-19 21:08:19 -04:00
Andrew Gallant
4df1298127 globset: fix bug where trailing . in file name was incorrectly handled
I'm not sure why I did this, but I think I was trying to imitate the
contract of [`std::path::Path::file_name`]:

> Returns None if the path terminates in `..`.

But the status quo clearly did not implement this. And as a result, if
you have a glob that ends in a `.`, it was instead treated as the empty
string (which only matches the empty string).

We fix this by implementing the semantic from the standard library
correctly.

Fixes #2990

[`std::path::Path::file_name`]: https://doc.rust-lang.org/std/path/struct.Path.html#method.file_name
2025-09-19 21:08:19 -04:00
Andrew Gallant
4ab1862dc0 stats: fix case where "bytes searched" could be wrong
Specifically, if the search was instructed to quit early, we might not
have correctly marked the number of bytes consumed.

I don't think this bug occurs when memory maps are used to read the
haystack.

Closes #2944
2025-09-19 21:08:19 -04:00
Luke Sandberg
5f5da48307 globset: support nested alternates
For example, `**/{node_modules/**/*/{ts,js},crates/**/*.{rs,toml}`.

I originally didn't add this I think for implementation simplicity, but
it turns out that it really isn't much work to do. There might have also
been some odd behavior in the regex engine for dealing with empty
alternates, but that has all been long fixed.

Closes #3048, Closes #3112
2025-09-19 21:08:19 -04:00
Andrew Gallant
d199058e77 cli: make rg -vf file behave sensibly
Previously, when `file` is empty (literally empty, as in, zero byte),
`rg -f file` and `rg -vf file` would behave identically. This is odd
and also doesn't match how GNU grep behaves. It's also not logically
correct. An empty file means _zero_ patterns which is an empty set. An
empty set matches nothing. Inverting the empty set should result in
matching everything.

This was because of an errant optimization that lets ripgrep quit early
if it can statically detect that no matches are possible.

Moreover, there was *also* a bug in how we constructed the PCRE2 pattern
when there are zero patterns. PCRE2 doesn't have a concept of sets of
patterns (unlike the `regex` crate), so we need to fake it with an empty
character class.

Fixes #1332, Fixes #3001, Closes #3041
2025-09-19 21:08:19 -04:00
Josh Cotton
bb0cbae312 ci: add aarch64 Windows
This also adds a new release artifact for aarch64 Windows.

Closes #2943, Closes #3038
2025-09-19 21:08:19 -04:00
Stephan Badragan
292bc54e64 printer: support -r/--replace with --json
This adds a `replacement` field to each submatch object in the JSON
output. In effect, this extends the `-r/--replace` flag so that it works
with `--json`.

This adds a new field instead of replacing the match text (which is how
the standard printer works) for maximum flexibility. This way, consumers
of the JSON output can access the original match text (and always rely
on it corresponding to the original match text) while also getting the
replacement text without needing to do the replacement themselves.

Closes #1872, Closes #2883
2025-09-19 21:08:19 -04:00
Lucas Trzesniewski
119407d0a9 printer: use std::path::absolute on Windows
This specifically avoids touching the file system, which can lead to
fairly dramatic speed-ups in large repositories with lots of matches.

Closes #2865
2025-09-19 21:08:19 -04:00
Thomas Otto
75970fd16b ignore: don't process command line arguments in reverse order
When searching in parallel with many more arguments than threads, the
first arguments are searched last -- unlike in the -j1 case.

This is unexpected for users who know about the parallel nature of rg
and think they can give the scheduler a hint by positioning larger
input files (L1, L2, ..) before smaller ones (█, ██). Instead, this can
result in sub-optimal thread usage and thus longer runtime (simplified
example with 2 threads):

 T1:  █ ██ █ █ █ █ ██ █ █ █ █ █ ██ ╠═════════════L1════════════╣
 T2:  █ █ ██ █ █ ██ █ █ █ ██ █ █ ╠═════L2════╣

                                       ┏━━━━┳━━━━┳━━━━┳━━━━┓
This is caused by assigning work to    ┃ T1 ┃ T2 ┃ T3 ┃ T4 ┃
 per-thread stacks in a round-robin    ┡━━━━╇━━━━╇━━━━╇━━━━┩
              manner, starting here  → │ L1 │ L2 │ L3 │ L4 │ ↵
                                       ├────├────┼────┼────┤
                                       │ s5 │ s6 │ s7 │ s8 │ ↵
                                       ├────┼────┼────┼────┤
                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
                                       ├────┼────┼────┼────┤
                                       │ st │ su │ sv │ sw │ ↵
                                       ├────┼────┼────┼────┘
                                       │ sx │ sy │ sz │
                                       └────┴────┴────┘
   and then processing them bottom-up:   ↥    ↥    ↥    ↥

                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
This patch reverses the input order    ├────┼────┼────┼────┤
so the two reversals cancel each other │ s7 │ s6 │ s5 │ L4 │ ↵
out. Now at least the first N          ├────┼────┼────┼────┘
arguments, N=number-of-threads, are    │ L3 │ L2 │ L1 │
processed before any others (then      └────┴────┴────┘
work-stealing may happen):

 T1:  ╠═════════════L1════════════╣ █ ██ █ █ █ █ █ █ ██
 T2:  ╠═════L2════╣ █ █ ██ █ █ ██ █ █ █ ██ █ █ ██ █ █ █

(With some more shuffling T1 could always be assigned L1 etc., but
that would mostly be for optics).

Closes #2849
2025-09-19 21:08:19 -04:00
Christoph Badura
380809f1e2 ignore/types: add Makefile.*
The *BSD build systems make use of "Makefile.inc" a lot. Make the
"make" type recognize this file by default. And more generally,
`Makefile.*` seems to be a convention, so just generalize it.

Closes #2846
2025-09-19 21:08:19 -04:00
Matt Kulukundis
94ea38da30 ignore: support .jj as well as .git
This makes it so the presence of `.jj` will cause ripgrep to treat it
as a VCS directory, just as if `.git` were present. This is useful for
ripgrep's default behavior when working with jj repositories that don't
have a `.git` but do have `.gitignore`. Namely, ripgrep requires the
presence of a VCS repository in order to respect `.gitignore`.

We don't handle clone-specific exclude rules for jj repositories without
`.git` though. It seems it isn't 100% set yet where we can find
those[1].

Closes #2842

[1]: https://github.com/BurntSushi/ripgrep/pull/2842#discussion_r2020076722
2025-09-19 21:08:19 -04:00
Tor Shepherd
da672f87e8 color: add italic to style attributes
Closes #2841
2025-09-19 21:08:19 -04:00
Stephen Albert-Moore
483628469a ignore/gitignore: skip BOM at start of ignore file
This matches Git's behavior.

Fixes #2177, Closes #2782
2025-09-19 21:08:19 -04:00
ChristopherYoung
14f4957b3d ignore: fix filtering searching subdir or .ignore in parent dir
The previous code deleted too many parts of the path when constructing
the absolute path, resulting in a shortened final path. This patch
creates the correct absolute path by only removing the necessary parts.

Fixes #829, Fixes #2731, Fixes #2747, Fixes #2778, Fixes #2836, Fixes #2933, Fixes #3144
Closes #2933
2025-09-19 21:08:19 -04:00
Jan Verbeek
f722268814 complete/fish: Take RIPGREP_CONFIG_PATH into account
The fish completions now also pay attention to the configuration file
to determine whether to suggest negation options and not just to the
current command line.

This doesn't cover all edge cases. For example the config file is
cached, and so changes may not take effect until the next shell
session. But the cases it doesn't cover are hopefully very rare.

Closes #2708
2025-09-19 21:08:19 -04:00
Andrew Gallant
8bd5950296 changelog: add next section 2024-09-08 22:32:09 -04:00
Andrew Gallant
c009652e77 changelog: 14.1.1 2024-09-08 22:13:53 -04:00
Andrew Gallant
3f68a8f3d7 changelog: 14.1.1 2024-09-08 22:03:22 -04:00
Andrew Gallant
e9abbc1a02 cargo: nuke 'simd-accel' from orbit
This feature causes nothing but problems and is frequently broken. The
only optimization it was enabling were SIMD optimizations for
transcoding. In particular, for UTF-16 transcoding. This is performed by
the [`encoding_rs`](https://github.com/hsivonen/encoding_rs) crate,
which specifically uses unstable portable SIMD APIs instead of the
stable non-portable SIMD APIs.

SIMD optimizations that apply to search have long been making use of
stable APIs, and are automatically enabled when your target supports
them. This is, IMO, the correct user experience and one that
`encoding_rs` refuses to support. I'm done dealing with it, so
transcoding will only use scalar code until the SIMD optimizations in
`encoding_rs` work on stable. (This doesn't mean that `encoding_rs` has
to change. This could also be fixed by stabilizing `std::simd`.)

Fixes #2748
2024-03-07 09:47:43 -05:00
Alex Touchet
648a65f197 doc: add missing date in changelog
PR #2704
2024-01-06 17:49:18 -05:00
Andrew Gallant
bdf01f46a6 changelog: start next section 2024-01-06 14:41:45 -05:00
Andrew Gallant
1fa76d2a42 changelog: add 14.1.0 blurb 2024-01-06 14:31:16 -05:00
Andrew Gallant
f02a50a69d changelog: various updates 2024-01-06 13:59:52 -05:00
fe9lix
b9c774937f ignore: fix reference cycle for compiled matchers
It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue #2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in #2692.

Fixes #2690
2024-01-06 12:50:42 -05:00
Jan Verbeek
e0a85678e1 complete/fish: improve shell completions for fish
- Stop using `-n __fish_use_subcommand`. This had the effect of
ignoring options if a positional argument has already been given, but
that's not how ripgrep works.

- Only suggest negation options if the option they're negating is
passed (e.g., only complete `--no-pcre2` if `--pcre2` is present). The
zsh completions already do this.

- Take into account whether an option takes an argument. If an option
is not a switch then it won't suggest further options until the
argument is given, e.g. `-C<tab>` won't suggest options but `-i<tab>`
will.

- Suggest correct arguments for options. We already completed a fixed
set of choices where available, but now we go further:

  - Filenames are only suggested for options that take filenames.

  - `--pre` and `--hostname-bin` suggest binaries from `$PATH`.

  - `-t`/`--type`/&c use `--type-list` for suggestions, like in zsh,
  with a preview of the glob patterns.

  - `--encoding` uses a hardcoded list extracted from the zsh
  completions. This has been refactored into a separate file, and the
  range globs (`{1..5}`) replaced by comma globs (`{1,2,3,4,5}`) since
  those work in both shells. I verified that this produces the same
  list as before in zsh, and the same list in fish (albeit in a
  different order).

PR #2684
2024-01-06 10:39:35 -05:00