ripgrep

Author	SHA1	Message	Date
Ian McKellar	85edf4c796	ignore: only stat `.jj` if we actually care I was comparing the work being done by fd and find and noticed (with `strace -f -c -S` calls) that fd was doing a ton of failed `statx` calls. Upon closer inspection it was stating `.jj` even though I was passing `--no-ignore`. Eventually I turned up this check in `Ignore::add_child_path` that was doing stat on `.jj` regardless of whether the options request it. With this patch it'll only stat `.jj` if that's relevant to the query. PR #3212	2025-10-30 13:29:58 -04:00
Andrew Gallant	36b7597693	changelog: start next section	2025-10-22 09:02:40 -04:00
Andrew Gallant	f5be160839	changelog: 15.1.0	2025-10-22 08:21:34 -04:00
Andrew Gallant	d47663b1b4	searcher: fix regression with `--line-buffered` flag In my fix for #3184, I actually had two fixes. One was a tweak to how we read data and the other was a tweak to how we determined how much of the buffer we needed to keep around. It turns out that fixing #3184 only required the latter fix, found in commit `d4b77a8d89`. The former fix also helped the specific case of #3184, but it ended up regressing `--line-buffered`. Specifically, previous to `8c6595c215` (the first fix), we would do one `read` syscall. This call might not fill our caller provided buffer. And in particular, `stdin` seemed to fill fewer bytes than reading from a file. So the "fix" was to put `read` in a loop and keep calling it until the caller provided buffer was full or until the stream was exhausted. This helped alleviate #3184 by amortizing `read` syscalls better. But of course, in retrospect, this change is clearly contrary to how `--line-buffered` works. We specifically do _not_ want to wait around until the buffer is full. We want to read what we can, search it and move on. So this reverts the first fix but leaves the second, which still keeps #3184 fixed and also fixes #3194 (the regression). This reverts commit `8c6595c215`. Fixes #3194	2025-10-19 11:06:39 -04:00
Andrew Gallant	f09b55b8e7	changelog: start next section	2025-10-15 23:32:00 -04:00
Andrew Gallant	3780168c13	changelog: 15.0.0	2025-10-15 22:53:30 -04:00
Andrew Gallant	79d393a302	release: remove riscv64 and powerpc64 artifacts Their CI workflows broke for different reasons. I perceive these as niche platforms that aren't worth blocking a release on. And not worth my time investigating CI problems.	2025-10-15 22:42:51 -04:00
Andrew Gallant	63209ae0b9	printer: fix `--stats` for `--json` Somehow, the JSON printer seems to have never emitted correct summary statistics. And I believe #3178 is the first time anyone has ever reported it. I believe this bug has persisted for years. That's surprising. Anyway, the problem here was that we were bailing out of `finish()` on the sink if we weren't supposed to print anything. But we bailed out before we tallied our summary statistics. Obviously we shouldn't do that. Fixes #3178	2025-10-15 21:21:20 -04:00
Andrew Gallant	b610d1cb15	ignore: fix global gitignore bug that arises with absolute paths The `ignore` crate currently handles two different kinds of "global" gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile` and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to ripgrep's `--ignore-file` flag). In contrast to any other kind of gitignore file, these gitignore files should have their patterns interpreted relative to the current working directory. (Arguably there are other choices we could make here, e.g., based on the paths given. But the `ignore` infrastructure can't handle that, and it's not clearly correct to me.) Normally, a gitignore file has its patterns interpreted relative to where the gitignore file is. This relative interpretation matters for patterns like `/foo`, which are anchored to _some_ directory. Previously, we would generally get the global gitignores correct because it's most common to use ripgrep without providing a path. Thus, it searches the current working directory. In this case, no stripping of the paths is needed in order for the gitignore patterns to be applied directly. But if one provides an absolute path (or something else) to ripgrep to search, the paths aren't stripped correctly. Indeed, in the core, I had just given up and not provided a "root" path to these global gitignores. So it had no hope of getting this correct. We fix this assigning the CWD to the `Gitignore` values created from global gitignore files. This was a painful thing to do because we'd ideally: 1. Call `std::env::current_dir()` at most once for each traversal. 2. Provide a way to avoid the library calling `std::env::current_dir()` at all. (Since this is global process state and folks might want to set it to different values for $reasons.) The `ignore` crate's internals are a total mess. But I think I've addressed the above 2 points in a semver compatible manner. Fixes #3179	2025-10-15 19:44:23 -04:00
Andrew Gallant	8c6595c215	searcher: fix performance bug with `-A/--after-context` when searching `stdin` This was a crazy subtle bug where ripgrep could slow down exponentially as increasingly larger values of `-A/--after-context` were used. But, interestingly, this would only occur when searching `stdin` and _not_ when searching the same data as a regular file. This confounded me because ripgrep, pretty early on, erases the difference between searching a single file and `stdin`. So it wasn't like there were different code paths. And I mistakenly assumed that they would otherwise behave the same as they are just treated as streams. But... it turns out that running `read` on a `stdin` versus a regular file seems to behave differently. At least on my Linux system, with `stdin`, `read` never seems to fill the buffer with more than 64K. But with a regular file, `read` pretty reliably fills the caller's buffer with as much space as declared. Of course, it is expected that `read` doesn't have to fill up the caller's buffer, and ripgrep is generally fine with that. But when `-A/--after-context` is used with a very large value---big enough that the default buffer capacity is too small---then more heap memory needs to be allocated to correctly handle all cases. This can result in passing buffers bigger than 64K to `read`. While we correctly handle `read` calls that don't fill the buffer, it turns out that if we don't fill the buffer, then we get into a pathological case where we aren't processing as many bytes as we could. That is, because of the `-A/--after-context` causing us to keep a lot of bytes around while we roll the buffer and because reading from `stdin` gives us fewer bytes than normal, we weren't amortizing our `read` calls as well as we should have been. Indeed, our buffer capacity increases specifically take this amortization into account, but we weren't taking advantage of it. We fix this by putting `read` into an inner loop that ensures our buffer gets filled up. This fixes the performance bug: ``` $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) \| wc -l real 1.330 user 0.767 sys 0.559 maxmem 29 MB faults 0 10000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A9999) \| wc -l real 2.355 user 0.860 sys 0.613 maxmem 29 MB faults 0 10000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) \| wc -l real 3.636 user 3.091 sys 0.537 maxmem 29 MB faults 0 100000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A99999) \| wc -l real 4.918 user 3.236 sys 0.710 maxmem 29 MB faults 0 100000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) \| wc -l real 5.430 user 4.666 sys 0.750 maxmem 51 MB faults 0 1000000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A999999) \| wc -l real 6.894 user 4.907 sys 0.850 maxmem 51 MB faults 0 1000000 ``` For comparison, here is GNU grep: ``` $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A9999) \| wc -l real 1.466 user 0.159 sys 0.839 maxmem 29 MB faults 0 10000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A99999) \| wc -l real 1.663 user 0.166 sys 0.941 maxmem 29 MB faults 0 100000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A999999) \| wc -l real 1.631 user 0.204 sys 0.910 maxmem 29 MB faults 0 1000000 ``` GNU grep is still notably faster. We'll fix that in the next commit. Fixes #3184	2025-10-14 14:27:43 -04:00
Andrew Gallant	de2567a4c7	printer: fix panic in replacements in look-around corner case The abstraction boundary fuck up is the gift that keeps on giving. It turns out that the invariant that the match would never exceed the range given is not always true. So we kludge around it. Also, update the CHANGELOG to include the fix for #2111. Fixes #3180	2025-10-12 17:25:19 -04:00
Andrew Gallant	5c42c8c48f	test: add regression test for fixed bug It turns out that #2094 was fixed in my `--max-count` refactor a few commits back. This commit adds a regression test for it. Closes #2094	2025-10-12 12:45:34 -04:00
Andrew Gallant	293ef80eaf	test: add another regression test for gitignore matching bug I believe this was also fixed by #2933. Closes #2770	2025-10-10 22:06:59 -04:00
Andrew Gallant	fa80aab6b0	test: add regression test for fixed gitignore bug I believe this was actually fixed by #2933. Closes #3067	2025-10-10 22:06:59 -04:00
mariano-m13	7c2161d687	release: add binaries for `riscv64gc-unknown-linux-gnu` target Note that we skip lz4/brotli/zstd tests on RISC-V. The CI runs RISC-V tests using cross/QEMU emulation. The decompression tools (lz4, brotli, zstd) are x86_64 binaries on the host that cannot execute in the RISC-V QEMU environment. Skip these three tests at compile-time on RISC-V to avoid test failures. The -z/--search-zip functionality itself works correctly on real RISC-V hardware where native decompression tools are available. PR #3165	2025-10-10 20:50:28 -04:00
Andrew Gallant	096f79ab98	deps: update everything This includes an update to `regex 1.12.1`, which fixes a couple of outstanding bugs in ripgrep. Fixes #2750, Fixes #3135	2025-10-10 20:13:29 -04:00
Andrew Gallant	0407e104f6	ignore: fix problem with searching whitelisted hidden files ... specifically, when the whitelist comes from a _parent_ gitignore file. Our handling of parent gitignores is pretty ham-fisted and has been a source of some unfortunate bugs. The problem is that we need to strip the parent path from the path we're searching in order to correctly apply the globs. But getting this stripping correct seems to be a subtle affair. Fixes #3173	2025-10-08 21:16:59 -04:00
Andrew Gallant	1b07c6616a	cli: document that `-c/--count` can be inconsistent with `-l/--files-with-matches` This is unfortunate, but is a known bug that I don't think can be fixed without either making `-l/--files-with-matches` much slower or changing what "binary filtering" means by default. In this PR, we document this inconsistency since users may find it quite surprising. The actual work-around is to disable binary filtering with the `--binary` flag. We add a test confirming this behavior. Closes #3131	2025-09-22 20:24:53 -04:00
Andrew Gallant	c1fc6a5eb8	release: build aarch64 artifacts for macos on GitHub Actions GitHub now supports this natively, so there's no need for me to do it any more. Fixes #3155	2025-09-22 11:56:33 -04:00
Isaac	64174b8e68	printer: preserve line terminator when using `--crlf` and `--replace` Ref #3097, Closes #3100	2025-09-19 21:08:19 -04:00
Pavel Safronov	a6e0be3c90	searcher: move "max matches" from printer to searcher This is a bit of a brutal change, but I believe is necessary in order to fix a bug in how we handle the "max matches" limit in multi-line mode while simultaneously handling context lines correctly. The main problem here is that "max matches" refers to the shorter of "one match per line" or "a single match." In typical grep, matches can't span multiple lines, so there's never a difference. But in multi-line mode, they can. So match counts necessarily must be handled differently for multi-line mode. The printer was previously responsible for this. But for $reasons, the printer is fundamentally not in charge of how matches are found and reported. See my comments in #3094 for even more context. This is a breaking change for `grep-printer`. Fixes #3076, Closes #3094	2025-09-19 21:08:19 -04:00
Andrew Gallant	74959a14cb	man: escape all hyphens in flag names Apparently, if we don't do this, some roff renderers with use a special Unicode hyphen. That in turn makes searching a man page not work as one would expect. Fixes #3140	2025-09-19 21:08:19 -04:00
dana	78383de9b2	complete/zsh: improve --hyperlink-format completion Also don't re-define helper functions if they exist. Closes #3102	2025-09-19 21:08:19 -04:00
Ilya Grigoriev	519c1bd5cf	complete: improvements for the `--hyperlink-format` flag The goal is to make the completion for `rg --hyperlink-format v<TAB>` work in the fish shell. These are not exhaustive (the user can also specify custom formats). This is somewhat unfortunate, but is probably better than not doing anything at all. The `grep+` value necessitated a change to a test. Closes #3096	2025-09-19 21:08:19 -04:00
emrebengue	99fe884536	colors: add `highlight` type support for matching lines This lets users highlight non-matching text in matching lines. Closes #3024, Closes #3107	2025-09-19 21:08:19 -04:00
Andrew Gallant	126bbeab8c	printer: fix handling of `has_match` for summary printer Previously, `Quiet` mode in the summary printer always acted like "print matching paths," except without the printing. This happened even if we wanted to "print non-matching paths." Since this only afflicted quiet mode, this had the effect of flipping the exit status when `--files-without-match --quiet` was used. Fixes #3108, Ref #3118	2025-09-19 21:08:19 -04:00
Andrew Gallant	4df1298127	globset: fix bug where trailing `.` in file name was incorrectly handled I'm not sure why I did this, but I think I was trying to imitate the contract of [`std::path::Path::file_name`]: > Returns None if the path terminates in `..`. But the status quo clearly did not implement this. And as a result, if you have a glob that ends in a `.`, it was instead treated as the empty string (which only matches the empty string). We fix this by implementing the semantic from the standard library correctly. Fixes #2990 [`std::path::Path::file_name`]: https://doc.rust-lang.org/std/path/struct.Path.html#method.file_name	2025-09-19 21:08:19 -04:00
Andrew Gallant	4ab1862dc0	stats: fix case where "bytes searched" could be wrong Specifically, if the search was instructed to quit early, we might not have correctly marked the number of bytes consumed. I don't think this bug occurs when memory maps are used to read the haystack. Closes #2944	2025-09-19 21:08:19 -04:00
Luke Sandberg	5f5da48307	globset: support nested alternates For example, `/{node_modules///{ts,js},crates//.{rs,toml}`. I originally didn't add this I think for implementation simplicity, but it turns out that it really isn't much work to do. There might have also been some odd behavior in the regex engine for dealing with empty alternates, but that has all been long fixed. Closes #3048, Closes #3112	2025-09-19 21:08:19 -04:00
Andrew Gallant	d199058e77	cli: make `rg -vf file` behave sensibly Previously, when `file` is empty (literally empty, as in, zero byte), `rg -f file` and `rg -vf file` would behave identically. This is odd and also doesn't match how GNU grep behaves. It's also not logically correct. An empty file means _zero_ patterns which is an empty set. An empty set matches nothing. Inverting the empty set should result in matching everything. This was because of an errant optimization that lets ripgrep quit early if it can statically detect that no matches are possible. Moreover, there was also a bug in how we constructed the PCRE2 pattern when there are zero patterns. PCRE2 doesn't have a concept of sets of patterns (unlike the `regex` crate), so we need to fake it with an empty character class. Fixes #1332, Fixes #3001, Closes #3041	2025-09-19 21:08:19 -04:00
Josh Cotton	bb0cbae312	ci: add aarch64 Windows This also adds a new release artifact for aarch64 Windows. Closes #2943, Closes #3038	2025-09-19 21:08:19 -04:00
Stephan Badragan	292bc54e64	printer: support `-r/--replace` with `--json` This adds a `replacement` field to each submatch object in the JSON output. In effect, this extends the `-r/--replace` flag so that it works with `--json`. This adds a new field instead of replacing the match text (which is how the standard printer works) for maximum flexibility. This way, consumers of the JSON output can access the original match text (and always rely on it corresponding to the original match text) while also getting the replacement text without needing to do the replacement themselves. Closes #1872, Closes #2883	2025-09-19 21:08:19 -04:00
Lucas Trzesniewski	119407d0a9	printer: use std::path::absolute on Windows This specifically avoids touching the file system, which can lead to fairly dramatic speed-ups in large repositories with lots of matches. Closes #2865	2025-09-19 21:08:19 -04:00
Thomas Otto	75970fd16b	ignore: don't process command line arguments in reverse order When searching in parallel with many more arguments than threads, the first arguments are searched last -- unlike in the -j1 case. This is unexpected for users who know about the parallel nature of rg and think they can give the scheduler a hint by positioning larger input files (L1, L2, ..) before smaller ones (█, ██). Instead, this can result in sub-optimal thread usage and thus longer runtime (simplified example with 2 threads): T1: █ ██ █ █ █ █ ██ █ █ █ █ █ ██ ╠═════════════L1════════════╣ T2: █ █ ██ █ █ ██ █ █ █ ██ █ █ ╠═════L2════╣ ┏━━━━┳━━━━┳━━━━┳━━━━┓ This is caused by assigning work to ┃ T1 ┃ T2 ┃ T3 ┃ T4 ┃ per-thread stacks in a round-robin ┡━━━━╇━━━━╇━━━━╇━━━━┩ manner, starting here → │ L1 │ L2 │ L3 │ L4 │ ↵ ├────├────┼────┼────┤ │ s5 │ s6 │ s7 │ s8 │ ↵ ├────┼────┼────┼────┤ ╷ .. ╷ .. ╷ .. ╷ .. ╷ ├────┼────┼────┼────┤ │ st │ su │ sv │ sw │ ↵ ├────┼────┼────┼────┘ │ sx │ sy │ sz │ └────┴────┴────┘ and then processing them bottom-up: ↥ ↥ ↥ ↥ ╷ .. ╷ .. ╷ .. ╷ .. ╷ This patch reverses the input order ├────┼────┼────┼────┤ so the two reversals cancel each other │ s7 │ s6 │ s5 │ L4 │ ↵ out. Now at least the first N ├────┼────┼────┼────┘ arguments, N=number-of-threads, are │ L3 │ L2 │ L1 │ processed before any others (then └────┴────┴────┘ work-stealing may happen): T1: ╠═════════════L1════════════╣ █ ██ █ █ █ █ █ █ ██ T2: ╠═════L2════╣ █ █ ██ █ █ ██ █ █ █ ██ █ █ ██ █ █ █ (With some more shuffling T1 could always be assigned L1 etc., but that would mostly be for optics). Closes #2849	2025-09-19 21:08:19 -04:00
Christoph Badura	380809f1e2	ignore/types: add Makefile.* The BSD build systems make use of "Makefile.inc" a lot. Make the "make" type recognize this file by default. And more generally, `Makefile.` seems to be a convention, so just generalize it. Closes #2846	2025-09-19 21:08:19 -04:00
Matt Kulukundis	94ea38da30	ignore: support `.jj` as well as `.git` This makes it so the presence of `.jj` will cause ripgrep to treat it as a VCS directory, just as if `.git` were present. This is useful for ripgrep's default behavior when working with jj repositories that don't have a `.git` but do have `.gitignore`. Namely, ripgrep requires the presence of a VCS repository in order to respect `.gitignore`. We don't handle clone-specific exclude rules for jj repositories without `.git` though. It seems it isn't 100% set yet where we can find those[1]. Closes #2842 [1]: https://github.com/BurntSushi/ripgrep/pull/2842#discussion_r2020076722	2025-09-19 21:08:19 -04:00
Tor Shepherd	da672f87e8	color: add italic to style attributes Closes #2841	2025-09-19 21:08:19 -04:00
Stephen Albert-Moore	483628469a	ignore/gitignore: skip BOM at start of ignore file This matches Git's behavior. Fixes #2177, Closes #2782	2025-09-19 21:08:19 -04:00
ChristopherYoung	14f4957b3d	ignore: fix filtering searching subdir or .ignore in parent dir The previous code deleted too many parts of the path when constructing the absolute path, resulting in a shortened final path. This patch creates the correct absolute path by only removing the necessary parts. Fixes #829, Fixes #2731, Fixes #2747, Fixes #2778, Fixes #2836, Fixes #2933, Fixes #3144 Closes #2933	2025-09-19 21:08:19 -04:00
Jan Verbeek	f722268814	complete/fish: Take RIPGREP_CONFIG_PATH into account The fish completions now also pay attention to the configuration file to determine whether to suggest negation options and not just to the current command line. This doesn't cover all edge cases. For example the config file is cached, and so changes may not take effect until the next shell session. But the cases it doesn't cover are hopefully very rare. Closes #2708	2025-09-19 21:08:19 -04:00
Andrew Gallant	8bd5950296	changelog: add next section	2024-09-08 22:32:09 -04:00
Andrew Gallant	c009652e77	changelog: 14.1.1	2024-09-08 22:13:53 -04:00
Andrew Gallant	3f68a8f3d7	changelog: 14.1.1	2024-09-08 22:03:22 -04:00
Andrew Gallant	e9abbc1a02	cargo: nuke 'simd-accel' from orbit This feature causes nothing but problems and is frequently broken. The only optimization it was enabling were SIMD optimizations for transcoding. In particular, for UTF-16 transcoding. This is performed by the [`encoding_rs`](https://github.com/hsivonen/encoding_rs) crate, which specifically uses unstable portable SIMD APIs instead of the stable non-portable SIMD APIs. SIMD optimizations that apply to search have long been making use of stable APIs, and are automatically enabled when your target supports them. This is, IMO, the correct user experience and one that `encoding_rs` refuses to support. I'm done dealing with it, so transcoding will only use scalar code until the SIMD optimizations in `encoding_rs` work on stable. (This doesn't mean that `encoding_rs` has to change. This could also be fixed by stabilizing `std::simd`.) Fixes #2748	2024-03-07 09:47:43 -05:00
Alex Touchet	648a65f197	doc: add missing date in changelog PR #2704	2024-01-06 17:49:18 -05:00
Andrew Gallant	bdf01f46a6	changelog: start next section	2024-01-06 14:41:45 -05:00
Andrew Gallant	1fa76d2a42	changelog: add 14.1.0 blurb	2024-01-06 14:31:16 -05:00
Andrew Gallant	f02a50a69d	changelog: various updates	2024-01-06 13:59:52 -05:00
fe9lix	b9c774937f	ignore: fix reference cycle for compiled matchers It looks like there is a reference cycle caused by the compiled matchers (compiled HashMap holds ref to Ignore and Ignore holds ref to HashMap). Using weak refs fixes issue #2690 in my test project. Also confirmed via before and after when profiling the code, see the attached screenshots in #2692. Fixes #2690	2024-01-06 12:50:42 -05:00
Jan Verbeek	e0a85678e1	complete/fish: improve shell completions for fish - Stop using `-n __fish_use_subcommand`. This had the effect of ignoring options if a positional argument has already been given, but that's not how ripgrep works. - Only suggest negation options if the option they're negating is passed (e.g., only complete `--no-pcre2` if `--pcre2` is present). The zsh completions already do this. - Take into account whether an option takes an argument. If an option is not a switch then it won't suggest further options until the argument is given, e.g. `-C<tab>` won't suggest options but `-i<tab>` will. - Suggest correct arguments for options. We already completed a fixed set of choices where available, but now we go further: - Filenames are only suggested for options that take filenames. - `--pre` and `--hostname-bin` suggest binaries from `$PATH`. - `-t`/`--type`/&c use `--type-list` for suggestions, like in zsh, with a preview of the glob patterns. - `--encoding` uses a hardcoded list extracted from the zsh completions. This has been refactored into a separate file, and the range globs (`{1..5}`) replaced by comma globs (`{1,2,3,4,5}`) since those work in both shells. I verified that this produces the same list as before in zsh, and the same list in fish (albeit in a different order). PR #2684	2024-01-06 10:39:35 -05:00

1 2 3 4 5 ...

289 Commits