This makes it so the caller can more easily refactor from
single-threaded to multi-threaded walking. If they want to support both,
this makes it easier to do so with a single initialization code-path. In
particular, it side-steps the need to put everything into an `Arc`.
This is not a breaking change because it strictly increases the number
of allowed inputs to `WalkParallel::run`.
Closes#1410, Closes#1432
Git looks for this file in GIT_COMMON_DIR, which is usually the same
as GIT_DIR (.git). However, when searching inside a linked worktree,
.git is usually a file that contains the path of the actual git dir,
which in turn contains a file "commondir" which references the directory
where info/exclude may reside, alongside other configuration shared across
all worktrees. This directory is usually the git dir of the main worktree.
Unlike git this does *not* read environment variables GIT_DIR and
GIT_COMMON_DIR, because it is not clear how to interpret them when
searching multiple repositories.
Fixes#1445, Closes#1446
This commit adds a simple `.exists()` check for `.gitignore`,
`.ignore`, and other similar files before actually calling
`File::open(…)` in `GitIgnoreBuilder::add`.
The reason is that a simple existence check via `stat` can be faster
than actually trying to `open` the file, see
https://stackoverflow.com/a/12774387/704831. As we typically expect(?)
the number of directories *without* ignore files to be much larger
than the number of directories *with* ignore files, this leads to an
overall speedup.
The performance gain is not huge for `rg`, but can be quite significant
if more `.gitignore`-like files are added via
`add_custom_ignore_filename`. The speedup is *larger* for folders with
*low* files-per-directory ratios.
Note though that we do not do this check on Windows until a specific
analysis there suggests this is beneficial. Namely, Windows generally
has slower file system operations, so it's not clear whether this
speculative check is actually a benefit or not.
Benchmark results
-----------------
`rg --files` in my home folder (200k results, 6.5 files per directory):
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 396.4 ± 3.2 | 390.9 | 400.0 | 1.05 |
| `./rg-feature --files` | 376.0 ± 3.6 | 369.3 | 383.5 | 1.00 |
`rg --files --hidden` in my home folder (800k results, 5.4
files per directory)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files --hidden` | 1.575 ± 0.012 | 1.560 | 1.597 | 1.06 |
| `./rg-feature --files --hidden` | 1.479 ± 0.011 | 1.464 | 1.496 | 1.00 |
`rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per
directory)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `~/rg-master --files` | 445.2 ± 5.3 | 435.6 | 453.0 | 1.04 |
| `~/rg-feature --files` | 428.9 ± 7.0 | 418.2 | 440.0 | 1.00 |
`rg --files` in the linux-5.3 source tree (65k results, 15.1
files per directory)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 94.5 ± 1.9 | 89.8 | 98.5 | 1.02 |
| `./rg-feature --files` | 92.6 ± 2.7 | 88.4 | 98.7 | 1.00 |
Closes#1381
.org_archive is the default extension for Org archive files, created when
entries from an Org-mode file are archived (see
<https://orgmode.org/org.html#Moving-subtrees>). These files are still in Org
mode format, so it's worth searching them at the same time as non-archive Org
mode files.
PR #1475
We were only using it to create temporary directories for `ignore`
tests, but it pulls in a bunch of dependencies and we don't really need
randomness. So just use our own simple wrapper instead.
Currently the crate assumes that exactly one of `cfg(windows)` or
`cfg(unix)` is true, but this is not actually the case, for instance
when compiling for `wasm32`.
Implement the missing functions so that the crate can compile on other
platforms, even though those functions will always return an error.
PR #1327
This includes:
*.dtd for Document Type Definitions
*.xsl and *.xslt for XSL Transformation descriptions
*.xsd for XML Schema definitions
*.xjb for JAXB bindings
*.rng for Relax NG files
*.sch for Schematron files
PR #1243
This commit fixes a bug where ripgrep only treated files beginning with
a `.` as hidden. On Windows, we continue this tradition, but
additionally check whether a file has the special Windows "hidden"
attribute set. If so, we treat it as a hidden file.
In order to make this work without an additional stat call, we had to
rearrange some of the plumbing from the directory traverser.
Fixes#1154
Previously, `man gitignore` specified that `**` was invalid unless it
was used in one of a few specific circumstances, i.e., `**`, `a/**`,
`**/b` or `a/**/b`. That is, `**` always had to be surrounded by either
a path separator or the beginning/end of the pattern.
It turns out that git itself has treated `**` outside the above contexts
as valid for quite a while, so there was an inconsistency between the
spec `man gitignore` and the implementation, and it wasn't clear which
was actually correct.
@okdana filed a bug against git[1] and got this fixed. The spec was wrong,
which has now been fixed [2] and updated[2].
This commit brings ripgrep in line with git and treats `**` outside of
the above contexts as two consecutive `*` patterns. We deprecate the
`InvalidRecursive` error since it is no longer used.
Fixes#373, Fixes#1098
[1] - https://public-inbox.org/git/C16A9F17-0375-42F9-90A9-A92C9F3D8BBA@dana.is
[2] - 627186d020
[3] - https://git-scm.com/docs/gitignore
This fixes a bug where repeated use of ** didn't behave as it should. In
particular, each use of `**` added a new requirement directory depth
requirement. For example, something like `**/**/b` would match
`foo/bar/b`, but it wouldn't match `foo/b` even though it should. In
particular, `**` semantics demand "infinite" depth, so repeated uses of
`**` should just coalesce as if only one was given.
We do this coalescing in the parser. It's a little tricky because we
treat `**/a`, `a/**` and `a/**/b` as distinct tokens with their own
regex conversions. We also test the crap out of it.
Fixes#1174
When deciding whether to add the `**/` prefix or not, we should choose
not to add it if the pattern is simply a bare `**`. Previously, we were
only not adding it if it was `**/`, which is correct, but we also need
to do it for `**` since `**` can already match anywhere.
There's likely a more principled solution to this, but this works for
now.
Fixes#1173
The --ignore-file-case-insensitive flag causes all
.gitignore/.rgignore/.ignore files to have their globs matched without
regard for case. Because this introduces a potentially significant
performance regression, this is always disabled by default. Users that
need case insensitive matching can enable it on a case by case basis.
Closes#1164, Closes#1170