This optimization wasn't tested too carefully, and it seems to result
in a massive amount of file handles open simultaneously. This is likely
a result of the parallel iterator, where many directories are being
traversed simultaneously.
Fixes#648
This commit fixes the symlink loop checker in the parallel directory
traverser to open fewer handles at the expense of keeping handles held
open longer.
This roughly matches the corresponding change in walkdir:
5bcc5b87eeFixes#633
This adds a new walk type in the `ignore` crate, `WalkParallel`, which
provides a way for recursively iterating over a set of paths in parallel
while respecting various ignore rules.
The API is a bit strange, as a closure producing a closure isn't
something one often sees, but it does seem to work well.
This also allowed us to simplify much of the worker logic in ripgrep
proper, where MultiWorker is now gone.
The name has_ignores is not descriptive in my opinion. I think
has_any_ignore_options more clearly states this method's purpose. I also
considered simply IgnoreOptions::any though I went with the more verbose
option.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.
This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.
While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:
1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
directory iteration. (Indeed, writing the parallel iterator is the
next step.)
Fixes#9, #44, #45