Namely, passing a directory to --ignore-file caused ripgrep to allocate
memory without bound.
The issue was that I got a bit overzealous with partial error
reporting. Namely, when processing a gitignore file, we should try
to use every pattern even if some patterns are invalid globs (e.g.,
a**b). In the process, I applied the same logic to I/O errors. In this
case, it manifest by attempting to read lines from a directory, which
appears to yield Results forever, where each Result is an error of the
form "you can't read from a directory silly." Since I treated it as a
partial error, ripgrep was just spinning and accruing each error in
memory, which caused the OOM killer to kick in.
Fixes#228
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.
This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.
While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:
1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
directory iteration. (Indeed, writing the parallel iterator is the
next step.)
Fixes#9, #44, #45