ripgrep

Author	SHA1	Message	Date
Andrew Gallant	e36b65a11a	windows: fix OneDrive traversals This commit fixes a bug on Windows where directory traversals were completely broken when attempting to scan OneDrive directories that use the "file on demand" strategy. The specific problem was that Rust's standard library treats OneDrive directories as reparse points instead of directories, which causes methods like `FileType::is_file` and `FileType::is_dir` to always return false, even when retrieved via methods like `metadata` that purport to follow symbolic links. We fix this by peppering our code with checks on the underlying file attributes exposed by Windows. We consider an entry a directory if and only if the directory bit is set on the attributes. We are careful to make sure that the code remains the same on non-Windows platforms. Note that we also bump the dependency on `walkdir`, which contains a similar fix for its traversals. This bug is recorded upstream: https://github.com/rust-lang/rust/issues/46484 Upstream also has a pending PR: https://github.com/rust-lang/rust/pull/47956 Fixes #705	2018-02-01 21:11:02 -05:00
ptzz	3cb4d1337e	ignore: support custom file names This commit adds support for ignore files with custom names. This allows for application specific ignorefile names, e.g. using `.fdignore` for `fd`. See also: https://github.com/BurntSushi/ripgrep/issues/673 See also: https://github.com/sharkdp/fd/issues/156	2018-01-29 16:06:05 -05:00
Balaji Sivaraman	b6177f0459	cleanup: replace try! with ?	2018-01-01 09:22:35 -05:00
Andrew Gallant	5714dbde09	ignore: partially revert symlink loop check optimization This optimization wasn't tested too carefully, and it seems to result in a massive amount of file handles open simultaneously. This is likely a result of the parallel iterator, where many directories are being traversed simultaneously. Fixes #648	2017-10-22 10:31:34 -04:00
Andrew Gallant	1bf9d29259	ignore: be fastidious with file handles This commit fixes the symlink loop checker in the parallel directory traverser to open fewer handles at the expense of keeping handles held open longer. This roughly matches the corresponding change in walkdir: `5bcc5b87ee` Fixes #633	2017-10-21 22:40:10 -04:00
Andrew Gallant	cd575d99f8	ignore: upgrade to walkdir 2 The uninteresting bits of this commit involve mechanical changes for updates to walkdir 2. The more interesting bits of this commit are the breaking changes, although none of them should require any significant change on users of this library. The breaking changes are as follows: * `DirEntry::path_is_symbolic_link` has been renamed to `DirEntry::path_is_symlink`. This matches the conventions in the standard library, and also the corresponding name change in walkdir. * Removed the `From<walkdir::Error> for ignore::Error` impl. This was intended to only be used internally, but was the only thing that made `walkdir` a public dependency of `ignore`. Therefore, we remove it since it seems unnecessary. * Renamed `WalkBuilder::sort_by` to `WalkBuilder::sort_by_file_name`, and changed the type of the comparator from Fn(&OsString, &OsString) -> cmp::Ordering + 'static to Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static The corresponding change in `walkdir` retains the `sort_by` name, but gives the comparator a pair of `&DirEntry` values instead of a pair of `&OsStr` values. Ideally, `ignore` would hand off its own pair of `&ignore::DirEntry` values, but this requires more design work. So for now, we retain previous functionality, but leave room to make a proper `sort_by` method. [breaking-change]	2017-10-21 22:40:09 -04:00
Alex Burka	a5f82e8826	ignore: add grouped toggle for standard filters	2017-09-02 12:28:59 -04:00
Alex Burka	82d101907a	ignore: document git_global enabled by default	2017-08-26 14:49:40 -04:00
Jordan Danford	c8a5a7a3f4	Fix minor grammar issues in docs for `ignore::Walk`	2017-07-06 06:58:14 -04:00
Marc Tiehuis	71585f6d47	Reduce unnecessary stat calls for max_filesize	2017-03-08 10:17:18 -05:00
tiehuis	49fd668712	Add file size exclusion to walker A maximum filesize can be specified as an argument to a `WalkBuilder`. If a file exceeds the specified size it will be ignored as part of the resulting file/directory set. The filesize limit never applies to directories.	2017-03-08 10:17:18 -05:00
Andrew Gallant	461e0c4e33	Don't search stdout redirected file. When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286	2017-01-09 16:12:08 -05:00
Andrew Gallant	b65a8c353b	Add --sort-files flag. When used, parallelism is disabled but the results are sorted by file path. Closes #263	2017-01-06 22:43:59 -05:00
Andrew Gallant	95cea77625	Tweak the parallel directory iterator. This commit fixes two issues. First, the iterator was executing the callback for every child of a directory in a single thread. Therefore, if the walker was run over a single directory, then no parallelism is used. We tweak the iterator slightly so that we don't fall into this trap. The second issue is a bit more subtle. In particular, we don't use the blocking semantics of MsQueue because we don't know when iteration finishes. This means that if there are a bunch of idle workers because there is no work available to them, then they will spin and burn the CPU. One case where this crops up is if you pipe the output of ripgrep into `less` and the total number of files to search is fewer than the number of threads ripgrep uses. We "fix" this with a very stupid heuristic: when the queue yields no work, we sleep the thread for 1ms. This still pegs the CPU, but not nearly as much as before. If one really want to avoid this behavior when using ripgrep, then `-j1` can be used to disable parallelism. Fixes #258	2017-01-06 21:43:49 -05:00
Andrew Gallant	bb70f96743	Fix a non-termination bug. This was a very silly bug. Instead of creating a particular atomic once and cloning it, we created a new value for each worker. Fixes #279	2016-12-12 06:55:49 -05:00
Andrew Gallant	7282706b42	Fix bug reading root symlink. When give an explicit file path on the command line like `foo` where `foo` is a symlink, ripgrep should follow it even if `-L` isn't set. This is consistent with the behavior of `foo/`. Fixes #256	2016-12-05 20:05:57 -05:00
Andrew Gallant	5b73dcc8ab	Rework parallelism in directory iterator. Previously, ignore::WalkParallel would invoke the callback for all explicitly given file paths in a single thread, which effectively meant that `rg pattern foo bar baz ...` didn't actually search foo, bar and baz in parallel. The code was structured that way to avoid spinning up workers if no directory paths were given. The original intention was probably to have a separate pool of threads responsible for searching, but ripgrep ended up just reusing the ignore::WalkParallel workers themselves for searching, and thereby subjected to its sub-par performance in this case. The code has been restructured so that file paths are sent to the workers, which brings back parallelism. Fixes #226	2016-11-09 17:19:40 -05:00
Andrew Gallant	2dce0dc0df	Fix a bug with handling --ignore-file. Namely, passing a directory to --ignore-file caused ripgrep to allocate memory without bound. The issue was that I got a bit overzealous with partial error reporting. Namely, when processing a gitignore file, we should try to use every pattern even if some patterns are invalid globs (e.g., a**b). In the process, I applied the same logic to I/O errors. In this case, it manifest by attempting to read lines from a directory, which appears to yield Results forever, where each Result is an error of the form "you can't read from a directory silly." Since I treated it as a partial error, ripgrep was just spinning and accruing each error in memory, which caused the OOM killer to kick in. Fixes #228	2016-11-09 16:45:23 -05:00
Andrew Gallant	b272be25fa	Add parallel recursive directory iterator. This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.	2016-11-05 21:45:55 -04:00
Andrew Gallant	d79add341b	Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45	2016-10-29 20:48:59 -04:00

20 Commits