Commit Graph

22 Commits

Author SHA1 Message Date
Thomas Otto
75970fd16b ignore: don't process command line arguments in reverse order
When searching in parallel with many more arguments than threads, the
first arguments are searched last -- unlike in the -j1 case.

This is unexpected for users who know about the parallel nature of rg
and think they can give the scheduler a hint by positioning larger
input files (L1, L2, ..) before smaller ones (█, ██). Instead, this can
result in sub-optimal thread usage and thus longer runtime (simplified
example with 2 threads):

 T1:  █ ██ █ █ █ █ ██ █ █ █ █ █ ██ ╠═════════════L1════════════╣
 T2:  █ █ ██ █ █ ██ █ █ █ ██ █ █ ╠═════L2════╣

                                       ┏━━━━┳━━━━┳━━━━┳━━━━┓
This is caused by assigning work to    ┃ T1 ┃ T2 ┃ T3 ┃ T4 ┃
 per-thread stacks in a round-robin    ┡━━━━╇━━━━╇━━━━╇━━━━┩
              manner, starting here  → │ L1 │ L2 │ L3 │ L4 │ ↵
                                       ├────├────┼────┼────┤
                                       │ s5 │ s6 │ s7 │ s8 │ ↵
                                       ├────┼────┼────┼────┤
                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
                                       ├────┼────┼────┼────┤
                                       │ st │ su │ sv │ sw │ ↵
                                       ├────┼────┼────┼────┘
                                       │ sx │ sy │ sz │
                                       └────┴────┴────┘
   and then processing them bottom-up:   ↥    ↥    ↥    ↥

                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
This patch reverses the input order    ├────┼────┼────┼────┤
so the two reversals cancel each other │ s7 │ s6 │ s5 │ L4 │ ↵
out. Now at least the first N          ├────┼────┼────┼────┘
arguments, N=number-of-threads, are    │ L3 │ L2 │ L1 │
processed before any others (then      └────┴────┴────┘
work-stealing may happen):

 T1:  ╠═════════════L1════════════╣ █ ██ █ █ █ █ █ █ ██
 T2:  ╠═════L2════╣ █ █ ██ █ █ ██ █ █ █ ██ █ █ ██ █ █ █

(With some more shuffling T1 could always be assigned L1 etc., but
that would mostly be for optics).

Closes #2849
2025-09-19 21:08:19 -04:00
cgzones
3ad0e83471 ignore/walk: correct build_parallel() documentation
The returned closure should return `WalkState`, not `()`.

Closes #2767
2024-03-27 14:50:05 -04:00
Andrew Gallant
59212d08d3 style: fix new lints
The Rust compiler seems to have gotten smarter at finding unused or
redundant imports.
2024-03-07 09:37:48 -05:00
Tavian Barnes
6d7550d58e ignore: Avoid contention on num_pending
Previously, every worker would increment the shared num_pending count on
every new work item, and decrement it after finishing them, leading to
lots of contention.  Now, we only track the number of workers actively
running, so there is no contention except when workers go to sleep or
wake up.

Closes #2642
2023-11-21 18:39:32 -05:00
Tavian Barnes
53679e4c43 ignore: simplify the work-stealing strategy
There's no particular reason for this change. I happened to be looking
at the code again and realized that stealing from your left neighbour
or your right neighbour shouldn't make a difference (and indeed perf is
the same in my benchmarks).

Closes #2624
2023-11-20 23:51:53 -05:00
Andrew Gallant
f16ea0812d ignore: polish
Like previous commits, we do a bit of polishing and bring the style up
to my current practice.
2023-10-09 20:29:52 -04:00
Andrew Gallant
3ad7a0d95e crates: remove hard-coded links
And use rustdoc's native intra-crate links. So much nicer.
2023-10-09 20:29:52 -04:00
Tavian Barnes
d938e955af ignore: use work-stealing stack instead of Arc<Mutex<Vec<_>>>
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.

But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.

This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.

Closes #1823, Closes #2591
Ref https://github.com/sharkdp/fd/issues/28
2023-09-20 11:52:42 -04:00
Christian Vallentin
6cd9479634 ignore: implement FusedIterator for Walk
PR #2567
2023-08-28 22:55:19 -04:00
Michal Terepeta
cb7501ff11 doc: clarify the comment on Worker.work_done
We call `work_done` only once the work has been actually performed
(otherwise `num_pending` could go to 0 before the actual work is done).

Closes #2039
2023-07-08 18:52:42 -04:00
Andrew Gallant
e95254a86f deps: remove ignore's dependency on crossbeam-utils
Scoped threads are now part of std.
2023-01-05 08:51:08 -05:00
Dave Rolsky
515f120b5c doc: fix typo
PR #2313
2022-09-24 13:23:59 -04:00
Andrew Gallant
e824531e38 edition: manual changes
This is mostly just about removing 'extern crate' everywhere and fixing
the fallout.
2021-06-01 21:07:37 -04:00
Andrew Gallant
af54069c51 edition: run 'cargo fix --edition --edition-idioms --all' 2021-06-01 21:07:37 -04:00
Andrew Gallant
459a9c5637 edition: initial 'cargo fix --edition' run 2021-06-01 21:07:37 -04:00
Richard Khoury
a28e664abd ignore: check ignore rules before issuing stat calls
This seems like an obvious optimization but becomes critical when
filesystem operations even as simple as stat can result in significant
overheads; an example of this was a bespoke filesystem layer in Windows
that hosted files remotely and would download them on-demand when
particular filesystem operations occurred. Users of this system who
ensured correct file-type fileters were being used could still get
unnecessary file access resulting in large downloads.

Fixes #1657, Closes #1660
2021-05-31 21:51:18 -04:00
Casey Rodarmor
793c1179cc ignore: allow filtering with predicate
Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to
all entries. If the predicate returns `false` on a given entry, that
entry and all children will be skipped.

Fixes #1555, Closes #1557
2020-05-08 23:24:40 -04:00
Andrew Gallant
139f186e57 crates/ignore: switch to depth first traversal
This replaces the use of channels in the parallel directory traversal
with a simple stack. The primary motivation for this change is to reduce
peak memory usage. In particular, when using a channel (which is a
queue), we wind up visiting files in a breadth first fashion. Using a
stack switches us to a depth first traversal. While there are no real
intrinsic differences, depth first traversal generally tends to use less
memory because directory trees are more commonly wide than they are
deep.

In particular, the queue/stack size itself is not the only concern. In
one recent case documented in #1550, a user wanted to search all Rust
crates. The directory structure was shallow but extremely wide, with a
single directory containing all crates. This in turn results is in
descending into each of those directories and building a gitignore
matcher for each (since most crates have `.gitignore` files) before ever
searching a single file. This means that ripgrep has all such matchers
in memory simultaneously, which winds up using quite a bit of memory.

In a depth first traversal, peak memory usage is much lower because
gitignore matches are built and discarded more quickly. In the case of
searching all crates, the peak memory usage decrease is dramatic. On my
system, it shrinks by an order magnitude, from almost 1GB to 50MB. The
decline in peak memory usage is consistent across other use cases as
well, but is typically more modest. For example, searching the Linux
repo has a 50% decrease in peak memory usage and searching the Chromium
repo has a 25% decrease in peak memory usage.

Search times generally remain unchanged, although some ad hoc benchmarks
that I typically run have gotten a bit slower. As far as I can tell,
this appears to be result of scheduling changes. Namely, the depth first
traversal seems to result in searching some very large files towards the
end of the search, which reduces the effectiveness of parallelism and
makes the overall search take longer. This seems to suggest that a stack
isn't optimal. It would instead perhaps be better to prioritize
searching larger files first, but it's not quite clear how to do this
without introducing more overhead (getting the file size for each file
requires a stat call).

Fixes #1550
2020-04-18 11:33:03 -04:00
Andrew Gallant
4176050cdd ignore: another simplification
Again, thanks to @zsugabubus!
2020-02-20 17:26:34 -05:00
Andrew Gallant
109460fce2 ignore: simplify parallel worker initialization
We can just ask the channel whether any work has been loaded. Normally
querying a channel for its length is a strong predictor of bugs, but in
this case, we do it before we ever attempt a `recv`, so it should work.

Kudos to @zsugabubus for suggesting this!
2020-02-20 16:50:41 -05:00
Andrew Gallant
f314b0d55f ignore: fix parallel traversal
It turns out that the previous version wasn't quite correct. Namely, it
was possible for the following sequence to occur:

1. Consider that all workers, except for one, are `waiting`.
2. The last remaining worker finds one more job to do and sends it on
   the channel.
3. One of the previously `waiting` workers wakes up from the job that
   the last running worker sent, but `self.resume()` has not been
   called yet.
4. The last worker, from (2), calls `get_work` and sees that the
   channel has nothing on it, so it executes `self.waiting() ==
   1`. Since the worker in (3) hasn't called `self.resume()` yet,
   `self.waiting() == 1` evaluates to true.
5. This sets off a chain reaction that stops all workers, despite that
   fact that (3) got more work (which could itself spawn more work).

The end result is that the traversal may terminate while their are still
outstanding work items to process. This problem was observed through
spurious failures in CI. I was not actually able to reproduce the bug
locally.

We fix this by changing our strategy to detect termination using a
counter. Namely, we increment the counter just before sending new work
and decrement the counter just after finishing work. In this way, we
guarantee that the counter only ever reaches 0 once there is no more
work to process.

See #1337 for more discussion. Many thanks to @zsugabubus for helping me
work through this.
2020-02-20 16:07:51 -05:00
Andrew Gallant
fdd8510fdd repo: move all source code in crates directory
The top-level listing was just getting a bit too long for my taste. So
put all of the code in one directory and shrink the large top-level mess
to a small top-level mess.

NOTE: This commit only contains renames. The subsequent commit will
actually make ripgrep build again. We do it this way with the naive hope
that this will make it easier for git history to track the renames.
Sigh.
2020-02-17 19:24:53 -05:00