deps: drop bytecount in favor of memchr_iter(..).count()
As of the memchr 2.6 release, its Iterator::count method is specialized
to only count the number of occurrences instead of finding the offset of
each occurrence. This replaces ripgrep's use of the bytecount crate.
While micro-benchmarks suggest that memchr's method has better
throughput than bytecount, it turned out to be an illusion. Namely, on a
~13GB haystack prior to this change:
$ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.473
user 1.186
sys 0.286
maxmem 12512 MB
faults 0
And then after:
$ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.532
user 1.280
sys 0.250
maxmem 12512 MB
faults 0
But perf is just about in the same ballpark. That's good enough for me
at the moment in order to drop the extra dependency.
I did this because the marginal cost of adding the Iterator::count()
specialization to memchr was extremely small.
This commit is contained in:
8
Cargo.lock
generated
8
Cargo.lock
generated
@@ -40,12 +40,6 @@ dependencies = [
|
||||
"serde",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "bytecount"
|
||||
version = "0.6.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c"
|
||||
|
||||
[[package]]
|
||||
name = "cc"
|
||||
version = "1.0.83"
|
||||
@@ -215,12 +209,12 @@ name = "grep-searcher"
|
||||
version = "0.1.11"
|
||||
dependencies = [
|
||||
"bstr",
|
||||
"bytecount",
|
||||
"encoding_rs",
|
||||
"encoding_rs_io",
|
||||
"grep-matcher",
|
||||
"grep-regex",
|
||||
"log",
|
||||
"memchr",
|
||||
"memmap2",
|
||||
"regex",
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user