Commit Graph

10740 Commits

Author SHA1 Message Date
Yann Collet
b880f20d52
Merge pull request #4171 from facebook/lvl3_ratio+
Improve compression ratio of levels 3 & 4
2024-10-17 11:39:41 -07:00
Yann Collet
41d870fbbf updated regression tests results 2024-10-17 11:06:26 -07:00
Yann Collet
ff8e98bebe enable regression tests at pull request time
was transferred from circleci,
but was only triggered on push into dev,
i.e. after pull request is merged.
2024-10-17 09:45:16 -07:00
Yann Collet
47d4f5662d rewrite code in the manner suggested by @terrelln 2024-10-17 09:37:23 -07:00
Yann Collet
61d08b0e42 fix test
a margin of 4 is insufficient to guarantee compression success.
2024-10-17 09:37:23 -07:00
Yann Collet
6326775166 slightly improved compression ratio at levels 3 & 4
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
2024-10-17 09:37:23 -07:00
Yann Collet
18a42190c2
Merge pull request #4170 from facebook/dict_cSpeed
Improve dictionary compression speed
2024-10-16 17:36:49 -07:00
Yann Collet
730d2dce41 fix test 2024-10-15 18:44:40 -07:00
Yann Collet
c2abfc5ba4 minor improvement to level 3 dictionary compression ratio 2024-10-15 17:58:33 -07:00
Yann Collet
e63896eb58 small dictionary compression speed improvement
not as good as small-blocks improvement,
but generally positive.
2024-10-15 17:48:35 -07:00
Yann Collet
def3ee9548
Merge pull request #4167 from facebook/ci_m32test_faster
attempt to make 32-bit tests faster
2024-10-12 01:57:55 -07:00
Yann Collet
e6740355e3 attempt parallel test running with -j 2024-10-11 18:01:28 -07:00
Yann Collet
6f2e29a234 measure if -O2 makes the test complete faster 2024-10-11 17:30:55 -07:00
Yann Collet
1024aa9252 attempt to make 32-bit tests faster
this is the longest CI test, reaching ~40mn on last PR
2024-10-11 16:24:25 -07:00
Yann Collet
8c38bda935
Merge pull request #4165 from facebook/cspeed_cmov
Improve compression speed on small blocks
2024-10-11 16:20:19 -07:00
Yann Collet
8e5823b65c rename variable name
findMatch -> matchFound
since it's a test, as opposed to an active search operation.
suggested by @terrelln
2024-10-11 15:38:12 -07:00
Yann Collet
83de00316c fixed parameter ordering in dfast
noticed by @terrelln
2024-10-11 15:36:15 -07:00
Yann Collet
7ba43091b8
Merge pull request #4164 from facebook/spec_043
spec update: huffman prefix code paragraph
2024-10-10 16:56:02 -07:00
Yann Collet
fa1fcb08ab minor: better variable naming 2024-10-10 16:07:20 -07:00
Yann Collet
3e7c66acd1 added ascending order example 2024-10-09 01:06:24 -07:00
Yann Collet
d45aee43f4 make __asm__ a __GNUC__ specific 2024-10-08 16:38:35 -07:00
Yann Collet
741b860fc1 store dummy bytes within ZSTD_match4Found_cmov()
feels more logical, better contained
2024-10-08 16:34:40 -07:00
Yann Collet
197c258a79 introduce memory barrier to force test order
suggested by @terrelln
2024-10-08 15:54:48 -07:00
Yann Collet
186b132495 made search strategy switchable
between cmov and branch
and use a simple heuristic based on wlog to select between them.

note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2 refactor search into an inline function
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
3b343dcfb1 refactor huffman prefix code paragraph 2024-10-07 17:15:07 -07:00
Yann Collet
1e7fa242f4 minor refactor zstd_fast
make hot variables more local
2024-10-07 11:22:40 -07:00
Yann Collet
da23998e9a
Merge pull request #4160 from facebook/fix_nightly
fix dependency for nightly github actions tests
2024-10-03 21:02:39 -07:00
Yann Collet
b84653fc83 fix dependency for nightly github actions tests 2024-10-03 15:10:16 -07:00
Yann Collet
b7e1eef048
Merge pull request #4159 from facebook/spec_refactor_fse
specification update
2024-10-03 14:54:16 -07:00
Yann Collet
a8b86d024a refactor documentation of the FSE decoding table build process 2024-10-02 23:09:06 -07:00
Yann Collet
75b0f5f4f5
Merge pull request #4153 from artem/fix-meson-includes
meson: Do not export private headers in libzstd_dep to avoid name clash
2024-10-02 16:51:44 -07:00
Yann Collet
dda3cdfdec
Merge pull request #4156 from facebook/rm_circleci
removing nightly tests built on circleci
2024-10-02 16:51:15 -07:00
Yann Collet
751bf1ffd8
Merge pull request #4157 from facebook/fix_result_c
fix incorrect pointer manipulation
2024-10-02 16:50:45 -07:00
Yann Collet
dcc8fd0472
Merge pull request #4158 from facebook/benchzstd_fclose
fix missing fclose()
2024-10-02 16:49:43 -07:00
Yann Collet
8edd147686 fix missing fclose()
fix #4151
2024-10-01 09:52:45 -07:00
Yann Collet
de6cc98e07 fix incorrect pointer manipulation
fix #4155
2024-10-01 09:25:26 -07:00
Yann Collet
3d5d3f5630 removing nightly tests built on circleci 2024-09-30 21:38:29 -07:00
Yann Collet
27bf1362fe
Merge pull request #4154 from dearblue/freebsd-14.1
Update FreeBSD VM image to 14.1
2024-09-30 11:54:32 -07:00
Artem Labazov
ccc02a9a77 meson: Fix contrib and tests build 2024-09-30 18:05:57 +03:00
Artem Labazov
d2d49a1161 meson: Do not export private headers in libzstd_dep to avoid name clash
This way libzstd_dep does not override, for instance, <xxhash.h>
2024-09-30 17:03:42 +03:00
dearblue
a3b5c4521c Update FreeBSD VM image to 14.1
FreeBSD 14.0 will reach the end of life on 2024-09-30.
The updated 14.1 is scheduled to end-of-life on 2025-03-31.

ref. https://www.freebsd.org/releases/14.2R/schedule/
2024-09-30 22:45:17 +09:00
Yann Collet
984d11a4d1
Merge pull request #4146 from facebook/dictBench_Doc
update documentation: specify that Dictionary can be used for benchmark
2024-09-27 13:44:42 -07:00
Yann Collet
d2212c680a
Merge pull request #4013 from elasota/spec-clarify-offset-code-overflow
Specify that decoders may reject non-zero probabilities for larger offset codes than implementation supports
2024-09-27 13:42:32 -07:00
Yann Collet
039f404faa update documentation to specify that Dictionary can be used for benchmark
fix #4139
2024-09-25 16:56:01 -07:00
inventor500
9215de52c7 Included suggestion from @neheb 2024-09-25 09:51:05 -07:00
inventor500
a8b544d460 Fixed warning when compiling pzstd with CPPFLAGS=-Wunused-result and CXXFLAGS=-std=c++17 2024-09-25 09:51:05 -07:00
Yann Collet
bc96d4b077
Merge pull request #4119 from xionghul/dev
Fix zstd-pgo run error
2024-09-24 17:55:43 -07:00
Yann Collet
d27a4cd4ac
Merge pull request #4143 from facebook/fix_dictsizemin_dic
fix doc nit: ZDICT_DICTSIZE_MIN
2024-09-24 17:55:25 -07:00
Ilya Tokar
e8fce38954 Optimize compression by avoiding unpredictable branches
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of

if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch

Do

addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted

Using microbenchmarks from https://github.com/google/fleetbench,
I get about ~10% speed-up:

name                                                                                          old cpu/op   new cpu/op    delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15                                     1.46ns ± 3%   1.31ns ± 7%   -9.88%  (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16                                     1.41ns ± 3%   1.28ns ± 3%   -9.56%  (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15                                     1.61ns ± 1%   1.43ns ± 3%  -10.70%  (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16                                     1.54ns ± 2%   1.39ns ± 3%   -9.21%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15                                     1.82ns ± 2%   1.61ns ± 3%  -11.31%  (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16                                     1.73ns ± 3%   1.56ns ± 3%   -9.50%  (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15                                     2.12ns ± 2%   1.79ns ± 3%  -15.55%  (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16                                     1.99ns ± 3%   1.72ns ± 3%  -13.70%  (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15                                      3.22ns ± 3%   2.94ns ± 3%   -8.67%  (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16                                      3.19ns ± 4%   2.86ns ± 4%  -10.55%  (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15                                      2.60ns ± 3%   2.22ns ± 3%  -14.53%  (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16                                      2.46ns ± 3%   2.13ns ± 2%  -13.67%  (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15                                      2.69ns ± 3%   2.46ns ± 3%   -8.63%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16                                      2.63ns ± 3%   2.36ns ± 3%  -10.47%  (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15                                      3.20ns ± 2%   2.95ns ± 3%   -7.94%  (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16                                      3.20ns ± 4%   2.87ns ± 4%  -10.33%  (p=0.000 n=40+40)

I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00