Releasing czlib and zstd Go Bindings
To commemorate the third annual GopherCon US in Denver this week, we’re releasing cgo bindings to two compression libraries that we’ve been using in production at Datadog for a while now: czlib and zstd.
czlib started as a fork of the vitess project’s cgzip package. Our primary data pipeline uses zlib compressed messages, but the standard library’s pure Go implementation can be significantly slower than the C zlib library. In order to address this gap, we modified a few flags in
cgzip to make it encode and decode with zlib wrapping rather than with gzip headers.
We’ve detailed some of the other more novel design decisions in
czlib, including its batch interfaces, in our general blog on performance in Go a couple of years ago. Performance varies quite a bit among the various interfaces, so it pays to benchmark using a message that is typical for your system by running the
czlib benchmark suite with PAYLOAD=path_to_message go test -run=NONE -bench .
Here are modern benchmark results running go1.7beta2 for compression and decompression using the non-streaming interface in czlib, the streaming interface, and the standard library’s compress/zlib that show the variance in performance:
# using a 2kb plaintext message BenchmarkCompress-4 30000 47415 ns/op 44.42 MB/s BenchmarkCompressStream-4 20000 61732 ns/op 34.11 MB/s BenchmarkCompressStdZlib-4 5000 227182 ns/op 9.27 MB/s BenchmarkDecompress-4 200000 8238 ns/op 255.62 MB/s BenchmarkDecompressStream-4 100000 18352 ns/op 114.75 MB/s BenchmarkDecompressStdZlib-4 50000 31565 ns/op 66.72 MB/s # using a 1.7MB plaintext message BenchmarkCompress-4 20 69808144 ns/op 24.70 MB/s BenchmarkCompressStream-4 20 73170819 ns/op 23.56 MB/s BenchmarkCompressStdZlib-4 20 70498763 ns/op 24.46 MB/s BenchmarkDecompress-4 200 6709252 ns/op 256.98 MB/s BenchmarkDecompressStream-4 200 6891833 ns/op 250.18 MB/s BenchmarkDecompressStdZlib-4 100 14256445 ns/op 120.94 MB/s
zstd, pronounced Zstandard, is a relatively new fast compression library from Yann Collet, the author of lz4. It has recently finalized its format, and a 1.0 release is pending. It compresses slightly faster than zlib at level 6 at a slightly better ratio, and decompresses much faster, making it a great general purpose zlib replacement.
The zstd library supports some interfaces that are common in more advanced compression libraries like stream compression, compression levels and pre-computed dictionaries. These are all exposed by our zstd Go binding, with the dictionary builder available in the upstream repos. The binding intentionally mimics the zlib interface, and aside from a few functions that do not return error in zstd, it is functionally a drop-in replacement. It also exposes a fixed-length batch compression interface present in the underlying library, very similar to the lz4 interface.