[00587dc] | 1 | =============================================================== |
---|
| 2 | Blosc: A blocking, shuffling and lossless compression library |
---|
| 3 | =============================================================== |
---|
| 4 | |
---|
| 5 | :Author: Francesc Alted |
---|
[981e22c] | 6 | :Contact: [email protected] |
---|
[00587dc] | 7 | :URL: http://www.blosc.org |
---|
[981e22c] | 8 | :Gitter: |gitter| |
---|
| 9 | :Travis CI: |travis| |
---|
| 10 | :Appveyor: |appveyor| |
---|
| 11 | |
---|
| 12 | .. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg |
---|
| 13 | :alt: Join the chat at https://gitter.im/Blosc/c-blosc |
---|
| 14 | :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge |
---|
| 15 | |
---|
| 16 | .. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master |
---|
| 17 | :target: https://travis-ci.org/Blosc/c-blosc |
---|
| 18 | |
---|
| 19 | .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true |
---|
| 20 | :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master |
---|
| 21 | |
---|
[00587dc] | 22 | |
---|
| 23 | What is it? |
---|
| 24 | =========== |
---|
| 25 | |
---|
| 26 | Blosc [1]_ is a high performance compressor optimized for binary data. |
---|
| 27 | It has been designed to transmit data to the processor cache faster |
---|
| 28 | than the traditional, non-compressed, direct memory fetch approach via |
---|
| 29 | a memcpy() OS call. Blosc is the first compressor (that I'm aware of) |
---|
| 30 | that is meant not only to reduce the size of large datasets on-disk or |
---|
| 31 | in-memory, but also to accelerate memory-bound computations. |
---|
| 32 | |
---|
| 33 | It uses the blocking technique (as described in [2]_) to reduce |
---|
[981e22c] | 34 | activity on the memory bus as much as possible. In short, this |
---|
[00587dc] | 35 | technique works by dividing datasets in blocks that are small enough |
---|
| 36 | to fit in caches of modern processors and perform compression / |
---|
| 37 | decompression there. It also leverages, if available, SIMD |
---|
[981e22c] | 38 | instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in |
---|
| 39 | order to accelerate the compression / decompression process to a |
---|
| 40 | maximum. |
---|
| 41 | |
---|
| 42 | Blosc is actually a metacompressor, that meaning that it can use a range |
---|
| 43 | of compression libraries for performing the actual |
---|
| 44 | compression/decompression. Right now, it comes with integrated support |
---|
| 45 | for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc |
---|
| 46 | comes with full sources for all compressors, so in case it does not find |
---|
| 47 | the libraries installed in your system, it will compile from the |
---|
| 48 | included sources and they will be integrated into the Blosc library |
---|
| 49 | anyway. That means that you can trust in having all supported |
---|
| 50 | compressors integrated in Blosc in all supported platforms. |
---|
| 51 | |
---|
| 52 | You can see some benchmarks about Blosc performance in [3]_ |
---|
[00587dc] | 53 | |
---|
| 54 | Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for |
---|
| 55 | details. |
---|
| 56 | |
---|
| 57 | .. [1] http://www.blosc.org |
---|
| 58 | .. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf |
---|
[981e22c] | 59 | .. [3] http://blosc.org/synthetic-benchmarks.html |
---|
[00587dc] | 60 | |
---|
| 61 | Meta-compression and other advantages over existing compressors |
---|
| 62 | =============================================================== |
---|
| 63 | |
---|
[981e22c] | 64 | C-Blosc is not like other compressors: it should rather be called a |
---|
[00587dc] | 65 | meta-compressor. This is so because it can use different compressors |
---|
[981e22c] | 66 | and filters (programs that generally improve compression ratio). At |
---|
| 67 | any rate, it can also be called a compressor because it happens that |
---|
| 68 | it already comes with several compressor and filters, so it can |
---|
| 69 | actually work like so. |
---|
| 70 | |
---|
| 71 | Currently C-Blosc comes with support of BloscLZ, a compressor heavily |
---|
| 72 | based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC |
---|
| 73 | (https://github.com/Cyan4973/lz4), Snappy |
---|
| 74 | (https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as |
---|
| 75 | well as a highly optimized (it can use SSE2 or AVX2 instructions, if |
---|
| 76 | available) shuffle and bitshuffle filters (for info on how and why |
---|
| 77 | shuffling works, see slide 17 of |
---|
| 78 | http://www.slideshare.net/PyData/blosc-py-data-2014). However, |
---|
| 79 | different compressors or filters may be added in the future. |
---|
| 80 | |
---|
| 81 | C-Blosc is in charge of coordinating the different compressor and |
---|
| 82 | filters so that they can leverage the blocking technique (described |
---|
| 83 | above) as well as multi-threaded execution (if several cores are |
---|
| 84 | available) automatically. That makes that every compressor and filter |
---|
[00587dc] | 85 | will work at very high speeds, even if it was not initially designed |
---|
| 86 | for doing blocking or multi-threading. |
---|
| 87 | |
---|
| 88 | Other advantages of Blosc are: |
---|
| 89 | |
---|
| 90 | * Meant for binary data: can take advantage of the type size |
---|
| 91 | meta-information for improved compression ratio (using the |
---|
[981e22c] | 92 | integrated shuffle and bitshuffle filters). |
---|
[00587dc] | 93 | |
---|
[981e22c] | 94 | * Small overhead on non-compressible data: only a maximum of (16 + 4 * |
---|
| 95 | nthreads) additional bytes over the source buffer length are needed |
---|
| 96 | to compress *any kind of input*. |
---|
[00587dc] | 97 | |
---|
[981e22c] | 98 | * Maximum destination length: contrarily to many other compressors, |
---|
| 99 | both compression and decompression routines have support for maximum |
---|
| 100 | size lengths for the destination buffer. |
---|
[00587dc] | 101 | |
---|
| 102 | When taken together, all these features set Blosc apart from other |
---|
| 103 | similar solutions. |
---|
| 104 | |
---|
[981e22c] | 105 | Compiling your application with a minimalistic Blosc |
---|
| 106 | ==================================================== |
---|
[00587dc] | 107 | |
---|
[981e22c] | 108 | The minimal Blosc consists of the next files (in `blosc/ directory |
---|
| 109 | <https://github.com/Blosc/c-blosc/tree/master/blosc>`_):: |
---|
[00587dc] | 110 | |
---|
[981e22c] | 111 | blosc.h and blosc.c -- the main routines |
---|
| 112 | shuffle*.h and shuffle*.c -- the shuffle code |
---|
| 113 | blosclz.h and blosclz.c -- the blosclz compressor |
---|
[00587dc] | 114 | |
---|
| 115 | Just add these files to your project in order to use Blosc. For |
---|
[981e22c] | 116 | information on compression and decompression routines, see `blosc.h |
---|
| 117 | <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. |
---|
[00587dc] | 118 | |
---|
[981e22c] | 119 | To compile using GCC (4.9 or higher recommended) on Unix: |
---|
[00587dc] | 120 | |
---|
| 121 | .. code-block:: console |
---|
| 122 | |
---|
[981e22c] | 123 | $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread |
---|
[00587dc] | 124 | |
---|
| 125 | Using Windows and MINGW: |
---|
| 126 | |
---|
| 127 | .. code-block:: console |
---|
| 128 | |
---|
[981e22c] | 129 | $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c |
---|
[00587dc] | 130 | |
---|
[981e22c] | 131 | Using Windows and MSVC (2013 or higher recommended): |
---|
[00587dc] | 132 | |
---|
| 133 | .. code-block:: console |
---|
| 134 | |
---|
[981e22c] | 135 | $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c |
---|
[00587dc] | 136 | |
---|
[981e22c] | 137 | In the `examples/ directory |
---|
| 138 | <https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find |
---|
| 139 | more hints on how to link your app with Blosc. |
---|
[00587dc] | 140 | |
---|
[981e22c] | 141 | I have not tried to compile this with compilers other than GCC, clang, |
---|
| 142 | MINGW, Intel ICC or MSVC yet. Please report your experiences with your |
---|
| 143 | own platforms. |
---|
| 144 | |
---|
| 145 | Adding support for other compressors with a minimalistic Blosc |
---|
| 146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
[00587dc] | 147 | |
---|
[981e22c] | 148 | The official cmake files (see below) for Blosc try hard to include |
---|
| 149 | support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so |
---|
| 150 | using them is just a matter of calling the appropriate |
---|
| 151 | `blosc_set_compressor() API call |
---|
| 152 | <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. See |
---|
| 153 | an `example here |
---|
| 154 | <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. |
---|
[00587dc] | 155 | |
---|
[981e22c] | 156 | Having said this, it is also easy to use a minimalistic Blosc and just |
---|
| 157 | add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC), |
---|
| 158 | HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the |
---|
| 159 | appropriate libraries. For example, for compiling with minimalistic |
---|
| 160 | Blosc but with added Zlib support do: |
---|
[00587dc] | 161 | |
---|
| 162 | .. code-block:: console |
---|
| 163 | |
---|
[981e22c] | 164 | $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz |
---|
[00587dc] | 165 | |
---|
[981e22c] | 166 | In the `bench/ directory |
---|
| 167 | <https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple |
---|
| 168 | of Makefile files (one for UNIX and the other for MinGW) with more |
---|
| 169 | complete building examples, like switching between libraries or |
---|
| 170 | internal sources for the compressors. |
---|
[00587dc] | 171 | |
---|
[981e22c] | 172 | Supported platforms |
---|
| 173 | ~~~~~~~~~~~~~~~~~~~ |
---|
[00587dc] | 174 | |
---|
[981e22c] | 175 | Blosc is meant to support all platforms where a C89 compliant C |
---|
| 176 | compiler can be found. The ones that are mostly tested are Intel |
---|
| 177 | (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM |
---|
| 178 | Blue Gene Q embedded "A2" processor are reported to work too. |
---|
[00587dc] | 179 | |
---|
| 180 | Compiling the Blosc library with CMake |
---|
| 181 | ====================================== |
---|
| 182 | |
---|
[981e22c] | 183 | Blosc can also be built, tested and installed using CMake_. Although |
---|
| 184 | this procedure might seem a bit more involved than the one described |
---|
| 185 | above, it is the most general because it allows to integrate other |
---|
| 186 | compressors than BloscLZ either from libraries or from internal |
---|
| 187 | sources. Hence, serious library developers are encouraged to use this |
---|
| 188 | way. |
---|
| 189 | |
---|
[00587dc] | 190 | The following procedure describes the "out of source" build. |
---|
| 191 | |
---|
| 192 | Create the build directory and move into it: |
---|
| 193 | |
---|
| 194 | .. code-block:: console |
---|
| 195 | |
---|
| 196 | $ mkdir build |
---|
| 197 | $ cd build |
---|
| 198 | |
---|
[981e22c] | 199 | Now run CMake configuration and optionally specify the installation |
---|
| 200 | directory (e.g. '/usr' or '/usr/local'): |
---|
[00587dc] | 201 | |
---|
| 202 | .. code-block:: console |
---|
| 203 | |
---|
[981e22c] | 204 | $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory .. |
---|
[00587dc] | 205 | |
---|
[981e22c] | 206 | CMake allows to configure Blosc in many different ways, like prefering |
---|
| 207 | internal or external sources for compressors or enabling/disabling |
---|
| 208 | them. Please note that configuration can also be performed using UI |
---|
| 209 | tools provided by CMake_ (ccmake or cmake-gui): |
---|
[00587dc] | 210 | |
---|
| 211 | .. code-block:: console |
---|
| 212 | |
---|
[981e22c] | 213 | $ ccmake .. # run a curses-based interface |
---|
| 214 | $ cmake-gui .. # run a graphical interface |
---|
[00587dc] | 215 | |
---|
| 216 | Build, test and install Blosc: |
---|
| 217 | |
---|
| 218 | .. code-block:: console |
---|
| 219 | |
---|
[981e22c] | 220 | $ cmake --build . |
---|
| 221 | $ ctest |
---|
| 222 | $ cmake --build . --target install |
---|
[00587dc] | 223 | |
---|
| 224 | The static and dynamic version of the Blosc library, together with |
---|
[981e22c] | 225 | header files, will be installed into the specified |
---|
| 226 | CMAKE_INSTALL_PREFIX. |
---|
[00587dc] | 227 | |
---|
| 228 | .. _CMake: http://www.cmake.org |
---|
| 229 | |
---|
[981e22c] | 230 | Once you have compiled your Blosc library, you can easily link your |
---|
| 231 | apps with it as shown in the `example/ directory |
---|
| 232 | <https://github.com/Blosc/c-blosc/blob/master/examples>`_. |
---|
| 233 | |
---|
| 234 | Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake |
---|
| 235 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
| 236 | |
---|
| 237 | The CMake files in Blosc are configured to automatically detect other |
---|
| 238 | compressors like LZ4, LZ4HC, Snappy or Zlib by default. So as long as |
---|
| 239 | the libraries and the header files for these libraries are accessible, |
---|
| 240 | these will be used by default. See an `example here |
---|
| 241 | <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. |
---|
| 242 | |
---|
| 243 | *Note on Zlib*: the library should be easily found on UNIX systems, |
---|
| 244 | although on Windows, you can help CMake to find it by setting the |
---|
| 245 | environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib' |
---|
| 246 | directories are. Also, make sure that Zlib DDL library is in your |
---|
| 247 | '\Windows' directory. |
---|
| 248 | |
---|
| 249 | However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been |
---|
| 250 | included in Blosc too. So, in general, you should not worry about not |
---|
| 251 | having (or CMake not finding) the libraries in your system because in |
---|
| 252 | this case, their sources will be automatically compiled for you. That |
---|
| 253 | effectively means that you can be confident in having a complete |
---|
| 254 | support for all the supported compression libraries in all supported |
---|
| 255 | platforms. |
---|
| 256 | |
---|
| 257 | If you want to force Blosc to use external libraries instead of |
---|
| 258 | the included compression sources: |
---|
| 259 | |
---|
| 260 | .. code-block:: console |
---|
| 261 | |
---|
| 262 | $ cmake -DPREFER_EXTERNAL_LZ4=ON .. |
---|
| 263 | |
---|
| 264 | You can also disable support for some compression libraries: |
---|
| 265 | |
---|
| 266 | .. code-block:: console |
---|
| 267 | |
---|
| 268 | $ cmake -DDEACTIVATE_SNAPPY=ON .. |
---|
| 269 | |
---|
| 270 | Mac OSX troubleshooting |
---|
| 271 | ~~~~~~~~~~~~~~~~~~~~~~~ |
---|
| 272 | |
---|
| 273 | If you run into compilation troubles when using Mac OSX, please make |
---|
| 274 | sure that you have installed the command line developer tools. You |
---|
| 275 | can always install them with: |
---|
| 276 | |
---|
| 277 | .. code-block:: console |
---|
| 278 | |
---|
| 279 | $ xcode-select --install |
---|
| 280 | |
---|
[00587dc] | 281 | Wrapper for Python |
---|
| 282 | ================== |
---|
| 283 | |
---|
| 284 | Blosc has an official wrapper for Python. See: |
---|
| 285 | |
---|
[981e22c] | 286 | https://github.com/Blosc/python-blosc |
---|
| 287 | |
---|
| 288 | Command line interface and serialization format for Blosc |
---|
| 289 | ========================================================= |
---|
| 290 | |
---|
| 291 | Blosc can be used from command line by using Bloscpack. See: |
---|
| 292 | |
---|
| 293 | https://github.com/Blosc/bloscpack |
---|
[00587dc] | 294 | |
---|
| 295 | Filter for HDF5 |
---|
| 296 | =============== |
---|
| 297 | |
---|
[981e22c] | 298 | For those who want to use Blosc as a filter in the HDF5 library, |
---|
| 299 | there is a sample implementation in the blosc/hdf5 project in: |
---|
| 300 | |
---|
| 301 | https://github.com/Blosc/hdf5 |
---|
[00587dc] | 302 | |
---|
| 303 | Mailing list |
---|
| 304 | ============ |
---|
| 305 | |
---|
| 306 | There is an official mailing list for Blosc at: |
---|
| 307 | |
---|
| 308 | [email protected] |
---|
| 309 | http://groups.google.es/group/blosc |
---|
| 310 | |
---|
| 311 | Acknowledgments |
---|
| 312 | =============== |
---|
| 313 | |
---|
[981e22c] | 314 | See THANKS.rst. |
---|
[00587dc] | 315 | |
---|
| 316 | |
---|
| 317 | ---- |
---|
| 318 | |
---|
| 319 | **Enjoy data!** |
---|