Changeset 981e22c for thirdparty/blosc/README.rst
- Timestamp:
- 08/26/16 19:35:26 (8 years ago)
- Branches:
- master, pympi
- Children:
- 8ebc79b
- Parents:
- cda87e9
- git-author:
- Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
- git-committer:
- Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
thirdparty/blosc/README.rst
r00587dc r981e22c 4 4 5 5 :Author: Francesc Alted 6 :Contact: f [email protected]6 :Contact: f[email protected] 7 7 :URL: http://www.blosc.org 8 :Gitter: |gitter| 9 :Travis CI: |travis| 10 :Appveyor: |appveyor| 11 12 .. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg 13 :alt: Join the chat at https://gitter.im/Blosc/c-blosc 14 :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge 15 16 .. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master 17 :target: https://travis-ci.org/Blosc/c-blosc 18 19 .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true 20 :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master 21 8 22 9 23 What is it? … … 18 32 19 33 It uses the blocking technique (as described in [2]_) to reduce 20 activity on the memory bus as much as possible. 34 activity on the memory bus as much as possible. In short, this 21 35 technique works by dividing datasets in blocks that are small enough 22 36 to fit in caches of modern processors and perform compression / 23 37 decompression there. It also leverages, if available, SIMD 24 instructions (SSE2) and multi-threading capabilities of CPUs, in order 25 to accelerate the compression / decompression process to a maximum. 26 27 You can see some recent benchmarks about Blosc performance in [3]_ 38 instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in 39 order to accelerate the compression / decompression process to a 40 maximum. 41 42 Blosc is actually a metacompressor, that meaning that it can use a range 43 of compression libraries for performing the actual 44 compression/decompression. Right now, it comes with integrated support 45 for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc 46 comes with full sources for all compressors, so in case it does not find 47 the libraries installed in your system, it will compile from the 48 included sources and they will be integrated into the Blosc library 49 anyway. That means that you can trust in having all supported 50 compressors integrated in Blosc in all supported platforms. 51 52 You can see some benchmarks about Blosc performance in [3]_ 28 53 29 54 Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for … … 32 57 .. [1] http://www.blosc.org 33 58 .. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf 34 .. [3] http://blosc.org/ trac/wiki/SyntheticBenchmarks59 .. [3] http://blosc.org/synthetic-benchmarks.html 35 60 36 61 Meta-compression and other advantages over existing compressors 37 62 =============================================================== 38 63 39 Blosc is not like other compressors: it should rather be called a64 C-Blosc is not like other compressors: it should rather be called a 40 65 meta-compressor. This is so because it can use different compressors 41 and pre-conditioners (programs that generally improve compression 42 ratio). At any rate, it can also be called a compressor because it 43 happens that it already integrates one compressor and one 44 pre-conditioner, so it can actually work like so. 45 46 Currently it uses BloscLZ, a compressor heavily based on FastLZ 47 (http://fastlz.org/), and a highly optimized (it can use SSE2 48 instructions, if available) Shuffle pre-conditioner. However, 49 different compressors or pre-conditioners may be added in the future. 50 51 Blosc is in charge of coordinating the compressor and pre-conditioners 52 so that they can leverage the blocking technique (described above) as 53 well as multi-threaded execution (if several cores are available) 54 automatically. That makes that every compressor and pre-conditioner 66 and filters (programs that generally improve compression ratio). At 67 any rate, it can also be called a compressor because it happens that 68 it already comes with several compressor and filters, so it can 69 actually work like so. 70 71 Currently C-Blosc comes with support of BloscLZ, a compressor heavily 72 based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC 73 (https://github.com/Cyan4973/lz4), Snappy 74 (https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as 75 well as a highly optimized (it can use SSE2 or AVX2 instructions, if 76 available) shuffle and bitshuffle filters (for info on how and why 77 shuffling works, see slide 17 of 78 http://www.slideshare.net/PyData/blosc-py-data-2014). However, 79 different compressors or filters may be added in the future. 80 81 C-Blosc is in charge of coordinating the different compressor and 82 filters so that they can leverage the blocking technique (described 83 above) as well as multi-threaded execution (if several cores are 84 available) automatically. That makes that every compressor and filter 55 85 will work at very high speeds, even if it was not initially designed 56 86 for doing blocking or multi-threading. … … 60 90 * Meant for binary data: can take advantage of the type size 61 91 meta-information for improved compression ratio (using the 62 integrated shuffle pre-conditioner). 63 64 * Small overhead on non-compressible data: only a maximum of 16 65 additional bytes over the source buffer length are needed to 66 compress *every* input. 67 68 * Maximum destination length: contrarily to many other 69 compressors, both compression and decompression routines have 70 support for maximum size lengths for the destination buffer. 71 72 * Replacement for memcpy(): it supports a 0 compression level that 73 does not compress at all and only adds 16 bytes of overhead. In 74 this mode Blosc can copy memory usually faster than a plain 75 memcpy(). 92 integrated shuffle and bitshuffle filters). 93 94 * Small overhead on non-compressible data: only a maximum of (16 + 4 * 95 nthreads) additional bytes over the source buffer length are needed 96 to compress *any kind of input*. 97 98 * Maximum destination length: contrarily to many other compressors, 99 both compression and decompression routines have support for maximum 100 size lengths for the destination buffer. 76 101 77 102 When taken together, all these features set Blosc apart from other 78 103 similar solutions. 79 104 80 Compiling your application with Blosc 81 ===================================== 82 83 Blosc consists of the next files (in blosc/ directory):: 84 85 blosc.h and blosc.c -- the main routines 86 blosclz.h and blosclz.c -- the actual compressor 87 shuffle.h and shuffle.c -- the shuffle code 105 Compiling your application with a minimalistic Blosc 106 ==================================================== 107 108 The minimal Blosc consists of the next files (in `blosc/ directory 109 <https://github.com/Blosc/c-blosc/tree/master/blosc>`_):: 110 111 blosc.h and blosc.c -- the main routines 112 shuffle*.h and shuffle*.c -- the shuffle code 113 blosclz.h and blosclz.c -- the blosclz compressor 88 114 89 115 Just add these files to your project in order to use Blosc. For 90 information on compression and decompression routines, see blosc.h. 91 92 To compile using GCC (4.4 or higher recommended) on Unix: 93 94 .. code-block:: console 95 96 $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -lpthread 116 information on compression and decompression routines, see `blosc.h 117 <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. 118 119 To compile using GCC (4.9 or higher recommended) on Unix: 120 121 .. code-block:: console 122 123 $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread 97 124 98 125 Using Windows and MINGW: … … 100 127 .. code-block:: console 101 128 102 $ gcc -O3 -msse2 -o myprog myprog.c blosc\*.c 103 104 Using Windows and MSVC (2008 or higher recommended): 105 106 .. code-block:: console 107 108 $ cl /Ox /Femyprog.exe myprog.c blosc\*.c 109 110 A simple usage example is the benchmark in the bench/bench.c file. 111 Also, another example for using Blosc as a generic HDF5 filter is in 112 the hdf5/ directory. 113 114 I have not tried to compile this with compilers other than GCC, MINGW, 115 Intel ICC or MSVC yet. Please report your experiences with your own 116 platforms. 117 118 Testing Blosc 119 ============= 120 121 Go to the test/ directory and issue: 122 123 .. code-block:: console 124 125 $ make test 126 127 These tests are very basic, and only valid for platforms where GNU 128 make/gcc tools are available. If you really want to test Blosc the 129 hard way, look at: 130 131 http://blosc.org/trac/wiki/SyntheticBenchmarks 132 133 where instructions on how to intensively test (and benchmark) Blosc 134 are given. If while running these tests you get some error, please 135 report it back! 129 $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c 130 131 Using Windows and MSVC (2013 or higher recommended): 132 133 .. code-block:: console 134 135 $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c 136 137 In the `examples/ directory 138 <https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find 139 more hints on how to link your app with Blosc. 140 141 I have not tried to compile this with compilers other than GCC, clang, 142 MINGW, Intel ICC or MSVC yet. Please report your experiences with your 143 own platforms. 144 145 Adding support for other compressors with a minimalistic Blosc 146 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 148 The official cmake files (see below) for Blosc try hard to include 149 support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so 150 using them is just a matter of calling the appropriate 151 `blosc_set_compressor() API call 152 <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. See 153 an `example here 154 <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 155 156 Having said this, it is also easy to use a minimalistic Blosc and just 157 add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC), 158 HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the 159 appropriate libraries. For example, for compiling with minimalistic 160 Blosc but with added Zlib support do: 161 162 .. code-block:: console 163 164 $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz 165 166 In the `bench/ directory 167 <https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple 168 of Makefile files (one for UNIX and the other for MinGW) with more 169 complete building examples, like switching between libraries or 170 internal sources for the compressors. 171 172 Supported platforms 173 ~~~~~~~~~~~~~~~~~~~ 174 175 Blosc is meant to support all platforms where a C89 compliant C 176 compiler can be found. The ones that are mostly tested are Intel 177 (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM 178 Blue Gene Q embedded "A2" processor are reported to work too. 136 179 137 180 Compiling the Blosc library with CMake 138 181 ====================================== 139 182 140 Blosc can also be built, tested and installed using CMake_. 183 Blosc can also be built, tested and installed using CMake_. Although 184 this procedure might seem a bit more involved than the one described 185 above, it is the most general because it allows to integrate other 186 compressors than BloscLZ either from libraries or from internal 187 sources. Hence, serious library developers are encouraged to use this 188 way. 189 141 190 The following procedure describes the "out of source" build. 142 191 … … 148 197 $ cd build 149 198 150 Configure Blosc in release mode (enable optimizations) specifying the 151 installation directory: 152 153 .. code-block:: console 154 155 $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=INSTALL_DIR \ 156 PATH_TO_BLOSC_SOURCE_DIR 157 158 Please note that configuration can also be performed using UI tools 159 provided by CMake_ (ccmake or cmake-gui): 160 161 .. code-block:: console 162 163 $ cmake-gui PATH_TO_BLOSC_SOURCE_DIR 199 Now run CMake configuration and optionally specify the installation 200 directory (e.g. '/usr' or '/usr/local'): 201 202 .. code-block:: console 203 204 $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory .. 205 206 CMake allows to configure Blosc in many different ways, like prefering 207 internal or external sources for compressors or enabling/disabling 208 them. Please note that configuration can also be performed using UI 209 tools provided by CMake_ (ccmake or cmake-gui): 210 211 .. code-block:: console 212 213 $ ccmake .. # run a curses-based interface 214 $ cmake-gui .. # run a graphical interface 164 215 165 216 Build, test and install Blosc: … … 167 218 .. code-block:: console 168 219 169 $ make170 $ maketest171 $ make install220 $ cmake --build . 221 $ ctest 222 $ cmake --build . --target install 172 223 173 224 The static and dynamic version of the Blosc library, together with 174 header files, will be installed into the specified INSTALL_DIR. 225 header files, will be installed into the specified 226 CMAKE_INSTALL_PREFIX. 175 227 176 228 .. _CMake: http://www.cmake.org 229 230 Once you have compiled your Blosc library, you can easily link your 231 apps with it as shown in the `example/ directory 232 <https://github.com/Blosc/c-blosc/blob/master/examples>`_. 233 234 Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake 235 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 236 237 The CMake files in Blosc are configured to automatically detect other 238 compressors like LZ4, LZ4HC, Snappy or Zlib by default. So as long as 239 the libraries and the header files for these libraries are accessible, 240 these will be used by default. See an `example here 241 <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 242 243 *Note on Zlib*: the library should be easily found on UNIX systems, 244 although on Windows, you can help CMake to find it by setting the 245 environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib' 246 directories are. Also, make sure that Zlib DDL library is in your 247 '\Windows' directory. 248 249 However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been 250 included in Blosc too. So, in general, you should not worry about not 251 having (or CMake not finding) the libraries in your system because in 252 this case, their sources will be automatically compiled for you. That 253 effectively means that you can be confident in having a complete 254 support for all the supported compression libraries in all supported 255 platforms. 256 257 If you want to force Blosc to use external libraries instead of 258 the included compression sources: 259 260 .. code-block:: console 261 262 $ cmake -DPREFER_EXTERNAL_LZ4=ON .. 263 264 You can also disable support for some compression libraries: 265 266 .. code-block:: console 267 268 $ cmake -DDEACTIVATE_SNAPPY=ON .. 269 270 Mac OSX troubleshooting 271 ~~~~~~~~~~~~~~~~~~~~~~~ 272 273 If you run into compilation troubles when using Mac OSX, please make 274 sure that you have installed the command line developer tools. You 275 can always install them with: 276 277 .. code-block:: console 278 279 $ xcode-select --install 177 280 178 281 Wrapper for Python … … 181 284 Blosc has an official wrapper for Python. See: 182 285 183 https://github.com/FrancescAlted/python-blosc 286 https://github.com/Blosc/python-blosc 287 288 Command line interface and serialization format for Blosc 289 ========================================================= 290 291 Blosc can be used from command line by using Bloscpack. See: 292 293 https://github.com/Blosc/bloscpack 184 294 185 295 Filter for HDF5 186 296 =============== 187 297 188 For those that want to use Blosc as a filter in the HDF5 library, 189 there is a sample implementation in the hdf5/ directory. 298 For those who want to use Blosc as a filter in the HDF5 library, 299 there is a sample implementation in the blosc/hdf5 project in: 300 301 https://github.com/Blosc/hdf5 190 302 191 303 Mailing list … … 200 312 =============== 201 313 202 I'd like to thank the PyTables community that have collaborated in the 203 exhaustive testing of Blosc. With an aggregate amount of more than 300 TB of 204 different datasets compressed *and* decompressed successfully, I can say that 205 Blosc is pretty safe now and ready for production purposes. 206 207 Other important contributions: 208 209 * Thibault North contributed a way to call Blosc from different threads in a 210 safe way. 211 212 * The cmake support was a contribution of Thibault North, Antonio Valentino 213 and Mark Wiebe. 214 215 * Valentin Haenel did a terrific work fixing typos and improving docs and the 216 plotting script. 314 See THANKS.rst. 217 315 218 316
Note: See TracChangeset
for help on using the changeset viewer.