source: thirdparty/blosc/README.rst @ 981e22c

Revision 981e22c, 11.4 KB checked in by Hal Finkel <hfinkel@…>, 8 years ago (diff)

Upgrade to latest blosc library

blosc git: e394f327ccc78319d90a06af0b88bce07034b8dd

  • Property mode set to 100644
Line 
1===============================================================
2 Blosc: A blocking, shuffling and lossless compression library
3===============================================================
4
5:Author: Francesc Alted
6:Contact: [email protected]
7:URL: http://www.blosc.org
8:Gitter: |gitter|
9:Travis CI: |travis|
10:Appveyor: |appveyor|
11
12.. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg
13        :alt: Join the chat at https://gitter.im/Blosc/c-blosc
14        :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
15
16.. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master
17        :target: https://travis-ci.org/Blosc/c-blosc
18
19.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true
20        :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master
21
22
23What is it?
24===========
25
26Blosc [1]_ is a high performance compressor optimized for binary data.
27It has been designed to transmit data to the processor cache faster
28than the traditional, non-compressed, direct memory fetch approach via
29a memcpy() OS call.  Blosc is the first compressor (that I'm aware of)
30that is meant not only to reduce the size of large datasets on-disk or
31in-memory, but also to accelerate memory-bound computations.
32
33It uses the blocking technique (as described in [2]_) to reduce
34activity on the memory bus as much as possible. In short, this
35technique works by dividing datasets in blocks that are small enough
36to fit in caches of modern processors and perform compression /
37decompression there.  It also leverages, if available, SIMD
38instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in
39order to accelerate the compression / decompression process to a
40maximum.
41
42Blosc is actually a metacompressor, that meaning that it can use a range
43of compression libraries for performing the actual
44compression/decompression. Right now, it comes with integrated support
45for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc
46comes with full sources for all compressors, so in case it does not find
47the libraries installed in your system, it will compile from the
48included sources and they will be integrated into the Blosc library
49anyway. That means that you can trust in having all supported
50compressors integrated in Blosc in all supported platforms.
51
52You can see some benchmarks about Blosc performance in [3]_
53
54Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
55details.
56
57.. [1] http://www.blosc.org
58.. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf
59.. [3] http://blosc.org/synthetic-benchmarks.html
60
61Meta-compression and other advantages over existing compressors
62===============================================================
63
64C-Blosc is not like other compressors: it should rather be called a
65meta-compressor.  This is so because it can use different compressors
66and filters (programs that generally improve compression ratio).  At
67any rate, it can also be called a compressor because it happens that
68it already comes with several compressor and filters, so it can
69actually work like so.
70
71Currently C-Blosc comes with support of BloscLZ, a compressor heavily
72based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC
73(https://github.com/Cyan4973/lz4), Snappy
74(https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as
75well as a highly optimized (it can use SSE2 or AVX2 instructions, if
76available) shuffle and bitshuffle filters (for info on how and why
77shuffling works, see slide 17 of
78http://www.slideshare.net/PyData/blosc-py-data-2014).  However,
79different compressors or filters may be added in the future.
80
81C-Blosc is in charge of coordinating the different compressor and
82filters so that they can leverage the blocking technique (described
83above) as well as multi-threaded execution (if several cores are
84available) automatically. That makes that every compressor and filter
85will work at very high speeds, even if it was not initially designed
86for doing blocking or multi-threading.
87
88Other advantages of Blosc are:
89
90* Meant for binary data: can take advantage of the type size
91  meta-information for improved compression ratio (using the
92  integrated shuffle and bitshuffle filters).
93
94* Small overhead on non-compressible data: only a maximum of (16 + 4 *
95  nthreads) additional bytes over the source buffer length are needed
96  to compress *any kind of input*.
97
98* Maximum destination length: contrarily to many other compressors,
99  both compression and decompression routines have support for maximum
100  size lengths for the destination buffer.
101
102When taken together, all these features set Blosc apart from other
103similar solutions.
104
105Compiling your application with a minimalistic Blosc
106====================================================
107
108The minimal Blosc consists of the next files (in `blosc/ directory
109<https://github.com/Blosc/c-blosc/tree/master/blosc>`_)::
110
111    blosc.h and blosc.c        -- the main routines
112    shuffle*.h and shuffle*.c  -- the shuffle code
113    blosclz.h and blosclz.c    -- the blosclz compressor
114
115Just add these files to your project in order to use Blosc.  For
116information on compression and decompression routines, see `blosc.h
117<https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_.
118
119To compile using GCC (4.9 or higher recommended) on Unix:
120
121.. code-block:: console
122
123   $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread
124
125Using Windows and MINGW:
126
127.. code-block:: console
128
129   $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c
130
131Using Windows and MSVC (2013 or higher recommended):
132
133.. code-block:: console
134
135  $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c
136
137In the `examples/ directory
138<https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find
139more hints on how to link your app with Blosc.
140
141I have not tried to compile this with compilers other than GCC, clang,
142MINGW, Intel ICC or MSVC yet. Please report your experiences with your
143own platforms.
144
145Adding support for other compressors with a minimalistic Blosc
146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147
148The official cmake files (see below) for Blosc try hard to include
149support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so
150using them is just a matter of calling the appropriate
151`blosc_set_compressor() API call
152<https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_.  See
153an `example here
154<https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_.
155
156Having said this, it is also easy to use a minimalistic Blosc and just
157add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC),
158HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the
159appropriate libraries. For example, for compiling with minimalistic
160Blosc but with added Zlib support do:
161
162.. code-block:: console
163
164   $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz
165
166In the `bench/ directory
167<https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple
168of Makefile files (one for UNIX and the other for MinGW) with more
169complete building examples, like switching between libraries or
170internal sources for the compressors.
171
172Supported platforms
173~~~~~~~~~~~~~~~~~~~
174
175Blosc is meant to support all platforms where a C89 compliant C
176compiler can be found.  The ones that are mostly tested are Intel
177(Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM
178Blue Gene Q embedded "A2" processor are reported to work too.
179
180Compiling the Blosc library with CMake
181======================================
182
183Blosc can also be built, tested and installed using CMake_. Although
184this procedure might seem a bit more involved than the one described
185above, it is the most general because it allows to integrate other
186compressors than BloscLZ either from libraries or from internal
187sources. Hence, serious library developers are encouraged to use this
188way.
189
190The following procedure describes the "out of source" build.
191
192Create the build directory and move into it:
193
194.. code-block:: console
195
196  $ mkdir build
197  $ cd build
198
199Now run CMake configuration and optionally specify the installation
200directory (e.g. '/usr' or '/usr/local'):
201
202.. code-block:: console
203
204  $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..
205
206CMake allows to configure Blosc in many different ways, like prefering
207internal or external sources for compressors or enabling/disabling
208them.  Please note that configuration can also be performed using UI
209tools provided by CMake_ (ccmake or cmake-gui):
210
211.. code-block:: console
212
213  $ ccmake ..      # run a curses-based interface
214  $ cmake-gui ..   # run a graphical interface
215
216Build, test and install Blosc:
217
218.. code-block:: console
219
220  $ cmake --build .
221  $ ctest
222  $ cmake --build . --target install
223
224The static and dynamic version of the Blosc library, together with
225header files, will be installed into the specified
226CMAKE_INSTALL_PREFIX.
227
228.. _CMake: http://www.cmake.org
229
230Once you have compiled your Blosc library, you can easily link your
231apps with it as shown in the `example/ directory
232<https://github.com/Blosc/c-blosc/blob/master/examples>`_.
233
234Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake
235~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
236
237The CMake files in Blosc are configured to automatically detect other
238compressors like LZ4, LZ4HC, Snappy or Zlib by default.  So as long as
239the libraries and the header files for these libraries are accessible,
240these will be used by default.  See an `example here
241<https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_.
242
243*Note on Zlib*: the library should be easily found on UNIX systems,
244although on Windows, you can help CMake to find it by setting the
245environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib'
246directories are. Also, make sure that Zlib DDL library is in your
247'\Windows' directory.
248
249However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been
250included in Blosc too. So, in general, you should not worry about not
251having (or CMake not finding) the libraries in your system because in
252this case, their sources will be automatically compiled for you. That
253effectively means that you can be confident in having a complete
254support for all the supported compression libraries in all supported
255platforms.
256
257If you want to force Blosc to use external libraries instead of
258the included compression sources:
259
260.. code-block:: console
261
262  $ cmake -DPREFER_EXTERNAL_LZ4=ON ..
263
264You can also disable support for some compression libraries:
265
266.. code-block:: console
267
268  $ cmake -DDEACTIVATE_SNAPPY=ON ..
269
270Mac OSX troubleshooting
271~~~~~~~~~~~~~~~~~~~~~~~
272
273If you run into compilation troubles when using Mac OSX, please make
274sure that you have installed the command line developer tools.  You
275can always install them with:
276
277.. code-block:: console
278
279  $ xcode-select --install
280
281Wrapper for Python
282==================
283
284Blosc has an official wrapper for Python.  See:
285
286https://github.com/Blosc/python-blosc
287
288Command line interface and serialization format for Blosc
289=========================================================
290
291Blosc can be used from command line by using Bloscpack.  See:
292
293https://github.com/Blosc/bloscpack
294
295Filter for HDF5
296===============
297
298For those who want to use Blosc as a filter in the HDF5 library,
299there is a sample implementation in the blosc/hdf5 project in:
300
301https://github.com/Blosc/hdf5
302
303Mailing list
304============
305
306There is an official mailing list for Blosc at:
307
308[email protected]
309http://groups.google.es/group/blosc
310
311Acknowledgments
312===============
313
314See THANKS.rst.
315
316
317----
318
319  **Enjoy data!**
Note: See TracBrowser for help on using the repository browser.