source: thirdparty/blosc/README.rst @ 00587dc

Revision 00587dc, 6.9 KB checked in by Hal Finkel <hfinkel@…>, 10 years ago (diff)

Initial Commit (gio-base-20150317)

  • Property mode set to 100644
Line 
1===============================================================
2 Blosc: A blocking, shuffling and lossless compression library
3===============================================================
4
5:Author: Francesc Alted
6:Contact: [email protected]
7:URL: http://www.blosc.org
8
9What is it?
10===========
11
12Blosc [1]_ is a high performance compressor optimized for binary data.
13It has been designed to transmit data to the processor cache faster
14than the traditional, non-compressed, direct memory fetch approach via
15a memcpy() OS call.  Blosc is the first compressor (that I'm aware of)
16that is meant not only to reduce the size of large datasets on-disk or
17in-memory, but also to accelerate memory-bound computations.
18
19It uses the blocking technique (as described in [2]_) to reduce
20activity on the memory bus as much as possible.  In short, this
21technique works by dividing datasets in blocks that are small enough
22to fit in caches of modern processors and perform compression /
23decompression there.  It also leverages, if available, SIMD
24instructions (SSE2) and multi-threading capabilities of CPUs, in order
25to accelerate the compression / decompression process to a maximum.
26
27You can see some recent benchmarks about Blosc performance in [3]_
28
29Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
30details.
31
32.. [1] http://www.blosc.org
33.. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf
34.. [3] http://blosc.org/trac/wiki/SyntheticBenchmarks
35
36Meta-compression and other advantages over existing compressors
37===============================================================
38
39Blosc is not like other compressors: it should rather be called a
40meta-compressor.  This is so because it can use different compressors
41and pre-conditioners (programs that generally improve compression
42ratio).  At any rate, it can also be called a compressor because it
43happens that it already integrates one compressor and one
44pre-conditioner, so it can actually work like so.
45
46Currently it uses BloscLZ, a compressor heavily based on FastLZ
47(http://fastlz.org/), and a highly optimized (it can use SSE2
48instructions, if available) Shuffle pre-conditioner. However,
49different compressors or pre-conditioners may be added in the future.
50
51Blosc is in charge of coordinating the compressor and pre-conditioners
52so that they can leverage the blocking technique (described above) as
53well as multi-threaded execution (if several cores are available)
54automatically. That makes that every compressor and pre-conditioner
55will work at very high speeds, even if it was not initially designed
56for doing blocking or multi-threading.
57
58Other advantages of Blosc are:
59
60* Meant for binary data: can take advantage of the type size
61  meta-information for improved compression ratio (using the
62  integrated shuffle pre-conditioner).
63
64* Small overhead on non-compressible data: only a maximum of 16
65  additional bytes over the source buffer length are needed to
66  compress *every* input.
67
68* Maximum destination length: contrarily to many other
69  compressors, both compression and decompression routines have
70  support for maximum size lengths for the destination buffer.
71
72* Replacement for memcpy(): it supports a 0 compression level that
73  does not compress at all and only adds 16 bytes of overhead. In
74  this mode Blosc can copy memory usually faster than a plain
75  memcpy().
76
77When taken together, all these features set Blosc apart from other
78similar solutions.
79
80Compiling your application with Blosc
81=====================================
82
83Blosc consists of the next files (in blosc/ directory)::
84
85    blosc.h and blosc.c      -- the main routines
86    blosclz.h and blosclz.c  -- the actual compressor
87    shuffle.h and shuffle.c  -- the shuffle code
88
89Just add these files to your project in order to use Blosc.  For
90information on compression and decompression routines, see blosc.h.
91
92To compile using GCC (4.4 or higher recommended) on Unix:
93
94.. code-block:: console
95
96   $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -lpthread
97
98Using Windows and MINGW:
99
100.. code-block:: console
101
102   $ gcc -O3 -msse2 -o myprog myprog.c blosc\*.c
103
104Using Windows and MSVC (2008 or higher recommended):
105
106.. code-block:: console
107
108  $ cl /Ox /Femyprog.exe myprog.c blosc\*.c
109
110A simple usage example is the benchmark in the bench/bench.c file.
111Also, another example for using Blosc as a generic HDF5 filter is in
112the hdf5/ directory.
113
114I have not tried to compile this with compilers other than GCC, MINGW,
115Intel ICC or MSVC yet. Please report your experiences with your own
116platforms.
117
118Testing Blosc
119=============
120
121Go to the test/ directory and issue:
122
123.. code-block:: console
124
125  $ make test
126
127These tests are very basic, and only valid for platforms where GNU
128make/gcc tools are available.  If you really want to test Blosc the
129hard way, look at:
130
131http://blosc.org/trac/wiki/SyntheticBenchmarks
132
133where instructions on how to intensively test (and benchmark) Blosc
134are given.  If while running these tests you get some error, please
135report it back!
136
137Compiling the Blosc library with CMake
138======================================
139
140Blosc can also be built, tested and installed using CMake_.
141The following procedure describes the "out of source" build.
142
143Create the build directory and move into it:
144
145.. code-block:: console
146
147  $ mkdir build
148  $ cd build
149
150Configure Blosc in release mode (enable optimizations) specifying the
151installation directory:
152
153.. code-block:: console
154
155  $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=INSTALL_DIR \
156      PATH_TO_BLOSC_SOURCE_DIR
157
158Please note that configuration can also be performed using UI tools
159provided by CMake_ (ccmake or cmake-gui):
160
161.. code-block:: console
162
163  $ cmake-gui PATH_TO_BLOSC_SOURCE_DIR
164
165Build, test and install Blosc:
166
167.. code-block:: console
168
169  $ make
170  $ make test
171  $ make install
172
173The static and dynamic version of the Blosc library, together with
174header files, will be installed into the specified INSTALL_DIR.
175
176.. _CMake: http://www.cmake.org
177
178Wrapper for Python
179==================
180
181Blosc has an official wrapper for Python.  See:
182
183https://github.com/FrancescAlted/python-blosc
184
185Filter for HDF5
186===============
187
188For those that want to use Blosc as a filter in the HDF5 library,
189there is a sample implementation in the hdf5/ directory.
190
191Mailing list
192============
193
194There is an official mailing list for Blosc at:
195
196[email protected]
197http://groups.google.es/group/blosc
198
199Acknowledgments
200===============
201
202I'd like to thank the PyTables community that have collaborated in the
203exhaustive testing of Blosc.  With an aggregate amount of more than 300 TB of
204different datasets compressed *and* decompressed successfully, I can say that
205Blosc is pretty safe now and ready for production purposes.
206
207Other important contributions:
208
209* Thibault North contributed a way to call Blosc from different threads in a
210  safe way.
211
212* The cmake support was a contribution of Thibault North, Antonio Valentino
213  and Mark Wiebe.
214
215* Valentin Haenel did a terrific work fixing typos and improving docs and the
216  plotting script.
217
218
219----
220
221  **Enjoy data!**
Note: See TracBrowser for help on using the repository browser.