1 | =============================================================== |
---|
2 | Blosc: A blocking, shuffling and lossless compression library |
---|
3 | =============================================================== |
---|
4 | |
---|
5 | :Author: Francesc Alted |
---|
6 | :Contact: [email protected] |
---|
7 | :URL: http://www.blosc.org |
---|
8 | :Gitter: |gitter| |
---|
9 | :Travis CI: |travis| |
---|
10 | :Appveyor: |appveyor| |
---|
11 | |
---|
12 | .. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg |
---|
13 | :alt: Join the chat at https://gitter.im/Blosc/c-blosc |
---|
14 | :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge |
---|
15 | |
---|
16 | .. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master |
---|
17 | :target: https://travis-ci.org/Blosc/c-blosc |
---|
18 | |
---|
19 | .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true |
---|
20 | :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master |
---|
21 | |
---|
22 | |
---|
23 | What is it? |
---|
24 | =========== |
---|
25 | |
---|
26 | Blosc [1]_ is a high performance compressor optimized for binary data. |
---|
27 | It has been designed to transmit data to the processor cache faster |
---|
28 | than the traditional, non-compressed, direct memory fetch approach via |
---|
29 | a memcpy() OS call. Blosc is the first compressor (that I'm aware of) |
---|
30 | that is meant not only to reduce the size of large datasets on-disk or |
---|
31 | in-memory, but also to accelerate memory-bound computations. |
---|
32 | |
---|
33 | It uses the blocking technique (as described in [2]_) to reduce |
---|
34 | activity on the memory bus as much as possible. In short, this |
---|
35 | technique works by dividing datasets in blocks that are small enough |
---|
36 | to fit in caches of modern processors and perform compression / |
---|
37 | decompression there. It also leverages, if available, SIMD |
---|
38 | instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in |
---|
39 | order to accelerate the compression / decompression process to a |
---|
40 | maximum. |
---|
41 | |
---|
42 | Blosc is actually a metacompressor, that meaning that it can use a range |
---|
43 | of compression libraries for performing the actual |
---|
44 | compression/decompression. Right now, it comes with integrated support |
---|
45 | for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc |
---|
46 | comes with full sources for all compressors, so in case it does not find |
---|
47 | the libraries installed in your system, it will compile from the |
---|
48 | included sources and they will be integrated into the Blosc library |
---|
49 | anyway. That means that you can trust in having all supported |
---|
50 | compressors integrated in Blosc in all supported platforms. |
---|
51 | |
---|
52 | You can see some benchmarks about Blosc performance in [3]_ |
---|
53 | |
---|
54 | Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for |
---|
55 | details. |
---|
56 | |
---|
57 | .. [1] http://www.blosc.org |
---|
58 | .. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf |
---|
59 | .. [3] http://blosc.org/synthetic-benchmarks.html |
---|
60 | |
---|
61 | Meta-compression and other advantages over existing compressors |
---|
62 | =============================================================== |
---|
63 | |
---|
64 | C-Blosc is not like other compressors: it should rather be called a |
---|
65 | meta-compressor. This is so because it can use different compressors |
---|
66 | and filters (programs that generally improve compression ratio). At |
---|
67 | any rate, it can also be called a compressor because it happens that |
---|
68 | it already comes with several compressor and filters, so it can |
---|
69 | actually work like so. |
---|
70 | |
---|
71 | Currently C-Blosc comes with support of BloscLZ, a compressor heavily |
---|
72 | based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC |
---|
73 | (https://github.com/Cyan4973/lz4), Snappy |
---|
74 | (https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as |
---|
75 | well as a highly optimized (it can use SSE2 or AVX2 instructions, if |
---|
76 | available) shuffle and bitshuffle filters (for info on how and why |
---|
77 | shuffling works, see slide 17 of |
---|
78 | http://www.slideshare.net/PyData/blosc-py-data-2014). However, |
---|
79 | different compressors or filters may be added in the future. |
---|
80 | |
---|
81 | C-Blosc is in charge of coordinating the different compressor and |
---|
82 | filters so that they can leverage the blocking technique (described |
---|
83 | above) as well as multi-threaded execution (if several cores are |
---|
84 | available) automatically. That makes that every compressor and filter |
---|
85 | will work at very high speeds, even if it was not initially designed |
---|
86 | for doing blocking or multi-threading. |
---|
87 | |
---|
88 | Other advantages of Blosc are: |
---|
89 | |
---|
90 | * Meant for binary data: can take advantage of the type size |
---|
91 | meta-information for improved compression ratio (using the |
---|
92 | integrated shuffle and bitshuffle filters). |
---|
93 | |
---|
94 | * Small overhead on non-compressible data: only a maximum of (16 + 4 * |
---|
95 | nthreads) additional bytes over the source buffer length are needed |
---|
96 | to compress *any kind of input*. |
---|
97 | |
---|
98 | * Maximum destination length: contrarily to many other compressors, |
---|
99 | both compression and decompression routines have support for maximum |
---|
100 | size lengths for the destination buffer. |
---|
101 | |
---|
102 | When taken together, all these features set Blosc apart from other |
---|
103 | similar solutions. |
---|
104 | |
---|
105 | Compiling your application with a minimalistic Blosc |
---|
106 | ==================================================== |
---|
107 | |
---|
108 | The minimal Blosc consists of the next files (in `blosc/ directory |
---|
109 | <https://github.com/Blosc/c-blosc/tree/master/blosc>`_):: |
---|
110 | |
---|
111 | blosc.h and blosc.c -- the main routines |
---|
112 | shuffle*.h and shuffle*.c -- the shuffle code |
---|
113 | blosclz.h and blosclz.c -- the blosclz compressor |
---|
114 | |
---|
115 | Just add these files to your project in order to use Blosc. For |
---|
116 | information on compression and decompression routines, see `blosc.h |
---|
117 | <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. |
---|
118 | |
---|
119 | To compile using GCC (4.9 or higher recommended) on Unix: |
---|
120 | |
---|
121 | .. code-block:: console |
---|
122 | |
---|
123 | $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread |
---|
124 | |
---|
125 | Using Windows and MINGW: |
---|
126 | |
---|
127 | .. code-block:: console |
---|
128 | |
---|
129 | $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c |
---|
130 | |
---|
131 | Using Windows and MSVC (2013 or higher recommended): |
---|
132 | |
---|
133 | .. code-block:: console |
---|
134 | |
---|
135 | $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c |
---|
136 | |
---|
137 | In the `examples/ directory |
---|
138 | <https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find |
---|
139 | more hints on how to link your app with Blosc. |
---|
140 | |
---|
141 | I have not tried to compile this with compilers other than GCC, clang, |
---|
142 | MINGW, Intel ICC or MSVC yet. Please report your experiences with your |
---|
143 | own platforms. |
---|
144 | |
---|
145 | Adding support for other compressors with a minimalistic Blosc |
---|
146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
147 | |
---|
148 | The official cmake files (see below) for Blosc try hard to include |
---|
149 | support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so |
---|
150 | using them is just a matter of calling the appropriate |
---|
151 | `blosc_set_compressor() API call |
---|
152 | <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. See |
---|
153 | an `example here |
---|
154 | <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. |
---|
155 | |
---|
156 | Having said this, it is also easy to use a minimalistic Blosc and just |
---|
157 | add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC), |
---|
158 | HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the |
---|
159 | appropriate libraries. For example, for compiling with minimalistic |
---|
160 | Blosc but with added Zlib support do: |
---|
161 | |
---|
162 | .. code-block:: console |
---|
163 | |
---|
164 | $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz |
---|
165 | |
---|
166 | In the `bench/ directory |
---|
167 | <https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple |
---|
168 | of Makefile files (one for UNIX and the other for MinGW) with more |
---|
169 | complete building examples, like switching between libraries or |
---|
170 | internal sources for the compressors. |
---|
171 | |
---|
172 | Supported platforms |
---|
173 | ~~~~~~~~~~~~~~~~~~~ |
---|
174 | |
---|
175 | Blosc is meant to support all platforms where a C89 compliant C |
---|
176 | compiler can be found. The ones that are mostly tested are Intel |
---|
177 | (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM |
---|
178 | Blue Gene Q embedded "A2" processor are reported to work too. |
---|
179 | |
---|
180 | Compiling the Blosc library with CMake |
---|
181 | ====================================== |
---|
182 | |
---|
183 | Blosc can also be built, tested and installed using CMake_. Although |
---|
184 | this procedure might seem a bit more involved than the one described |
---|
185 | above, it is the most general because it allows to integrate other |
---|
186 | compressors than BloscLZ either from libraries or from internal |
---|
187 | sources. Hence, serious library developers are encouraged to use this |
---|
188 | way. |
---|
189 | |
---|
190 | The following procedure describes the "out of source" build. |
---|
191 | |
---|
192 | Create the build directory and move into it: |
---|
193 | |
---|
194 | .. code-block:: console |
---|
195 | |
---|
196 | $ mkdir build |
---|
197 | $ cd build |
---|
198 | |
---|
199 | Now run CMake configuration and optionally specify the installation |
---|
200 | directory (e.g. '/usr' or '/usr/local'): |
---|
201 | |
---|
202 | .. code-block:: console |
---|
203 | |
---|
204 | $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory .. |
---|
205 | |
---|
206 | CMake allows to configure Blosc in many different ways, like prefering |
---|
207 | internal or external sources for compressors or enabling/disabling |
---|
208 | them. Please note that configuration can also be performed using UI |
---|
209 | tools provided by CMake_ (ccmake or cmake-gui): |
---|
210 | |
---|
211 | .. code-block:: console |
---|
212 | |
---|
213 | $ ccmake .. # run a curses-based interface |
---|
214 | $ cmake-gui .. # run a graphical interface |
---|
215 | |
---|
216 | Build, test and install Blosc: |
---|
217 | |
---|
218 | .. code-block:: console |
---|
219 | |
---|
220 | $ cmake --build . |
---|
221 | $ ctest |
---|
222 | $ cmake --build . --target install |
---|
223 | |
---|
224 | The static and dynamic version of the Blosc library, together with |
---|
225 | header files, will be installed into the specified |
---|
226 | CMAKE_INSTALL_PREFIX. |
---|
227 | |
---|
228 | .. _CMake: http://www.cmake.org |
---|
229 | |
---|
230 | Once you have compiled your Blosc library, you can easily link your |
---|
231 | apps with it as shown in the `example/ directory |
---|
232 | <https://github.com/Blosc/c-blosc/blob/master/examples>`_. |
---|
233 | |
---|
234 | Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake |
---|
235 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
236 | |
---|
237 | The CMake files in Blosc are configured to automatically detect other |
---|
238 | compressors like LZ4, LZ4HC, Snappy or Zlib by default. So as long as |
---|
239 | the libraries and the header files for these libraries are accessible, |
---|
240 | these will be used by default. See an `example here |
---|
241 | <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. |
---|
242 | |
---|
243 | *Note on Zlib*: the library should be easily found on UNIX systems, |
---|
244 | although on Windows, you can help CMake to find it by setting the |
---|
245 | environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib' |
---|
246 | directories are. Also, make sure that Zlib DDL library is in your |
---|
247 | '\Windows' directory. |
---|
248 | |
---|
249 | However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been |
---|
250 | included in Blosc too. So, in general, you should not worry about not |
---|
251 | having (or CMake not finding) the libraries in your system because in |
---|
252 | this case, their sources will be automatically compiled for you. That |
---|
253 | effectively means that you can be confident in having a complete |
---|
254 | support for all the supported compression libraries in all supported |
---|
255 | platforms. |
---|
256 | |
---|
257 | If you want to force Blosc to use external libraries instead of |
---|
258 | the included compression sources: |
---|
259 | |
---|
260 | .. code-block:: console |
---|
261 | |
---|
262 | $ cmake -DPREFER_EXTERNAL_LZ4=ON .. |
---|
263 | |
---|
264 | You can also disable support for some compression libraries: |
---|
265 | |
---|
266 | .. code-block:: console |
---|
267 | |
---|
268 | $ cmake -DDEACTIVATE_SNAPPY=ON .. |
---|
269 | |
---|
270 | Mac OSX troubleshooting |
---|
271 | ~~~~~~~~~~~~~~~~~~~~~~~ |
---|
272 | |
---|
273 | If you run into compilation troubles when using Mac OSX, please make |
---|
274 | sure that you have installed the command line developer tools. You |
---|
275 | can always install them with: |
---|
276 | |
---|
277 | .. code-block:: console |
---|
278 | |
---|
279 | $ xcode-select --install |
---|
280 | |
---|
281 | Wrapper for Python |
---|
282 | ================== |
---|
283 | |
---|
284 | Blosc has an official wrapper for Python. See: |
---|
285 | |
---|
286 | https://github.com/Blosc/python-blosc |
---|
287 | |
---|
288 | Command line interface and serialization format for Blosc |
---|
289 | ========================================================= |
---|
290 | |
---|
291 | Blosc can be used from command line by using Bloscpack. See: |
---|
292 | |
---|
293 | https://github.com/Blosc/bloscpack |
---|
294 | |
---|
295 | Filter for HDF5 |
---|
296 | =============== |
---|
297 | |
---|
298 | For those who want to use Blosc as a filter in the HDF5 library, |
---|
299 | there is a sample implementation in the blosc/hdf5 project in: |
---|
300 | |
---|
301 | https://github.com/Blosc/hdf5 |
---|
302 | |
---|
303 | Mailing list |
---|
304 | ============ |
---|
305 | |
---|
306 | There is an official mailing list for Blosc at: |
---|
307 | |
---|
308 | [email protected] |
---|
309 | http://groups.google.es/group/blosc |
---|
310 | |
---|
311 | Acknowledgments |
---|
312 | =============== |
---|
313 | |
---|
314 | See THANKS.rst. |
---|
315 | |
---|
316 | |
---|
317 | ---- |
---|
318 | |
---|
319 | **Enjoy data!** |
---|