Changeset 981e22c for thirdparty/blosc
- Timestamp:
- 08/26/16 19:35:26 (8 years ago)
- Branches:
- master, pympi
- Children:
- 8ebc79b
- Parents:
- cda87e9
- git-author:
- Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
- git-committer:
- Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
- Location:
- thirdparty/blosc
- Files:
-
- 11 added
- 13 edited
Legend:
- Unmodified
- Added
- Removed
-
thirdparty/blosc/ANNOUNCE.rst
r00587dc r981e22c 1 1 =============================================================== 2 Announcing Blosc 1.2.33 A blocking, shuffling and lossless compression library 2 Announcing c-blosc 1.10.0 3 A blocking, shuffling and lossless compression library for C 4 4 =============================================================== 5 5 … … 7 7 ============ 8 8 9 New `blosc_init()` and `blosc_destroy()` functions have been added so10 that the global lock can be initialized safely. These new functions 11 will also allow for other kind of initializations/destructions in the12 future. 9 This release introduces support for the new Zstd codec. Zstd is meant to 10 achieve larger compression ratios than Zlib, but with higher speeds. We 11 are talking about a well-balanced codec that should see a lot of use 12 among Blosc users. There is a blog about what you can expect of it in: 13 13 14 Existing applications using Blosc do not need to start using the new 15 functions right away, as long as they calling `blosc_set_nthreads()` 16 previous to anything else. However, using them is highly recommended. 17 18 Thanks to Oscar Villellas for the init/destroy suggestion, it is a 19 nice idea indeed! 14 http://blosc.org/blog/zstd-has-just-landed-in-blosc.html 20 15 21 16 For more info, please see the release notes in: 22 17 23 https://github.com/FrancescAlted/blosc/wiki/Release-notes 18 https://github.com/Blosc/c-blosc/blob/master/RELEASE_NOTES.rst 19 24 20 25 21 What is it? 26 22 =========== 27 23 28 Blosc (http://www.blosc.org) is a high performance compressor24 Blosc (http://www.blosc.org) is a high performance meta-compressor 29 25 optimized for binary data. It has been designed to transmit data to 30 26 the processor cache faster than the traditional, non-compressed, 31 27 direct memory fetch approach via a memcpy() OS call. 32 28 33 Blosc is the first compressor (that I'm aware of) that is meant not 34 only to reduce the size of large datasets on-disk or in-memory, but 35 also to accelerate object manipulations that are memory-bound. 29 Blosc has internal support for different compressors like its internal 30 BloscLZ, but also LZ4, LZ4HC, Snappy and Zlib. This way these can 31 automatically leverage the multithreading and pre-filtering 32 (shuffling) capabilities that comes with Blosc. 36 33 37 There is also a handy command line for Blosc called Bloscpack38 (https://github.com/esc/bloscpack) that allows you to compress large39 binary datafiles on-disk. Although the format for Bloscpack has not40 stabilized yet, it allows you to effectively use Blosc from you41 favorite shell.42 34 43 35 Download sources … … 50 42 and proceed from there. The github repository is over here: 51 43 52 https://github.com/ FrancescAlted/blosc44 https://github.com/Blosc 53 45 54 46 Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for 55 47 details. 48 56 49 57 50 Mailing list … … 65 58 66 59 Enjoy Data! 67 68 69 .. Local Variables:70 .. mode: rst71 .. coding: utf-872 .. fill-column: 7073 .. End: -
thirdparty/blosc/LICENSES/BLOSC.txt
r00587dc r981e22c 1 1 Blosc - A blocking, shuffling and lossless compression library 2 2 3 Copyright (C) 2009-2012 Francesc Alted <[email protected]> 4 Copyright (C) 2013 Francesc Alted <[email protected]> 3 Copyright (C) 2009-2016 Francesc Alted <[email protected]> 5 4 6 5 Permission is hereby granted, free of charge, to any person obtaining a copy … … 21 20 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 22 21 THE SOFTWARE. 23 -
thirdparty/blosc/LICENSES/STDINT.txt
r00587dc r981e22c 1 Copyright (c) 2006-2008 Alexander Chemeris 1 ISO C9x compliant stdint.h for Microsoft Visual Studio 2 Based on ISO/IEC 9899:TC2 Committee draft (May 6, 2005) WG14/N1124 3 4 Copyright (c) 2006-2013 Alexander Chemeris 2 5 3 6 Redistribution and use in source and binary forms, with or without … … 11 14 documentation and/or other materials provided with the distribution. 12 15 13 3. The name of the author may be used to endorse or promote products 14 derived from this software without specific prior written permission. 16 3. Neither the name of the product nor the names of its contributors may 17 be used to endorse or promote products derived from this software 18 without specific prior written permission. 15 19 16 20 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED -
thirdparty/blosc/README.rst
r00587dc r981e22c 4 4 5 5 :Author: Francesc Alted 6 :Contact: f [email protected]6 :Contact: f[email protected] 7 7 :URL: http://www.blosc.org 8 :Gitter: |gitter| 9 :Travis CI: |travis| 10 :Appveyor: |appveyor| 11 12 .. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg 13 :alt: Join the chat at https://gitter.im/Blosc/c-blosc 14 :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge 15 16 .. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master 17 :target: https://travis-ci.org/Blosc/c-blosc 18 19 .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true 20 :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master 21 8 22 9 23 What is it? … … 18 32 19 33 It uses the blocking technique (as described in [2]_) to reduce 20 activity on the memory bus as much as possible. 34 activity on the memory bus as much as possible. In short, this 21 35 technique works by dividing datasets in blocks that are small enough 22 36 to fit in caches of modern processors and perform compression / 23 37 decompression there. It also leverages, if available, SIMD 24 instructions (SSE2) and multi-threading capabilities of CPUs, in order 25 to accelerate the compression / decompression process to a maximum. 26 27 You can see some recent benchmarks about Blosc performance in [3]_ 38 instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in 39 order to accelerate the compression / decompression process to a 40 maximum. 41 42 Blosc is actually a metacompressor, that meaning that it can use a range 43 of compression libraries for performing the actual 44 compression/decompression. Right now, it comes with integrated support 45 for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc 46 comes with full sources for all compressors, so in case it does not find 47 the libraries installed in your system, it will compile from the 48 included sources and they will be integrated into the Blosc library 49 anyway. That means that you can trust in having all supported 50 compressors integrated in Blosc in all supported platforms. 51 52 You can see some benchmarks about Blosc performance in [3]_ 28 53 29 54 Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for … … 32 57 .. [1] http://www.blosc.org 33 58 .. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf 34 .. [3] http://blosc.org/ trac/wiki/SyntheticBenchmarks59 .. [3] http://blosc.org/synthetic-benchmarks.html 35 60 36 61 Meta-compression and other advantages over existing compressors 37 62 =============================================================== 38 63 39 Blosc is not like other compressors: it should rather be called a64 C-Blosc is not like other compressors: it should rather be called a 40 65 meta-compressor. This is so because it can use different compressors 41 and pre-conditioners (programs that generally improve compression 42 ratio). At any rate, it can also be called a compressor because it 43 happens that it already integrates one compressor and one 44 pre-conditioner, so it can actually work like so. 45 46 Currently it uses BloscLZ, a compressor heavily based on FastLZ 47 (http://fastlz.org/), and a highly optimized (it can use SSE2 48 instructions, if available) Shuffle pre-conditioner. However, 49 different compressors or pre-conditioners may be added in the future. 50 51 Blosc is in charge of coordinating the compressor and pre-conditioners 52 so that they can leverage the blocking technique (described above) as 53 well as multi-threaded execution (if several cores are available) 54 automatically. That makes that every compressor and pre-conditioner 66 and filters (programs that generally improve compression ratio). At 67 any rate, it can also be called a compressor because it happens that 68 it already comes with several compressor and filters, so it can 69 actually work like so. 70 71 Currently C-Blosc comes with support of BloscLZ, a compressor heavily 72 based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC 73 (https://github.com/Cyan4973/lz4), Snappy 74 (https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as 75 well as a highly optimized (it can use SSE2 or AVX2 instructions, if 76 available) shuffle and bitshuffle filters (for info on how and why 77 shuffling works, see slide 17 of 78 http://www.slideshare.net/PyData/blosc-py-data-2014). However, 79 different compressors or filters may be added in the future. 80 81 C-Blosc is in charge of coordinating the different compressor and 82 filters so that they can leverage the blocking technique (described 83 above) as well as multi-threaded execution (if several cores are 84 available) automatically. That makes that every compressor and filter 55 85 will work at very high speeds, even if it was not initially designed 56 86 for doing blocking or multi-threading. … … 60 90 * Meant for binary data: can take advantage of the type size 61 91 meta-information for improved compression ratio (using the 62 integrated shuffle pre-conditioner). 63 64 * Small overhead on non-compressible data: only a maximum of 16 65 additional bytes over the source buffer length are needed to 66 compress *every* input. 67 68 * Maximum destination length: contrarily to many other 69 compressors, both compression and decompression routines have 70 support for maximum size lengths for the destination buffer. 71 72 * Replacement for memcpy(): it supports a 0 compression level that 73 does not compress at all and only adds 16 bytes of overhead. In 74 this mode Blosc can copy memory usually faster than a plain 75 memcpy(). 92 integrated shuffle and bitshuffle filters). 93 94 * Small overhead on non-compressible data: only a maximum of (16 + 4 * 95 nthreads) additional bytes over the source buffer length are needed 96 to compress *any kind of input*. 97 98 * Maximum destination length: contrarily to many other compressors, 99 both compression and decompression routines have support for maximum 100 size lengths for the destination buffer. 76 101 77 102 When taken together, all these features set Blosc apart from other 78 103 similar solutions. 79 104 80 Compiling your application with Blosc 81 ===================================== 82 83 Blosc consists of the next files (in blosc/ directory):: 84 85 blosc.h and blosc.c -- the main routines 86 blosclz.h and blosclz.c -- the actual compressor 87 shuffle.h and shuffle.c -- the shuffle code 105 Compiling your application with a minimalistic Blosc 106 ==================================================== 107 108 The minimal Blosc consists of the next files (in `blosc/ directory 109 <https://github.com/Blosc/c-blosc/tree/master/blosc>`_):: 110 111 blosc.h and blosc.c -- the main routines 112 shuffle*.h and shuffle*.c -- the shuffle code 113 blosclz.h and blosclz.c -- the blosclz compressor 88 114 89 115 Just add these files to your project in order to use Blosc. For 90 information on compression and decompression routines, see blosc.h. 91 92 To compile using GCC (4.4 or higher recommended) on Unix: 93 94 .. code-block:: console 95 96 $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -lpthread 116 information on compression and decompression routines, see `blosc.h 117 <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. 118 119 To compile using GCC (4.9 or higher recommended) on Unix: 120 121 .. code-block:: console 122 123 $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread 97 124 98 125 Using Windows and MINGW: … … 100 127 .. code-block:: console 101 128 102 $ gcc -O3 -msse2 -o myprog myprog.c blosc\*.c 103 104 Using Windows and MSVC (2008 or higher recommended): 105 106 .. code-block:: console 107 108 $ cl /Ox /Femyprog.exe myprog.c blosc\*.c 109 110 A simple usage example is the benchmark in the bench/bench.c file. 111 Also, another example for using Blosc as a generic HDF5 filter is in 112 the hdf5/ directory. 113 114 I have not tried to compile this with compilers other than GCC, MINGW, 115 Intel ICC or MSVC yet. Please report your experiences with your own 116 platforms. 117 118 Testing Blosc 119 ============= 120 121 Go to the test/ directory and issue: 122 123 .. code-block:: console 124 125 $ make test 126 127 These tests are very basic, and only valid for platforms where GNU 128 make/gcc tools are available. If you really want to test Blosc the 129 hard way, look at: 130 131 http://blosc.org/trac/wiki/SyntheticBenchmarks 132 133 where instructions on how to intensively test (and benchmark) Blosc 134 are given. If while running these tests you get some error, please 135 report it back! 129 $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c 130 131 Using Windows and MSVC (2013 or higher recommended): 132 133 .. code-block:: console 134 135 $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c 136 137 In the `examples/ directory 138 <https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find 139 more hints on how to link your app with Blosc. 140 141 I have not tried to compile this with compilers other than GCC, clang, 142 MINGW, Intel ICC or MSVC yet. Please report your experiences with your 143 own platforms. 144 145 Adding support for other compressors with a minimalistic Blosc 146 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 148 The official cmake files (see below) for Blosc try hard to include 149 support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so 150 using them is just a matter of calling the appropriate 151 `blosc_set_compressor() API call 152 <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. See 153 an `example here 154 <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 155 156 Having said this, it is also easy to use a minimalistic Blosc and just 157 add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC), 158 HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the 159 appropriate libraries. For example, for compiling with minimalistic 160 Blosc but with added Zlib support do: 161 162 .. code-block:: console 163 164 $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz 165 166 In the `bench/ directory 167 <https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple 168 of Makefile files (one for UNIX and the other for MinGW) with more 169 complete building examples, like switching between libraries or 170 internal sources for the compressors. 171 172 Supported platforms 173 ~~~~~~~~~~~~~~~~~~~ 174 175 Blosc is meant to support all platforms where a C89 compliant C 176 compiler can be found. The ones that are mostly tested are Intel 177 (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM 178 Blue Gene Q embedded "A2" processor are reported to work too. 136 179 137 180 Compiling the Blosc library with CMake 138 181 ====================================== 139 182 140 Blosc can also be built, tested and installed using CMake_. 183 Blosc can also be built, tested and installed using CMake_. Although 184 this procedure might seem a bit more involved than the one described 185 above, it is the most general because it allows to integrate other 186 compressors than BloscLZ either from libraries or from internal 187 sources. Hence, serious library developers are encouraged to use this 188 way. 189 141 190 The following procedure describes the "out of source" build. 142 191 … … 148 197 $ cd build 149 198 150 Configure Blosc in release mode (enable optimizations) specifying the 151 installation directory: 152 153 .. code-block:: console 154 155 $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=INSTALL_DIR \ 156 PATH_TO_BLOSC_SOURCE_DIR 157 158 Please note that configuration can also be performed using UI tools 159 provided by CMake_ (ccmake or cmake-gui): 160 161 .. code-block:: console 162 163 $ cmake-gui PATH_TO_BLOSC_SOURCE_DIR 199 Now run CMake configuration and optionally specify the installation 200 directory (e.g. '/usr' or '/usr/local'): 201 202 .. code-block:: console 203 204 $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory .. 205 206 CMake allows to configure Blosc in many different ways, like prefering 207 internal or external sources for compressors or enabling/disabling 208 them. Please note that configuration can also be performed using UI 209 tools provided by CMake_ (ccmake or cmake-gui): 210 211 .. code-block:: console 212 213 $ ccmake .. # run a curses-based interface 214 $ cmake-gui .. # run a graphical interface 164 215 165 216 Build, test and install Blosc: … … 167 218 .. code-block:: console 168 219 169 $ make170 $ maketest171 $ make install220 $ cmake --build . 221 $ ctest 222 $ cmake --build . --target install 172 223 173 224 The static and dynamic version of the Blosc library, together with 174 header files, will be installed into the specified INSTALL_DIR. 225 header files, will be installed into the specified 226 CMAKE_INSTALL_PREFIX. 175 227 176 228 .. _CMake: http://www.cmake.org 229 230 Once you have compiled your Blosc library, you can easily link your 231 apps with it as shown in the `example/ directory 232 <https://github.com/Blosc/c-blosc/blob/master/examples>`_. 233 234 Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake 235 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 236 237 The CMake files in Blosc are configured to automatically detect other 238 compressors like LZ4, LZ4HC, Snappy or Zlib by default. So as long as 239 the libraries and the header files for these libraries are accessible, 240 these will be used by default. See an `example here 241 <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 242 243 *Note on Zlib*: the library should be easily found on UNIX systems, 244 although on Windows, you can help CMake to find it by setting the 245 environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib' 246 directories are. Also, make sure that Zlib DDL library is in your 247 '\Windows' directory. 248 249 However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been 250 included in Blosc too. So, in general, you should not worry about not 251 having (or CMake not finding) the libraries in your system because in 252 this case, their sources will be automatically compiled for you. That 253 effectively means that you can be confident in having a complete 254 support for all the supported compression libraries in all supported 255 platforms. 256 257 If you want to force Blosc to use external libraries instead of 258 the included compression sources: 259 260 .. code-block:: console 261 262 $ cmake -DPREFER_EXTERNAL_LZ4=ON .. 263 264 You can also disable support for some compression libraries: 265 266 .. code-block:: console 267 268 $ cmake -DDEACTIVATE_SNAPPY=ON .. 269 270 Mac OSX troubleshooting 271 ~~~~~~~~~~~~~~~~~~~~~~~ 272 273 If you run into compilation troubles when using Mac OSX, please make 274 sure that you have installed the command line developer tools. You 275 can always install them with: 276 277 .. code-block:: console 278 279 $ xcode-select --install 177 280 178 281 Wrapper for Python … … 181 284 Blosc has an official wrapper for Python. See: 182 285 183 https://github.com/FrancescAlted/python-blosc 286 https://github.com/Blosc/python-blosc 287 288 Command line interface and serialization format for Blosc 289 ========================================================= 290 291 Blosc can be used from command line by using Bloscpack. See: 292 293 https://github.com/Blosc/bloscpack 184 294 185 295 Filter for HDF5 186 296 =============== 187 297 188 For those that want to use Blosc as a filter in the HDF5 library, 189 there is a sample implementation in the hdf5/ directory. 298 For those who want to use Blosc as a filter in the HDF5 library, 299 there is a sample implementation in the blosc/hdf5 project in: 300 301 https://github.com/Blosc/hdf5 190 302 191 303 Mailing list … … 200 312 =============== 201 313 202 I'd like to thank the PyTables community that have collaborated in the 203 exhaustive testing of Blosc. With an aggregate amount of more than 300 TB of 204 different datasets compressed *and* decompressed successfully, I can say that 205 Blosc is pretty safe now and ready for production purposes. 206 207 Other important contributions: 208 209 * Thibault North contributed a way to call Blosc from different threads in a 210 safe way. 211 212 * The cmake support was a contribution of Thibault North, Antonio Valentino 213 and Mark Wiebe. 214 215 * Valentin Haenel did a terrific work fixing typos and improving docs and the 216 plotting script. 314 See THANKS.rst. 217 315 218 316 -
thirdparty/blosc/README_HEADER.rst
r00587dc r981e22c 21 21 (``uint8``) Blosc format version. 22 22 :versionlz: 23 (``uint8``) Blosclz format version (internal Lempel-Ziv algorithm).24 :flags :25 (``bitfield``) The flags of the buffer .23 (``uint8``) Version of the internal compressor used. 24 :flags and compressor enumeration: 25 (``bitfield``) The flags of the buffer 26 26 27 27 :bit 0 (``0x01``): 28 Whether the shuffle filter has been applied or not.28 Whether the byte-shuffle filter has been applied or not. 29 29 :bit 1 (``0x02``): 30 30 Whether the internal buffer is a pure memcpy or not. 31 :bit 2 (``0x04``): 32 Whether the bit-shuffle filter has been applied or not. 33 :bit 3 (``0x08``): 34 Reserved 35 :bit 4 (``0x16``): 36 Reserved 37 :bit 5 (``0x32``): 38 Part of the enumeration for compressors. 39 :bit 6 (``0x64``): 40 Part of the enumeration for compressors. 41 :bit 7 (``0x64``): 42 Part of the enumeration for compressors. 43 44 The last three bits form an enumeration that allows to use alternative 45 compressors. 46 47 :``0``: 48 ``blosclz`` 49 :``1``: 50 ``lz4`` or ``lz4hc`` 51 :``2``: 52 ``snappy`` 53 :``3``: 54 ``zlib`` 55 :``4``: 56 ``zstd`` 31 57 32 58 :typesize: … … 38 64 :ctbytes: 39 65 (``uint32``) Compressed size of the buffer. 40 -
thirdparty/blosc/RELEASE_NOTES.rst
r00587dc r981e22c 1 =========================== ====2 Release notes for Blosc 1.2.33 =========================== ====1 =========================== 2 Release notes for C-Blosc 3 =========================== 4 4 5 5 :Author: Francesc Alted 6 :Contact: f [email protected]6 :Contact: f[email protected] 7 7 :URL: http://www.blosc.org 8 9 10 Changes from 1.10.0 to 1.10.1 11 ============================= 12 13 #XXX version-specific blurb XXX# 14 15 16 Changes from 1.9.3 to 1.10.0 17 ============================ 18 19 - Initial support for Zstandard (0.7.4). Zstandard (or Zstd for short) is a new 20 compression library that allows better compression than Zlib, but that works 21 typically faster (and some times much faster), making of it a good match for 22 Blosc. 23 24 Although the Zstd format is considered stable 25 (http://fastcompression.blogspot.com.es/2016_07_03_archive.html), its API is 26 maturing very fast, and despite passing the extreme test suite for C-Blosc, 27 this codec should be considered in beta for C-Blosc usage purposes. Please 28 test it and report back any possible issues you may get. 29 30 31 Changes from 1.9.2 to 1.9.3 32 =========================== 33 34 - Reverted a mistake introduced in 1.7.1. At that time, bit-shuffling 35 was enabled for typesize == 1 (i.e. strings), but the change also 36 included byte-shuffling accidentally. This only affected performance, 37 but in a quite bad way (a copy was needed). This has been fixed and 38 byte-shuffling is not active when typesize == 1 anymore. 39 40 41 Changes from 1.9.1 to 1.9.2 42 =========================== 43 44 - Check whether Blosc is actually initialized before blosc_init(), 45 blosc_destroy() and blosc_free_resources(). This makes the library 46 more resistant to different initialization cycles 47 (e.g. https://github.com/stevengj/Blosc.jl/issues/19). 48 49 50 Changes from 1.9.0 to 1.9.1 51 =========================== 52 53 - The internal copies when clevel=0 are made now via memcpy(). At the 54 beginning of C-Blosc development, benchmarks where saying that the 55 internal, multi-threaded copies inside C-Blosc were faster than 56 memcpy(), but 6 years later, memcpy() made greats strides in terms 57 of efficiency. With this, you should expect an slight speed 58 advantage (10% ~ 20%) when C-Blosc is used as a replacement of 59 memcpy() (which should not be the most common scenario out there). 60 61 - Added a new DEACTIVATE_AVX2 cmake option to explicitly disable AVX2 62 at build-time. Thanks to James Bird. 63 64 - The ``make -jN`` for parallel compilation should work now. Thanks 65 to James Bird. 66 67 68 Changes from 1.8.1 to 1.9.0 69 =========================== 70 71 * New blosc_get_nthreads() function to get the number of threads that 72 will be used internally during compression/decompression (set by 73 already existing blosc_set_nthreads()). 74 75 * New blosc_get_compressor() function to get the compressor that will 76 be used internally during compression (set by already existing 77 blosc_set_compressor()). 78 79 * New blosc_get_blocksize() function to get the internal blocksize to 80 be used during compression (set by already existing 81 blosc_set_blocksize()). 82 83 * Now, when the BLOSC_NOLOCK environment variable is set (to any 84 value), the calls to blosc_compress() and blosc_decompress() will 85 call blosc_compress_ctx() and blosc_decompress_ctx() under the hood 86 so as to avoid the internal locks. See blosc.h for details. This 87 allows multi-threaded apps calling the non _ctx() functions to avoid 88 the internal locks in C-Blosc. For the not multi-threaded app 89 though, it is in general slower to call the _ctx() functions so the 90 use of BLOSC_NOLOCK is discouraged. 91 92 * In the same vein, from now on, when the BLOSC_NTHREADS environment 93 variable is set to an integer, every call to blosc_compress() and 94 blosc_decompress() will call blosc_set_nthreads(BLOSC_NTHREADS) 95 before the actuall compression/decompression process. See blosc.h 96 for details. 97 98 * Finally, if BLOSC_CLEVEL, BLOSC_SHUFFLE, BLOSC_TYPESIZE and/or 99 BLOSC_COMPRESSOR variables are set in the environment, these will be 100 also honored before calling blosc_compress(). 101 102 * Calling blosc_init() before any other Blosc call, although 103 recommended, is not necessary anymore. The idea is that you can use 104 just the basic blosc_compress() and blosc_decompress() and control 105 other parameters (nthreads, compressor, blocksize) by using 106 environment variables (see above). 107 108 109 Changes from 1.8.0 to 1.8.1 110 =========================== 111 112 * Disable the use of __builtin_cpu_supports() for GCC 5.3.1 113 compatibility. Details in: 114 https://lists.fedoraproject.org/archives/list/[email protected]/thread/ZM2L65WIZEEQHHLFERZYD5FAG7QY2OGB/ 115 116 117 Changes from 1.7.1 to 1.8.0 118 =========================== 119 120 * The code is (again) compatible with VS2008 and VS2010. This is 121 important for compatibility with Python 2.6/2.7/3.3/3.4. 122 123 * Introduced a new global lock during blosc_decompress() operation. 124 As the blosc_compress() was already guarded by a global lock, this 125 means that the compression/decompression is again thread safe. 126 However, when using C-Blosc from multi-threaded environments, it is 127 important to keep using the *_ctx() functions for performance 128 reasons. NOTE: _ctx() functions will be replaced by more powerful 129 ones in C-Blosc 2.0. 130 131 132 Changes from 1.7.0 to 1.7.1 133 =========================== 134 135 * Fixed a bug preventing bitshuffle to work correctly on getitem(). 136 Now, everything with bitshuffle seems to work correctly. 137 138 * Fixed the thread initialization for blosc_decompress_ctx(). Issue 139 #158. Thanks to Chris Webers. 140 141 * Fixed a bug in the blocksize computation introduced in 1.7.0. This 142 could have been creating segfaults. 143 144 * Allow bitshuffle to run on 1-byte typesizes. 145 146 * New parametrization of the blocksize to be independent of the 147 typesize. This allows a smoother speed throughout all typesizes. 148 149 * lz4 and lz4hc codecs upgraded to 1.7.2 (from 1.7.0). 150 151 * When calling set_nthreads() but not actually changing the number of 152 threads in the internal pool does not teardown and setup it anymore. 153 PR #153. Thanks to Santi Villalba. 154 155 156 Changes from 1.6.1 to 1.7.0 157 =========================== 158 159 * Added a new 'bitshuffle' filter so that the shuffle takes place at a 160 bit level and not just at a byte one, which is what it does the 161 previous 'shuffle' filter. 162 163 For activating this new bit-level filter you only have to pass the 164 symbol BLOSC_BITSHUFFLE to `blosc_compress()`. For the previous 165 byte-level one, pass BLOSC_SHUFFLE. For disabling the shuffle, pass 166 BLOSC_NOSHUFFLE. 167 168 This is a port of the existing filter in 169 https://github.com/kiyo-masui/bitshuffle. Thanks to Kiyo Masui for 170 changing the license and allowing its inclusion here. 171 172 * New acceleration mode for LZ4 and BloscLZ codecs that enters in 173 operation with complevel < 9. This allows for an important boost in 174 speed with minimal compression ratio loss. Francesc Alted. 175 176 * LZ4 codec updated to 1.7.0 (r130). 177 178 * PREFER_EXTERNAL_COMPLIBS cmake option has been removed and replaced 179 by the more fine grained PREFER_EXTERNAL_LZ4, PREFER_EXTERNAL_SNAPPY 180 and PREFER_EXTERNAL_ZLIB. In order to allow the use of the new API 181 introduced in LZ4 1.7.0, PREFER_EXTERNAL_LZ4 has been set to OFF by 182 default, whereas PREFER_EXTERNAL_SNAPPY and PREFER_EXTERNAL_ZLIB 183 continues to be ON. 184 185 * Implemented SSE2 shuffle support for buffers containing a number of 186 elements which is not a multiple of (typesize * vectorsize). Jack 187 Pappas. 188 189 * Added SSE2 shuffle/unshuffle routines for types larger than 16 190 bytes. Jack Pappas. 191 192 * 'test_basic' suite has been split in components for a much better 193 granularity on what's a possibly failing test. Also, lots of new 194 tests have been added. Jack Pappas. 195 196 * Fixed compilation on non-Intel archs (tested on ARM). Zbyszek 197 Szmek. 198 199 * Modifyied cmake files in order to inform that AVX2 on Visual Studio 200 is supported only in 2013 update 2 and higher. 201 202 * Added a replacement for stdbool.h for Visual Studio < 2013. 203 204 * blosclz codec adds Win64/Intel as a platform supporting unaligned 205 addressing. That leads to a speed-up of 2.2x in decompression. 206 207 * New blosc_get_version_string() function for retrieving the version 208 of the c-blosc library. Useful when linking with dynamic libraries 209 and one want to know its version. 210 211 * New example (win-dynamic-linking.c) that shows how to link a Blosc 212 DLL dynamically in run-time (Windows only). 213 214 * The `context.threads_started` is initialized now when decompressing. 215 This could cause crashes in case you decompressed before compressing 216 (e.g. directly deserializing blosc buffers). @atchouprakov. 217 218 * The HDF5 filter has been removed from c-blosc and moved into its own 219 repo at: https://github.com/Blosc/hdf5 220 221 * The MS Visual Studio 2008 has been tested with c-blosc for ensuring 222 compatibility with extensions for Python 2.6 and up. 223 224 225 Changes from 1.6.0 to 1.6.1 226 =========================== 227 228 * Support for *runtime* detection of AVX2 and SSE2 SIMD instructions. 229 These changes make it possible to compile one single binary that 230 runs on a system that supports SSE2 or AVX2 (or neither), so the 231 redistribution problem is fixed (see #101). Thanks to Julian Taylor 232 and Jack Pappas. 233 234 * Added support for MinGW and TDM-GCC compilers for Windows. Thanks 235 to yasushima-gd. 236 237 * Fixed a bug in blosclz that could potentially overwrite an area 238 beyond the output buffer. See #113. 239 240 * New computation for blocksize so that larger typesizes (> 8 bytes) 241 would benefit of much better compression ratios. Speed is not 242 penalized too much. 243 244 * New parametrization of the hash table for blosclz codec. This 245 allows better compression in many scenarios, while slightly 246 increasing the speed. 247 248 249 Changes from 1.5.4 to 1.6.0 250 =========================== 251 252 * Support for AVX2 is here! The benchmarks with a 4-core Intel 253 Haswell machine tell that both compression and decompression are 254 accelerated around a 10%, reaching peaks of 9.6 GB/s during 255 compression and 26 GB/s during decompression (memcpy() speed for 256 this machine is 7.5 GB/s for writes and 11.7 GB/s for reads). Many 257 thanks to @littlezhou for this nice work. 258 259 * Support for HPET (high precision timers) for the `bench` program. 260 This is particularly important for microbenchmarks like bench is 261 doing; since they take so little time to run, the granularity of a 262 less-accurate timer may account for a significant portion of the 263 runtime of the benchmark itself, skewing the results. Thanks to 264 Jack Pappas. 265 266 267 Changes from 1.5.3 to 1.5.4 268 =========================== 269 270 * Updated to LZ4 1.6.0 (r128). 271 272 * Fix resource leak in t_blosc. Jack Pappas. 273 274 * Better checks during testing. Jack Pappas. 275 276 * Dynamically loadable HDF5 filter plugin. Kiyo Masui. 277 278 279 Changes from 1.5.2 to 1.5.3 280 =========================== 281 282 * Use llabs function (where available) instead of abs to avoid 283 truncating the result. Jack Pappas. 284 285 * Use C11 aligned_alloc when it's available. Jack Pappas. 286 287 * Use the built-in stdint.h with MSVC when available. Jack Pappas. 288 289 * Only define the __SSE2__ symbol when compiling with MS Visual C++ 290 and targeting x64 or x86 with the correct /arch flag set. This 291 avoids re-defining the symbol which makes other compilers issue 292 warnings. Jack Pappas. 293 294 * Reinitializing Blosc during a call to set_nthreads() so as to fix 295 problems with contexts. Francesc Alted. 296 297 298 299 Changes from 1.5.1 to 1.5.2 300 =========================== 301 302 * Using blosc_compress_ctx() / blosc_decompress_ctx() inside the HDF5 303 compressor for allowing operation in multiprocess scenarios. See: 304 https://github.com/PyTables/PyTables/issues/412 305 306 The drawback of this quick fix is that the Blosc filter will be only 307 able to use a single thread until another solution can be devised. 308 309 310 Changes from 1.5.0 to 1.5.1 311 =========================== 312 313 * Updated to LZ4 1.5.0. Closes #74. 314 315 * Added the 'const' qualifier to non SSE2 shuffle functions. Closes #75. 316 317 * Explicitly call blosc_init() in HDF5 blosc_filter.c, fixing a 318 segfault. 319 320 * Quite a few improvements in cmake files for HDF5 support. Thanks to 321 Dana Robinson (The HDF Group). 322 323 * Variable 'class' caused problems compiling the HDF5 filter with g++. 324 Thanks to Laurent Chapon. 325 326 * Small improvements on docstrings of c-blosc main functions. 327 328 329 Changes from 1.4.1 to 1.5.0 330 =========================== 331 332 * Added new calls for allowing Blosc to be used *simultaneously* 333 (i.e. lock free) from multi-threaded environments. The new 334 functions are: 335 336 - blosc_compress_ctx(...) 337 - blosc_decompress_ctx(...) 338 339 See the new docstrings in blosc.h for how to use them. The previous 340 API should be completely unaffected. Thanks to Christopher Speller. 341 342 * Optimized copies during BloscLZ decompression. This can make BloscLZ 343 to decompress up to 1.5x faster in some situations. 344 345 * LZ4 and LZ4HC compressors updated to version 1.3.1. 346 347 * Added an examples directory on how to link apps with Blosc. 348 349 * stdlib.h moved from blosc.c to blosc.h as suggested by Rob Lathm. 350 351 * Fix a warning for {snappy,lz4}-free compilation. Thanks to Andrew Schaaf. 352 353 * Several improvements for CMakeLists.txt (cmake). 354 355 * Fixing C99 compatibility warnings. Thanks to Christopher Speller. 356 357 358 Changes from 1.4.0 to 1.4.1 359 =========================== 360 361 * Fixed a bug in blosc_getitem() introduced in 1.4.0. Added a test for 362 blosc_getitem() as well. 363 364 365 Changes from 1.3.6 to 1.4.0 366 =========================== 367 368 * Support for non-Intel and non-SSE2 architectures has been added. In 369 particular, the Raspberry Pi platform (ARM) has been tested and all 370 tests pass here. 371 372 * Architectures requiring strict access alignment are supported as well. 373 Due to this, arquitectures with a high penalty in accessing unaligned 374 data (e.g. Raspberry Pi, ARMv6) can compress up to 2.5x faster. 375 376 * LZ4 has been updated to r119 (1.2.0) so as to fix a possible security 377 breach. 378 379 380 Changes from 1.3.5 to 1.3.6 381 =========================== 382 383 * Updated to LZ4 r118 due to a (highly unlikely) security hole. For 384 details see: 385 386 http://fastcompression.blogspot.fr/2014/06/debunking-lz4-20-years-old-bug-myth.html 387 388 389 Changes from 1.3.4 to 1.3.5 390 =========================== 391 392 * Removed a pointer from 'pointer from integer without a cast' compiler 393 warning due to a bad macro definition. 394 395 396 Changes from 1.3.3 to 1.3.4 397 =========================== 398 399 * Fixed a false buffer overrun condition. This bug made c-blosc to 400 fail, even if the failure was not real. 401 402 * Fixed the type of a buffer string. 403 404 405 Changes from 1.3.2 to 1.3.3 406 =========================== 407 408 * Updated to LZ4 1.1.3 (improved speed for 32-bit platforms). 409 410 * Added a new `blosc_cbuffer_complib()` for getting the compression 411 library for a compressed buffer. 412 413 414 Changes from 1.3.1 to 1.3.2 415 =========================== 416 417 * Fix for compiling Snappy sources against MSVC 2008. Thanks to Mark 418 Wiebe! 419 420 * Version for internal LZ4 and Snappy are now supported. When compiled 421 against the external libraries, this info is not available because 422 they do not support the symbols (yet). 423 424 425 Changes from 1.3.0 to 1.3.1 426 =========================== 427 428 * Fixes for a series of issues with the filter for HDF5 and, in 429 particular, a problem in the decompression buffer size that made it 430 impossible to use the blosc_filter in combination with other ones 431 (e.g. fletcher32). See 432 https://github.com/PyTables/PyTables/issues/21. 433 434 Thanks to Antonio Valentino for the fix! 435 436 437 Changes from 1.2.4 to 1.3.0 438 =========================== 439 440 A nice handful of compressors have been added to Blosc: 441 442 * LZ4 (http://code.google.com/p/lz4/): A very fast 443 compressor/decompressor. Could be thought as a replacement of the 444 original BloscLZ, but it can behave better is some scenarios. 445 446 * LZ4HC (http://code.google.com/p/lz4/): This is a variation of LZ4 447 that achieves much better compression ratio at the cost of being 448 much slower for compressing. Decompression speed is unaffected (and 449 sometimes better than when using LZ4 itself!), so this is very good 450 for read-only datasets. 451 452 * Snappy (http://code.google.com/p/snappy/): A very fast 453 compressor/decompressor. Could be thought as a replacement of the 454 original BloscLZ, but it can behave better is some scenarios. 455 456 * Zlib (http://www.zlib.net/): This is a classic. It achieves very 457 good compression ratios, at the cost of speed. However, 458 decompression speed is still pretty good, so it is a good candidate 459 for read-only datasets. 460 461 With this, you can select the compression library with the new 462 function:: 463 464 int blosc_set_complib(char* complib); 465 466 where you pass the library that you want to use (currently "blosclz", 467 "lz4", "lz4hc", "snappy" and "zlib", but the list can grow in the 468 future). 469 470 You can get more info about compressors support in you Blosc build by 471 using these functions:: 472 473 char* blosc_list_compressors(void); 474 int blosc_get_complib_info(char *compressor, char **complib, char **version); 8 475 9 476 … … 245 712 necessary on Mac because 16 bytes alignment is ensured by default. 246 713 Thanks to Ivan Vilata. Fixes #3. 247 248 249 250 251 .. Local Variables:252 .. mode: rst253 .. coding: utf-8254 .. fill-column: 72255 .. End: -
thirdparty/blosc/RELEASING.rst
r00587dc r981e22c 4 4 5 5 :Author: Francesc Alted 6 :Contact: f [email protected]7 :Date: 201 2-09-166 :Contact: f[email protected] 7 :Date: 2014-01-15 8 8 9 9 … … 16 16 - Check that *VERSION* symbols in blosc/blosc.h contains the correct info. 17 17 18 - Commit the changes:: 19 20 $ git commit -a -m"Getting ready for X.Y.Z release" 21 22 18 23 Testing 19 24 ------- 20 25 21 Go to the test/ directoryand issue::26 Create a new build/ directory, change into it and issue:: 22 27 23 $ make test 28 $ cmake .. 29 $ cmake --build . 30 $ ctest 24 31 25 These tests are very basic, and only valid for platforms where GNU 26 make/gcc tools are available. To actually test Blosc the hard way, 27 look at the end of: 32 To actually test Blosc the hard way, look at the end of: 28 33 29 http://blosc.org/ trac/wiki/SyntheticBenchmarks34 http://blosc.org/synthetic-benchmarks.html 30 35 31 36 where instructions on how to intensively test (and benchmark) Blosc 32 37 are given. 33 34 Packaging35 ---------36 37 - Unpack the archive of the repository in a temporary directory::38 39 $ export VERSION="the version number"40 $ mkdir /tmp/blosc-$VERSION41 # IMPORTANT: make sure that you are at the root of the repo now!42 $ git archive master | tar -x -C /tmp/blosc-$VERSION43 44 - And package the repo::45 46 $ cd /tmp47 $ tar cvfz blosc-$VERSION.tar.gz blosc-$VERSION48 49 Do a quick check that the tarball is sane.50 51 52 Uploading53 ---------54 55 - Go to the downloads section in blosc.org and upload the source56 tarball.57 38 58 39 … … 72 53 ---------- 73 54 74 - Update the release notes in the github wiki: 75 76 https://github.com/FrancescAlted/blosc/wiki/Release-notes 77 78 - Send an announcement to the blosc, pytables, carray and 55 - Send an announcement to the blosc, pytables-dev, bcolz and 79 56 comp.compression lists. Use the ``ANNOUNCE.rst`` file as skeleton 80 57 (possibly as the definitive version). 58 81 59 82 60 Post-release actions … … 87 65 88 66 - Create new headers for adding new features in ``RELEASE_NOTES.rst`` 89 and empty the release-specific information in ``ANNOUNCE.rst`` and 90 add this place-holder instead: 67 and add this place-holder instead: 91 68 92 69 #XXX version-specific blurb XXX# 70 71 - Commit the changes:: 72 73 $ git commit -a -m"Post X.Y.Z release actions done" 74 $ git push 93 75 94 76 -
thirdparty/blosc/blosc.c
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library3 4 Author: Francesc Alted <f [email protected]>2 Blosc - Blocked Shuffling and Compression Library 3 4 Author: Francesc Alted <f[email protected]> 5 5 Creation date: 2009-05-20 6 6 … … 9 9 10 10 11 #include <stdio.h> 11 12 #include <stdlib.h> 12 #include < stdio.h>13 #include <errno.h> 13 14 #include <string.h> 14 15 #include <sys/types.h> 15 16 #include <sys/stat.h> 16 17 #include <assert.h> 18 #if defined(USING_CMAKE) 19 #include "config.h" 20 #endif /* USING_CMAKE */ 17 21 #include "blosc.h" 22 #include "shuffle.h" 18 23 #include "blosclz.h" 19 #include "shuffle.h" 24 #if defined(HAVE_LZ4) 25 #include "lz4.h" 26 #include "lz4hc.h" 27 #endif /* HAVE_LZ4 */ 28 #if defined(HAVE_SNAPPY) 29 #include "snappy-c.h" 30 #endif /* HAVE_SNAPPY */ 31 #if defined(HAVE_ZLIB) 32 #include "zlib.h" 33 #endif /* HAVE_ZLIB */ 34 #if defined(HAVE_ZSTD) 35 #include "zstd.h" 36 #endif /* HAVE_ZSTD */ 20 37 21 38 #if defined(_WIN32) && !defined(__MINGW32__) 22 39 #include <windows.h> 23 #include "win32/stdint-windows.h" 40 #include <malloc.h> 41 42 /* stdint.h only available in VS2010 (VC++ 16.0) and newer */ 43 #if defined(_MSC_VER) && _MSC_VER < 1600 44 #include "win32/stdint-windows.h" 45 #else 46 #include <stdint.h> 47 #endif 48 24 49 #include <process.h> 25 50 #define getpid _getpid … … 30 55 #endif /* _WIN32 */ 31 56 32 #if defined(_WIN32) 57 #if defined(_WIN32) && !defined(__GNUC__) 33 58 #include "win32/pthread.h" 34 59 #include "win32/pthread.c" … … 37 62 #endif 38 63 64 /* If C11 is supported, use it's built-in aligned allocation. */ 65 #if __STDC_VERSION__ >= 201112L 66 #include <stdalign.h> 67 #endif 68 39 69 40 70 /* Some useful units */ … … 50 80 /* The size of L1 cache. 32 KB is quite common nowadays. */ 51 81 #define L1 (32*KB) 52 53 /* Wrapped function to adjust the number of threads used by blosc */54 int blosc_set_nthreads_(int);55 56 /* Global variables for main logic */57 static int32_t init_temps_done = 0; /* temp for compr/decompr initialized? */58 static int32_t force_blocksize = 0; /* force the use of a blocksize? */59 static int pid = 0; /* the PID for this process */60 static int init_lib = 0; /* is library initalized? */61 62 /* Global variables for threads */63 static int32_t nthreads = 1; /* number of desired threads in pool */64 static int32_t init_threads_done = 0; /* pool of threads initialized? */65 static int32_t end_threads = 0; /* should exisiting threads end? */66 static int32_t init_sentinels_done = 0; /* sentinels initialized? */67 static int32_t giveup_code; /* error code when give up */68 static int32_t nblock; /* block counter */69 static pthread_t threads[BLOSC_MAX_THREADS]; /* opaque structure for threads */70 static int32_t tids[BLOSC_MAX_THREADS]; /* ID per each thread */71 #if !defined(_WIN32)72 static pthread_attr_t ct_attr; /* creation time attrs for threads */73 #endif74 82 75 83 /* Have problems using posix barriers when symbol value is 200112L */ … … 78 86 #define _POSIX_BARRIERS_MINE 79 87 #endif 80 81 88 /* Synchronization variables */ 82 static pthread_mutex_t count_mutex; 89 90 91 struct blosc_context { 92 int32_t compress; /* 1 if we are doing compression 0 if decompress */ 93 94 const uint8_t* src; 95 uint8_t* dest; /* The current pos in the destination buffer */ 96 uint8_t* header_flags; /* Flags for header. Currently booked: 97 - 0: byte-shuffled? 98 - 1: memcpy'ed? 99 - 2: bit-shuffled? */ 100 int32_t sourcesize; /* Number of bytes in source buffer (or uncompressed bytes in compressed file) */ 101 int32_t nblocks; /* Number of total blocks in buffer */ 102 int32_t leftover; /* Extra bytes at end of buffer */ 103 int32_t blocksize; /* Length of the block in bytes */ 104 int32_t typesize; /* Type size */ 105 int32_t num_output_bytes; /* Counter for the number of output bytes */ 106 int32_t destsize; /* Maximum size for destination buffer */ 107 uint8_t* bstarts; /* Start of the buffer past header info */ 108 int32_t compcode; /* Compressor code to use */ 109 int clevel; /* Compression level (1-9) */ 110 111 /* Threading */ 112 int32_t numthreads; 113 int32_t threads_started; 114 int32_t end_threads; 115 pthread_t threads[BLOSC_MAX_THREADS]; 116 int32_t tids[BLOSC_MAX_THREADS]; 117 pthread_mutex_t count_mutex; 118 #ifdef _POSIX_BARRIERS_MINE 119 pthread_barrier_t barr_init; 120 pthread_barrier_t barr_finish; 121 #else 122 int32_t count_threads; 123 pthread_mutex_t count_threads_mutex; 124 pthread_cond_t count_threads_cv; 125 #endif 126 #if !defined(_WIN32) 127 pthread_attr_t ct_attr; /* creation time attrs for threads */ 128 #endif 129 int32_t thread_giveup_code; /* error code when give up */ 130 int32_t thread_nblock; /* block counter */ 131 }; 132 133 struct thread_context { 134 struct blosc_context* parent_context; 135 int32_t tid; 136 uint8_t* tmp; 137 uint8_t* tmp2; 138 uint8_t* tmp3; 139 int32_t tmpblocksize; /* Used to keep track of how big the temporary buffers are */ 140 }; 141 142 /* Global context for non-contextual API */ 143 static struct blosc_context* g_global_context; 83 144 static pthread_mutex_t global_comp_mutex; 84 #ifdef _POSIX_BARRIERS_MINE 85 static pthread_barrier_t barr_init; 86 static pthread_barrier_t barr_finish; 87 #else 88 static int32_t count_threads; 89 static pthread_mutex_t count_threads_mutex; 90 static pthread_cond_t count_threads_cv; 91 #endif 92 93 94 /* Structure for parameters in (de-)compression threads */ 95 static struct thread_data { 96 int32_t typesize; 97 int32_t blocksize; 98 int32_t compress; 99 int32_t clevel; 100 int32_t flags; 101 int32_t memcpyed; 102 int32_t ntbytes; 103 int32_t nbytes; 104 int32_t maxbytes; 105 int32_t nblocks; 106 int32_t leftover; 107 int32_t *bstarts; /* start pointers for each block */ 108 uint8_t *src; 109 uint8_t *dest; 110 uint8_t *tmp[BLOSC_MAX_THREADS]; 111 uint8_t *tmp2[BLOSC_MAX_THREADS]; 112 } params; 113 114 115 /* Structure for parameters meant for keeping track of current temporaries */ 116 static struct temp_data { 117 int32_t nthreads; 118 int32_t typesize; 119 int32_t blocksize; 120 } current_temp; 121 145 static int32_t g_compressor = BLOSC_BLOSCLZ; /* the compressor to use by default */ 146 static int32_t g_threads = 1; 147 static int32_t g_force_blocksize = 0; 148 static int32_t g_initlib = 0; 149 150 151 152 /* Wrapped function to adjust the number of threads used by blosc */ 153 int blosc_set_nthreads_(struct blosc_context*); 154 155 /* Releases the global threadpool */ 156 int blosc_release_threadpool(struct blosc_context* context); 122 157 123 158 /* Macros for synchronization */ … … 125 160 /* Wait until all threads are initialized */ 126 161 #ifdef _POSIX_BARRIERS_MINE 127 static int rc; 128 #define WAIT_INIT \ 129 rc = pthread_barrier_wait(&barr_init); \ 162 #define WAIT_INIT(RET_VAL, CONTEXT_PTR) \ 163 rc = pthread_barrier_wait(&CONTEXT_PTR->barr_init); \ 130 164 if (rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD) { \ 131 printf("Could not wait on barrier (init) \n"); \132 return( -1);\165 printf("Could not wait on barrier (init): %d\n", rc); \ 166 return((RET_VAL)); \ 133 167 } 134 168 #else 135 #define WAIT_INIT \136 pthread_mutex_lock(& count_threads_mutex); \137 if ( count_threads < nthreads) { \138 count_threads++;\139 pthread_cond_wait(& count_threads_cv, &count_threads_mutex); \169 #define WAIT_INIT(RET_VAL, CONTEXT_PTR) \ 170 pthread_mutex_lock(&CONTEXT_PTR->count_threads_mutex); \ 171 if (CONTEXT_PTR->count_threads < CONTEXT_PTR->numthreads) { \ 172 CONTEXT_PTR->count_threads++; \ 173 pthread_cond_wait(&CONTEXT_PTR->count_threads_cv, &CONTEXT_PTR->count_threads_mutex); \ 140 174 } \ 141 175 else { \ 142 pthread_cond_broadcast(& count_threads_cv); \176 pthread_cond_broadcast(&CONTEXT_PTR->count_threads_cv); \ 143 177 } \ 144 pthread_mutex_unlock(& count_threads_mutex);178 pthread_mutex_unlock(&CONTEXT_PTR->count_threads_mutex); 145 179 #endif 146 180 147 181 /* Wait for all threads to finish */ 148 182 #ifdef _POSIX_BARRIERS_MINE 149 #define WAIT_FINISH \150 rc = pthread_barrier_wait(& barr_finish); \183 #define WAIT_FINISH(RET_VAL, CONTEXT_PTR) \ 184 rc = pthread_barrier_wait(&CONTEXT_PTR->barr_finish); \ 151 185 if (rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD) { \ 152 186 printf("Could not wait on barrier (finish)\n"); \ 153 return( -1);\187 return((RET_VAL)); \ 154 188 } 155 189 #else 156 #define WAIT_FINISH \157 pthread_mutex_lock(& count_threads_mutex); \158 if ( count_threads > 0) { \159 count_threads--; \160 pthread_cond_wait(& count_threads_cv, &count_threads_mutex); \190 #define WAIT_FINISH(RET_VAL, CONTEXT_PTR) \ 191 pthread_mutex_lock(&CONTEXT_PTR->count_threads_mutex); \ 192 if (CONTEXT_PTR->count_threads > 0) { \ 193 CONTEXT_PTR->count_threads--; \ 194 pthread_cond_wait(&CONTEXT_PTR->count_threads_cv, &CONTEXT_PTR->count_threads_mutex); \ 161 195 } \ 162 196 else { \ 163 pthread_cond_broadcast(& count_threads_cv); \197 pthread_cond_broadcast(&CONTEXT_PTR->count_threads_cv); \ 164 198 } \ 165 pthread_mutex_unlock(& count_threads_mutex);199 pthread_mutex_unlock(&CONTEXT_PTR->count_threads_mutex); 166 200 #endif 167 201 … … 173 207 int res = 0; 174 208 175 #if defined(_WIN32) 209 /* Do an alignment to 32 bytes because AVX2 is supported */ 210 #if _ISOC11_SOURCE 211 /* C11 aligned allocation. 'size' must be a multiple of the alignment. */ 212 block = aligned_alloc(32, size); 213 #elif defined(_WIN32) 176 214 /* A (void *) cast needed for avoiding a warning with MINGW :-/ */ 177 block = (void *)_aligned_malloc(size, 16);215 block = (void *)_aligned_malloc(size, 32); 178 216 #elif defined __APPLE__ 179 217 /* Mac OS X guarantees 16-byte alignment in small allocs */ … … 181 219 #elif _POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600 182 220 /* Platform does have an implementation of posix_memalign */ 183 res = posix_memalign(&block, 16, size);221 res = posix_memalign(&block, 32, size); 184 222 #else 185 223 block = malloc(size); … … 206 244 207 245 208 /* If `a` is little-endian, return it as-is. If not, return a copy, 209 with the endianness changed */ 210 static int32_t sw32(int32_t a) 211 { 212 int32_t tmp; 213 char *pa = (char *)&a; 214 char *ptmp = (char *)&tmp; 246 /* Copy 4 bytes from `*pa` to int32_t, changing endianness if necessary. */ 247 static int32_t sw32_(const uint8_t *pa) 248 { 249 int32_t idest; 250 uint8_t *dest = (uint8_t *)&idest; 215 251 int i = 1; /* for big/little endian detection */ 216 252 char *p = (char *)&i; … … 218 254 if (p[0] != 1) { 219 255 /* big endian */ 220 ptmp[0] = pa[3]; 221 ptmp[1] = pa[2]; 222 ptmp[2] = pa[1]; 223 ptmp[3] = pa[0]; 224 return tmp; 256 dest[0] = pa[3]; 257 dest[1] = pa[2]; 258 dest[2] = pa[1]; 259 dest[3] = pa[0]; 225 260 } 226 261 else { 227 262 /* little endian */ 228 return a; 229 } 230 } 231 263 dest[0] = pa[0]; 264 dest[1] = pa[1]; 265 dest[2] = pa[2]; 266 dest[3] = pa[3]; 267 } 268 return idest; 269 } 270 271 272 /* Copy 4 bytes from `*pa` to `*dest`, changing endianness if necessary. */ 273 static void _sw32(uint8_t* dest, int32_t a) 274 { 275 uint8_t *pa = (uint8_t *)&a; 276 int i = 1; /* for big/little endian detection */ 277 char *p = (char *)&i; 278 279 if (p[0] != 1) { 280 /* big endian */ 281 dest[0] = pa[3]; 282 dest[1] = pa[2]; 283 dest[2] = pa[1]; 284 dest[3] = pa[0]; 285 } 286 else { 287 /* little endian */ 288 dest[0] = pa[0]; 289 dest[1] = pa[1]; 290 dest[2] = pa[2]; 291 dest[3] = pa[3]; 292 } 293 } 294 295 296 /* 297 * Conversion routines between compressor and compression libraries 298 */ 299 300 /* Return the library code associated with the compressor name */ 301 static int compname_to_clibcode(const char *compname) 302 { 303 if (strcmp(compname, BLOSC_BLOSCLZ_COMPNAME) == 0) 304 return BLOSC_BLOSCLZ_LIB; 305 if (strcmp(compname, BLOSC_LZ4_COMPNAME) == 0) 306 return BLOSC_LZ4_LIB; 307 if (strcmp(compname, BLOSC_LZ4HC_COMPNAME) == 0) 308 return BLOSC_LZ4_LIB; 309 if (strcmp(compname, BLOSC_SNAPPY_COMPNAME) == 0) 310 return BLOSC_SNAPPY_LIB; 311 if (strcmp(compname, BLOSC_ZLIB_COMPNAME) == 0) 312 return BLOSC_ZLIB_LIB; 313 if (strcmp(compname, BLOSC_ZSTD_COMPNAME) == 0) 314 return BLOSC_ZSTD_LIB; 315 return -1; 316 } 317 318 /* Return the library name associated with the compressor code */ 319 static char *clibcode_to_clibname(int clibcode) 320 { 321 if (clibcode == BLOSC_BLOSCLZ_LIB) return BLOSC_BLOSCLZ_LIBNAME; 322 if (clibcode == BLOSC_LZ4_LIB) return BLOSC_LZ4_LIBNAME; 323 if (clibcode == BLOSC_SNAPPY_LIB) return BLOSC_SNAPPY_LIBNAME; 324 if (clibcode == BLOSC_ZLIB_LIB) return BLOSC_ZLIB_LIBNAME; 325 if (clibcode == BLOSC_ZSTD_LIB) return BLOSC_ZSTD_LIBNAME; 326 return NULL; /* should never happen */ 327 } 328 329 330 /* 331 * Conversion routines between compressor names and compressor codes 332 */ 333 334 /* Get the compressor name associated with the compressor code */ 335 int blosc_compcode_to_compname(int compcode, char **compname) 336 { 337 int code = -1; /* -1 means non-existent compressor code */ 338 char *name = NULL; 339 340 /* Map the compressor code */ 341 if (compcode == BLOSC_BLOSCLZ) 342 name = BLOSC_BLOSCLZ_COMPNAME; 343 else if (compcode == BLOSC_LZ4) 344 name = BLOSC_LZ4_COMPNAME; 345 else if (compcode == BLOSC_LZ4HC) 346 name = BLOSC_LZ4HC_COMPNAME; 347 else if (compcode == BLOSC_SNAPPY) 348 name = BLOSC_SNAPPY_COMPNAME; 349 else if (compcode == BLOSC_ZLIB) 350 name = BLOSC_ZLIB_COMPNAME; 351 else if (compcode == BLOSC_ZSTD) 352 name = BLOSC_ZSTD_COMPNAME; 353 354 *compname = name; 355 356 /* Guess if there is support for this code */ 357 if (compcode == BLOSC_BLOSCLZ) 358 code = BLOSC_BLOSCLZ; 359 #if defined(HAVE_LZ4) 360 else if (compcode == BLOSC_LZ4) 361 code = BLOSC_LZ4; 362 else if (compcode == BLOSC_LZ4HC) 363 code = BLOSC_LZ4HC; 364 #endif /* HAVE_LZ4 */ 365 #if defined(HAVE_SNAPPY) 366 else if (compcode == BLOSC_SNAPPY) 367 code = BLOSC_SNAPPY; 368 #endif /* HAVE_SNAPPY */ 369 #if defined(HAVE_ZLIB) 370 else if (compcode == BLOSC_ZLIB) 371 code = BLOSC_ZLIB; 372 #endif /* HAVE_ZLIB */ 373 #if defined(HAVE_ZSTD) 374 else if (compcode == BLOSC_ZSTD) 375 code = BLOSC_ZSTD; 376 #endif /* HAVE_ZSTD */ 377 378 return code; 379 } 380 381 /* Get the compressor code for the compressor name. -1 if it is not available */ 382 int blosc_compname_to_compcode(const char *compname) 383 { 384 int code = -1; /* -1 means non-existent compressor code */ 385 386 if (strcmp(compname, BLOSC_BLOSCLZ_COMPNAME) == 0) { 387 code = BLOSC_BLOSCLZ; 388 } 389 #if defined(HAVE_LZ4) 390 else if (strcmp(compname, BLOSC_LZ4_COMPNAME) == 0) { 391 code = BLOSC_LZ4; 392 } 393 else if (strcmp(compname, BLOSC_LZ4HC_COMPNAME) == 0) { 394 code = BLOSC_LZ4HC; 395 } 396 #endif /* HAVE_LZ4 */ 397 #if defined(HAVE_SNAPPY) 398 else if (strcmp(compname, BLOSC_SNAPPY_COMPNAME) == 0) { 399 code = BLOSC_SNAPPY; 400 } 401 #endif /* HAVE_SNAPPY */ 402 #if defined(HAVE_ZLIB) 403 else if (strcmp(compname, BLOSC_ZLIB_COMPNAME) == 0) { 404 code = BLOSC_ZLIB; 405 } 406 #endif /* HAVE_ZLIB */ 407 #if defined(HAVE_ZSTD) 408 else if (strcmp(compname, BLOSC_ZSTD_COMPNAME) == 0) { 409 code = BLOSC_ZSTD; 410 } 411 #endif /* HAVE_ZSTD */ 412 413 return code; 414 } 415 416 417 #if defined(HAVE_LZ4) 418 static int lz4_wrap_compress(const char* input, size_t input_length, 419 char* output, size_t maxout, int accel) 420 { 421 int cbytes; 422 cbytes = LZ4_compress_fast(input, output, (int)input_length, (int)maxout, 423 accel); 424 return cbytes; 425 } 426 427 static int lz4hc_wrap_compress(const char* input, size_t input_length, 428 char* output, size_t maxout, int clevel) 429 { 430 int cbytes; 431 if (input_length > (size_t)(2<<30)) 432 return -1; /* input larger than 1 GB is not supported */ 433 /* clevel for lz4hc goes up to 16, at least in LZ4 1.1.3 */ 434 cbytes = LZ4_compressHC2_limitedOutput(input, output, (int)input_length, 435 (int)maxout, clevel*2-1); 436 return cbytes; 437 } 438 439 static int lz4_wrap_decompress(const char* input, size_t compressed_length, 440 char* output, size_t maxout) 441 { 442 size_t cbytes; 443 cbytes = LZ4_decompress_fast(input, output, (int)maxout); 444 if (cbytes != compressed_length) { 445 return 0; 446 } 447 return (int)maxout; 448 } 449 450 #endif /* HAVE_LZ4 */ 451 452 #if defined(HAVE_SNAPPY) 453 static int snappy_wrap_compress(const char* input, size_t input_length, 454 char* output, size_t maxout) 455 { 456 snappy_status status; 457 size_t cl = maxout; 458 status = snappy_compress(input, input_length, output, &cl); 459 if (status != SNAPPY_OK){ 460 return 0; 461 } 462 return (int)cl; 463 } 464 465 static int snappy_wrap_decompress(const char* input, size_t compressed_length, 466 char* output, size_t maxout) 467 { 468 snappy_status status; 469 size_t ul = maxout; 470 status = snappy_uncompress(input, compressed_length, output, &ul); 471 if (status != SNAPPY_OK){ 472 return 0; 473 } 474 return (int)ul; 475 } 476 #endif /* HAVE_SNAPPY */ 477 478 #if defined(HAVE_ZLIB) 479 /* zlib is not very respectful with sharing name space with others. 480 Fortunately, its names do not collide with those already in blosc. */ 481 static int zlib_wrap_compress(const char* input, size_t input_length, 482 char* output, size_t maxout, int clevel) 483 { 484 int status; 485 uLongf cl = maxout; 486 status = compress2( 487 (Bytef*)output, &cl, (Bytef*)input, (uLong)input_length, clevel); 488 if (status != Z_OK){ 489 return 0; 490 } 491 return (int)cl; 492 } 493 494 static int zlib_wrap_decompress(const char* input, size_t compressed_length, 495 char* output, size_t maxout) 496 { 497 int status; 498 uLongf ul = maxout; 499 status = uncompress( 500 (Bytef*)output, &ul, (Bytef*)input, (uLong)compressed_length); 501 if (status != Z_OK){ 502 return 0; 503 } 504 return (int)ul; 505 } 506 #endif /* HAVE_ZLIB */ 507 508 #if defined(HAVE_ZSTD) 509 static int zstd_wrap_compress(const char* input, size_t input_length, 510 char* output, size_t maxout, int clevel) { 511 size_t code; 512 // clevel = (clevel < 9) ? clevel * 2 - 1 : ZSTD_maxCLevel(); // see zstd#254 513 clevel = (clevel < 9) ? clevel * 2 - 1 : 22; 514 code = ZSTD_compress( 515 (void*)output, maxout, (void*)input, input_length, clevel); 516 if (ZSTD_isError(code)) { 517 return 0; 518 } 519 return (int)code; 520 } 521 522 static int zstd_wrap_decompress(const char* input, size_t compressed_length, 523 char* output, size_t maxout) { 524 size_t code; 525 code = ZSTD_decompress( 526 (void*)output, maxout, (void*)input, compressed_length); 527 if (ZSTD_isError(code)) { 528 fprintf(stderr, "error decompressing with Zstd: %s \n", ZSTD_getErrorName(code)); 529 return 0; 530 } 531 return (int)code; 532 } 533 #endif /* HAVE_ZSTD */ 534 535 /* Compute acceleration for blosclz */ 536 static int get_accel(const struct blosc_context* context) { 537 int32_t clevel = context->clevel; 538 int32_t typesize = context->typesize; 539 540 if (clevel == 9) { 541 return 1; 542 } 543 if (context->compcode == BLOSC_BLOSCLZ) { 544 /* Compute the power of 2. See: 545 * http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/ 546 */ 547 int32_t tspow2 = ((typesize != 0) && !(typesize & (typesize - 1))); 548 if (tspow2 && typesize < 32) { 549 return 32; 550 } 551 } 552 else if (context->compcode == BLOSC_LZ4) { 553 /* This acceleration setting based on discussions held in: 554 * https://groups.google.com/forum/#!topic/lz4c/zosy90P8MQw 555 */ 556 return (10 - clevel); 557 } 558 return 1; 559 } 232 560 233 561 /* Shuffle & compress a single block */ 234 static int blosc_c(int32_t blocksize, int32_t leftoverblock, 235 int32_t ntbytes, int32_t maxbytes, 236 uint8_t *src, uint8_t *dest, uint8_t *tmp) 562 static int blosc_c(const struct blosc_context* context, int32_t blocksize, 563 int32_t leftoverblock, int32_t ntbytes, int32_t maxbytes, 564 const uint8_t *src, uint8_t *dest, uint8_t *tmp, 565 uint8_t *tmp2) 237 566 { 238 567 int32_t j, neblock, nsplits; … … 240 569 int32_t ctbytes = 0; /* number of compressed bytes in block */ 241 570 int32_t maxout; 242 int32_t typesize = params.typesize; 243 uint8_t *_tmp; 244 245 if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 246 /* Shuffle this block (this makes sense only if typesize > 1) */ 571 int32_t typesize = context->typesize; 572 const uint8_t *_tmp = src; 573 char *compname; 574 int accel; 575 int bscount; 576 577 if (*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) { 578 /* Byte shuffling only makes sense if typesize > 1 */ 247 579 shuffle(typesize, blocksize, src, tmp); 248 580 _tmp = tmp; 249 581 } 250 else { 251 _tmp = src; 252 } 582 /* We don't allow more than 1 filter at the same time (yet) */ 583 else if (*(context->header_flags) & BLOSC_DOBITSHUFFLE) { 584 bscount = bitshuffle(typesize, blocksize, src, tmp, tmp2); 585 if (bscount < 0) 586 return bscount; 587 _tmp = tmp; 588 } 589 590 /* Calculate acceleration for different compressors */ 591 accel = get_accel(context); 253 592 254 593 /* Compress for each shuffled slice split for this block. */ … … 268 607 ctbytes += (int32_t)sizeof(int32_t); 269 608 maxout = neblock; 609 #if defined(HAVE_SNAPPY) 610 if (context->compcode == BLOSC_SNAPPY) { 611 /* TODO perhaps refactor this to keep the value stashed somewhere */ 612 maxout = snappy_max_compressed_length(neblock); 613 } 614 #endif /* HAVE_SNAPPY */ 270 615 if (ntbytes+maxout > maxbytes) { 271 616 maxout = maxbytes - ntbytes; /* avoid buffer overrun */ … … 274 619 } 275 620 } 276 cbytes = blosclz_compress(params.clevel, _tmp+j*neblock, neblock, 277 dest, maxout); 278 if (cbytes >= maxout) { 279 /* Buffer overrun caused by blosclz_compress (should never happen) */ 621 if (context->compcode == BLOSC_BLOSCLZ) { 622 cbytes = blosclz_compress(context->clevel, _tmp+j*neblock, neblock, 623 dest, maxout, accel); 624 } 625 #if defined(HAVE_LZ4) 626 else if (context->compcode == BLOSC_LZ4) { 627 cbytes = lz4_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 628 (char *)dest, (size_t)maxout, accel); 629 } 630 else if (context->compcode == BLOSC_LZ4HC) { 631 cbytes = lz4hc_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 632 (char *)dest, (size_t)maxout, 633 context->clevel); 634 } 635 #endif /* HAVE_LZ4 */ 636 #if defined(HAVE_SNAPPY) 637 else if (context->compcode == BLOSC_SNAPPY) { 638 cbytes = snappy_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 639 (char *)dest, (size_t)maxout); 640 } 641 #endif /* HAVE_SNAPPY */ 642 #if defined(HAVE_ZLIB) 643 else if (context->compcode == BLOSC_ZLIB) { 644 cbytes = zlib_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 645 (char *)dest, (size_t)maxout, 646 context->clevel); 647 } 648 #endif /* HAVE_ZLIB */ 649 #if defined(HAVE_ZSTD) 650 else if (context->compcode == BLOSC_ZSTD) { 651 cbytes = zstd_wrap_compress((char*)_tmp + j * neblock, (size_t)neblock, 652 (char*)dest, (size_t)maxout, context->clevel); 653 } 654 #endif /* HAVE_ZSTD */ 655 656 else { 657 blosc_compcode_to_compname(context->compcode, &compname); 658 fprintf(stderr, "Blosc has not been compiled with '%s' ", compname); 659 fprintf(stderr, "compression support. Please use one having it."); 660 return -5; /* signals no compression support */ 661 } 662 663 if (cbytes > maxout) { 664 /* Buffer overrun caused by compression (should never happen) */ 280 665 return -1; 281 666 } … … 284 669 return -2; 285 670 } 286 else if (cbytes == 0 ) {287 /* The compressor has been unable to compress data significantly. */671 else if (cbytes == 0 || cbytes == neblock) { 672 /* The compressor has been unable to compress data at all. */ 288 673 /* Before doing the copy, check that we are not running into a 289 674 buffer overflow. */ … … 294 679 cbytes = neblock; 295 680 } 296 ((int32_t *)(dest))[-1] = sw32(cbytes);681 _sw32(dest - 4, cbytes); 297 682 dest += cbytes; 298 683 ntbytes += cbytes; … … 303 688 } 304 689 305 306 690 /* Decompress & unshuffle a single block */ 307 static int blosc_d( int32_t blocksize, int32_t leftoverblock,308 uint8_t *src, uint8_t *dest, uint8_t *tmp, uint8_t *tmp2)691 static int blosc_d(struct blosc_context* context, int32_t blocksize, int32_t leftoverblock, 692 const uint8_t *src, uint8_t *dest, uint8_t *tmp, uint8_t *tmp2) 309 693 { 310 694 int32_t j, neblock, nsplits; … … 313 697 int32_t ctbytes = 0; /* number of compressed bytes in block */ 314 698 int32_t ntbytes = 0; /* number of uncompressed bytes in block */ 315 uint8_t *_tmp; 316 int32_t typesize = params.typesize; 317 318 if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 699 uint8_t *_tmp = dest; 700 int32_t typesize = context->typesize; 701 int32_t compformat; 702 char *compname; 703 int bscount; 704 705 if ((*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) || \ 706 (*(context->header_flags) & BLOSC_DOBITSHUFFLE)) { 319 707 _tmp = tmp; 320 708 } 321 else { 322 _tmp = dest; 323 } 709 710 compformat = (*(context->header_flags) & 0xe0) >> 5; 324 711 325 712 /* Compress for each shuffled slice split for this block. */ … … 333 720 neblock = blocksize / nsplits; 334 721 for (j = 0; j < nsplits; j++) { 335 cbytes = sw32 (((int32_t *)(src))[0]);/* amount of compressed bytes */722 cbytes = sw32_(src); /* amount of compressed bytes */ 336 723 src += sizeof(int32_t); 337 724 ctbytes += (int32_t)sizeof(int32_t); … … 342 729 } 343 730 else { 344 nbytes = blosclz_decompress(src, cbytes, _tmp, neblock); 731 if (compformat == BLOSC_BLOSCLZ_FORMAT) { 732 nbytes = blosclz_decompress(src, cbytes, _tmp, neblock); 733 } 734 #if defined(HAVE_LZ4) 735 else if (compformat == BLOSC_LZ4_FORMAT) { 736 nbytes = lz4_wrap_decompress((char *)src, (size_t)cbytes, 737 (char*)_tmp, (size_t)neblock); 738 } 739 #endif /* HAVE_LZ4 */ 740 #if defined(HAVE_SNAPPY) 741 else if (compformat == BLOSC_SNAPPY_FORMAT) { 742 nbytes = snappy_wrap_decompress((char *)src, (size_t)cbytes, 743 (char*)_tmp, (size_t)neblock); 744 } 745 #endif /* HAVE_SNAPPY */ 746 #if defined(HAVE_ZLIB) 747 else if (compformat == BLOSC_ZLIB_FORMAT) { 748 nbytes = zlib_wrap_decompress((char *)src, (size_t)cbytes, 749 (char*)_tmp, (size_t)neblock); 750 } 751 #endif /* HAVE_ZLIB */ 752 #if defined(HAVE_ZSTD) 753 else if (compformat == BLOSC_ZSTD_FORMAT) { 754 nbytes = zstd_wrap_decompress((char*)src, (size_t)cbytes, 755 (char*)_tmp, (size_t)neblock); 756 } 757 #endif /* HAVE_ZSTD */ 758 else { 759 compname = clibcode_to_clibname(compformat); 760 fprintf(stderr, 761 "Blosc has not been compiled with decompression " 762 "support for '%s' format. ", compname); 763 fprintf(stderr, "Please recompile for adding this support.\n"); 764 return -5; /* signals no decompression support */ 765 } 766 767 /* Check that decompressed bytes number is correct */ 345 768 if (nbytes != neblock) { 346 return -2; 347 } 769 return -2; 770 } 771 348 772 } 349 773 src += cbytes; … … 353 777 } /* Closes j < nsplits */ 354 778 355 if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 356 if ((uintptr_t)dest % 16 == 0) { 357 /* 16-bytes aligned dest. SSE2 unshuffle will work. */ 358 unshuffle(typesize, blocksize, tmp, dest); 359 } 360 else { 361 /* dest is not aligned. Use tmp2, which is aligned, and copy. */ 362 unshuffle(typesize, blocksize, tmp, tmp2); 363 if (tmp2 != dest) { 364 /* Copy only when dest is not tmp2 (e.g. not blosc_getitem()) */ 365 memcpy(dest, tmp2, blocksize); 366 } 367 } 779 if (*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) { 780 unshuffle(typesize, blocksize, tmp, dest); 781 } 782 else if (*(context->header_flags) & BLOSC_DOBITSHUFFLE) { 783 bscount = bitunshuffle(typesize, blocksize, tmp, dest, tmp2); 784 if (bscount < 0) 785 return bscount; 368 786 } 369 787 … … 374 792 375 793 /* Serial version for compression/decompression */ 376 static int serial_blosc( void)794 static int serial_blosc(struct blosc_context* context) 377 795 { 378 796 int32_t j, bsize, leftoverblock; 379 797 int32_t cbytes; 380 int32_t compress = params.compress; 381 int32_t blocksize = params.blocksize; 382 int32_t ntbytes = params.ntbytes; 383 int32_t flags = params.flags; 384 int32_t maxbytes = params.maxbytes; 385 int32_t nblocks = params.nblocks; 386 int32_t leftover = params.nbytes % params.blocksize; 387 int32_t *bstarts = params.bstarts; 388 uint8_t *src = params.src; 389 uint8_t *dest = params.dest; 390 uint8_t *tmp = params.tmp[0]; /* tmp for thread 0 */ 391 uint8_t *tmp2 = params.tmp2[0]; /* tmp2 for thread 0 */ 392 393 for (j = 0; j < nblocks; j++) { 394 if (compress && !(flags & BLOSC_MEMCPYED)) { 395 bstarts[j] = sw32(ntbytes); 396 } 397 bsize = blocksize; 798 799 int32_t ebsize = context->blocksize + context->typesize * (int32_t)sizeof(int32_t); 800 int32_t ntbytes = context->num_output_bytes; 801 802 uint8_t *tmp = my_malloc(context->blocksize + ebsize); 803 uint8_t *tmp2 = tmp + context->blocksize; 804 805 for (j = 0; j < context->nblocks; j++) { 806 if (context->compress && !(*(context->header_flags) & BLOSC_MEMCPYED)) { 807 _sw32(context->bstarts + j * 4, ntbytes); 808 } 809 bsize = context->blocksize; 398 810 leftoverblock = 0; 399 if ((j == nblocks - 1) && (leftover > 0)) {400 bsize = leftover;811 if ((j == context->nblocks - 1) && (context->leftover > 0)) { 812 bsize = context->leftover; 401 813 leftoverblock = 1; 402 814 } 403 if (co mpress) {404 if ( flags& BLOSC_MEMCPYED) {815 if (context->compress) { 816 if (*(context->header_flags) & BLOSC_MEMCPYED) { 405 817 /* We want to memcpy only */ 406 memcpy(dest+BLOSC_MAX_OVERHEAD+j*blocksize, src+j*blocksize, bsize); 818 memcpy(context->dest+BLOSC_MAX_OVERHEAD+j*context->blocksize, 819 context->src+j*context->blocksize, 820 bsize); 407 821 cbytes = bsize; 408 822 } 409 823 else { 410 824 /* Regular compression */ 411 cbytes = blosc_c(bsize, leftoverblock, ntbytes, maxbytes, 412 src+j*blocksize, dest+ntbytes, tmp); 825 cbytes = blosc_c(context, bsize, leftoverblock, ntbytes, 826 context->destsize, context->src+j*context->blocksize, 827 context->dest+ntbytes, tmp, tmp2); 413 828 if (cbytes == 0) { 414 829 ntbytes = 0; /* uncompressible data */ … … 418 833 } 419 834 else { 420 if ( flags& BLOSC_MEMCPYED) {835 if (*(context->header_flags) & BLOSC_MEMCPYED) { 421 836 /* We want to memcpy only */ 422 memcpy(dest+j*blocksize, src+BLOSC_MAX_OVERHEAD+j*blocksize, bsize); 837 memcpy(context->dest+j*context->blocksize, 838 context->src+BLOSC_MAX_OVERHEAD+j*context->blocksize, 839 bsize); 423 840 cbytes = bsize; 424 841 } 425 842 else { 426 843 /* Regular decompression */ 427 cbytes = blosc_d(bsize, leftoverblock, 428 src+sw32(bstarts[j]), dest+j*blocksize, tmp, tmp2); 844 cbytes = blosc_d(context, bsize, leftoverblock, 845 context->src + sw32_(context->bstarts + j * 4), 846 context->dest+j*context->blocksize, tmp, tmp2); 429 847 } 430 848 } … … 436 854 } 437 855 856 // Free temporaries 857 my_free(tmp); 858 438 859 return ntbytes; 439 860 } … … 441 862 442 863 /* Threaded version for compression/decompression */ 443 static int parallel_blosc(void) 444 { 864 static int parallel_blosc(struct blosc_context* context) 865 { 866 int rc; 445 867 446 868 /* Check whether we need to restart threads */ 447 if (!init_threads_done || pid != getpid()) { 448 blosc_set_nthreads_(nthreads); 449 } 869 blosc_set_nthreads_(context); 870 871 /* Set sentinels */ 872 context->thread_giveup_code = 1; 873 context->thread_nblock = -1; 450 874 451 875 /* Synchronization point for all threads (wait for initialization) */ 452 WAIT_INIT; 876 WAIT_INIT(-1, context); 877 453 878 /* Synchronization point for all threads (wait for finalization) */ 454 WAIT_FINISH ;455 456 if ( giveup_code > 0) {879 WAIT_FINISH(-1, context); 880 881 if (context->thread_giveup_code > 0) { 457 882 /* Return the total bytes (de-)compressed in threads */ 458 return params.ntbytes;883 return context->num_output_bytes; 459 884 } 460 885 else { 461 886 /* Compression/decompression gave up. Return error code. */ 462 return giveup_code; 463 } 464 } 465 466 467 /* Convenience functions for creating and releasing temporaries */ 468 static int create_temporaries(void) 469 { 470 int32_t tid; 471 int32_t typesize = params.typesize; 472 int32_t blocksize = params.blocksize; 473 /* Extended blocksize for temporary destination. Extended blocksize 474 is only useful for compression in parallel mode, but it doesn't 475 hurt serial mode either. */ 476 int32_t ebsize = blocksize + typesize*(int32_t)sizeof(int32_t); 477 478 /* Create temporary area for each thread */ 479 for (tid = 0; tid < nthreads; tid++) { 480 uint8_t *tmp = my_malloc(blocksize); 481 uint8_t *tmp2; 482 if (tmp == NULL) { 483 return -1; 484 } 485 params.tmp[tid] = tmp; 486 tmp2 = my_malloc(ebsize); 487 if (tmp2 == NULL) { 488 return -1; 489 } 490 params.tmp2[tid] = tmp2; 491 } 492 493 init_temps_done = 1; 494 /* Update params for current temporaries */ 495 current_temp.nthreads = nthreads; 496 current_temp.typesize = typesize; 497 current_temp.blocksize = blocksize; 498 return 0; 499 } 500 501 502 static void release_temporaries(void) 503 { 504 int32_t tid; 505 506 /* Release buffers */ 507 for (tid = 0; tid < nthreads; tid++) { 508 my_free(params.tmp[tid]); 509 my_free(params.tmp2[tid]); 510 } 511 512 init_temps_done = 0; 887 return context->thread_giveup_code; 888 } 513 889 } 514 890 … … 516 892 /* Do the compression or decompression of the buffer depending on the 517 893 global params. */ 518 static int do_job( void)894 static int do_job(struct blosc_context* context) 519 895 { 520 896 int32_t ntbytes; 521 522 /* Initialize/reset temporaries if needed */523 if (!init_temps_done) {524 int ret;525 ret = create_temporaries();526 if (ret < 0) {527 return -1;528 }529 }530 else if (current_temp.nthreads != nthreads ||531 current_temp.typesize != params.typesize ||532 current_temp.blocksize != params.blocksize) {533 int ret;534 release_temporaries();535 ret = create_temporaries();536 if (ret < 0) {537 return -1;538 }539 }540 897 541 898 /* Run the serial version when nthreads is 1 or when the buffers are 542 899 not much larger than blocksize */ 543 if ( nthreads == 1 || (params.nbytes / params.blocksize) <= 1) {544 ntbytes = serial_blosc( );900 if (context->numthreads == 1 || (context->sourcesize / context->blocksize) <= 1) { 901 ntbytes = serial_blosc(context); 545 902 } 546 903 else { 547 ntbytes = parallel_blosc( );904 ntbytes = parallel_blosc(context); 548 905 } 549 906 … … 552 909 553 910 554 static int32_t compute_blocksize(int32_t clevel, int32_t typesize, 555 int32_t nbytes) 911 static int32_t compute_blocksize(struct blosc_context* context, int32_t clevel, 912 int32_t typesize, int32_t nbytes, 913 int32_t forced_blocksize) 556 914 { 557 915 int32_t blocksize; … … 564 922 blocksize = nbytes; /* Start by a whole buffer as blocksize */ 565 923 566 if (force _blocksize) {567 blocksize = force _blocksize;568 /* Check that forced blocksize is not too small nor too large*/924 if (forced_blocksize) { 925 blocksize = forced_blocksize; 926 /* Check that forced blocksize is not too small */ 569 927 if (blocksize < MIN_BUFFERSIZE) { 570 928 blocksize = MIN_BUFFERSIZE; 571 929 } 572 930 } 573 else if (nbytes >= L1*4) { 574 blocksize = L1 * 4; 931 else if (nbytes >= L1) { 932 blocksize = L1; 933 934 /* For LZ4HC, increase the block sizes by a factor of 8 because it 935 is meant for compressing large blocks (it shows a big overhead 936 when compressing small ones). */ 937 if (context->compcode == BLOSC_LZ4HC) { 938 blocksize *= 8; 939 } 940 941 /* For Zlib, increase the block sizes by a factor of 8 because it 942 is meant for compressing large blocks (it shows a big overhead 943 when compressing small ones). */ 944 if (context->compcode == BLOSC_ZLIB) { 945 blocksize *= 8; 946 } 947 948 /* For Zstd, increase the block sizes by a factor of 8 because it 949 is meant for compressing large blocks (it shows a big overhead 950 when compressing small ones). */ 951 if (context->compcode == BLOSC_ZSTD) { 952 blocksize *= 8; 953 } 954 575 955 if (clevel == 0) { 576 blocksize /= 16;956 blocksize /= 4; 577 957 } 578 958 else if (clevel <= 3) { 579 blocksize /= 8;959 blocksize /= 2; 580 960 } 581 961 else if (clevel <= 5) { 582 blocksize /= 4;962 blocksize *= 1; 583 963 } 584 964 else if (clevel <= 6) { 585 blocksize /= 2;965 blocksize *= 2; 586 966 } 587 967 else if (clevel < 9) { 588 blocksize *= 1;968 blocksize *= 4; 589 969 } 590 970 else { 591 blocksize *= 2;971 blocksize *= 16; 592 972 } 593 973 } … … 598 978 } 599 979 600 /* blocksize mustbe a multiple of the typesize */980 /* blocksize *must absolutely* be a multiple of the typesize */ 601 981 if (blocksize > typesize) { 602 982 blocksize = blocksize / typesize * typesize; 603 983 } 604 984 605 /* blocksize must not exceed (64 KB * typesize) in order to allow606 BloscLZ to achieve better compression ratios (the ultimate reason607 for this is that hash_log in BloscLZ cannot be larger than 15) */608 if ((blocksize / typesize) > 64*KB) {609 blocksize = 64 * KB * typesize;610 }611 612 985 return blocksize; 613 986 } 614 987 615 616 /* The public routine for compression. See blosc.h for docstrings. */ 617 int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 618 const void *src, void *dest, size_t destsize) 619 { 620 uint8_t *_dest=NULL; /* current pos for destination buffer */ 621 uint8_t *flags; /* flags for header. Currently booked: 622 - 0: shuffled? 623 - 1: memcpy'ed? */ 624 int32_t nbytes_; /* number of bytes in source buffer */ 625 int32_t nblocks; /* number of total blocks in buffer */ 626 int32_t leftover; /* extra bytes at end of buffer */ 627 int32_t *bstarts; /* start pointers for each block */ 628 int32_t blocksize; /* length of the block in bytes */ 629 int32_t ntbytes = 0; /* the number of compressed bytes */ 630 int32_t *ntbytes_; /* placeholder for bytes in output buffer */ 631 int32_t maxbytes = (int32_t)destsize; /* maximum size for dest buffer */ 988 static int initialize_context_compression(struct blosc_context* context, 989 int clevel, 990 int doshuffle, 991 size_t typesize, 992 size_t sourcesize, 993 const void* src, 994 void* dest, 995 size_t destsize, 996 int32_t compressor, 997 int32_t blocksize, 998 int32_t numthreads) 999 { 1000 /* Set parameters */ 1001 context->compress = 1; 1002 context->src = (const uint8_t*)src; 1003 context->dest = (uint8_t *)(dest); 1004 context->num_output_bytes = 0; 1005 context->destsize = (int32_t)destsize; 1006 context->sourcesize = sourcesize; 1007 context->typesize = typesize; 1008 context->compcode = compressor; 1009 context->numthreads = numthreads; 1010 context->end_threads = 0; 1011 context->clevel = clevel; 632 1012 633 1013 /* Check buffer size limits */ 634 if ( nbytes> BLOSC_MAX_BUFFERSIZE) {1014 if (sourcesize > BLOSC_MAX_BUFFERSIZE) { 635 1015 /* If buffer is too large, give up. */ 636 1016 fprintf(stderr, "Input buffer size cannot exceed %d bytes\n", … … 638 1018 return -1; 639 1019 } 640 641 /* We can safely do this assignation now */642 nbytes_ = (int32_t)nbytes;643 1020 644 1021 /* Compression level */ … … 650 1027 651 1028 /* Shuffle */ 652 if (doshuffle != 0 && doshuffle != 1 ) {653 fprintf(stderr, "`shuffle` parameter must be either 0 or 1!\n");1029 if (doshuffle != 0 && doshuffle != 1 && doshuffle != 2) { 1030 fprintf(stderr, "`shuffle` parameter must be either 0, 1 or 2!\n"); 654 1031 return -10; 655 1032 } 656 1033 657 1034 /* Check typesize limits */ 658 if ( typesize > BLOSC_MAX_TYPESIZE) {1035 if (context->typesize > BLOSC_MAX_TYPESIZE) { 659 1036 /* If typesize is too large, treat buffer as an 1-byte stream. */ 660 typesize = 1;1037 context->typesize = 1; 661 1038 } 662 1039 663 1040 /* Get the blocksize */ 664 blocksize = compute_blocksize(clevel, (int32_t)typesize, nbytes_);1041 context->blocksize = compute_blocksize(context, clevel, (int32_t)context->typesize, context->sourcesize, blocksize); 665 1042 666 1043 /* Compute number of blocks in buffer */ 667 nblocks = nbytes_ / blocksize; 668 leftover = nbytes_ % blocksize; 669 nblocks = (leftover>0)? nblocks+1: nblocks; 670 671 _dest = (uint8_t *)(dest); 672 /* Write header for this block */ 673 _dest[0] = BLOSC_VERSION_FORMAT; /* blosc format version */ 674 _dest[1] = BLOSCLZ_VERSION_FORMAT; /* blosclz format version */ 675 flags = _dest+2; /* flags */ 676 _dest[2] = 0; /* zeroes flags */ 677 _dest[3] = (uint8_t)typesize; /* type size */ 678 _dest += 4; 679 ((int32_t *)_dest)[0] = sw32(nbytes_); /* size of the buffer */ 680 ((int32_t *)_dest)[1] = sw32(blocksize);/* block size */ 681 ntbytes_ = (int32_t *)(_dest+8); /* compressed buffer size */ 682 _dest += sizeof(int32_t)*3; 683 bstarts = (int32_t *)_dest; /* starts for every block */ 684 _dest += sizeof(int32_t)*nblocks; /* space for pointers to blocks */ 685 ntbytes = (int32_t)(_dest - (uint8_t *)dest); 686 687 if (clevel == 0) { 1044 context->nblocks = context->sourcesize / context->blocksize; 1045 context->leftover = context->sourcesize % context->blocksize; 1046 context->nblocks = (context->leftover > 0) ? (context->nblocks + 1) : context->nblocks; 1047 1048 return 1; 1049 } 1050 1051 static int write_compression_header(struct blosc_context* context, int clevel, int doshuffle) 1052 { 1053 int32_t compformat; 1054 1055 /* Write version header for this block */ 1056 context->dest[0] = BLOSC_VERSION_FORMAT; /* blosc format version */ 1057 1058 /* Write compressor format */ 1059 compformat = -1; 1060 switch (context->compcode) 1061 { 1062 case BLOSC_BLOSCLZ: 1063 compformat = BLOSC_BLOSCLZ_FORMAT; 1064 context->dest[1] = BLOSC_BLOSCLZ_VERSION_FORMAT; /* blosclz format version */ 1065 break; 1066 1067 #if defined(HAVE_LZ4) 1068 case BLOSC_LZ4: 1069 compformat = BLOSC_LZ4_FORMAT; 1070 context->dest[1] = BLOSC_LZ4_VERSION_FORMAT; /* lz4 format version */ 1071 break; 1072 case BLOSC_LZ4HC: 1073 compformat = BLOSC_LZ4HC_FORMAT; 1074 context->dest[1] = BLOSC_LZ4HC_VERSION_FORMAT; /* lz4hc is the same as lz4 */ 1075 break; 1076 #endif /* HAVE_LZ4 */ 1077 1078 #if defined(HAVE_SNAPPY) 1079 case BLOSC_SNAPPY: 1080 compformat = BLOSC_SNAPPY_FORMAT; 1081 context->dest[1] = BLOSC_SNAPPY_VERSION_FORMAT; /* snappy format version */ 1082 break; 1083 #endif /* HAVE_SNAPPY */ 1084 1085 #if defined(HAVE_ZLIB) 1086 case BLOSC_ZLIB: 1087 compformat = BLOSC_ZLIB_FORMAT; 1088 context->dest[1] = BLOSC_ZLIB_VERSION_FORMAT; /* zlib format version */ 1089 break; 1090 #endif /* HAVE_ZLIB */ 1091 1092 #if defined(HAVE_ZSTD) 1093 case BLOSC_ZSTD: 1094 compformat = BLOSC_ZSTD_FORMAT; 1095 context->dest[1] = BLOSC_ZSTD_VERSION_FORMAT; /* zstd format version */ 1096 break; 1097 #endif /* HAVE_ZSTD */ 1098 1099 default: 1100 { 1101 char *compname; 1102 compname = clibcode_to_clibname(compformat); 1103 fprintf(stderr, "Blosc has not been compiled with '%s' ", compname); 1104 fprintf(stderr, "compression support. Please use one having it."); 1105 return -5; /* signals no compression support */ 1106 break; 1107 } 1108 } 1109 1110 context->header_flags = context->dest+2; /* flags */ 1111 context->dest[2] = 0; /* zeroes flags */ 1112 context->dest[3] = (uint8_t)context->typesize; /* type size */ 1113 _sw32(context->dest + 4, context->sourcesize); /* size of the buffer */ 1114 _sw32(context->dest + 8, context->blocksize); /* block size */ 1115 context->bstarts = context->dest + 16; /* starts for every block */ 1116 context->num_output_bytes = 16 + sizeof(int32_t)*context->nblocks; /* space for header and pointers */ 1117 1118 if (context->clevel == 0) { 688 1119 /* Compression level 0 means buffer to be memcpy'ed */ 689 * flags|= BLOSC_MEMCPYED;690 } 691 692 if ( nbytes_< MIN_BUFFERSIZE) {1120 *(context->header_flags) |= BLOSC_MEMCPYED; 1121 } 1122 1123 if (context->sourcesize < MIN_BUFFERSIZE) { 693 1124 /* Buffer is too small. Try memcpy'ing. */ 694 *flags |= BLOSC_MEMCPYED; 695 } 696 697 if (doshuffle == 1) { 698 /* Shuffle is active */ 699 *flags |= BLOSC_DOSHUFFLE; /* bit 0 set to one in flags */ 700 } 701 702 /* Take global lock for the time of compression */ 703 pthread_mutex_lock(&global_comp_mutex); 704 /* Populate parameters for compression routines */ 705 params.compress = 1; 706 params.clevel = clevel; 707 params.flags = (int32_t)*flags; 708 params.typesize = (int32_t)typesize; 709 params.blocksize = blocksize; 710 params.ntbytes = ntbytes; 711 params.nbytes = nbytes_; 712 params.maxbytes = maxbytes; 713 params.nblocks = nblocks; 714 params.leftover = leftover; 715 params.bstarts = bstarts; 716 params.src = (uint8_t *)src; 717 params.dest = (uint8_t *)dest; 718 719 if (!(*flags & BLOSC_MEMCPYED)) { 1125 *(context->header_flags) |= BLOSC_MEMCPYED; 1126 } 1127 1128 if (doshuffle == BLOSC_SHUFFLE) { 1129 /* Byte-shuffle is active */ 1130 *(context->header_flags) |= BLOSC_DOSHUFFLE; /* bit 0 set to one in flags */ 1131 } 1132 1133 if (doshuffle == BLOSC_BITSHUFFLE) { 1134 /* Bit-shuffle is active */ 1135 *(context->header_flags) |= BLOSC_DOBITSHUFFLE; /* bit 2 set to one in flags */ 1136 } 1137 1138 *(context->header_flags) |= compformat << 5; /* compressor format start at bit 5 */ 1139 1140 return 1; 1141 } 1142 1143 int blosc_compress_context(struct blosc_context* context) 1144 { 1145 int32_t ntbytes = 0; 1146 1147 if (!(*(context->header_flags) & BLOSC_MEMCPYED)) { 720 1148 /* Do the actual compression */ 721 ntbytes = do_job( );1149 ntbytes = do_job(context); 722 1150 if (ntbytes < 0) { 723 1151 return -1; 724 1152 } 725 if ((ntbytes == 0) && ( nbytes_+BLOSC_MAX_OVERHEAD <= maxbytes)) {1153 if ((ntbytes == 0) && (context->sourcesize+BLOSC_MAX_OVERHEAD <= context->destsize)) { 726 1154 /* Last chance for fitting `src` buffer in `dest`. Update flags 727 1155 and do a memcpy later on. */ 728 *flags |= BLOSC_MEMCPYED; 729 params.flags |= BLOSC_MEMCPYED; 730 } 731 } 732 733 if (*flags & BLOSC_MEMCPYED) { 734 if (nbytes_+BLOSC_MAX_OVERHEAD > maxbytes) { 1156 *(context->header_flags) |= BLOSC_MEMCPYED; 1157 } 1158 } 1159 1160 if (*(context->header_flags) & BLOSC_MEMCPYED) { 1161 if (context->sourcesize + BLOSC_MAX_OVERHEAD > context->destsize) { 735 1162 /* We are exceeding maximum output size */ 736 1163 ntbytes = 0; 737 1164 } 738 else if (((nbytes_ % L1) == 0) || (nthreads > 1)) {739 /* More effective with large buffers that are multiples of the740 cache size or multi-cores */741 params.ntbytes = BLOSC_MAX_OVERHEAD;742 ntbytes = do_job();743 if (ntbytes < 0) {744 return -1;745 }746 }747 1165 else { 748 memcpy((uint8_t *)dest+BLOSC_MAX_OVERHEAD, src, nbytes_); 749 ntbytes = nbytes_ + BLOSC_MAX_OVERHEAD; 1166 memcpy(context->dest+BLOSC_MAX_OVERHEAD, context->src, 1167 context->sourcesize); 1168 ntbytes = context->sourcesize + BLOSC_MAX_OVERHEAD; 750 1169 } 751 1170 } 752 1171 753 1172 /* Set the number of compressed bytes in header */ 754 *ntbytes_ = sw32(ntbytes); 755 756 /* Release global lock */ 1173 _sw32(context->dest + 12, ntbytes); 1174 1175 assert(ntbytes <= context->destsize); 1176 return ntbytes; 1177 } 1178 1179 /* The public routine for compression with context. */ 1180 int blosc_compress_ctx(int clevel, int doshuffle, size_t typesize, 1181 size_t nbytes, const void* src, void* dest, 1182 size_t destsize, const char* compressor, 1183 size_t blocksize, int numinternalthreads) 1184 { 1185 int error, result; 1186 struct blosc_context context; 1187 1188 context.threads_started = 0; 1189 error = initialize_context_compression(&context, clevel, doshuffle, typesize, 1190 nbytes, src, dest, destsize, 1191 blosc_compname_to_compcode(compressor), 1192 blocksize, numinternalthreads); 1193 if (error < 0) { return error; } 1194 1195 error = write_compression_header(&context, clevel, doshuffle); 1196 if (error < 0) { return error; } 1197 1198 result = blosc_compress_context(&context); 1199 1200 if (numinternalthreads > 1) 1201 { 1202 blosc_release_threadpool(&context); 1203 } 1204 1205 return result; 1206 } 1207 1208 /* The public routine for compression. See blosc.h for docstrings. */ 1209 int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 1210 const void *src, void *dest, size_t destsize) 1211 { 1212 int error; 1213 int result; 1214 char* envvar; 1215 1216 /* Check if should initialize */ 1217 if (!g_initlib) blosc_init(); 1218 1219 /* Check for a BLOSC_CLEVEL environment variable */ 1220 envvar = getenv("BLOSC_CLEVEL"); 1221 if (envvar != NULL) { 1222 long value; 1223 value = strtol(envvar, NULL, 10); 1224 if ((value != EINVAL) && (value >= 0)) { 1225 clevel = (int)value; 1226 } 1227 } 1228 1229 /* Check for a BLOSC_SHUFFLE environment variable */ 1230 envvar = getenv("BLOSC_SHUFFLE"); 1231 if (envvar != NULL) { 1232 if (strcmp(envvar, "NOSHUFFLE") == 0) { 1233 doshuffle = BLOSC_NOSHUFFLE; 1234 } 1235 if (strcmp(envvar, "SHUFFLE") == 0) { 1236 doshuffle = BLOSC_SHUFFLE; 1237 } 1238 if (strcmp(envvar, "BITSHUFFLE") == 0) { 1239 doshuffle = BLOSC_BITSHUFFLE; 1240 } 1241 } 1242 1243 /* Check for a BLOSC_TYPESIZE environment variable */ 1244 envvar = getenv("BLOSC_TYPESIZE"); 1245 if (envvar != NULL) { 1246 long value; 1247 value = strtol(envvar, NULL, 10); 1248 if ((value != EINVAL) && (value > 0)) { 1249 typesize = (int)value; 1250 } 1251 } 1252 1253 /* Check for a BLOSC_COMPRESSOR environment variable */ 1254 envvar = getenv("BLOSC_COMPRESSOR"); 1255 if (envvar != NULL) { 1256 result = blosc_set_compressor(envvar); 1257 if (result < 0) { return result; } 1258 } 1259 1260 /* Check for a BLOSC_COMPRESSOR environment variable */ 1261 envvar = getenv("BLOSC_BLOCKSIZE"); 1262 if (envvar != NULL) { 1263 long blocksize; 1264 blocksize = strtol(envvar, NULL, 10); 1265 if ((blocksize != EINVAL) && (blocksize > 0)) { 1266 blosc_set_blocksize((size_t)blocksize); 1267 } 1268 } 1269 1270 /* Check for a BLOSC_NTHREADS environment variable */ 1271 envvar = getenv("BLOSC_NTHREADS"); 1272 if (envvar != NULL) { 1273 long nthreads; 1274 nthreads = strtol(envvar, NULL, 10); 1275 if ((nthreads != EINVAL) && (nthreads > 0)) { 1276 result = blosc_set_nthreads((int)nthreads); 1277 if (result < 0) { return result; } 1278 } 1279 } 1280 1281 /* Check for a BLOSC_NOLOCK environment variable. It is important 1282 that this should be the last env var so that it can take the 1283 previous ones into account */ 1284 envvar = getenv("BLOSC_NOLOCK"); 1285 if (envvar != NULL) { 1286 char *compname; 1287 blosc_compcode_to_compname(g_compressor, &compname); 1288 result = blosc_compress_ctx(clevel, doshuffle, typesize, 1289 nbytes, src, dest, destsize, 1290 compname, g_force_blocksize, g_threads); 1291 return result; 1292 } 1293 1294 pthread_mutex_lock(&global_comp_mutex); 1295 1296 error = initialize_context_compression(g_global_context, clevel, doshuffle, 1297 typesize, nbytes, src, dest, destsize, 1298 g_compressor, g_force_blocksize, 1299 g_threads); 1300 if (error < 0) { return error; } 1301 1302 error = write_compression_header(g_global_context, clevel, doshuffle); 1303 if (error < 0) { return error; } 1304 1305 result = blosc_compress_context(g_global_context); 1306 757 1307 pthread_mutex_unlock(&global_comp_mutex); 758 759 assert((int32_t)ntbytes <= (int32_t)maxbytes); 760 return ntbytes; 761 } 762 763 764 /* The public routine for decompression. See blosc.h for docstrings. */ 765 int blosc_decompress(const void *src, void *dest, size_t destsize) 766 { 767 uint8_t *_src=NULL; /* current pos for source buffer */ 768 uint8_t version, versionlz; /* versions for compressed header */ 769 uint8_t flags; /* flags for header */ 770 int32_t ntbytes; /* the number of uncompressed bytes */ 771 int32_t nblocks; /* number of total blocks in buffer */ 772 int32_t leftover; /* extra bytes at end of buffer */ 773 int32_t *bstarts; /* start pointers for each block */ 774 int32_t typesize, blocksize, nbytes, ctbytes; 775 776 _src = (uint8_t *)(src); 1308 1309 return result; 1310 } 1311 1312 int blosc_run_decompression_with_context(struct blosc_context* context, 1313 const void* src, 1314 void* dest, 1315 size_t destsize, 1316 int numinternalthreads) 1317 { 1318 uint8_t version; 1319 uint8_t versionlz; 1320 uint32_t ctbytes; 1321 int32_t ntbytes; 1322 1323 context->compress = 0; 1324 context->src = (const uint8_t*)src; 1325 context->dest = (uint8_t*)dest; 1326 context->destsize = destsize; 1327 context->num_output_bytes = 0; 1328 context->numthreads = numinternalthreads; 1329 context->end_threads = 0; 777 1330 778 1331 /* Read the header block */ 779 version = _src[0]; /* blosc format version */ 780 versionlz = _src[1]; /* blosclz format version */ 781 flags = _src[2]; /* flags */ 782 typesize = (int32_t)_src[3]; /* typesize */ 783 _src += 4; 784 nbytes = sw32(((int32_t *)_src)[0]); /* buffer size */ 785 blocksize = sw32(((int32_t *)_src)[1]); /* block size */ 786 ctbytes = sw32(((int32_t *)_src)[2]); /* compressed buffer size */ 787 1332 version = context->src[0]; /* blosc format version */ 1333 versionlz = context->src[1]; /* blosclz format version */ 1334 1335 context->header_flags = (uint8_t*)(context->src + 2); /* flags */ 1336 context->typesize = (int32_t)context->src[3]; /* typesize */ 1337 context->sourcesize = sw32_(context->src + 4); /* buffer size */ 1338 context->blocksize = sw32_(context->src + 8); /* block size */ 1339 ctbytes = sw32_(context->src + 12); /* compressed buffer size */ 1340 1341 /* Unused values */ 788 1342 version += 0; /* shut up compiler warning */ 789 1343 versionlz += 0; /* shut up compiler warning */ 790 1344 ctbytes += 0; /* shut up compiler warning */ 791 1345 792 _src += sizeof(int32_t)*3; 793 bstarts = (int32_t *)_src; 1346 context->bstarts = (uint8_t*)(context->src + 16); 1347 /* Compute some params */ 1348 /* Total blocks */ 1349 context->nblocks = context->sourcesize / context->blocksize; 1350 context->leftover = context->sourcesize % context->blocksize; 1351 context->nblocks = (context->leftover>0)? context->nblocks+1: context->nblocks; 1352 1353 /* Check that we have enough space to decompress */ 1354 if (context->sourcesize > (int32_t)destsize) { 1355 return -1; 1356 } 1357 1358 /* Check whether this buffer is memcpy'ed */ 1359 if (*(context->header_flags) & BLOSC_MEMCPYED) { 1360 memcpy(dest, (uint8_t *)src+BLOSC_MAX_OVERHEAD, context->sourcesize); 1361 ntbytes = context->sourcesize; 1362 } 1363 else { 1364 /* Do the actual decompression */ 1365 ntbytes = do_job(context); 1366 if (ntbytes < 0) { 1367 return -1; 1368 } 1369 } 1370 1371 assert(ntbytes <= (int32_t)destsize); 1372 return ntbytes; 1373 } 1374 1375 /* The public routine for decompression with context. */ 1376 int blosc_decompress_ctx(const void *src, void *dest, size_t destsize, 1377 int numinternalthreads) 1378 { 1379 int result; 1380 struct blosc_context context; 1381 1382 context.threads_started = 0; 1383 result = blosc_run_decompression_with_context(&context, src, dest, destsize, numinternalthreads); 1384 1385 if (numinternalthreads > 1) 1386 { 1387 blosc_release_threadpool(&context); 1388 } 1389 1390 return result; 1391 } 1392 1393 1394 /* The public routine for decompression. See blosc.h for docstrings. */ 1395 int blosc_decompress(const void *src, void *dest, size_t destsize) 1396 { 1397 int result; 1398 char* envvar; 1399 long nthreads; 1400 1401 /* Check if should initialize */ 1402 if (!g_initlib) blosc_init(); 1403 1404 /* Check for a BLOSC_NTHREADS environment variable */ 1405 envvar = getenv("BLOSC_NTHREADS"); 1406 if (envvar != NULL) { 1407 nthreads = strtol(envvar, NULL, 10); 1408 if ((nthreads != EINVAL) && (nthreads > 0)) { 1409 result = blosc_set_nthreads((int)nthreads); 1410 if (result < 0) { return result; } 1411 } 1412 } 1413 1414 /* Check for a BLOSC_NOLOCK environment variable. It is important 1415 that this should be the last env var so that it can take the 1416 previous ones into account */ 1417 envvar = getenv("BLOSC_NOLOCK"); 1418 if (envvar != NULL) { 1419 result = blosc_decompress_ctx(src, dest, destsize, g_threads); 1420 return result; 1421 } 1422 1423 pthread_mutex_lock(&global_comp_mutex); 1424 1425 result = blosc_run_decompression_with_context(g_global_context, src, dest, 1426 destsize, g_threads); 1427 1428 pthread_mutex_unlock(&global_comp_mutex); 1429 1430 return result; 1431 } 1432 1433 1434 /* Specific routine optimized for decompression a small number of 1435 items out of a compressed chunk. This does not use threads because 1436 it would affect negatively to performance. */ 1437 int blosc_getitem(const void *src, int start, int nitems, void *dest) 1438 { 1439 uint8_t *_src=NULL; /* current pos for source buffer */ 1440 uint8_t version, versionlz; /* versions for compressed header */ 1441 uint8_t flags; /* flags for header */ 1442 int32_t ntbytes = 0; /* the number of uncompressed bytes */ 1443 int32_t nblocks; /* number of total blocks in buffer */ 1444 int32_t leftover; /* extra bytes at end of buffer */ 1445 uint8_t *bstarts; /* start pointers for each block */ 1446 int tmp_init = 0; 1447 int32_t typesize, blocksize, nbytes, ctbytes; 1448 int32_t j, bsize, bsize2, leftoverblock; 1449 int32_t cbytes, startb, stopb; 1450 int stop = start + nitems; 1451 uint8_t *tmp; 1452 uint8_t *tmp2; 1453 uint8_t *tmp3; 1454 int32_t ebsize; 1455 1456 _src = (uint8_t *)(src); 1457 1458 /* Read the header block */ 1459 version = _src[0]; /* blosc format version */ 1460 versionlz = _src[1]; /* blosclz format version */ 1461 flags = _src[2]; /* flags */ 1462 typesize = (int32_t)_src[3]; /* typesize */ 1463 nbytes = sw32_(_src + 4); /* buffer size */ 1464 blocksize = sw32_(_src + 8); /* block size */ 1465 ctbytes = sw32_(_src + 12); /* compressed buffer size */ 1466 1467 ebsize = blocksize + typesize * (int32_t)sizeof(int32_t); 1468 tmp = my_malloc(blocksize + ebsize + blocksize); 1469 tmp2 = tmp + blocksize; 1470 tmp3 = tmp + blocksize + ebsize; 1471 1472 version += 0; /* shut up compiler warning */ 1473 versionlz += 0; /* shut up compiler warning */ 1474 ctbytes += 0; /* shut up compiler warning */ 1475 1476 _src += 16; 1477 bstarts = _src; 794 1478 /* Compute some params */ 795 1479 /* Total blocks */ … … 799 1483 _src += sizeof(int32_t)*nblocks; 800 1484 801 /* Check that we have enough space to decompress */802 if (nbytes > (int32_t)destsize) {803 return -1;804 }805 806 /* Take global lock for the time of decompression */807 pthread_mutex_lock(&global_comp_mutex);808 809 /* Populate parameters for decompression routines */810 params.compress = 0;811 params.clevel = 0; /* specific for compression */812 params.flags = (int32_t)flags;813 params.typesize = typesize;814 params.blocksize = blocksize;815 params.ntbytes = 0;816 params.nbytes = nbytes;817 params.nblocks = nblocks;818 params.leftover = leftover;819 params.bstarts = bstarts;820 params.src = (uint8_t *)src;821 params.dest = (uint8_t *)dest;822 823 /* Check whether this buffer is memcpy'ed */824 if (flags & BLOSC_MEMCPYED) {825 if (((nbytes % L1) == 0) || (nthreads > 1)) {826 /* More effective with large buffers that are multiples of the827 cache size or multi-cores */828 ntbytes = do_job();829 if (ntbytes < 0) {830 return -1;831 }832 }833 else {834 memcpy(dest, (uint8_t *)src+BLOSC_MAX_OVERHEAD, nbytes);835 ntbytes = nbytes;836 }837 }838 else {839 /* Do the actual decompression */840 ntbytes = do_job();841 if (ntbytes < 0) {842 return -1;843 }844 }845 /* Release global lock */846 pthread_mutex_unlock(&global_comp_mutex);847 848 assert(ntbytes <= (int32_t)destsize);849 return ntbytes;850 }851 852 853 /* Specific routine optimized for decompression a small number of854 items out of a compressed chunk. This does not use threads because855 it would affect negatively to performance. */856 int blosc_getitem(const void *src, int start, int nitems, void *dest)857 {858 uint8_t *_src=NULL; /* current pos for source buffer */859 uint8_t version, versionlz; /* versions for compressed header */860 uint8_t flags; /* flags for header */861 int32_t ntbytes = 0; /* the number of uncompressed bytes */862 int32_t nblocks; /* number of total blocks in buffer */863 int32_t leftover; /* extra bytes at end of buffer */864 int32_t *bstarts; /* start pointers for each block */865 uint8_t *tmp = params.tmp[0]; /* tmp for thread 0 */866 uint8_t *tmp2 = params.tmp2[0]; /* tmp2 for thread 0 */867 int tmp_init = 0;868 int32_t typesize, blocksize, nbytes, ctbytes;869 int32_t j, bsize, bsize2, leftoverblock;870 int32_t cbytes, startb, stopb;871 int stop = start + nitems;872 873 _src = (uint8_t *)(src);874 875 /* Take global lock */876 pthread_mutex_lock(&global_comp_mutex);877 878 /* Read the header block */879 version = _src[0]; /* blosc format version */880 versionlz = _src[1]; /* blosclz format version */881 flags = _src[2]; /* flags */882 typesize = (int32_t)_src[3]; /* typesize */883 _src += 4;884 nbytes = sw32(((int32_t *)_src)[0]); /* buffer size */885 blocksize = sw32(((int32_t *)_src)[1]); /* block size */886 ctbytes = sw32(((int32_t *)_src)[2]); /* compressed buffer size */887 888 version += 0; /* shut up compiler warning */889 versionlz += 0; /* shut up compiler warning */890 ctbytes += 0; /* shut up compiler warning */891 892 _src += sizeof(int32_t)*3;893 bstarts = (int32_t *)_src;894 /* Compute some params */895 /* Total blocks */896 nblocks = nbytes / blocksize;897 leftover = nbytes % blocksize;898 nblocks = (leftover>0)? nblocks+1: nblocks;899 _src += sizeof(int32_t)*nblocks;900 901 1485 /* Check region boundaries */ 902 1486 if ((start < 0) || (start*typesize > nbytes)) { 903 1487 fprintf(stderr, "`start` out of bounds"); 904 return (-1);1488 return -1; 905 1489 } 906 1490 907 1491 if ((stop < 0) || (stop*typesize > nbytes)) { 908 1492 fprintf(stderr, "`start`+`nitems` out of bounds"); 909 return (-1); 910 } 911 912 /* Parameters needed by blosc_d */ 913 params.typesize = typesize; 914 params.flags = flags; 915 916 /* Initialize temporaries if needed */ 917 if (tmp == NULL || tmp2 == NULL || current_temp.blocksize < blocksize) { 918 tmp = my_malloc(blocksize); 919 if (tmp == NULL) { 920 return -1; 921 } 922 tmp2 = my_malloc(blocksize); 923 if (tmp2 == NULL) { 924 return -1; 925 } 926 tmp_init = 1; 1493 return -1; 927 1494 } 928 1495 … … 958 1525 } 959 1526 else { 1527 struct blosc_context context; 1528 /* blosc_d only uses typesize and flags */ 1529 context.typesize = typesize; 1530 context.header_flags = &flags; 1531 960 1532 /* Regular decompression. Put results in tmp2. */ 961 cbytes = blosc_d(bsize, leftoverblock, 962 (uint8_t *)src+sw32(bstarts[j]), tmp2, tmp, tmp2); 1533 cbytes = blosc_d(&context, bsize, leftoverblock, 1534 (uint8_t *)src + sw32_(bstarts + j * 4), 1535 tmp2, tmp, tmp3); 963 1536 if (cbytes < 0) { 964 1537 ntbytes = cbytes; … … 971 1544 ntbytes += cbytes; 972 1545 } 973 974 /* Release global lock */ 975 pthread_mutex_unlock(&global_comp_mutex); 976 977 if (tmp_init) { 978 my_free(tmp); 979 my_free(tmp2); 980 } 1546 1547 my_free(tmp); 981 1548 982 1549 return ntbytes; … … 985 1552 986 1553 /* Decompress & unshuffle several blocks in a single thread */ 987 static int t_blosc(void *tids)988 { 989 int32_t tid = *(int32_t *)tids;1554 static void *t_blosc(void *ctxt) 1555 { 1556 struct thread_context* context = (struct thread_context*)ctxt; 990 1557 int32_t cbytes, ntdest; 991 1558 int32_t tblocks; /* number of blocks per thread */ … … 1003 1570 int32_t nblocks; 1004 1571 int32_t leftover; 1005 int32_t *bstarts;1006 uint8_t *src;1572 uint8_t *bstarts; 1573 const uint8_t *src; 1007 1574 uint8_t *dest; 1008 1575 uint8_t *tmp; 1009 1576 uint8_t *tmp2; 1010 1011 while (1) {1012 1013 init_sentinels_done = 0; /* sentinels have to be initialised yet */1014 1577 uint8_t *tmp3; 1578 int rc; 1579 1580 while(1) 1581 { 1015 1582 /* Synchronization point for all threads (wait for initialization) */ 1016 WAIT_INIT; 1017 1018 /* Check if thread has been asked to return */ 1019 if (end_threads) { 1020 return(0); 1021 } 1022 1023 pthread_mutex_lock(&count_mutex); 1024 if (!init_sentinels_done) { 1025 /* Set sentinels and other global variables */ 1026 giveup_code = 1; /* no error code initially */ 1027 nblock = -1; /* block counter */ 1028 init_sentinels_done = 1; /* sentinels have been initialised */ 1029 } 1030 pthread_mutex_unlock(&count_mutex); 1583 WAIT_INIT(NULL, context->parent_context); 1584 1585 if(context->parent_context->end_threads) 1586 { 1587 break; 1588 } 1031 1589 1032 1590 /* Get parameters for this thread before entering the main loop */ 1033 blocksize = params.blocksize; 1034 ebsize = blocksize + params.typesize*(int32_t)sizeof(int32_t); 1035 compress = params.compress; 1036 flags = params.flags; 1037 maxbytes = params.maxbytes; 1038 nblocks = params.nblocks; 1039 leftover = params.leftover; 1040 bstarts = params.bstarts; 1041 src = params.src; 1042 dest = params.dest; 1043 tmp = params.tmp[tid]; 1044 tmp2 = params.tmp2[tid]; 1591 blocksize = context->parent_context->blocksize; 1592 ebsize = blocksize + context->parent_context->typesize * (int32_t)sizeof(int32_t); 1593 compress = context->parent_context->compress; 1594 flags = *(context->parent_context->header_flags); 1595 maxbytes = context->parent_context->destsize; 1596 nblocks = context->parent_context->nblocks; 1597 leftover = context->parent_context->leftover; 1598 bstarts = context->parent_context->bstarts; 1599 src = context->parent_context->src; 1600 dest = context->parent_context->dest; 1601 1602 if (blocksize > context->tmpblocksize) 1603 { 1604 my_free(context->tmp); 1605 context->tmp = my_malloc(blocksize + ebsize + blocksize); 1606 context->tmp2 = context->tmp + blocksize; 1607 context->tmp3 = context->tmp + blocksize + ebsize; 1608 } 1609 1610 tmp = context->tmp; 1611 tmp2 = context->tmp2; 1612 tmp3 = context->tmp3; 1045 1613 1046 1614 ntbytes = 0; /* only useful for decompression */ … … 1048 1616 if (compress && !(flags & BLOSC_MEMCPYED)) { 1049 1617 /* Compression always has to follow the block order */ 1050 pthread_mutex_lock(&co unt_mutex);1051 nblock++;1052 nblock_ = nblock;1053 pthread_mutex_unlock(&co unt_mutex);1618 pthread_mutex_lock(&context->parent_context->count_mutex); 1619 context->parent_context->thread_nblock++; 1620 nblock_ = context->parent_context->thread_nblock; 1621 pthread_mutex_unlock(&context->parent_context->count_mutex); 1054 1622 tblock = nblocks; 1055 1623 } … … 1059 1627 1060 1628 /* Blocks per thread */ 1061 tblocks = nblocks / nthreads;1062 leftover2 = nblocks % nthreads;1629 tblocks = nblocks / context->parent_context->numthreads; 1630 leftover2 = nblocks % context->parent_context->numthreads; 1063 1631 tblocks = (leftover2>0)? tblocks+1: tblocks; 1064 1632 1065 nblock_ = tid*tblocks;1633 nblock_ = context->tid*tblocks; 1066 1634 tblock = nblock_ + tblocks; 1067 1635 if (tblock > nblocks) { … … 1072 1640 /* Loop over blocks */ 1073 1641 leftoverblock = 0; 1074 while ((nblock_ < tblock) && giveup_code > 0) {1642 while ((nblock_ < tblock) && context->parent_context->thread_giveup_code > 0) { 1075 1643 bsize = blocksize; 1076 1644 if (nblock_ == (nblocks - 1) && (leftover > 0)) { … … 1087 1655 else { 1088 1656 /* Regular compression */ 1089 cbytes = blosc_c( bsize, leftoverblock, 0, ebsize,1090 src+nblock_*blocksize, tmp2, tmp );1657 cbytes = blosc_c(context->parent_context, bsize, leftoverblock, 0, ebsize, 1658 src+nblock_*blocksize, tmp2, tmp, tmp3); 1091 1659 } 1092 1660 } … … 1099 1667 } 1100 1668 else { 1101 cbytes = blosc_d(bsize, leftoverblock, 1102 src+sw32(bstarts[nblock_]), dest+nblock_*blocksize, 1669 cbytes = blosc_d(context->parent_context, bsize, leftoverblock, 1670 src + sw32_(bstarts + nblock_ * 4), 1671 dest+nblock_*blocksize, 1103 1672 tmp, tmp2); 1104 1673 } … … 1106 1675 1107 1676 /* Check whether current thread has to giveup */ 1108 if ( giveup_code <= 0) {1677 if (context->parent_context->thread_giveup_code <= 0) { 1109 1678 break; 1110 1679 } … … 1113 1682 if (cbytes < 0) { /* compr/decompr failure */ 1114 1683 /* Set giveup_code error */ 1115 pthread_mutex_lock(&co unt_mutex);1116 giveup_code = cbytes;1117 pthread_mutex_unlock(&co unt_mutex);1684 pthread_mutex_lock(&context->parent_context->count_mutex); 1685 context->parent_context->thread_giveup_code = cbytes; 1686 pthread_mutex_unlock(&context->parent_context->count_mutex); 1118 1687 break; 1119 1688 } … … 1121 1690 if (compress && !(flags & BLOSC_MEMCPYED)) { 1122 1691 /* Start critical section */ 1123 pthread_mutex_lock(&co unt_mutex);1124 ntdest = params.ntbytes;1125 bstarts[nblock_] = sw32(ntdest);/* update block start counter */1126 if ( (cbytes == 0) || (ntdest+cbytes > (int32_t)maxbytes) ) {1127 giveup_code = 0;/* uncompressible buffer */1128 pthread_mutex_unlock(&co unt_mutex);1692 pthread_mutex_lock(&context->parent_context->count_mutex); 1693 ntdest = context->parent_context->num_output_bytes; 1694 _sw32(bstarts + nblock_ * 4, ntdest); /* update block start counter */ 1695 if ( (cbytes == 0) || (ntdest+cbytes > maxbytes) ) { 1696 context->parent_context->thread_giveup_code = 0; /* uncompressible buffer */ 1697 pthread_mutex_unlock(&context->parent_context->count_mutex); 1129 1698 break; 1130 1699 } 1131 nblock++;1132 nblock_ = nblock;1133 params.ntbytes += cbytes; /* update return bytes counter */1134 pthread_mutex_unlock(&co unt_mutex);1700 context->parent_context->thread_nblock++; 1701 nblock_ = context->parent_context->thread_nblock; 1702 context->parent_context->num_output_bytes += cbytes; /* update return bytes counter */ 1703 pthread_mutex_unlock(&context->parent_context->count_mutex); 1135 1704 /* End of critical section */ 1136 1705 … … 1147 1716 1148 1717 /* Sum up all the bytes decompressed */ 1149 if ((!compress || (flags & BLOSC_MEMCPYED)) && giveup_code > 0) {1718 if ((!compress || (flags & BLOSC_MEMCPYED)) && context->parent_context->thread_giveup_code > 0) { 1150 1719 /* Update global counter for all threads (decompression only) */ 1151 pthread_mutex_lock(&co unt_mutex);1152 params.ntbytes += ntbytes;1153 pthread_mutex_unlock(&co unt_mutex);1720 pthread_mutex_lock(&context->parent_context->count_mutex); 1721 context->parent_context->num_output_bytes += ntbytes; 1722 pthread_mutex_unlock(&context->parent_context->count_mutex); 1154 1723 } 1155 1724 1156 1725 /* Meeting point for all threads (wait for finalization) */ 1157 WAIT_FINISH; 1158 1159 } /* closes while(1) */ 1160 1161 /* This should never be reached, but anyway */ 1162 return(0); 1163 } 1164 1165 1166 static int init_threads(void) 1726 WAIT_FINISH(NULL, context->parent_context); 1727 } 1728 1729 /* Cleanup our working space and context */ 1730 my_free(context->tmp); 1731 my_free(context); 1732 1733 return(NULL); 1734 } 1735 1736 1737 static int init_threads(struct blosc_context* context) 1167 1738 { 1168 1739 int32_t tid; 1169 1740 int rc2; 1741 int32_t ebsize; 1742 struct thread_context* thread_context; 1170 1743 1171 1744 /* Initialize mutex and condition variable objects */ 1172 pthread_mutex_init(&count_mutex, NULL); 1745 pthread_mutex_init(&context->count_mutex, NULL); 1746 1747 /* Set context thread sentinels */ 1748 context->thread_giveup_code = 1; 1749 context->thread_nblock = -1; 1173 1750 1174 1751 /* Barrier initialization */ 1175 1752 #ifdef _POSIX_BARRIERS_MINE 1176 pthread_barrier_init(& barr_init, NULL, nthreads+1);1177 pthread_barrier_init(& barr_finish, NULL, nthreads+1);1753 pthread_barrier_init(&context->barr_init, NULL, context->numthreads+1); 1754 pthread_barrier_init(&context->barr_finish, NULL, context->numthreads+1); 1178 1755 #else 1179 pthread_mutex_init(&co unt_threads_mutex, NULL);1180 pthread_cond_init(&co unt_threads_cv, NULL);1181 co unt_threads = 0; /* Reset threads counter */1756 pthread_mutex_init(&context->count_threads_mutex, NULL); 1757 pthread_cond_init(&context->count_threads_cv, NULL); 1758 context->count_threads = 0; /* Reset threads counter */ 1182 1759 #endif 1183 1760 1184 1761 #if !defined(_WIN32) 1185 1762 /* Initialize and set thread detached attribute */ 1186 pthread_attr_init(&c t_attr);1187 pthread_attr_setdetachstate(&c t_attr, PTHREAD_CREATE_JOINABLE);1763 pthread_attr_init(&context->ct_attr); 1764 pthread_attr_setdetachstate(&context->ct_attr, PTHREAD_CREATE_JOINABLE); 1188 1765 #endif 1189 1766 1190 1767 /* Finally, create the threads in detached state */ 1191 for (tid = 0; tid < nthreads; tid++) { 1192 tids[tid] = tid; 1768 for (tid = 0; tid < context->numthreads; tid++) { 1769 context->tids[tid] = tid; 1770 1771 /* Create a thread context thread owns context (will destroy when finished) */ 1772 thread_context = (struct thread_context*)my_malloc(sizeof(struct thread_context)); 1773 thread_context->parent_context = context; 1774 thread_context->tid = tid; 1775 1776 ebsize = context->blocksize + context->typesize * (int32_t)sizeof(int32_t); 1777 thread_context->tmp = my_malloc(context->blocksize + ebsize + context->blocksize); 1778 thread_context->tmp2 = thread_context->tmp + context->blocksize; 1779 thread_context->tmp3 = thread_context->tmp + context->blocksize + ebsize; 1780 thread_context->tmpblocksize = context->blocksize; 1781 1193 1782 #if !defined(_WIN32) 1194 rc2 = pthread_create(&threads[tid], &ct_attr, (void*)t_blosc, 1195 (void *)&tids[tid]); 1783 rc2 = pthread_create(&context->threads[tid], &context->ct_attr, t_blosc, (void *)thread_context); 1196 1784 #else 1197 rc2 = pthread_create(&threads[tid], NULL, (void*)t_blosc, 1198 (void *)&tids[tid]); 1785 rc2 = pthread_create(&context->threads[tid], NULL, t_blosc, (void *)thread_context); 1199 1786 #endif 1200 1787 if (rc2) { … … 1205 1792 } 1206 1793 1207 init_threads_done = 1; /* Initialization done! */1208 pid = (int)getpid(); /* save the PID for this process */1209 1794 1210 1795 return(0); 1211 1796 } 1212 1797 1213 void blosc_init(void) { 1214 /* Init global lock */ 1215 pthread_mutex_init(&global_comp_mutex, NULL); 1216 init_lib = 1; 1217 } 1218 1219 int blosc_set_nthreads(int nthreads_new) 1220 { 1221 int ret; 1222 1223 /* Check if should initialize (implementing previous 1.2.3 behaviour, 1224 where calling blosc_set_nthreads was enough) */ 1225 if (!init_lib) blosc_init(); 1226 1227 /* Take global lock */ 1228 pthread_mutex_lock(&global_comp_mutex); 1229 1230 ret = blosc_set_nthreads_(nthreads_new); 1231 /* Release global lock */ 1232 pthread_mutex_unlock(&global_comp_mutex); 1233 1798 int blosc_get_nthreads(void) 1799 { 1800 int ret = g_threads; 1801 1234 1802 return ret; 1235 1803 } 1236 1804 1237 int blosc_set_nthreads_(int nthreads_new) 1238 { 1239 int32_t nthreads_old = nthreads; 1240 int32_t t; 1241 int rc2; 1242 void *status; 1243 1244 if (nthreads_new > BLOSC_MAX_THREADS) { 1805 int blosc_set_nthreads(int nthreads_new) 1806 { 1807 int ret = g_threads; 1808 1809 /* Check if should initialize */ 1810 if (!g_initlib) blosc_init(); 1811 1812 if (nthreads_new != ret){ 1813 /* Re-initialize Blosc */ 1814 blosc_destroy(); 1815 blosc_init(); 1816 g_threads = nthreads_new; 1817 } 1818 1819 return ret; 1820 } 1821 1822 int blosc_set_nthreads_(struct blosc_context* context) 1823 { 1824 if (context->numthreads > BLOSC_MAX_THREADS) { 1245 1825 fprintf(stderr, 1246 1826 "Error. nthreads cannot be larger than BLOSC_MAX_THREADS (%d)", … … 1248 1828 return -1; 1249 1829 } 1250 else if ( nthreads_new<= 0) {1830 else if (context->numthreads <= 0) { 1251 1831 fprintf(stderr, "Error. nthreads must be a positive integer"); 1252 1832 return -1; 1253 1833 } 1254 1834 1255 /* Only join threads if they are not initialized or if our PID is 1256 different from that in pid var (probably means that we are a 1257 subprocess, and thus threads are non-existent). */ 1258 if (nthreads > 1 && init_threads_done && pid == getpid()) { 1259 /* Tell all existing threads to finish */ 1260 end_threads = 1; 1261 /* Synchronization point for all threads (wait for initialization) */ 1262 WAIT_INIT; 1263 /* Join exiting threads */ 1264 for (t=0; t<nthreads; t++) { 1265 rc2 = pthread_join(threads[t], &status); 1266 if (rc2) { 1267 fprintf(stderr, "ERROR; return code from pthread_join() is %d\n", rc2); 1268 fprintf(stderr, "\tError detail: %s\n", strerror(rc2)); 1269 return(-1); 1270 } 1271 } 1272 init_threads_done = 0; 1273 end_threads = 0; 1274 } 1275 1276 /* Launch a new pool of threads (if necessary) */ 1277 nthreads = nthreads_new; 1278 if (nthreads > 1 && (!init_threads_done || pid != getpid())) { 1279 init_threads(); 1280 } 1281 1282 return nthreads_old; 1283 } 1284 1285 1286 /* Free possible memory temporaries and thread resources */ 1287 int blosc_free_resources(void) 1835 /* Launch a new pool of threads */ 1836 if (context->numthreads > 1 && context->numthreads != context->threads_started) { 1837 blosc_release_threadpool(context); 1838 init_threads(context); 1839 } 1840 1841 /* We have now started the threads */ 1842 context->threads_started = context->numthreads; 1843 1844 return context->numthreads; 1845 } 1846 1847 char* blosc_get_compressor(void) 1848 { 1849 char* compname; 1850 blosc_compcode_to_compname(g_compressor, &compname); 1851 1852 return compname; 1853 } 1854 1855 int blosc_set_compressor(const char *compname) 1856 { 1857 int code = blosc_compname_to_compcode(compname); 1858 1859 g_compressor = code; 1860 1861 /* Check if should initialize */ 1862 if (!g_initlib) blosc_init(); 1863 1864 return code; 1865 } 1866 1867 char* blosc_list_compressors(void) 1868 { 1869 static int compressors_list_done = 0; 1870 static char ret[256]; 1871 1872 if (compressors_list_done) return ret; 1873 ret[0] = '\0'; 1874 strcat(ret, BLOSC_BLOSCLZ_COMPNAME); 1875 #if defined(HAVE_LZ4) 1876 strcat(ret, ","); strcat(ret, BLOSC_LZ4_COMPNAME); 1877 strcat(ret, ","); strcat(ret, BLOSC_LZ4HC_COMPNAME); 1878 #endif /* HAVE_LZ4 */ 1879 #if defined(HAVE_SNAPPY) 1880 strcat(ret, ","); strcat(ret, BLOSC_SNAPPY_COMPNAME); 1881 #endif /* HAVE_SNAPPY */ 1882 #if defined(HAVE_ZLIB) 1883 strcat(ret, ","); strcat(ret, BLOSC_ZLIB_COMPNAME); 1884 #endif /* HAVE_ZLIB */ 1885 #if defined(HAVE_ZSTD) 1886 strcat(ret, ","); strcat(ret, BLOSC_ZSTD_COMPNAME); 1887 #endif /* HAVE_ZSTD */ 1888 compressors_list_done = 1; 1889 return ret; 1890 } 1891 1892 char* blosc_get_version_string(void) 1893 { 1894 static char ret[256]; 1895 strcpy(ret, BLOSC_VERSION_STRING); 1896 return ret; 1897 } 1898 1899 int blosc_get_complib_info(char *compname, char **complib, char **version) 1900 { 1901 int clibcode; 1902 char *clibname; 1903 char *clibversion = "unknown"; 1904 1905 #if (defined(HAVE_LZ4) && defined(LZ4_VERSION_MAJOR)) || (defined(HAVE_SNAPPY) && defined(SNAPPY_VERSION)) || defined(ZSTD_VERSION_MAJOR) 1906 char sbuffer[256]; 1907 #endif 1908 1909 clibcode = compname_to_clibcode(compname); 1910 clibname = clibcode_to_clibname(clibcode); 1911 1912 /* complib version */ 1913 if (clibcode == BLOSC_BLOSCLZ_LIB) { 1914 clibversion = BLOSCLZ_VERSION_STRING; 1915 } 1916 #if defined(HAVE_LZ4) 1917 else if (clibcode == BLOSC_LZ4_LIB) { 1918 #if defined(LZ4_VERSION_MAJOR) 1919 sprintf(sbuffer, "%d.%d.%d", 1920 LZ4_VERSION_MAJOR, LZ4_VERSION_MINOR, LZ4_VERSION_RELEASE); 1921 clibversion = sbuffer; 1922 #endif /* LZ4_VERSION_MAJOR */ 1923 } 1924 #endif /* HAVE_LZ4 */ 1925 #if defined(HAVE_SNAPPY) 1926 else if (clibcode == BLOSC_SNAPPY_LIB) { 1927 #if defined(SNAPPY_VERSION) 1928 sprintf(sbuffer, "%d.%d.%d", SNAPPY_MAJOR, SNAPPY_MINOR, SNAPPY_PATCHLEVEL); 1929 clibversion = sbuffer; 1930 #endif /* SNAPPY_VERSION */ 1931 } 1932 #endif /* HAVE_SNAPPY */ 1933 #if defined(HAVE_ZLIB) 1934 else if (clibcode == BLOSC_ZLIB_LIB) { 1935 clibversion = ZLIB_VERSION; 1936 } 1937 #endif /* HAVE_ZLIB */ 1938 #if defined(HAVE_ZSTD) 1939 else if (clibcode == BLOSC_ZSTD_LIB) { 1940 sprintf(sbuffer, "%d.%d.%d", 1941 ZSTD_VERSION_MAJOR, ZSTD_VERSION_MINOR, ZSTD_VERSION_RELEASE); 1942 clibversion = sbuffer; 1943 } 1944 #endif /* HAVE_ZSTD */ 1945 1946 *complib = strdup(clibname); 1947 *version = strdup(clibversion); 1948 return clibcode; 1949 } 1950 1951 /* Return `nbytes`, `cbytes` and `blocksize` from a compressed buffer. */ 1952 void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 1953 size_t *cbytes, size_t *blocksize) 1954 { 1955 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1956 uint8_t version, versionlz; /* versions for compressed header */ 1957 1958 /* Read the version info (could be useful in the future) */ 1959 version = _src[0]; /* blosc format version */ 1960 versionlz = _src[1]; /* blosclz format version */ 1961 1962 version += 0; /* shut up compiler warning */ 1963 versionlz += 0; /* shut up compiler warning */ 1964 1965 /* Read the interesting values */ 1966 *nbytes = (size_t)sw32_(_src + 4); /* uncompressed buffer size */ 1967 *blocksize = (size_t)sw32_(_src + 8); /* block size */ 1968 *cbytes = (size_t)sw32_(_src + 12); /* compressed buffer size */ 1969 } 1970 1971 1972 /* Return `typesize` and `flags` from a compressed buffer. */ 1973 void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 1974 int *flags) 1975 { 1976 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1977 uint8_t version, versionlz; /* versions for compressed header */ 1978 1979 /* Read the version info (could be useful in the future) */ 1980 version = _src[0]; /* blosc format version */ 1981 versionlz = _src[1]; /* blosclz format version */ 1982 1983 version += 0; /* shut up compiler warning */ 1984 versionlz += 0; /* shut up compiler warning */ 1985 1986 /* Read the interesting values */ 1987 *flags = (int)_src[2]; /* flags */ 1988 *typesize = (size_t)_src[3]; /* typesize */ 1989 } 1990 1991 1992 /* Return version information from a compressed buffer. */ 1993 void blosc_cbuffer_versions(const void *cbuffer, int *version, 1994 int *versionlz) 1995 { 1996 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1997 1998 /* Read the version info */ 1999 *version = (int)_src[0]; /* blosc format version */ 2000 *versionlz = (int)_src[1]; /* Lempel-Ziv compressor format version */ 2001 } 2002 2003 2004 /* Return the compressor library/format used in a compressed buffer. */ 2005 char *blosc_cbuffer_complib(const void *cbuffer) 2006 { 2007 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 2008 int clibcode; 2009 char *complib; 2010 2011 /* Read the compressor format/library info */ 2012 clibcode = (_src[2] & 0xe0) >> 5; 2013 complib = clibcode_to_clibname(clibcode); 2014 return complib; 2015 } 2016 2017 /* Get the internal blocksize to be used during compression. 0 means 2018 that an automatic blocksize is computed internally. */ 2019 int blosc_get_blocksize(void) 2020 { 2021 return (int)g_force_blocksize; 2022 } 2023 2024 /* Force the use of a specific blocksize. If 0, an automatic 2025 blocksize will be used (the default). */ 2026 void blosc_set_blocksize(size_t size) 2027 { 2028 g_force_blocksize = (int32_t)size; 2029 } 2030 2031 void blosc_init(void) 2032 { 2033 /* Return if we are already initialized */ 2034 if (g_initlib) return; 2035 2036 pthread_mutex_init(&global_comp_mutex, NULL); 2037 g_global_context = (struct blosc_context*)my_malloc(sizeof(struct blosc_context)); 2038 g_global_context->threads_started = 0; 2039 g_initlib = 1; 2040 } 2041 2042 void blosc_destroy(void) 2043 { 2044 /* Return if Blosc is not initialized */ 2045 if (!g_initlib) return; 2046 2047 g_initlib = 0; 2048 blosc_release_threadpool(g_global_context); 2049 my_free(g_global_context); 2050 pthread_mutex_destroy(&global_comp_mutex); 2051 } 2052 2053 int blosc_release_threadpool(struct blosc_context* context) 1288 2054 { 1289 2055 int32_t t; 2056 void* status; 2057 int rc; 1290 2058 int rc2; 1291 void *status; 1292 1293 /* Take global lock */ 1294 pthread_mutex_lock(&global_comp_mutex); 1295 1296 /* Release temporaries */ 1297 if (init_temps_done) { 1298 release_temporaries(); 1299 } 1300 1301 /* Finish the possible thread pool */ 1302 if (nthreads > 1 && init_threads_done) { 2059 2060 if (context->threads_started > 0) 2061 { 1303 2062 /* Tell all existing threads to finish */ 1304 end_threads = 1; 1305 /* Synchronization point for all threads (wait for initialization) */ 1306 WAIT_INIT; 2063 context->end_threads = 1; 2064 2065 /* Sync threads */ 2066 WAIT_INIT(-1, context); 2067 1307 2068 /* Join exiting threads */ 1308 for (t=0; t< nthreads; t++) {1309 rc2 = pthread_join( threads[t], &status);2069 for (t=0; t<context->threads_started; t++) { 2070 rc2 = pthread_join(context->threads[t], &status); 1310 2071 if (rc2) { 1311 2072 fprintf(stderr, "ERROR; return code from pthread_join() is %d\n", rc2); 1312 2073 fprintf(stderr, "\tError detail: %s\n", strerror(rc2)); 1313 return(-1);1314 2074 } 1315 2075 } 1316 2076 1317 2077 /* Release mutex and condition variable objects */ 1318 pthread_mutex_destroy(&co unt_mutex);2078 pthread_mutex_destroy(&context->count_mutex); 1319 2079 1320 2080 /* Barriers */ 1321 #ifdef _POSIX_BARRIERS_MINE 1322 pthread_barrier_destroy(&barr_init); 1323 pthread_barrier_destroy(&barr_finish); 1324 #else 1325 pthread_mutex_destroy(&count_threads_mutex); 1326 pthread_cond_destroy(&count_threads_cv); 1327 #endif 1328 1329 /* Thread attributes */ 1330 #if !defined(_WIN32) 1331 pthread_attr_destroy(&ct_attr); 1332 #endif 1333 1334 init_threads_done = 0; 1335 end_threads = 0; 1336 } 1337 /* Release global lock */ 1338 pthread_mutex_unlock(&global_comp_mutex); 1339 return(0); 1340 1341 } 1342 1343 void blosc_destroy(void) { 1344 /* Free the resources */ 1345 blosc_free_resources(); 1346 /* Destroy global lock */ 1347 pthread_mutex_destroy(&global_comp_mutex); 1348 } 1349 1350 /* Return `nbytes`, `cbytes` and `blocksize` from a compressed buffer. */ 1351 void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 1352 size_t *cbytes, size_t *blocksize) 1353 { 1354 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1355 uint8_t version, versionlz; /* versions for compressed header */ 1356 1357 /* Read the version info (could be useful in the future) */ 1358 version = _src[0]; /* blosc format version */ 1359 versionlz = _src[1]; /* blosclz format version */ 1360 1361 version += 0; /* shut up compiler warning */ 1362 versionlz += 0; /* shut up compiler warning */ 1363 1364 /* Read the interesting values */ 1365 _src += 4; 1366 *nbytes = (size_t)sw32(((int32_t *)_src)[0]); /* uncompressed buffer size */ 1367 *blocksize = (size_t)sw32(((int32_t *)_src)[1]); /* block size */ 1368 *cbytes = (size_t)sw32(((int32_t *)_src)[2]); /* compressed buffer size */ 1369 } 1370 1371 1372 /* Return `typesize` and `flags` from a compressed buffer. */ 1373 void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 1374 int *flags) 1375 { 1376 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1377 uint8_t version, versionlz; /* versions for compressed header */ 1378 1379 /* Read the version info (could be useful in the future) */ 1380 version = _src[0]; /* blosc format version */ 1381 versionlz = _src[1]; /* blosclz format version */ 1382 1383 version += 0; /* shut up compiler warning */ 1384 versionlz += 0; /* shut up compiler warning */ 1385 1386 /* Read the interesting values */ 1387 *flags = (int)_src[2]; /* flags */ 1388 *typesize = (size_t)_src[3]; /* typesize */ 1389 } 1390 1391 1392 /* Return version information from a compressed buffer. */ 1393 void blosc_cbuffer_versions(const void *cbuffer, int *version, 1394 int *versionlz) 1395 { 1396 uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */ 1397 1398 /* Read the version info */ 1399 *version = (int)_src[0]; /* blosc format version */ 1400 *versionlz = (int)_src[1]; /* blosclz format version */ 1401 } 1402 1403 1404 /* Force the use of a specific blocksize. If 0, an automatic 1405 blocksize will be used (the default). */ 1406 void blosc_set_blocksize(size_t size) 1407 { 1408 /* Take global lock */ 1409 pthread_mutex_lock(&global_comp_mutex); 1410 1411 force_blocksize = (int32_t)size; 1412 1413 /* Release global lock */ 1414 pthread_mutex_unlock(&global_comp_mutex); 1415 } 2081 #ifdef _POSIX_BARRIERS_MINE 2082 pthread_barrier_destroy(&context->barr_init); 2083 pthread_barrier_destroy(&context->barr_finish); 2084 #else 2085 pthread_mutex_destroy(&context->count_threads_mutex); 2086 pthread_cond_destroy(&context->count_threads_cv); 2087 #endif 2088 2089 /* Thread attributes */ 2090 #if !defined(_WIN32) 2091 pthread_attr_destroy(&context->ct_attr); 2092 #endif 2093 2094 } 2095 2096 context->threads_started = 0; 2097 2098 return 0; 2099 } 2100 2101 int blosc_free_resources(void) 2102 { 2103 /* Return if Blosc is not initialized */ 2104 if (!g_initlib) return -1; 2105 2106 return blosc_release_threadpool(g_global_context); 2107 } -
thirdparty/blosc/blosc.h
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library3 4 Author: Francesc Alted <f [email protected]>2 Blosc - Blocked Shuffling and Compression Library 3 4 Author: Francesc Alted <f[email protected]> 5 5 6 6 See LICENSES/BLOSC.txt for details about copyright and rights to use. 7 7 **********************************************************************/ 8 9 #include <limits.h>10 11 8 #ifndef BLOSC_H 12 9 #define BLOSC_H 13 10 11 #include <limits.h> 12 #include <stdlib.h> 13 #include "blosc-export.h" 14 15 #ifdef __cplusplus 16 extern "C" { 17 #endif 18 14 19 /* Version numbers */ 15 20 #define BLOSC_VERSION_MAJOR 1 /* for major interface/format changes */ 16 #define BLOSC_VERSION_MINOR 2/* for minor interface/format changes */17 #define BLOSC_VERSION_RELEASE 3/* for tweaks, bug-fixes, or development */18 19 #define BLOSC_VERSION_STRING "1. 2.3" /* string version. Sync with above! */21 #define BLOSC_VERSION_MINOR 10 /* for minor interface/format changes */ 22 #define BLOSC_VERSION_RELEASE 1 /* for tweaks, bug-fixes, or development */ 23 24 #define BLOSC_VERSION_STRING "1.10.1.dev" /* string version. Sync with above! */ 20 25 #define BLOSC_VERSION_REVISION "$Rev$" /* revision version */ 21 #define BLOSC_VERSION_DATE "$Date:: 2013-05-17 #$" /* date version */ 22 23 /* The *_VERS_FORMAT should be just 1-byte long */ 26 #define BLOSC_VERSION_DATE "$Date:: 2016-07-20 #$" /* date version */ 27 28 #define BLOSCLZ_VERSION_STRING "1.0.5" /* the internal compressor version */ 29 30 /* The *_FORMAT symbols should be just 1-byte long */ 24 31 #define BLOSC_VERSION_FORMAT 2 /* Blosc format version, starting at 1 */ 25 #define BLOSCLZ_VERSION_FORMAT 1 /* Blosclz format version, starting at 1 */26 27 /* The combined blosc and blosclz formats */28 #define BLOSC_VERSION_CFORMAT (BLOSC_VERSION_FORMAT << 8) & (BLOSCLZ_VERSION_FORMAT)29 32 30 33 /* Minimum header length */ … … 36 39 #define BLOSC_MAX_OVERHEAD BLOSC_MIN_HEADER_LENGTH 37 40 38 /* Maximum buffer size to be compressed */41 /* Maximum source buffer size to be compressed */ 39 42 #define BLOSC_MAX_BUFFERSIZE (INT_MAX - BLOSC_MAX_OVERHEAD) 40 43 41 /* Maximum typesize before considering buffer as a stream of bytes */44 /* Maximum typesize before considering source buffer as a stream of bytes */ 42 45 #define BLOSC_MAX_TYPESIZE 255 /* Cannot be larger than 255 */ 43 46 … … 45 48 #define BLOSC_MAX_THREADS 256 46 49 50 /* Codes for shuffling (see blosc_compress) */ 51 #define BLOSC_NOSHUFFLE 0 /* no shuffle */ 52 #define BLOSC_SHUFFLE 1 /* byte-wise shuffle */ 53 #define BLOSC_BITSHUFFLE 2 /* bit-wise shuffle */ 54 47 55 /* Codes for internal flags (see blosc_cbuffer_metainfo) */ 48 #define BLOSC_DOSHUFFLE 0x1 49 #define BLOSC_MEMCPYED 0x2 50 51 52 53 /** 54 Initialize the Blosc library. You must call this previous to any other 55 Blosc call, and make sure that you call this in a non-threaded environment. 56 Other Blosc calls can be called in a threaded environment, if desired. 57 58 */ 59 60 void blosc_init(void); 61 62 63 /** 64 65 Destroy the Blosc library environment. You must call this after to you are 66 done with all the Blosc calls, and make sure that you call this in a 67 non-threaded environment. 68 69 */ 70 71 void blosc_destroy(void); 56 #define BLOSC_DOSHUFFLE 0x1 /* byte-wise shuffle */ 57 #define BLOSC_MEMCPYED 0x2 /* plain copy */ 58 #define BLOSC_DOBITSHUFFLE 0x4 /* bit-wise shuffle */ 59 60 /* Codes for the different compressors shipped with Blosc */ 61 #define BLOSC_BLOSCLZ 0 62 #define BLOSC_LZ4 1 63 #define BLOSC_LZ4HC 2 64 #define BLOSC_SNAPPY 3 65 #define BLOSC_ZLIB 4 66 #define BLOSC_ZSTD 5 67 68 /* Names for the different compressors shipped with Blosc */ 69 #define BLOSC_BLOSCLZ_COMPNAME "blosclz" 70 #define BLOSC_LZ4_COMPNAME "lz4" 71 #define BLOSC_LZ4HC_COMPNAME "lz4hc" 72 #define BLOSC_SNAPPY_COMPNAME "snappy" 73 #define BLOSC_ZLIB_COMPNAME "zlib" 74 #define BLOSC_ZSTD_COMPNAME "zstd" 75 76 /* Codes for compression libraries shipped with Blosc (code must be < 8) */ 77 #define BLOSC_BLOSCLZ_LIB 0 78 #define BLOSC_LZ4_LIB 1 79 #define BLOSC_SNAPPY_LIB 2 80 #define BLOSC_ZLIB_LIB 3 81 #define BLOSC_ZSTD_LIB 4 82 83 /* Names for the different compression libraries shipped with Blosc */ 84 #define BLOSC_BLOSCLZ_LIBNAME "BloscLZ" 85 #define BLOSC_LZ4_LIBNAME "LZ4" 86 #define BLOSC_SNAPPY_LIBNAME "Snappy" 87 #define BLOSC_ZLIB_LIBNAME "Zlib" 88 #define BLOSC_ZSTD_LIBNAME "Zstd" 89 90 /* The codes for compressor formats shipped with Blosc */ 91 #define BLOSC_BLOSCLZ_FORMAT BLOSC_BLOSCLZ_LIB 92 #define BLOSC_LZ4_FORMAT BLOSC_LZ4_LIB 93 #define BLOSC_LZ4HC_FORMAT BLOSC_LZ4_LIB /* LZ4HC and LZ4 share the same format */ 94 #define BLOSC_SNAPPY_FORMAT BLOSC_SNAPPY_LIB 95 #define BLOSC_ZLIB_FORMAT BLOSC_ZLIB_LIB 96 #define BLOSC_ZSTD_FORMAT BLOSC_ZSTD_LIB 97 98 99 /* The version formats for compressors shipped with Blosc */ 100 /* All versions here starts at 1 */ 101 #define BLOSC_BLOSCLZ_VERSION_FORMAT 1 102 #define BLOSC_LZ4_VERSION_FORMAT 1 103 #define BLOSC_LZ4HC_VERSION_FORMAT 1 /* LZ4HC and LZ4 share the same format */ 104 #define BLOSC_SNAPPY_VERSION_FORMAT 1 105 #define BLOSC_ZLIB_VERSION_FORMAT 1 106 #define BLOSC_ZSTD_VERSION_FORMAT 1 107 108 109 /** 110 Initialize the Blosc library environment. 111 112 You must call this previous to any other Blosc call, unless you want 113 Blosc to be used simultaneously in a multi-threaded environment, in 114 which case you should *exclusively* use the 115 blosc_compress_ctx()/blosc_decompress_ctx() pair (see below). 116 */ 117 BLOSC_EXPORT void blosc_init(void); 118 119 120 /** 121 Destroy the Blosc library environment. 122 123 You must call this after to you are done with all the Blosc calls, 124 unless you have not used blosc_init() before (see blosc_init() 125 above). 126 */ 127 BLOSC_EXPORT void blosc_destroy(void); 72 128 73 129 74 130 /** 75 131 Compress a block of data in the `src` buffer and returns the size of 76 compressed block. The size of `src` buffer is specified by132 the compressed block. The size of `src` buffer is specified by 77 133 `nbytes`. There is not a minimum for `src` buffer size (`nbytes`). 78 134 … … 81 137 82 138 `doshuffle` specifies whether the shuffle compression preconditioner 83 should be applied or not. 0 means not applying it and 1 means 84 applying it. 139 should be applied or not. BLOSC_NOSHUFFLE means not applying it, 140 BLOSC_SHUFFLE means applying it at a byte level and BLOSC_BITSHUFFLE 141 at a bit level (slower but may achieve better entropy alignment). 85 142 86 143 `typesize` is the number of bytes for the atomic type in binary 87 144 `src` buffer. This is mainly useful for the shuffle preconditioner. 88 Only a typesize > 1 will allow the shuffle to work. 145 For implementation reasons, only a 1 < typesize < 256 will allow the 146 shuffle filter to work. When typesize is not in this range, shuffle 147 will be silently disabled. 89 148 90 149 The `dest` buffer must have at least the size of `destsize`. Blosc … … 93 152 The `src` buffer and the `dest` buffer can not overlap. 94 153 154 Compression is memory safe and guaranteed not to write the `dest` 155 buffer more than what is specified in `destsize`. 156 95 157 If `src` buffer cannot be compressed into `destsize`, the return 96 158 value is zero and you should discard the contents of the `dest` … … 101 163 together with the buffer data causing this and compression settings. 102 164 103 Compression is memory safe and guaranteed not to write the `dest` 104 buffer more than what is specified in `destsize`. However, it is 105 not re-entrant and not thread-safe (despite the fact that it uses 106 threads internally). 107 */ 108 109 int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 110 const void *src, void *dest, size_t destsize); 111 165 Environment variables 166 --------------------- 167 168 blosc_compress() honors different environment variables to control 169 internal parameters without the need of doing that programatically. 170 Here are the ones supported: 171 172 BLOSC_CLEVEL=(INTEGER): This will overwrite the `clevel` parameter 173 before the compression process starts. 174 175 BLOSC_SHUFFLE=[NOSHUFFLE | SHUFFLE | BITSHUFFLE]: This will 176 overwrite the `doshuffle` parameter before the compression process 177 starts. 178 179 BLOSC_TYPESIZE=(INTEGER): This will overwrite the `typesize` 180 parameter before the compression process starts. 181 182 BLOSC_COMPRESSOR=[BLOSCLZ | LZ4 | LZ4HC | SNAPPY | ZLIB]: This will 183 call blosc_set_compressor(BLOSC_COMPRESSOR) before the compression 184 process starts. 185 186 BLOSC_NTHREADS=(INTEGER): This will call 187 blosc_set_nthreads(BLOSC_NTHREADS) before the compression process 188 starts. 189 190 BLOSC_BLOCKSIZE=(INTEGER): This will call 191 blosc_set_blocksize(BLOSC_BLOCKSIZE) before the compression process 192 starts. *NOTE:* The blocksize is a critical parameter with 193 important restrictions in the allowed values, so use this with care. 194 195 BLOSC_NOLOCK=(ANY VALUE): This will call blosc_compress_ctx() under 196 the hood, with the `compressor`, `blocksize` and 197 `numinternalthreads` parameters set to the same as the last calls to 198 blosc_set_compressor(), blosc_set_blocksize() and 199 blosc_set_nthreads(). BLOSC_CLEVEL, BLOSC_SHUFFLE, BLOSC_TYPESIZE 200 environment vars will also be honored. 201 */ 202 BLOSC_EXPORT int blosc_compress(int clevel, int doshuffle, size_t typesize, 203 size_t nbytes, const void *src, void *dest, 204 size_t destsize); 205 206 207 /** 208 Context interface to blosc compression. This does not require a call 209 to blosc_init() and can be called from multithreaded applications 210 without the global lock being used, so allowing Blosc be executed 211 simultaneously in those scenarios. 212 213 It uses the same parameters than the blosc_compress() function plus: 214 215 `compressor`: the string representing the type of compressor to use. 216 217 `blocksize`: the requested size of the compressed blocks. If 0, an 218 automatic blocksize will be used. 219 220 `numinternalthreads`: the number of threads to use internally. 221 222 A negative return value means that an internal error happened. This 223 should never happen. If you see this, please report it back 224 together with the buffer data causing this and compression settings. 225 */ 226 BLOSC_EXPORT int blosc_compress_ctx(int clevel, int doshuffle, size_t typesize, 227 size_t nbytes, const void* src, void* dest, 228 size_t destsize, const char* compressor, 229 size_t blocksize, int numinternalthreads); 112 230 113 231 /** 114 232 Decompress a block of compressed data in `src`, put the result in 115 `dest` and returns the size of the decompressed block. If error 116 occurs, e.g. the compressed data is corrupted or the output buffer 117 is not large enough, then 0 (zero) or a negative value will be 118 returned instead. 233 `dest` and returns the size of the decompressed block. 119 234 120 235 The `src` buffer and the `dest` buffer can not overlap. 121 236 122 237 Decompression is memory safe and guaranteed not to write the `dest` 123 buffer more than what is specified in `destsize`. However, it is 124 not re-entrant and not thread-safe (despite the fact that it uses 125 threads internally). 238 buffer more than what is specified in `destsize`. 239 240 If an error occurs, e.g. the compressed data is corrupted or the 241 output buffer is not large enough, then 0 (zero) or a negative value 242 will be returned instead. 243 244 Environment variables 245 --------------------- 246 247 blosc_decompress() honors different environment variables to control 248 internal parameters without the need of doing that programatically. 249 Here are the ones supported: 250 251 BLOSC_NTHREADS=(INTEGER): This will call 252 blosc_set_nthreads(BLOSC_NTHREADS) before the proper decompression 253 process starts. 254 255 BLOSC_NOLOCK=(ANY VALUE): This will call blosc_decompress_ctx() 256 under the hood, with the `numinternalthreads` parameter set to the 257 same value as the last call to blosc_set_nthreads(). 126 258 */ 127 128 int blosc_decompress(const void *src, void *dest, size_t destsize); 129 259 BLOSC_EXPORT int blosc_decompress(const void *src, void *dest, size_t destsize); 260 261 262 /** 263 Context interface to blosc decompression. This does not require a 264 call to blosc_init() and can be called from multithreaded 265 applications without the global lock being used, so allowing Blosc 266 be executed simultaneously in those scenarios. 267 268 It uses the same parameters than the blosc_decompress() function plus: 269 270 `numinternalthreads`: number of threads to use internally. 271 272 Decompression is memory safe and guaranteed not to write the `dest` 273 buffer more than what is specified in `destsize`. 274 275 If an error occurs, e.g. the compressed data is corrupted or the 276 output buffer is not large enough, then 0 (zero) or a negative value 277 will be returned instead. 278 */ 279 BLOSC_EXPORT int blosc_decompress_ctx(const void *src, void *dest, 280 size_t destsize, int numinternalthreads); 130 281 131 282 /** 132 283 Get `nitems` (of typesize size) in `src` buffer starting in `start`. 133 284 The items are returned in `dest` buffer, which has to have enough 134 space for storing all items. Returns the number of bytes copied to 135 `dest` or a negative value if some error happens. 136 */ 137 138 int blosc_getitem(const void *src, int start, int nitems, void *dest); 285 space for storing all items. 286 287 Returns the number of bytes copied to `dest` or a negative value if 288 some error happens. 289 */ 290 BLOSC_EXPORT int blosc_getitem(const void *src, int start, int nitems, void *dest); 291 292 293 /** 294 Returns the current number of threads that are used for 295 compression/decompression. 296 */ 297 BLOSC_EXPORT int blosc_get_nthreads(void); 139 298 140 299 … … 142 301 Initialize a pool of threads for compression/decompression. If 143 302 `nthreads` is 1, then the serial version is chosen and a possible 144 previous existing pool is ended. Returns the previous number of 145 threads. If this is not called, `nthreads` is set to 1 internally. 303 previous existing pool is ended. If this is not called, `nthreads` 304 is set to 1 internally. 305 306 Returns the previous number of threads. 307 */ 308 BLOSC_EXPORT int blosc_set_nthreads(int nthreads); 309 310 311 /** 312 Returns the current compressor that is used for compression. 313 */ 314 BLOSC_EXPORT char* blosc_get_compressor(void); 315 316 317 /** 318 Select the compressor to be used. The supported ones are "blosclz", 319 "lz4", "lz4hc", "snappy", "zlib" and "ztsd". If this function is not 320 called, then "blosclz" will be used. 321 322 In case the compressor is not recognized, or there is not support 323 for it in this build, it returns a -1. Else it returns the code for 324 the compressor (>=0). 325 */ 326 BLOSC_EXPORT int blosc_set_compressor(const char* compname); 327 328 329 /** 330 Get the `compname` associated with the `compcode`. 331 332 If the compressor code is not recognized, or there is not support 333 for it in this build, -1 is returned. Else, the compressor code is 334 returned. 335 */ 336 BLOSC_EXPORT int blosc_compcode_to_compname(int compcode, char **compname); 337 338 339 /** 340 Return the compressor code associated with the compressor name. 341 342 If the compressor name is not recognized, or there is not support 343 for it in this build, -1 is returned instead. 344 */ 345 BLOSC_EXPORT int blosc_compname_to_compcode(const char *compname); 346 347 348 /** 349 Get a list of compressors supported in the current build. The 350 returned value is a string with a concatenation of "blosclz", "lz4", 351 "lz4hc", "snappy", "zlib" or "zstd "separated by commas, depending 352 on which ones are present in the build. 353 354 This function does not leak, so you should not free() the returned 355 list. 356 357 This function should always succeed. 358 */ 359 BLOSC_EXPORT char* blosc_list_compressors(void); 360 361 /** 362 Return the version of blosc in string format. 363 364 Useful for dynamic libraries. 146 365 */ 147 148 int blosc_set_nthreads(int nthreads); 149 150 151 /** 152 Free possible memory temporaries and thread resources. Use this when you 153 are not going to use Blosc for a long while. In case of problems releasing 154 the resources, it returns a negative number, else it returns 0. 155 */ 156 157 int blosc_free_resources(void); 366 BLOSC_EXPORT char* blosc_get_version_string(void); 367 368 369 /** 370 Get info from compression libraries included in the current build. 371 In `compname` you pass the compressor name that you want info from. 372 In `complib` and `version` you get the compression library name and 373 version (if available) as output. 374 375 In `complib` and `version` you get a pointer to the compressor 376 library name and the version in string format respectively. After 377 using the name and version, you should free() them so as to avoid 378 leaks. 379 380 If the compressor is supported, it returns the code for the library 381 (>=0). If it is not supported, this function returns -1. 382 */ 383 BLOSC_EXPORT int blosc_get_complib_info(char *compname, char **complib, char **version); 384 385 386 /** 387 Free possible memory temporaries and thread resources. Use this 388 when you are not going to use Blosc for a long while. In case of 389 problems releasing the resources, it returns a negative number, else 390 it returns 0. 391 */ 392 BLOSC_EXPORT int blosc_free_resources(void); 158 393 159 394 … … 168 403 169 404 This function should always succeed. 170 */ 171 172 void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 173 size_t *cbytes, size_t *blocksize); 405 */ 406 BLOSC_EXPORT void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 407 size_t *cbytes, size_t *blocksize); 174 408 175 409 … … 182 416 * bit 1: whether the internal buffer is a pure memcpy or not 183 417 184 You can use the `BLOSC_DOSHUFFLE` and `BLOSC_MEMCPYED` symbols for 185 extracting the interesting bits (e.g. ``flags & BLOSC_DOSHUFFLE`` 186 says whether the buffer is shuffled or not). 418 You can use the `BLOSC_DOSHUFFLE`, `BLOSC_DOBITSHUFFLE` and 419 `BLOSC_MEMCPYED` symbols for extracting the interesting bits 420 (e.g. ``flags & BLOSC_DOSHUFFLE`` says whether the buffer is 421 byte-shuffled or not). 187 422 188 423 This function should always succeed. 189 */ 190 191 void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 192 int *flags); 424 */ 425 BLOSC_EXPORT void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 426 int *flags); 193 427 194 428 … … 196 430 Return information about a compressed buffer, namely the internal 197 431 Blosc format version (`version`) and the format for the internal 198 Lempel-Ziv algorithm (`versionlz`). This function should always 199 succeed. 200 */ 201 202 void blosc_cbuffer_versions(const void *cbuffer, int *version, 203 int *versionlz); 432 Lempel-Ziv compressor used (`versionlz`). 433 434 This function should always succeed. 435 */ 436 BLOSC_EXPORT void blosc_cbuffer_versions(const void *cbuffer, int *version, 437 int *versionlz); 438 439 440 /** 441 Return the compressor library/format used in a compressed buffer. 442 443 This function should always succeed. 444 */ 445 BLOSC_EXPORT char *blosc_cbuffer_complib(const void *cbuffer); 204 446 205 447 … … 211 453 *********************************************************************/ 212 454 455 /* Get the internal blocksize to be used during compression. 0 means 456 that an automatic blocksize is computed internally. */ 457 BLOSC_EXPORT int blosc_get_blocksize(void); 213 458 214 459 /** 215 460 Force the use of a specific blocksize. If 0, an automatic 216 461 blocksize will be used (the default). 217 */ 218 219 void blosc_set_blocksize(size_t blocksize); 220 221 462 463 The blocksize is a critical parameter with important restrictions in 464 the allowed values, so use this with care. 465 */ 466 BLOSC_EXPORT void blosc_set_blocksize(size_t blocksize); 467 468 #ifdef __cplusplus 469 } 222 470 #endif 471 472 473 #endif -
thirdparty/blosc/blosclz.c
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library3 4 Author: Francesc Alted <f [email protected]>2 Blosc - Blocked Shuffling and Compression Library 3 4 Author: Francesc Alted <f[email protected]> 5 5 Creation date: 2009-05-20 6 6 … … 21 21 #if defined(_WIN32) && !defined(__MINGW32__) 22 22 #include <windows.h> 23 #include "win32/stdint-windows.h" 23 24 /* stdint.h only available in VS2010 (VC++ 16.0) and newer */ 25 #if defined(_MSC_VER) && _MSC_VER < 1600 26 #include "win32/stdint-windows.h" 27 #else 28 #include <stdint.h> 29 #endif 30 /* llabs only available in VS2013 (VC++ 18.0) and newer */ 31 #if defined(_MSC_VER) && _MSC_VER < 1800 32 #define llabs(v) abs(v) 33 #endif 24 34 #else 25 35 #include <stdint.h> … … 36 46 #elif defined(__i486__) || defined(__i586__) || defined(__i686__) /* GNU C */ 37 47 #undef BLOSCLZ_STRICT_ALIGN 38 #elif defined(_M_IX86) /* Intel, MSVC */48 #elif defined(_M_IX86) || defined(_M_X64) /* Intel, MSVC */ 39 49 #undef BLOSCLZ_STRICT_ALIGN 40 50 #elif defined(__386) … … 44 54 #elif defined(__I86__) /* Digital Mars */ 45 55 #undef BLOSCLZ_STRICT_ALIGN 56 /* Seems like unaligned access in ARM (at least ARMv6) is pretty 57 expensive, so we are going to always enforce strict aligment in ARM. 58 If anybody suggest that newer ARMs are better, we can revisit this. */ 59 /* #elif defined(__ARM_FEATURE_UNALIGNED) */ /* ARM, GNU C */ 60 /* #undef BLOSCLZ_STRICT_ALIGN */ 46 61 #endif 47 62 #endif … … 67 82 * Use inlined functions for supported systems. 68 83 */ 69 #if defined(__GNUC__) || defined(__DMC__) || defined(__POCC__) || defined(__WATCOMC__) || defined(__SUNPRO_C) 70 #define BLOSCLZ_INLINE inline 71 #elif defined(__BORLANDC__) || defined(_MSC_VER) || defined(__LCC__) 72 #define BLOSCLZ_INLINE __inline 73 #else 74 #define BLOSCLZ_INLINE 84 #if defined(_MSC_VER) && !defined(__cplusplus) /* Visual Studio */ 85 #define inline __inline /* Visual C is not C99, but supports some kind of inline */ 75 86 #endif 76 87 77 88 #define MAX_COPY 32 78 #define MAX_LEN 264 /* 256 + 8 */79 89 #define MAX_DISTANCE 8191 80 90 #define MAX_FARDISTANCE (65535+MAX_DISTANCE-1) … … 87 97 88 98 89 static BLOSCLZ_INLINE int32_t hash_function(uint8_t* p, uint8_t hash_log) 90 { 91 int32_t v; 92 93 v = BLOSCLZ_READU16(p); 94 v ^= BLOSCLZ_READU16(p+1)^(v>>(16-hash_log)); 95 v &= (1 << hash_log) - 1; 96 return v; 99 /* 100 * Fast copy macros 101 */ 102 #if defined(_WIN32) 103 #define CPYSIZE 32 104 #else 105 #define CPYSIZE 8 106 #endif 107 #define MCPY(d,s) { memcpy(d, s, CPYSIZE); d+=CPYSIZE; s+=CPYSIZE; } 108 #define FASTCOPY(d,s,e) { do { MCPY(d,s) } while (d<e); } 109 #define SAFECOPY(d,s,e) { while (d<e) { MCPY(d,s) } } 110 111 /* Copy optimized for copying in blocks */ 112 #define BLOCK_COPY(op, ref, len, op_limit) \ 113 { int ilen = len % CPYSIZE; \ 114 uint8_t *cpy = op + len; \ 115 if (cpy + CPYSIZE - ilen <= op_limit) { \ 116 FASTCOPY(op, ref, cpy); \ 117 ref -= (op-cpy); op = cpy; \ 118 } \ 119 else { \ 120 cpy -= ilen; \ 121 SAFECOPY(op, ref, cpy); \ 122 ref -= (op-cpy); op = cpy; \ 123 for(; ilen; --ilen) \ 124 *op++ = *ref++; \ 125 } \ 97 126 } 98 127 128 #define SAFE_COPY(op, ref, len, op_limit) \ 129 if (llabs(op-ref) < CPYSIZE) { \ 130 for(; len; --len) \ 131 *op++ = *ref++; \ 132 } \ 133 else BLOCK_COPY(op, ref, len, op_limit); 134 135 /* Copy optimized for GCC 4.8. Seems like long copy loops are optimal. */ 136 #define GCC_SAFE_COPY(op, ref, len, op_limit) \ 137 if ((len > 32) || (llabs(op-ref) < CPYSIZE)) { \ 138 for(; len; --len) \ 139 *op++ = *ref++; \ 140 } \ 141 else BLOCK_COPY(op, ref, len, op_limit); 142 143 /* Simple, but pretty effective hash function for 3-byte sequence */ 144 #define HASH_FUNCTION(v, p, l) { \ 145 v = BLOSCLZ_READU16(p); \ 146 v ^= BLOSCLZ_READU16(p + 1) ^ ( v >> (16 - l)); \ 147 v &= (1 << l) - 1; \ 148 } 149 150 /* Another version which seems to be a bit more effective than the above, 151 * but a bit slower. Could be interesting for high opt_level. 152 */ 153 #define MINMATCH 3 154 #define HASH_FUNCTION2(v, p, l) { \ 155 v = BLOSCLZ_READU16(p); \ 156 v = (v * 2654435761U) >> ((MINMATCH * 8) - (l + 1)); \ 157 v &= (1 << l) - 1; \ 158 } 159 160 #define LITERAL(ip, op, op_limit, anchor, copy) { \ 161 if (BLOSCLZ_UNEXPECT_CONDITIONAL(op+2 > op_limit)) \ 162 goto out; \ 163 *op++ = *anchor++; \ 164 ip = anchor; \ 165 copy++; \ 166 if(BLOSCLZ_UNEXPECT_CONDITIONAL(copy == MAX_COPY)) { \ 167 copy = 0; \ 168 *op++ = MAX_COPY-1; \ 169 } \ 170 continue; \ 171 } 99 172 100 173 #define IP_BOUNDARY 2 101 174 102 int blosclz_compress(int opt_level, const void* input, 103 int length, void* output, int maxout) 175 176 int blosclz_compress(const int opt_level, const void* input, int length, 177 void* output, int maxout, int accel) 104 178 { 105 179 uint8_t* ip = (uint8_t*) input; … … 110 184 111 185 /* Hash table depends on the opt level. Hash_log cannot be larger than 15. */ 112 uint8_t hash_log_[10] = {-1, 8, 9, 9, 11, 11, 12, 13, 14, 15}; 186 /* The parametrization below is made from playing with the bench suite, like: 187 $ bench/bench blosclz single 4 188 $ bench/bench blosclz single 4 4194280 12 25 189 and taking the minimum times on a i5-3380M @ 2.90GHz. 190 Curiously enough, values >= 14 does not always 191 get maximum compression, even with large blocksizes. */ 192 int8_t hash_log_[10] = {-1, 11, 11, 11, 12, 13, 13, 13, 13, 13}; 113 193 uint8_t hash_log = hash_log_[opt_level]; 114 194 uint16_t hash_size = 1 << hash_log; … … 116 196 uint8_t* op_limit; 117 197 118 int32_t hslot;119 198 int32_t hval; 120 199 uint8_t copy; 121 200 122 double maxlength_[10] = {-1, .1, .15, .2, . 5, .7, .85, .925, .975, 1.0};201 double maxlength_[10] = {-1, .1, .15, .2, .3, .45, .6, .75, .9, 1.0}; 123 202 int32_t maxlength = (int32_t) (length * maxlength_[opt_level]); 124 203 if (maxlength > (int32_t) maxout) { … … 127 206 op_limit = op + maxlength; 128 207 129 /* output buffer cannot be less than 66 bytes or we can get into problems. 130 As output is usually the same length than input, we take input length. */ 131 if (length < 66) { 132 return 0; /* Mark this as uncompressible */ 208 /* output buffer cannot be less than 66 bytes or we can get into trouble */ 209 if (BLOSCLZ_UNEXPECT_CONDITIONAL(maxlength < 66 || length < 4)) { 210 return 0; 133 211 } 134 212 135 htab = (uint16_t *) malloc(hash_size*sizeof(uint16_t)); 136 137 /* sanity check */ 138 if(BLOSCLZ_UNEXPECT_CONDITIONAL(length < 4)) { 139 if(length) { 140 /* create literal copy only */ 141 *op++ = length-1; 142 ip_bound++; 143 while(ip <= ip_bound) 144 *op++ = *ip++; 145 free(htab); 146 return length+1; 147 } 148 else goto out; 149 } 150 151 /* initializes hash table */ 152 for (hslot = 0; hslot < hash_size; hslot++) 153 htab[hslot] = 0; 213 /* prepare the acceleration to be used in condition */ 214 accel = accel < 1 ? 1 : accel; 215 accel -= 1; 216 217 htab = (uint16_t *) calloc(hash_size, sizeof(uint16_t)); 154 218 155 219 /* we start with literal copy */ … … 175 239 176 240 /* find potential match */ 177 hval = hash_function(ip, hash_log);241 HASH_FUNCTION(hval, ip, hash_log); 178 242 ref = ibase + htab[hval]; 179 /* update hash table */180 htab[hval] = (uint16_t)(anchor - ibase);181 243 182 244 /* calculate distance to the match */ 183 245 distance = (int32_t)(anchor - ref); 246 247 /* update hash table if necessary */ 248 if ((distance & accel) == 0) 249 htab[hval] = (uint16_t)(anchor - ibase); 184 250 185 251 /* is this a match? check the first 3 bytes */ 186 252 if (distance==0 || (distance >= MAX_FARDISTANCE) || 187 253 *ref++ != *ip++ || *ref++!=*ip++ || *ref++!=*ip++) 188 goto literal;254 LITERAL(ip, op, op_limit, anchor, copy); 189 255 190 256 /* far, needs at least 5-byte match */ 191 if ( distance >= MAX_DISTANCE) {257 if (opt_level >= 5 && distance >= MAX_DISTANCE) { 192 258 if (*ip++ != *ref++ || *ip++ != *ref++) 193 goto literal;259 LITERAL(ip, op, op_limit, anchor, copy); 194 260 len += 2; 195 261 } … … 211 277 /* safe because the outer check against ip limit */ 212 278 while (ip < (ip_bound - (sizeof(int64_t) - IP_BOUNDARY))) { 279 #if !defined(BLOSCLZ_STRICT_ALIGN) 213 280 value2 = ((int64_t *)ref)[0]; 281 #else 282 memcpy(&value2, ref, 8); 283 #endif 214 284 if (value != value2) { 215 285 /* Find the byte that starts to differ */ … … 234 304 /* safe because the outer check against ip limit */ 235 305 while (ip < (ip_bound - (sizeof(int64_t) - IP_BOUNDARY))) { 236 if (*ref++ != *ip++) break; 306 #if !defined(BLOSCLZ_STRICT_ALIGN) 237 307 if (((int64_t *)ref)[0] != ((int64_t *)ip)[0]) { 308 #endif 238 309 /* Find the byte that starts to differ */ 239 310 while (ip < ip_bound) { … … 241 312 } 242 313 break; 243 } 244 else { 245 ip += 8; 246 ref += 8; 247 } 314 #if !defined(BLOSCLZ_STRICT_ALIGN) 315 } else { ip += 8; ref += 8; } 316 #endif 248 317 } 249 318 /* Last correction before exiting loop */ … … 311 380 312 381 /* update the hash at match boundary */ 313 hval = hash_function(ip, hash_log);382 HASH_FUNCTION(hval, ip, hash_log); 314 383 htab[hval] = (uint16_t)(ip++ - ibase); 315 hval = hash_function(ip, hash_log);384 HASH_FUNCTION(hval, ip, hash_log); 316 385 htab[hval] = (uint16_t)(ip++ - ibase); 317 386 318 387 /* assuming literal copy */ 319 388 *op++ = MAX_COPY-1; 320 321 continue;322 323 literal:324 if (BLOSCLZ_UNEXPECT_CONDITIONAL(op+2 > op_limit)) goto out;325 *op++ = *anchor++;326 ip = anchor;327 copy++;328 if(BLOSCLZ_UNEXPECT_CONDITIONAL(copy == MAX_COPY)) {329 copy = 0;330 *op++ = MAX_COPY-1;331 }332 389 } 333 390 … … 362 419 } 363 420 364 365 421 int blosclz_decompress(const void* input, int length, void* output, int maxout) 366 422 { … … 373 429 374 430 do { 375 constuint8_t* ref = op;431 uint8_t* ref = op; 376 432 int32_t len = ctrl >> 5; 377 433 int32_t ofs = (ctrl & 31) << 8; … … 422 478 ref--; 423 479 len += 3; 424 if (abs((int32_t)(ref-op)) <= (int32_t)len) { 425 /* src and dst do overlap: do a loop */ 426 for(; len; --len) 427 *op++ = *ref++; 428 /* The memmove below does not work well (don't know why) */ 429 /* memmove(op, ref, len); 430 op += len; 431 ref += len; 432 len = 0; */ 433 } 434 else { 435 memcpy(op, ref, len); 436 op += len; 437 ref += len; 438 } 480 #if !defined(_WIN32) && ((defined(__GNUC__) || defined(__INTEL_COMPILER) || !defined(__clang__))) 481 GCC_SAFE_COPY(op, ref, len, op_limit); 482 #else 483 SAFE_COPY(op, ref, len, op_limit); 484 #endif 439 485 } 440 486 } … … 450 496 #endif 451 497 452 memcpy(op, ip, ctrl); 453 ip += ctrl; 454 op += ctrl; 498 BLOCK_COPY(op, ip, ctrl, op_limit); 455 499 456 500 loop = (int32_t)BLOSCLZ_EXPECT_CONDITIONAL(ip < ip_limit); -
thirdparty/blosc/blosclz.h
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library2 Blosc - Blocked Shuffling and Compression Library 3 3 4 Author: Francesc Alted <f [email protected]>4 Author: Francesc Alted <f[email protected]> 5 5 6 6 See LICENSES/BLOSC.txt for details about copyright and rights to use. … … 33 33 output buffer. 34 34 35 The acceleration parameter is related with the frequency for 36 updating the internal hash. An acceleration of 1 means that the 37 internal hash is updated at full rate. A value < 1 is not allowed 38 and will be silently set to 1. 39 35 40 The input buffer and the output buffer can not overlap. 36 41 */ 37 42 38 int blosclz_compress( int opt_level, const void* input, int length,39 void* output, int maxout );43 int blosclz_compress(const int opt_level, const void* input, int length, 44 void* output, int maxout, int accel); 40 45 41 46 /** -
thirdparty/blosc/shuffle.c
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library3 4 Author: Francesc Alted <f [email protected]>2 Blosc - Blocked Shuffling and Compression Library 3 4 Author: Francesc Alted <f[email protected]> 5 5 Creation date: 2009-05-20 6 6 … … 8 8 **********************************************************************/ 9 9 10 #include "shuffle.h" 11 #include "shuffle-common.h" 12 #include "shuffle-generic.h" 13 #include "bitshuffle-generic.h" 10 14 #include <stdio.h> 11 15 #include <string.h> 12 #include "shuffle.h" 13 14 #if defined(_WIN32) && !defined(__MINGW32__) 15 #include <windows.h> 16 #include "win32/stdint-windows.h" 17 #define __SSE2__ /* Windows does not define this by default */ 18 #else 19 #include <stdint.h> 20 #include <inttypes.h> 21 #endif /* _WIN32 */ 22 23 24 /* The non-SSE2 versions of shuffle and unshuffle */ 25 26 /* Shuffle a block. This can never fail. */ 27 static void _shuffle(size_t bytesoftype, size_t blocksize, 28 uint8_t* _src, uint8_t* _dest) 29 { 30 size_t i, j, neblock, leftover; 31 32 /* Non-optimized shuffle */ 33 neblock = blocksize / bytesoftype; /* Number of elements in a block */ 34 for (j = 0; j < bytesoftype; j++) { 35 for (i = 0; i < neblock; i++) { 36 _dest[j*neblock+i] = _src[i*bytesoftype+j]; 16 17 /* Visual Studio < 2013 does not have stdbool.h so here it is a replacement: */ 18 #if defined __STDC__ && defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L 19 /* have a C99 compiler */ 20 typedef _Bool bool; 21 #else 22 /* do not have a C99 compiler */ 23 typedef unsigned char bool; 24 #endif 25 static const bool false = 0; 26 static const bool true = 1; 27 28 29 #if !defined(__clang__) && defined(__GNUC__) && defined(__GNUC_MINOR__) && \ 30 __GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8) 31 #define HAVE_CPU_FEAT_INTRIN 32 #endif 33 34 /* Include hardware-accelerated shuffle/unshuffle routines based on 35 the target architecture. Note that a target architecture may support 36 more than one type of acceleration!*/ 37 #if defined(SHUFFLE_AVX2_ENABLED) 38 #include "shuffle-avx2.h" 39 #include "bitshuffle-avx2.h" 40 #endif /* defined(SHUFFLE_AVX2_ENABLED) */ 41 42 #if defined(SHUFFLE_SSE2_ENABLED) 43 #include "shuffle-sse2.h" 44 #include "bitshuffle-sse2.h" 45 #endif /* defined(SHUFFLE_SSE2_ENABLED) */ 46 47 48 /* Define function pointer types for shuffle/unshuffle routines. */ 49 typedef void(*shuffle_func)(const size_t, const size_t, const uint8_t*, const uint8_t*); 50 typedef void(*unshuffle_func)(const size_t, const size_t, const uint8_t*, const uint8_t*); 51 typedef int64_t(*bitshuffle_func)(void*, void*, const size_t, const size_t, void*); 52 typedef int64_t(*bitunshuffle_func)(void*, void*, const size_t, const size_t, void*); 53 54 /* An implementation of shuffle/unshuffle routines. */ 55 typedef struct shuffle_implementation { 56 /* Name of this implementation. */ 57 const char* name; 58 /* Function pointer to the shuffle routine for this implementation. */ 59 shuffle_func shuffle; 60 /* Function pointer to the unshuffle routine for this implementation. */ 61 unshuffle_func unshuffle; 62 /* Function pointer to the bitshuffle routine for this implementation. */ 63 bitshuffle_func bitshuffle; 64 /* Function pointer to the bitunshuffle routine for this implementation. */ 65 bitunshuffle_func bitunshuffle; 66 } shuffle_implementation_t; 67 68 typedef enum { 69 BLOSC_HAVE_NOTHING = 0, 70 BLOSC_HAVE_SSE2 = 1, 71 BLOSC_HAVE_AVX2 = 2 72 } blosc_cpu_features; 73 74 /* Detect hardware and set function pointers to the best shuffle/unshuffle 75 implementations supported by the host processor. */ 76 #if defined(SHUFFLE_AVX2_ENABLED) || defined(SHUFFLE_SSE2_ENABLED) /* Intel/i686 */ 77 78 /* Disabled the __builtin_cpu_supports() call, as it has issues with 79 new versions of gcc (like 5.3.1 in forthcoming ubuntu/xenial: 80 "undefined symbol: __cpu_model" 81 For a similar report, see: 82 https://lists.fedoraproject.org/archives/list/[email protected]/thread/ZM2L65WIZEEQHHLFERZYD5FAG7QY2OGB/ 83 */ 84 #if defined(HAVE_CPU_FEAT_INTRIN) && 0 85 static blosc_cpu_features blosc_get_cpu_features(void) { 86 blosc_cpu_features cpu_features = BLOSC_HAVE_NOTHING; 87 if (__builtin_cpu_supports("sse2")) { 88 cpu_features |= BLOSC_HAVE_SSE2; 89 } 90 if (__builtin_cpu_supports("avx2")) { 91 cpu_features |= BLOSC_HAVE_AVX2; 92 } 93 return cpu_features; 94 } 95 #else 96 97 #if defined(_MSC_VER) && !defined(__clang__) 98 #include <intrin.h> /* Needed for __cpuid */ 99 100 /* _xgetbv is only supported by VS2010 SP1 and newer versions of VS. */ 101 #if _MSC_FULL_VER >= 160040219 102 #include <immintrin.h> /* Needed for _xgetbv */ 103 #elif defined(_M_IX86) 104 105 /* Implement _xgetbv for VS2008 and VS2010 RTM with 32-bit (x86) targets. */ 106 107 static uint64_t _xgetbv(uint32_t xcr) { 108 uint32_t xcr0, xcr1; 109 __asm { 110 mov ecx, xcr 111 _asm _emit 0x0f _asm _emit 0x01 _asm _emit 0xd0 112 mov xcr0, eax 113 mov xcr1, edx 37 114 } 38 } 39 leftover = blocksize % bytesoftype; 40 memcpy(_dest + neblock*bytesoftype, _src + neblock*bytesoftype, leftover); 41 } 42 43 /* Unshuffle a block. This can never fail. */ 44 static void _unshuffle(size_t bytesoftype, size_t blocksize, 45 uint8_t* _src, uint8_t* _dest) 46 { 47 size_t i, j, neblock, leftover; 48 49 /* Non-optimized unshuffle */ 50 neblock = blocksize / bytesoftype; /* Number of elements in a block */ 51 for (i = 0; i < neblock; i++) { 52 for (j = 0; j < bytesoftype; j++) { 53 _dest[i*bytesoftype+j] = _src[j*neblock+i]; 54 } 55 } 56 leftover = blocksize % bytesoftype; 57 memcpy(_dest+neblock*bytesoftype, _src+neblock*bytesoftype, leftover); 58 } 59 60 61 #ifdef __SSE2__ 62 63 /* The SSE2 versions of shuffle and unshuffle */ 64 65 #include <emmintrin.h> 66 67 /* The next is useful for debugging purposes */ 68 #if 0 69 static void printxmm(__m128i xmm0) 70 { 71 uint8_t buf[16]; 72 73 ((__m128i *)buf)[0] = xmm0; 74 printf("%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x\n", 75 buf[0], buf[1], buf[2], buf[3], 76 buf[4], buf[5], buf[6], buf[7], 77 buf[8], buf[9], buf[10], buf[11], 78 buf[12], buf[13], buf[14], buf[15]); 79 } 80 #endif 81 82 83 /* Routine optimized for shuffling a buffer for a type size of 2 bytes. */ 84 static void 85 shuffle2(uint8_t* dest, uint8_t* src, size_t size) 86 { 87 size_t i, j, k; 88 size_t numof16belem; 89 __m128i xmm0[2], xmm1[2]; 90 91 numof16belem = size / (16*2); 92 for (i = 0, j = 0; i < numof16belem; i++, j += 16*2) { 93 /* Fetch and transpose bytes, words and double words in groups of 94 32 bytes */ 95 for (k = 0; k < 2; k++) { 96 xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 97 xmm0[k] = _mm_shufflelo_epi16(xmm0[k], 0xd8); 98 xmm0[k] = _mm_shufflehi_epi16(xmm0[k], 0xd8); 99 xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 100 xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 101 xmm0[k] = _mm_unpacklo_epi8(xmm0[k], xmm1[k]); 102 xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 103 xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 104 xmm0[k] = _mm_unpacklo_epi16(xmm0[k], xmm1[k]); 105 xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 106 } 107 /* Transpose quad words */ 108 for (k = 0; k < 1; k++) { 109 xmm1[k*2] = _mm_unpacklo_epi64(xmm0[k], xmm0[k+1]); 110 xmm1[k*2+1] = _mm_unpackhi_epi64(xmm0[k], xmm0[k+1]); 111 } 112 /* Store the result vectors */ 113 for (k = 0; k < 2; k++) { 114 ((__m128i *)dest)[k*numof16belem+i] = xmm1[k]; 115 } 116 } 117 } 118 119 120 /* Routine optimized for shuffling a buffer for a type size of 4 bytes. */ 121 static void 122 shuffle4(uint8_t* dest, uint8_t* src, size_t size) 123 { 124 size_t i, j, k; 125 size_t numof16belem; 126 __m128i xmm0[4], xmm1[4]; 127 128 numof16belem = size / (16*4); 129 for (i = 0, j = 0; i < numof16belem; i++, j += 16*4) { 130 /* Fetch and transpose bytes and words in groups of 64 bytes */ 131 for (k = 0; k < 4; k++) { 132 xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 133 xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 134 xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0x8d); 135 xmm0[k] = _mm_unpacklo_epi8(xmm1[k], xmm0[k]); 136 xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x04e); 137 xmm0[k] = _mm_unpacklo_epi16(xmm0[k], xmm1[k]); 138 } 139 /* Transpose double words */ 140 for (k = 0; k < 2; k++) { 141 xmm1[k*2] = _mm_unpacklo_epi32(xmm0[k*2], xmm0[k*2+1]); 142 xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[k*2], xmm0[k*2+1]); 143 } 144 /* Transpose quad words */ 145 for (k = 0; k < 2; k++) { 146 xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+2]); 147 xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+2]); 148 } 149 /* Store the result vectors */ 150 for (k = 0; k < 4; k++) { 151 ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 152 } 153 } 154 } 155 156 157 /* Routine optimized for shuffling a buffer for a type size of 8 bytes. */ 158 static void 159 shuffle8(uint8_t* dest, uint8_t* src, size_t size) 160 { 161 size_t i, j, k, l; 162 size_t numof16belem; 163 __m128i xmm0[8], xmm1[8]; 164 165 numof16belem = size / (16*8); 166 for (i = 0, j = 0; i < numof16belem; i++, j += 16*8) { 167 /* Fetch and transpose bytes in groups of 128 bytes */ 168 for (k = 0; k < 8; k++) { 169 xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 170 xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 171 xmm1[k] = _mm_unpacklo_epi8(xmm0[k], xmm1[k]); 172 } 173 /* Transpose words */ 174 for (k = 0, l = 0; k < 4; k++, l +=2) { 175 xmm0[k*2] = _mm_unpacklo_epi16(xmm1[l], xmm1[l+1]); 176 xmm0[k*2+1] = _mm_unpackhi_epi16(xmm1[l], xmm1[l+1]); 177 } 178 /* Transpose double words */ 179 for (k = 0, l = 0; k < 4; k++, l++) { 180 if (k == 2) l += 2; 181 xmm1[k*2] = _mm_unpacklo_epi32(xmm0[l], xmm0[l+2]); 182 xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[l], xmm0[l+2]); 183 } 184 /* Transpose quad words */ 185 for (k = 0; k < 4; k++) { 186 xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+4]); 187 xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+4]); 188 } 189 /* Store the result vectors */ 190 for (k = 0; k < 8; k++) { 191 ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 192 } 193 } 194 } 195 196 197 /* Routine optimized for shuffling a buffer for a type size of 16 bytes. */ 198 static void 199 shuffle16(uint8_t* dest, uint8_t* src, size_t size) 200 { 201 size_t i, j, k, l; 202 size_t numof16belem; 203 __m128i xmm0[16], xmm1[16]; 204 205 numof16belem = size / (16*16); 206 for (i = 0, j = 0; i < numof16belem; i++, j += 16*16) { 207 /* Fetch elements in groups of 256 bytes */ 208 for (k = 0; k < 16; k++) { 209 xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 210 } 211 /* Transpose bytes */ 212 for (k = 0, l = 0; k < 8; k++, l +=2) { 213 xmm1[k*2] = _mm_unpacklo_epi8(xmm0[l], xmm0[l+1]); 214 xmm1[k*2+1] = _mm_unpackhi_epi8(xmm0[l], xmm0[l+1]); 215 } 216 /* Transpose words */ 217 for (k = 0, l = -2; k < 8; k++, l++) { 218 if ((k%2) == 0) l += 2; 219 xmm0[k*2] = _mm_unpacklo_epi16(xmm1[l], xmm1[l+2]); 220 xmm0[k*2+1] = _mm_unpackhi_epi16(xmm1[l], xmm1[l+2]); 221 } 222 /* Transpose double words */ 223 for (k = 0, l = -4; k < 8; k++, l++) { 224 if ((k%4) == 0) l += 4; 225 xmm1[k*2] = _mm_unpacklo_epi32(xmm0[l], xmm0[l+4]); 226 xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[l], xmm0[l+4]); 227 } 228 /* Transpose quad words */ 229 for (k = 0; k < 8; k++) { 230 xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+8]); 231 xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+8]); 232 } 233 /* Store the result vectors */ 234 for (k = 0; k < 16; k++) { 235 ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 236 } 237 } 238 } 239 240 241 /* Shuffle a block. This can never fail. */ 242 void shuffle(size_t bytesoftype, size_t blocksize, 243 uint8_t* _src, uint8_t* _dest) { 244 int unaligned_dest = (int)((uintptr_t)_dest % 16); 245 int power_of_two = (blocksize & (blocksize - 1)) == 0; 246 int too_small = (blocksize < 256); 247 248 if (unaligned_dest || !power_of_two || too_small) { 249 /* _dest buffer is not aligned, not a power of two or is too 250 small. Call the non-sse2 version. */ 251 _shuffle(bytesoftype, blocksize, _src, _dest); 252 return; 253 } 254 255 /* Optimized shuffle */ 256 /* The buffer must be aligned on a 16 bytes boundary, have a power */ 257 /* of 2 size and be larger or equal than 256 bytes. */ 258 if (bytesoftype == 4) { 259 shuffle4(_dest, _src, blocksize); 260 } 261 else if (bytesoftype == 8) { 262 shuffle8(_dest, _src, blocksize); 263 } 264 else if (bytesoftype == 16) { 265 shuffle16(_dest, _src, blocksize); 266 } 267 else if (bytesoftype == 2) { 268 shuffle2(_dest, _src, blocksize); 269 } 270 else { 271 /* Non-optimized shuffle */ 272 _shuffle(bytesoftype, blocksize, _src, _dest); 273 } 274 } 275 276 277 /* Routine optimized for unshuffling a buffer for a type size of 2 bytes. */ 278 static void 279 unshuffle2(uint8_t* dest, uint8_t* orig, size_t size) 280 { 281 size_t i, k; 282 size_t neblock, numof16belem; 283 __m128i xmm1[2], xmm2[2]; 284 285 neblock = size / 2; 286 numof16belem = neblock / 16; 287 for (i = 0, k = 0; i < numof16belem; i++, k += 2) { 288 /* Load the first 32 bytes in 2 XMM registrers */ 289 xmm1[0] = ((__m128i *)orig)[0*numof16belem+i]; 290 xmm1[1] = ((__m128i *)orig)[1*numof16belem+i]; 291 /* Shuffle bytes */ 292 /* Compute the low 32 bytes */ 293 xmm2[0] = _mm_unpacklo_epi8(xmm1[0], xmm1[1]); 294 /* Compute the hi 32 bytes */ 295 xmm2[1] = _mm_unpackhi_epi8(xmm1[0], xmm1[1]); 296 /* Store the result vectors in proper order */ 297 ((__m128i *)dest)[k+0] = xmm2[0]; 298 ((__m128i *)dest)[k+1] = xmm2[1]; 299 } 300 } 301 302 303 /* Routine optimized for unshuffling a buffer for a type size of 4 bytes. */ 304 static void 305 unshuffle4(uint8_t* dest, uint8_t* orig, size_t size) 306 { 307 size_t i, j, k; 308 size_t neblock, numof16belem; 309 __m128i xmm0[4], xmm1[4]; 310 311 neblock = size / 4; 312 numof16belem = neblock / 16; 313 for (i = 0, k = 0; i < numof16belem; i++, k += 4) { 314 /* Load the first 64 bytes in 4 XMM registrers */ 315 for (j = 0; j < 4; j++) { 316 xmm0[j] = ((__m128i *)orig)[j*numof16belem+i]; 317 } 318 /* Shuffle bytes */ 319 for (j = 0; j < 2; j++) { 320 /* Compute the low 32 bytes */ 321 xmm1[j] = _mm_unpacklo_epi8(xmm0[j*2], xmm0[j*2+1]); 322 /* Compute the hi 32 bytes */ 323 xmm1[2+j] = _mm_unpackhi_epi8(xmm0[j*2], xmm0[j*2+1]); 324 } 325 /* Shuffle 2-byte words */ 326 for (j = 0; j < 2; j++) { 327 /* Compute the low 32 bytes */ 328 xmm0[j] = _mm_unpacklo_epi16(xmm1[j*2], xmm1[j*2+1]); 329 /* Compute the hi 32 bytes */ 330 xmm0[2+j] = _mm_unpackhi_epi16(xmm1[j*2], xmm1[j*2+1]); 331 } 332 /* Store the result vectors in proper order */ 333 ((__m128i *)dest)[k+0] = xmm0[0]; 334 ((__m128i *)dest)[k+1] = xmm0[2]; 335 ((__m128i *)dest)[k+2] = xmm0[1]; 336 ((__m128i *)dest)[k+3] = xmm0[3]; 337 } 338 } 339 340 341 /* Routine optimized for unshuffling a buffer for a type size of 8 bytes. */ 342 static void 343 unshuffle8(uint8_t* dest, uint8_t* orig, size_t size) 344 { 345 size_t i, j, k; 346 size_t neblock, numof16belem; 347 __m128i xmm0[8], xmm1[8]; 348 349 neblock = size / 8; 350 numof16belem = neblock / 16; 351 for (i = 0, k = 0; i < numof16belem; i++, k += 8) { 352 /* Load the first 64 bytes in 8 XMM registrers */ 353 for (j = 0; j < 8; j++) { 354 xmm0[j] = ((__m128i *)orig)[j*numof16belem+i]; 355 } 356 /* Shuffle bytes */ 357 for (j = 0; j < 4; j++) { 358 /* Compute the low 32 bytes */ 359 xmm1[j] = _mm_unpacklo_epi8(xmm0[j*2], xmm0[j*2+1]); 360 /* Compute the hi 32 bytes */ 361 xmm1[4+j] = _mm_unpackhi_epi8(xmm0[j*2], xmm0[j*2+1]); 362 } 363 /* Shuffle 2-byte words */ 364 for (j = 0; j < 4; j++) { 365 /* Compute the low 32 bytes */ 366 xmm0[j] = _mm_unpacklo_epi16(xmm1[j*2], xmm1[j*2+1]); 367 /* Compute the hi 32 bytes */ 368 xmm0[4+j] = _mm_unpackhi_epi16(xmm1[j*2], xmm1[j*2+1]); 369 } 370 /* Shuffle 4-byte dwords */ 371 for (j = 0; j < 4; j++) { 372 /* Compute the low 32 bytes */ 373 xmm1[j] = _mm_unpacklo_epi32(xmm0[j*2], xmm0[j*2+1]); 374 /* Compute the hi 32 bytes */ 375 xmm1[4+j] = _mm_unpackhi_epi32(xmm0[j*2], xmm0[j*2+1]); 376 } 377 /* Store the result vectors in proper order */ 378 ((__m128i *)dest)[k+0] = xmm1[0]; 379 ((__m128i *)dest)[k+1] = xmm1[4]; 380 ((__m128i *)dest)[k+2] = xmm1[2]; 381 ((__m128i *)dest)[k+3] = xmm1[6]; 382 ((__m128i *)dest)[k+4] = xmm1[1]; 383 ((__m128i *)dest)[k+5] = xmm1[5]; 384 ((__m128i *)dest)[k+6] = xmm1[3]; 385 ((__m128i *)dest)[k+7] = xmm1[7]; 386 } 387 } 388 389 390 /* Routine optimized for unshuffling a buffer for a type size of 16 bytes. */ 391 static void 392 unshuffle16(uint8_t* dest, uint8_t* orig, size_t size) 393 { 394 size_t i, j, k; 395 size_t neblock, numof16belem; 396 __m128i xmm1[16], xmm2[16]; 397 398 neblock = size / 16; 399 numof16belem = neblock / 16; 400 for (i = 0, k = 0; i < numof16belem; i++, k += 16) { 401 /* Load the first 128 bytes in 16 XMM registrers */ 402 for (j = 0; j < 16; j++) { 403 xmm1[j] = ((__m128i *)orig)[j*numof16belem+i]; 404 } 405 /* Shuffle bytes */ 406 for (j = 0; j < 8; j++) { 407 /* Compute the low 32 bytes */ 408 xmm2[j] = _mm_unpacklo_epi8(xmm1[j*2], xmm1[j*2+1]); 409 /* Compute the hi 32 bytes */ 410 xmm2[8+j] = _mm_unpackhi_epi8(xmm1[j*2], xmm1[j*2+1]); 411 } 412 /* Shuffle 2-byte words */ 413 for (j = 0; j < 8; j++) { 414 /* Compute the low 32 bytes */ 415 xmm1[j] = _mm_unpacklo_epi16(xmm2[j*2], xmm2[j*2+1]); 416 /* Compute the hi 32 bytes */ 417 xmm1[8+j] = _mm_unpackhi_epi16(xmm2[j*2], xmm2[j*2+1]); 418 } 419 /* Shuffle 4-byte dwords */ 420 for (j = 0; j < 8; j++) { 421 /* Compute the low 32 bytes */ 422 xmm2[j] = _mm_unpacklo_epi32(xmm1[j*2], xmm1[j*2+1]); 423 /* Compute the hi 32 bytes */ 424 xmm2[8+j] = _mm_unpackhi_epi32(xmm1[j*2], xmm1[j*2+1]); 425 } 426 /* Shuffle 8-byte qwords */ 427 for (j = 0; j < 8; j++) { 428 /* Compute the low 32 bytes */ 429 xmm1[j] = _mm_unpacklo_epi64(xmm2[j*2], xmm2[j*2+1]); 430 /* Compute the hi 32 bytes */ 431 xmm1[8+j] = _mm_unpackhi_epi64(xmm2[j*2], xmm2[j*2+1]); 432 } 433 /* Store the result vectors in proper order */ 434 ((__m128i *)dest)[k+0] = xmm1[0]; 435 ((__m128i *)dest)[k+1] = xmm1[8]; 436 ((__m128i *)dest)[k+2] = xmm1[4]; 437 ((__m128i *)dest)[k+3] = xmm1[12]; 438 ((__m128i *)dest)[k+4] = xmm1[2]; 439 ((__m128i *)dest)[k+5] = xmm1[10]; 440 ((__m128i *)dest)[k+6] = xmm1[6]; 441 ((__m128i *)dest)[k+7] = xmm1[14]; 442 ((__m128i *)dest)[k+8] = xmm1[1]; 443 ((__m128i *)dest)[k+9] = xmm1[9]; 444 ((__m128i *)dest)[k+10] = xmm1[5]; 445 ((__m128i *)dest)[k+11] = xmm1[13]; 446 ((__m128i *)dest)[k+12] = xmm1[3]; 447 ((__m128i *)dest)[k+13] = xmm1[11]; 448 ((__m128i *)dest)[k+14] = xmm1[7]; 449 ((__m128i *)dest)[k+15] = xmm1[15]; 450 } 451 } 452 453 454 /* Unshuffle a block. This can never fail. */ 455 void unshuffle(size_t bytesoftype, size_t blocksize, 456 uint8_t* _src, uint8_t* _dest) { 457 int unaligned_src = (int)((uintptr_t)_src % 16); 458 int unaligned_dest = (int)((uintptr_t)_dest % 16); 459 int power_of_two = (blocksize & (blocksize - 1)) == 0; 460 int too_small = (blocksize < 256); 461 462 if (unaligned_src || unaligned_dest || !power_of_two || too_small) { 463 /* _src or _dest buffer is not aligned, not a power of two or is 464 too small. Call the non-sse2 version. */ 465 _unshuffle(bytesoftype, blocksize, _src, _dest); 466 return; 467 } 468 469 /* Optimized unshuffle */ 470 /* The buffers must be aligned on a 16 bytes boundary, have a power */ 471 /* of 2 size and be larger or equal than 256 bytes. */ 472 if (bytesoftype == 4) { 473 unshuffle4(_dest, _src, blocksize); 474 } 475 else if (bytesoftype == 8) { 476 unshuffle8(_dest, _src, blocksize); 477 } 478 else if (bytesoftype == 16) { 479 unshuffle16(_dest, _src, blocksize); 480 } 481 else if (bytesoftype == 2) { 482 unshuffle2(_dest, _src, blocksize); 483 } 484 else { 485 /* Non-optimized unshuffle */ 486 _unshuffle(bytesoftype, blocksize, _src, _dest); 487 } 488 } 489 490 #else /* no __SSE2__ available */ 491 492 void shuffle(size_t bytesoftype, size_t blocksize, 493 uint8_t* _src, uint8_t* _dest) { 494 _shuffle(bytesoftype, blocksize, _src, _dest); 495 } 496 497 void unshuffle(size_t bytesoftype, size_t blocksize, 498 uint8_t* _src, uint8_t* _dest) { 499 _unshuffle(bytesoftype, blocksize, _src, _dest); 500 } 501 502 #endif /* __SSE2__ */ 115 return ((uint64_t)xcr1 << 32) | xcr0; 116 } 117 118 #elif defined(_M_X64) 119 120 /* Implement _xgetbv for VS2008 and VS2010 RTM with 64-bit (x64) targets. 121 These compilers don't support any of the newer acceleration ISAs 122 (e.g., AVX2) supported by blosc, and all x64 hardware supports SSE2 123 which means we can get away with returning a hard-coded value from 124 this implementation of _xgetbv. */ 125 126 static inline uint64_t 127 _xgetbv(uint32_t xcr) { 128 /* A 64-bit OS must have XMM save support. */ 129 return xcr == 0 ? (1UL << 1) : 0UL; 130 } 131 132 #else 133 134 /* Hardware detection for any other MSVC targets (e.g., ARM) 135 isn't implemented at this time. */ 136 #error This version of c-blosc only supports x86 and x64 targets with MSVC. 137 138 #endif /* _MSC_FULL_VER >= 160040219 */ 139 140 #else 141 142 /* Implement the __cpuid and __cpuidex intrinsics for GCC, Clang, 143 and others using inline assembly. */ 144 __attribute__((always_inline)) 145 static inline void 146 __cpuidex(int32_t cpuInfo[4], int32_t function_id, int32_t subfunction_id) { 147 __asm__ __volatile__ ( 148 # if defined(__i386__) && defined (__PIC__) 149 /* Can't clobber ebx with PIC running under 32-bit, so it needs to be manually restored. 150 https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family 151 */ 152 "movl %%ebx, %%edi\n\t" 153 "cpuid\n\t" 154 "xchgl %%ebx, %%edi": 155 "=D" (cpuInfo[1]), 156 #else 157 "cpuid": 158 "=b" (cpuInfo[1]), 159 #endif /* defined(__i386) && defined(__PIC__) */ 160 "=a" (cpuInfo[0]), 161 "=c" (cpuInfo[2]), 162 "=d" (cpuInfo[3]) : 163 "a" (function_id), "c" (subfunction_id) 164 ); 165 } 166 167 #define __cpuid(cpuInfo, function_id) __cpuidex(cpuInfo, function_id, 0) 168 169 #define _XCR_XFEATURE_ENABLED_MASK 0 170 171 /* Reads the content of an extended control register. 172 https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family 173 */ 174 static inline uint64_t 175 _xgetbv(uint32_t xcr) { 176 uint32_t eax, edx; 177 __asm__ __volatile__ ( 178 /* "xgetbv" 179 This is specified as raw instruction bytes due to some older compilers 180 having issues with the mnemonic form. 181 */ 182 ".byte 0x0f, 0x01, 0xd0": 183 "=a" (eax), 184 "=d" (edx) : 185 "c" (xcr) 186 ); 187 return ((uint64_t)edx << 32) | eax; 188 } 189 190 #endif /* defined(_MSC_FULL_VER) */ 191 192 #ifndef _XCR_XFEATURE_ENABLED_MASK 193 #define _XCR_XFEATURE_ENABLED_MASK 0x0 194 #endif 195 196 static blosc_cpu_features blosc_get_cpu_features(void) { 197 blosc_cpu_features result = BLOSC_HAVE_NOTHING; 198 int32_t max_basic_function_id; 199 /* Holds the values of eax, ebx, ecx, edx set by the `cpuid` instruction */ 200 int32_t cpu_info[4]; 201 int sse2_available; 202 int sse3_available; 203 int ssse3_available; 204 int sse41_available; 205 int sse42_available; 206 int xsave_available; 207 int xsave_enabled_by_os; 208 int avx2_available = 0; 209 int avx512bw_available = 0; 210 int xmm_state_enabled = 0; 211 int ymm_state_enabled = 0; 212 int zmm_state_enabled = 0; 213 uint64_t xcr0_contents; 214 215 /* Get the number of basic functions available. */ 216 __cpuid(cpu_info, 0); 217 max_basic_function_id = cpu_info[0]; 218 219 /* Check for SSE-based features and required OS support */ 220 __cpuid(cpu_info, 1); 221 sse2_available = (cpu_info[3] & (1 << 26)) != 0; 222 sse3_available = (cpu_info[2] & (1 << 0)) != 0; 223 ssse3_available = (cpu_info[2] & (1 << 9)) != 0; 224 sse41_available = (cpu_info[2] & (1 << 19)) != 0; 225 sse42_available = (cpu_info[2] & (1 << 20)) != 0; 226 227 xsave_available = (cpu_info[2] & (1 << 26)) != 0; 228 xsave_enabled_by_os = (cpu_info[2] & (1 << 27)) != 0; 229 230 /* Check for AVX-based features, if the processor supports extended features. */ 231 if (max_basic_function_id >= 7) { 232 __cpuid(cpu_info, 7); 233 avx2_available = (cpu_info[1] & (1 << 5)) != 0; 234 avx512bw_available = (cpu_info[1] & (1 << 30)) != 0; 235 } 236 237 /* Even if certain features are supported by the CPU, they may not be supported 238 by the OS (in which case using them would crash the process or system). 239 If xsave is available and enabled by the OS, check the contents of the 240 extended control register XCR0 to see if the CPU features are enabled. */ 241 #if defined(_XCR_XFEATURE_ENABLED_MASK) 242 if (xsave_available && xsave_enabled_by_os && ( 243 sse2_available || sse3_available || ssse3_available 244 || sse41_available || sse42_available 245 || avx2_available || avx512bw_available)) { 246 /* Determine which register states can be restored by the OS. */ 247 xcr0_contents = _xgetbv(_XCR_XFEATURE_ENABLED_MASK); 248 249 xmm_state_enabled = (xcr0_contents & (1UL << 1)) != 0; 250 ymm_state_enabled = (xcr0_contents & (1UL << 2)) != 0; 251 252 /* Require support for both the upper 256-bits of zmm0-zmm15 to be 253 restored as well as all of zmm16-zmm31 and the opmask registers. */ 254 zmm_state_enabled = (xcr0_contents & 0x70) == 0x70; 255 } 256 #endif /* defined(_XCR_XFEATURE_ENABLED_MASK) */ 257 258 #if defined(BLOSC_DUMP_CPU_INFO) 259 printf("Shuffle CPU Information:\n"); 260 printf("SSE2 available: %s\n", sse2_available ? "True" : "False"); 261 printf("SSE3 available: %s\n", sse3_available ? "True" : "False"); 262 printf("SSSE3 available: %s\n", ssse3_available ? "True" : "False"); 263 printf("SSE4.1 available: %s\n", sse41_available ? "True" : "False"); 264 printf("SSE4.2 available: %s\n", sse42_available ? "True" : "False"); 265 printf("AVX2 available: %s\n", avx2_available ? "True" : "False"); 266 printf("AVX512BW available: %s\n", avx512bw_available ? "True" : "False"); 267 printf("XSAVE available: %s\n", xsave_available ? "True" : "False"); 268 printf("XSAVE enabled: %s\n", xsave_enabled_by_os ? "True" : "False"); 269 printf("XMM state enabled: %s\n", xmm_state_enabled ? "True" : "False"); 270 printf("YMM state enabled: %s\n", ymm_state_enabled ? "True" : "False"); 271 printf("ZMM state enabled: %s\n", zmm_state_enabled ? "True" : "False"); 272 #endif /* defined(BLOSC_DUMP_CPU_INFO) */ 273 274 /* Using the gathered CPU information, determine which implementation to use. */ 275 /* technically could fail on sse2 cpu on os without xmm support, but that 276 * shouldn't exist anymore */ 277 if (sse2_available) { 278 result |= BLOSC_HAVE_SSE2; 279 } 280 if (xmm_state_enabled && ymm_state_enabled && avx2_available) { 281 result |= BLOSC_HAVE_AVX2; 282 } 283 return result; 284 } 285 #endif 286 287 #else /* No hardware acceleration supported for the target architecture. */ 288 #if defined(_MSC_VER) 289 #pragma message("Hardware-acceleration detection not implemented for the target architecture. Only the generic shuffle/unshuffle routines will be available.") 290 #else 291 #warning Hardware-acceleration detection not implemented for the target architecture. Only the generic shuffle/unshuffle routines will be available. 292 #endif 293 294 static blosc_cpu_features blosc_get_cpu_features(void) { 295 return BLOSC_HAVE_NOTHING; 296 } 297 298 #endif 299 300 static shuffle_implementation_t get_shuffle_implementation() { 301 blosc_cpu_features cpu_features = blosc_get_cpu_features(); 302 shuffle_implementation_t impl_generic; 303 304 #if defined(SHUFFLE_AVX2_ENABLED) 305 if (cpu_features & BLOSC_HAVE_AVX2) { 306 shuffle_implementation_t impl_avx2; 307 impl_avx2.name = "avx2"; 308 impl_avx2.shuffle = (shuffle_func)shuffle_avx2; 309 impl_avx2.unshuffle = (unshuffle_func)unshuffle_avx2; 310 impl_avx2.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_avx2; 311 impl_avx2.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_avx2; 312 return impl_avx2; 313 } 314 #endif /* defined(SHUFFLE_AVX2_ENABLED) */ 315 316 #if defined(SHUFFLE_SSE2_ENABLED) 317 if (cpu_features & BLOSC_HAVE_SSE2) { 318 shuffle_implementation_t impl_sse2; 319 impl_sse2.name = "sse2"; 320 impl_sse2.shuffle = (shuffle_func)shuffle_sse2; 321 impl_sse2.unshuffle = (unshuffle_func)unshuffle_sse2; 322 impl_sse2.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_sse2; 323 impl_sse2.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_sse2; 324 return impl_sse2; 325 } 326 #endif /* defined(SHUFFLE_SSE2_ENABLED) */ 327 328 /* Processor doesn't support any of the hardware-accelerated implementations, 329 so use the generic implementation. */ 330 impl_generic.name = "generic"; 331 impl_generic.shuffle = (shuffle_func)shuffle_generic; 332 impl_generic.unshuffle = (unshuffle_func)unshuffle_generic; 333 impl_generic.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_scal; 334 impl_generic.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_scal; 335 return impl_generic; 336 } 337 338 339 /* Flag indicating whether the implementation has been initialized. 340 Zero means it hasn't been initialized, non-zero means it has. */ 341 static int32_t implementation_initialized; 342 343 /* The dynamically-chosen shuffle/unshuffle implementation. 344 This is only safe to use once `implementation_initialized` is set. */ 345 static shuffle_implementation_t host_implementation; 346 347 /* Initialize the shuffle implementation, if necessary. */ 348 #if defined(__GNUC__) || defined(__clang__) 349 __attribute__((always_inline)) 350 #endif 351 static 352 #if defined(_MSC_VER) 353 __forceinline 354 #else 355 inline 356 #endif 357 void init_shuffle_implementation() { 358 /* Initialization could (in rare cases) take place concurrently on 359 multiple threads, but it shouldn't matter because the 360 initialization should return the same result on each thread (so 361 the implementation will be the same). Since that's the case we 362 can avoid complicated synchronization here and get a small 363 performance benefit because we don't need to perform a volatile 364 load on the initialization variable each time this function is 365 called. */ 366 #if defined(__GNUC__) || defined(__clang__) 367 if (__builtin_expect(!implementation_initialized, 0)) { 368 #else 369 if (!implementation_initialized) { 370 #endif 371 /* Initialize the implementation. */ 372 host_implementation = get_shuffle_implementation(); 373 374 /* Set the flag indicating the implementation has been initialized. */ 375 implementation_initialized = 1; 376 } 377 } 378 379 /* Shuffle a block by dynamically dispatching to the appropriate 380 hardware-accelerated routine at run-time. */ 381 void 382 shuffle(const size_t bytesoftype, const size_t blocksize, 383 const uint8_t* _src, const uint8_t* _dest) { 384 /* Initialize the shuffle implementation if necessary. */ 385 init_shuffle_implementation(); 386 387 /* The implementation is initialized. 388 Dispatch to it's shuffle routine. */ 389 (host_implementation.shuffle)(bytesoftype, blocksize, _src, _dest); 390 } 391 392 /* Unshuffle a block by dynamically dispatching to the appropriate 393 hardware-accelerated routine at run-time. */ 394 void 395 unshuffle(const size_t bytesoftype, const size_t blocksize, 396 const uint8_t* _src, const uint8_t* _dest) { 397 /* Initialize the shuffle implementation if necessary. */ 398 init_shuffle_implementation(); 399 400 /* The implementation is initialized. 401 Dispatch to it's unshuffle routine. */ 402 (host_implementation.unshuffle)(bytesoftype, blocksize, _src, _dest); 403 } 404 405 /* Bit-shuffle a block by dynamically dispatching to the appropriate 406 hardware-accelerated routine at run-time. */ 407 int 408 bitshuffle(const size_t bytesoftype, const size_t blocksize, 409 const uint8_t* const _src, const uint8_t* _dest, 410 const uint8_t* _tmp) { 411 int size = blocksize / bytesoftype; 412 /* Initialize the shuffle implementation if necessary. */ 413 init_shuffle_implementation(); 414 415 if ((size % 8) == 0) 416 /* The number of elems is a multiple of 8 which is supported by 417 bitshuffle. */ 418 return (int)(host_implementation.bitshuffle)((void*)_src, (void*)_dest, 419 blocksize / bytesoftype, 420 bytesoftype, (void*)_tmp); 421 else 422 memcpy((void*)_dest, (void*)_src, blocksize); 423 return size; 424 } 425 426 /* Bit-unshuffle a block by dynamically dispatching to the appropriate 427 hardware-accelerated routine at run-time. */ 428 int 429 bitunshuffle(const size_t bytesoftype, const size_t blocksize, 430 const uint8_t* const _src, const uint8_t* _dest, 431 const uint8_t* _tmp) { 432 int size = blocksize / bytesoftype; 433 /* Initialize the shuffle implementation if necessary. */ 434 init_shuffle_implementation(); 435 436 if ((size % 8) == 0) 437 /* The number of elems is a multiple of 8 which is supported by 438 bitshuffle. */ 439 return (int)(host_implementation.bitunshuffle)((void*)_src, (void*)_dest, 440 blocksize / bytesoftype, 441 bytesoftype, (void*)_tmp); 442 else 443 memcpy((void*)_dest, (void*)_src, blocksize); 444 return size; 445 } -
thirdparty/blosc/shuffle.h
r00587dc r981e22c 1 1 /********************************************************************* 2 Blosc - Blocked S uffling and Compression Library2 Blosc - Blocked Shuffling and Compression Library 3 3 4 Author: Francesc Alted <f [email protected]>4 Author: Francesc Alted <f[email protected]> 5 5 6 6 See LICENSES/BLOSC.txt for details about copyright and rights to use. 7 7 **********************************************************************/ 8 8 9 /* Shuffle/unshuffle routines which dynamically dispatch to hardware- 10 accelerated routines based on the processor's architecture. 11 Consumers should almost always prefer to call these routines instead 12 of directly calling one of the hardware-accelerated routines, since 13 these are cross-platform and future-proof. */ 9 14 10 /* Shuffle/unshuffle routines */ 15 #ifndef SHUFFLE_H 16 #define SHUFFLE_H 11 17 12 void shuffle(size_t bytesoftype, size_t blocksize, 13 unsigned char* _src, unsigned char* _dest); 18 #include "shuffle-common.h" 14 19 15 void unshuffle(size_t bytesoftype, size_t blocksize, 16 unsigned char* _src, unsigned char* _dest); 20 #ifdef __cplusplus 21 extern "C" { 22 #endif 23 24 /** 25 Primary shuffle and bitshuffle routines. 26 This function dynamically dispatches to the appropriate hardware-accelerated 27 routine based on the host processor's architecture. If the host processor is 28 not supported by any of the hardware-accelerated routines, the generic 29 (non-accelerated) implementation is used instead. 30 Consumers should almost always prefer to call this routine instead of directly 31 calling the hardware-accelerated routines because this method is both cross- 32 platform and future-proof. 33 */ 34 BLOSC_NO_EXPORT void 35 shuffle(const size_t bytesoftype, const size_t blocksize, 36 const uint8_t* _src, const uint8_t* _dest); 37 38 BLOSC_NO_EXPORT int 39 bitshuffle(const size_t bytesoftype, const size_t blocksize, 40 const uint8_t* const _src, const uint8_t* _dest, 41 const uint8_t* _tmp); 42 43 /** 44 Primary unshuffle and bitunshuffle routine. 45 This function dynamically dispatches to the appropriate hardware-accelerated 46 routine based on the host processor's architecture. If the host processor is 47 not supported by any of the hardware-accelerated routines, the generic 48 (non-accelerated) implementation is used instead. 49 Consumers should almost always prefer to call this routine instead of directly 50 calling the hardware-accelerated routines because this method is both cross- 51 platform and future-proof. 52 */ 53 BLOSC_NO_EXPORT void 54 unshuffle(const size_t bytesoftype, const size_t blocksize, 55 const uint8_t* _src, const uint8_t* _dest); 56 57 58 BLOSC_NO_EXPORT int 59 bitunshuffle(const size_t bytesoftype, const size_t blocksize, 60 const uint8_t* const _src, const uint8_t* _dest, 61 const uint8_t* _tmp); 62 63 #ifdef __cplusplus 64 } 65 #endif 66 67 #endif /* SHUFFLE_H */
Note: See TracChangeset
for help on using the changeset viewer.