Changeset 981e22c


Ignore:
Timestamp:
08/26/16 19:35:26 (8 years ago)
Author:
Hal Finkel <hfinkel@…>
Branches:
master, pympi
Children:
8ebc79b
Parents:
cda87e9
git-author:
Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
git-committer:
Hal Finkel <hfinkel@…> (08/26/16 19:35:26)
Message:

Upgrade to latest blosc library

blosc git: e394f327ccc78319d90a06af0b88bce07034b8dd

Files:
11 added
14 edited

Legend:

Unmodified
Added
Removed
  • GNUmakefile

    r754e14c r981e22c  
    6464        $(CXX) $(FE_CFLAGS) $(FE_CPPFLAGS) -c -o $@ $< 
    6565 
    66 $(FEDIR)/GenericIOPrint: $(FEDIR)/GenericIOPrint.o $(FEDIR)/GenericIO.o  $(FEDIR)/blosc.o $(FEDIR)/blosclz.o $(FEDIR)/shuffle.o 
     66FE_BLOSC_O := $(FEDIR)/blosc.o $(FEDIR)/blosclz.o $(FEDIR)/shuffle.o $(FEDIR)/bitshuffle-generic.o $(FEDIR)/shuffle-generic.o 
     67 
     68$(FEDIR)/GenericIOPrint: $(FEDIR)/GenericIOPrint.o $(FEDIR)/GenericIO.o $(FE_BLOSC_O) 
    6769        $(CXX) $(FE_CFLAGS) -o $@ $^  
    6870 
    69 $(FEDIR)/GenericIOVerify: $(FEDIR)/GenericIOVerify.o $(FEDIR)/GenericIO.o $(FEDIR)/blosc.o $(FEDIR)/blosclz.o $(FEDIR)/shuffle.o 
     71$(FEDIR)/GenericIOVerify: $(FEDIR)/GenericIOVerify.o $(FEDIR)/GenericIO.o $(FE_BLOSC_O) 
    7072        $(CXX) $(FE_CFLAGS) -o $@ $^  
    7173 
     
    99101        $(MPICXX) $(MPI_CFLAGS) $(MPI_CPPFLAGS) -c -o $@ $< 
    100102 
    101 $(MPIDIR)/GenericIOPrint: $(MPIDIR)/GenericIOPrint.o $(MPIDIR)/GenericIO.o $(MPIDIR)/blosc.o $(MPIDIR)/blosclz.o $(MPIDIR)/shuffle.o 
     103MPI_BLOSC_O := $(MPIDIR)/blosc.o $(MPIDIR)/blosclz.o $(MPIDIR)/shuffle.o $(MPIDIR)/bitshuffle-generic.o $(MPIDIR)/shuffle-generic.o 
     104 
     105$(MPIDIR)/GenericIOPrint: $(MPIDIR)/GenericIOPrint.o $(MPIDIR)/GenericIO.o $(MPI_BLOSC_O) 
    102106        $(MPICXX) $(MPI_CFLAGS) -o $@ $^  
    103107 
    104 $(MPIDIR)/GenericIOVerify: $(MPIDIR)/GenericIOVerify.o $(MPIDIR)/GenericIO.o $(MPIDIR)/blosc.o $(MPIDIR)/blosclz.o $(MPIDIR)/shuffle.o 
     108$(MPIDIR)/GenericIOVerify: $(MPIDIR)/GenericIOVerify.o $(MPIDIR)/GenericIO.o $(MPI_BLOSC_O) 
    105109        $(MPICXX) $(MPI_CFLAGS) -o $@ $^  
    106110 
    107 $(MPIDIR)/GenericIOBenchmarkRead: $(MPIDIR)/GenericIOBenchmarkRead.o $(MPIDIR)/GenericIO.o $(MPIDIR)/blosc.o $(MPIDIR)/blosclz.o $(MPIDIR)/shuffle.o 
     111$(MPIDIR)/GenericIOBenchmarkRead: $(MPIDIR)/GenericIOBenchmarkRead.o $(MPIDIR)/GenericIO.o $(MPI_BLOSC_O) 
    108112        $(MPICXX) $(MPI_CFLAGS) -o $@ $^  
    109113 
    110 $(MPIDIR)/GenericIOBenchmarkWrite: $(MPIDIR)/GenericIOBenchmarkWrite.o $(MPIDIR)/GenericIO.o $(MPIDIR)/blosc.o $(MPIDIR)/blosclz.o $(MPIDIR)/shuffle.o 
     114$(MPIDIR)/GenericIOBenchmarkWrite: $(MPIDIR)/GenericIOBenchmarkWrite.o $(MPIDIR)/GenericIO.o $(MPI_BLOSC_O) 
    111115        $(MPICXX) $(MPI_CFLAGS) -o $@ $^  
    112116 
  • thirdparty/blosc/ANNOUNCE.rst

    r00587dc r981e22c  
    11=============================================================== 
    2  Announcing Blosc 1.2.3 
    3  A blocking, shuffling and lossless compression library 
     2 Announcing c-blosc 1.10.0 
     3 A blocking, shuffling and lossless compression library for C 
    44=============================================================== 
    55 
     
    77============ 
    88 
    9 New `blosc_init()` and `blosc_destroy()` functions have been added so 
    10 that the global lock can be initialized safely. These new functions 
    11 will also allow for other kind of initializations/destructions in the 
    12 future. 
     9This release introduces support for the new Zstd codec. Zstd is meant to 
     10achieve larger compression ratios than Zlib, but with higher speeds. We 
     11are talking about a well-balanced codec that should see a lot of use 
     12among Blosc users. There is a blog about what you can expect of it in: 
    1313 
    14 Existing applications using Blosc do not need to start using the new 
    15 functions right away, as long as they calling `blosc_set_nthreads()` 
    16 previous to anything else.  However, using them is highly recommended. 
    17  
    18 Thanks to Oscar Villellas for the init/destroy suggestion, it is a 
    19 nice idea indeed! 
     14http://blosc.org/blog/zstd-has-just-landed-in-blosc.html 
    2015 
    2116For more info, please see the release notes in: 
    2217 
    23 https://github.com/FrancescAlted/blosc/wiki/Release-notes 
     18https://github.com/Blosc/c-blosc/blob/master/RELEASE_NOTES.rst 
     19 
    2420 
    2521What is it? 
    2622=========== 
    2723 
    28 Blosc (http://www.blosc.org) is a high performance compressor 
     24Blosc (http://www.blosc.org) is a high performance meta-compressor 
    2925optimized for binary data.  It has been designed to transmit data to 
    3026the processor cache faster than the traditional, non-compressed, 
    3127direct memory fetch approach via a memcpy() OS call. 
    3228 
    33 Blosc is the first compressor (that I'm aware of) that is meant not 
    34 only to reduce the size of large datasets on-disk or in-memory, but 
    35 also to accelerate object manipulations that are memory-bound. 
     29Blosc has internal support for different compressors like its internal 
     30BloscLZ, but also LZ4, LZ4HC, Snappy and Zlib.  This way these can 
     31automatically leverage the multithreading and pre-filtering 
     32(shuffling) capabilities that comes with Blosc. 
    3633 
    37 There is also a handy command line for Blosc called Bloscpack 
    38 (https://github.com/esc/bloscpack) that allows you to compress large 
    39 binary datafiles on-disk.  Although the format for Bloscpack has not 
    40 stabilized yet, it allows you to effectively use Blosc from you 
    41 favorite shell. 
    4234 
    4335Download sources 
     
    5042and proceed from there.  The github repository is over here: 
    5143 
    52 https://github.com/FrancescAlted/blosc 
     44https://github.com/Blosc 
    5345 
    5446Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for 
    5547details. 
     48 
    5649 
    5750Mailing list 
     
    6558 
    6659Enjoy Data! 
    67  
    68  
    69 .. Local Variables: 
    70 .. mode: rst 
    71 .. coding: utf-8 
    72 .. fill-column: 70 
    73 .. End: 
  • thirdparty/blosc/LICENSES/BLOSC.txt

    r00587dc r981e22c  
    11Blosc - A blocking, shuffling and lossless compression library 
    22 
    3 Copyright (C) 2009-2012 Francesc Alted <[email protected]> 
    4 Copyright (C) 2013      Francesc Alted <[email protected]> 
     3Copyright (C) 2009-2016 Francesc Alted <[email protected]> 
    54 
    65Permission is hereby granted, free of charge, to any person obtaining a copy 
     
    2120OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
    2221THE SOFTWARE. 
    23  
  • thirdparty/blosc/LICENSES/STDINT.txt

    r00587dc r981e22c  
    1 Copyright (c) 2006-2008 Alexander Chemeris 
     1ISO C9x  compliant stdint.h for Microsoft Visual Studio 
     2Based on ISO/IEC 9899:TC2 Committee draft (May 6, 2005) WG14/N1124 
     3 
     4 Copyright (c) 2006-2013 Alexander Chemeris 
    25 
    36Redistribution and use in source and binary forms, with or without 
     
    1114     documentation and/or other materials provided with the distribution. 
    1215 
    13   3. The name of the author may be used to endorse or promote products 
    14      derived from this software without specific prior written permission. 
     16  3. Neither the name of the product nor the names of its contributors may 
     17     be used to endorse or promote products derived from this software 
     18     without specific prior written permission. 
    1519 
    1620THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED 
  • thirdparty/blosc/README.rst

    r00587dc r981e22c  
    44 
    55:Author: Francesc Alted 
    6 :Contact: f[email protected] 
     6:Contact: f[email protected] 
    77:URL: http://www.blosc.org 
     8:Gitter: |gitter| 
     9:Travis CI: |travis| 
     10:Appveyor: |appveyor| 
     11 
     12.. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg 
     13        :alt: Join the chat at https://gitter.im/Blosc/c-blosc 
     14        :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge 
     15 
     16.. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master 
     17        :target: https://travis-ci.org/Blosc/c-blosc 
     18 
     19.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true 
     20        :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master 
     21 
    822 
    923What is it? 
     
    1832 
    1933It uses the blocking technique (as described in [2]_) to reduce 
    20 activity on the memory bus as much as possible.  In short, this 
     34activity on the memory bus as much as possible. In short, this 
    2135technique works by dividing datasets in blocks that are small enough 
    2236to fit in caches of modern processors and perform compression / 
    2337decompression there.  It also leverages, if available, SIMD 
    24 instructions (SSE2) and multi-threading capabilities of CPUs, in order 
    25 to accelerate the compression / decompression process to a maximum. 
    26  
    27 You can see some recent benchmarks about Blosc performance in [3]_ 
     38instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in 
     39order to accelerate the compression / decompression process to a 
     40maximum. 
     41 
     42Blosc is actually a metacompressor, that meaning that it can use a range 
     43of compression libraries for performing the actual 
     44compression/decompression. Right now, it comes with integrated support 
     45for BloscLZ (the original one), LZ4, LZ4HC, Snappy, Zlib and Zstd. Blosc 
     46comes with full sources for all compressors, so in case it does not find 
     47the libraries installed in your system, it will compile from the 
     48included sources and they will be integrated into the Blosc library 
     49anyway. That means that you can trust in having all supported 
     50compressors integrated in Blosc in all supported platforms. 
     51 
     52You can see some benchmarks about Blosc performance in [3]_ 
    2853 
    2954Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for 
     
    3257.. [1] http://www.blosc.org 
    3358.. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf 
    34 .. [3] http://blosc.org/trac/wiki/SyntheticBenchmarks 
     59.. [3] http://blosc.org/synthetic-benchmarks.html 
    3560 
    3661Meta-compression and other advantages over existing compressors 
    3762=============================================================== 
    3863 
    39 Blosc is not like other compressors: it should rather be called a 
     64C-Blosc is not like other compressors: it should rather be called a 
    4065meta-compressor.  This is so because it can use different compressors 
    41 and pre-conditioners (programs that generally improve compression 
    42 ratio).  At any rate, it can also be called a compressor because it 
    43 happens that it already integrates one compressor and one 
    44 pre-conditioner, so it can actually work like so. 
    45  
    46 Currently it uses BloscLZ, a compressor heavily based on FastLZ 
    47 (http://fastlz.org/), and a highly optimized (it can use SSE2 
    48 instructions, if available) Shuffle pre-conditioner. However, 
    49 different compressors or pre-conditioners may be added in the future. 
    50  
    51 Blosc is in charge of coordinating the compressor and pre-conditioners 
    52 so that they can leverage the blocking technique (described above) as 
    53 well as multi-threaded execution (if several cores are available) 
    54 automatically. That makes that every compressor and pre-conditioner 
     66and filters (programs that generally improve compression ratio).  At 
     67any rate, it can also be called a compressor because it happens that 
     68it already comes with several compressor and filters, so it can 
     69actually work like so. 
     70 
     71Currently C-Blosc comes with support of BloscLZ, a compressor heavily 
     72based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC 
     73(https://github.com/Cyan4973/lz4), Snappy 
     74(https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as 
     75well as a highly optimized (it can use SSE2 or AVX2 instructions, if 
     76available) shuffle and bitshuffle filters (for info on how and why 
     77shuffling works, see slide 17 of 
     78http://www.slideshare.net/PyData/blosc-py-data-2014).  However, 
     79different compressors or filters may be added in the future. 
     80 
     81C-Blosc is in charge of coordinating the different compressor and 
     82filters so that they can leverage the blocking technique (described 
     83above) as well as multi-threaded execution (if several cores are 
     84available) automatically. That makes that every compressor and filter 
    5585will work at very high speeds, even if it was not initially designed 
    5686for doing blocking or multi-threading. 
     
    6090* Meant for binary data: can take advantage of the type size 
    6191  meta-information for improved compression ratio (using the 
    62   integrated shuffle pre-conditioner). 
    63  
    64 * Small overhead on non-compressible data: only a maximum of 16 
    65   additional bytes over the source buffer length are needed to 
    66   compress *every* input. 
    67  
    68 * Maximum destination length: contrarily to many other 
    69   compressors, both compression and decompression routines have 
    70   support for maximum size lengths for the destination buffer. 
    71  
    72 * Replacement for memcpy(): it supports a 0 compression level that 
    73   does not compress at all and only adds 16 bytes of overhead. In 
    74   this mode Blosc can copy memory usually faster than a plain 
    75   memcpy(). 
     92  integrated shuffle and bitshuffle filters). 
     93 
     94* Small overhead on non-compressible data: only a maximum of (16 + 4 * 
     95  nthreads) additional bytes over the source buffer length are needed 
     96  to compress *any kind of input*. 
     97 
     98* Maximum destination length: contrarily to many other compressors, 
     99  both compression and decompression routines have support for maximum 
     100  size lengths for the destination buffer. 
    76101 
    77102When taken together, all these features set Blosc apart from other 
    78103similar solutions. 
    79104 
    80 Compiling your application with Blosc 
    81 ===================================== 
    82  
    83 Blosc consists of the next files (in blosc/ directory):: 
    84  
    85     blosc.h and blosc.c      -- the main routines 
    86     blosclz.h and blosclz.c  -- the actual compressor 
    87     shuffle.h and shuffle.c  -- the shuffle code 
     105Compiling your application with a minimalistic Blosc 
     106==================================================== 
     107 
     108The minimal Blosc consists of the next files (in `blosc/ directory 
     109<https://github.com/Blosc/c-blosc/tree/master/blosc>`_):: 
     110 
     111    blosc.h and blosc.c        -- the main routines 
     112    shuffle*.h and shuffle*.c  -- the shuffle code 
     113    blosclz.h and blosclz.c    -- the blosclz compressor 
    88114 
    89115Just add these files to your project in order to use Blosc.  For 
    90 information on compression and decompression routines, see blosc.h. 
    91  
    92 To compile using GCC (4.4 or higher recommended) on Unix: 
    93  
    94 .. code-block:: console 
    95  
    96    $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -lpthread 
     116information on compression and decompression routines, see `blosc.h 
     117<https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. 
     118 
     119To compile using GCC (4.9 or higher recommended) on Unix: 
     120 
     121.. code-block:: console 
     122 
     123   $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread 
    97124 
    98125Using Windows and MINGW: 
     
    100127.. code-block:: console 
    101128 
    102    $ gcc -O3 -msse2 -o myprog myprog.c blosc\*.c 
    103  
    104 Using Windows and MSVC (2008 or higher recommended): 
    105  
    106 .. code-block:: console 
    107  
    108   $ cl /Ox /Femyprog.exe myprog.c blosc\*.c 
    109  
    110 A simple usage example is the benchmark in the bench/bench.c file. 
    111 Also, another example for using Blosc as a generic HDF5 filter is in 
    112 the hdf5/ directory. 
    113  
    114 I have not tried to compile this with compilers other than GCC, MINGW, 
    115 Intel ICC or MSVC yet. Please report your experiences with your own 
    116 platforms. 
    117  
    118 Testing Blosc 
    119 ============= 
    120  
    121 Go to the test/ directory and issue: 
    122  
    123 .. code-block:: console 
    124  
    125   $ make test 
    126  
    127 These tests are very basic, and only valid for platforms where GNU 
    128 make/gcc tools are available.  If you really want to test Blosc the 
    129 hard way, look at: 
    130  
    131 http://blosc.org/trac/wiki/SyntheticBenchmarks 
    132  
    133 where instructions on how to intensively test (and benchmark) Blosc 
    134 are given.  If while running these tests you get some error, please 
    135 report it back! 
     129   $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c 
     130 
     131Using Windows and MSVC (2013 or higher recommended): 
     132 
     133.. code-block:: console 
     134 
     135  $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c 
     136 
     137In the `examples/ directory 
     138<https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find 
     139more hints on how to link your app with Blosc. 
     140 
     141I have not tried to compile this with compilers other than GCC, clang, 
     142MINGW, Intel ICC or MSVC yet. Please report your experiences with your 
     143own platforms. 
     144 
     145Adding support for other compressors with a minimalistic Blosc 
     146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
     147 
     148The official cmake files (see below) for Blosc try hard to include 
     149support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so 
     150using them is just a matter of calling the appropriate 
     151`blosc_set_compressor() API call 
     152<https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_.  See 
     153an `example here 
     154<https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 
     155 
     156Having said this, it is also easy to use a minimalistic Blosc and just 
     157add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC), 
     158HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the 
     159appropriate libraries. For example, for compiling with minimalistic 
     160Blosc but with added Zlib support do: 
     161 
     162.. code-block:: console 
     163 
     164   $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz 
     165 
     166In the `bench/ directory 
     167<https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple 
     168of Makefile files (one for UNIX and the other for MinGW) with more 
     169complete building examples, like switching between libraries or 
     170internal sources for the compressors. 
     171 
     172Supported platforms 
     173~~~~~~~~~~~~~~~~~~~ 
     174 
     175Blosc is meant to support all platforms where a C89 compliant C 
     176compiler can be found.  The ones that are mostly tested are Intel 
     177(Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM 
     178Blue Gene Q embedded "A2" processor are reported to work too. 
    136179 
    137180Compiling the Blosc library with CMake 
    138181====================================== 
    139182 
    140 Blosc can also be built, tested and installed using CMake_. 
     183Blosc can also be built, tested and installed using CMake_. Although 
     184this procedure might seem a bit more involved than the one described 
     185above, it is the most general because it allows to integrate other 
     186compressors than BloscLZ either from libraries or from internal 
     187sources. Hence, serious library developers are encouraged to use this 
     188way. 
     189 
    141190The following procedure describes the "out of source" build. 
    142191 
     
    148197  $ cd build 
    149198 
    150 Configure Blosc in release mode (enable optimizations) specifying the 
    151 installation directory: 
    152  
    153 .. code-block:: console 
    154  
    155   $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=INSTALL_DIR \ 
    156       PATH_TO_BLOSC_SOURCE_DIR 
    157  
    158 Please note that configuration can also be performed using UI tools 
    159 provided by CMake_ (ccmake or cmake-gui): 
    160  
    161 .. code-block:: console 
    162  
    163   $ cmake-gui PATH_TO_BLOSC_SOURCE_DIR 
     199Now run CMake configuration and optionally specify the installation 
     200directory (e.g. '/usr' or '/usr/local'): 
     201 
     202.. code-block:: console 
     203 
     204  $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory .. 
     205 
     206CMake allows to configure Blosc in many different ways, like prefering 
     207internal or external sources for compressors or enabling/disabling 
     208them.  Please note that configuration can also be performed using UI 
     209tools provided by CMake_ (ccmake or cmake-gui): 
     210 
     211.. code-block:: console 
     212 
     213  $ ccmake ..      # run a curses-based interface 
     214  $ cmake-gui ..   # run a graphical interface 
    164215 
    165216Build, test and install Blosc: 
     
    167218.. code-block:: console 
    168219 
    169   $ make 
    170   $ make test 
    171   $ make install  
     220  $ cmake --build . 
     221  $ ctest 
     222  $ cmake --build . --target install 
    172223 
    173224The static and dynamic version of the Blosc library, together with 
    174 header files, will be installed into the specified INSTALL_DIR. 
     225header files, will be installed into the specified 
     226CMAKE_INSTALL_PREFIX. 
    175227 
    176228.. _CMake: http://www.cmake.org 
     229 
     230Once you have compiled your Blosc library, you can easily link your 
     231apps with it as shown in the `example/ directory 
     232<https://github.com/Blosc/c-blosc/blob/master/examples>`_. 
     233 
     234Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake 
     235~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
     236 
     237The CMake files in Blosc are configured to automatically detect other 
     238compressors like LZ4, LZ4HC, Snappy or Zlib by default.  So as long as 
     239the libraries and the header files for these libraries are accessible, 
     240these will be used by default.  See an `example here 
     241<https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_. 
     242 
     243*Note on Zlib*: the library should be easily found on UNIX systems, 
     244although on Windows, you can help CMake to find it by setting the 
     245environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib' 
     246directories are. Also, make sure that Zlib DDL library is in your 
     247'\Windows' directory. 
     248 
     249However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been 
     250included in Blosc too. So, in general, you should not worry about not 
     251having (or CMake not finding) the libraries in your system because in 
     252this case, their sources will be automatically compiled for you. That 
     253effectively means that you can be confident in having a complete 
     254support for all the supported compression libraries in all supported 
     255platforms. 
     256 
     257If you want to force Blosc to use external libraries instead of 
     258the included compression sources: 
     259 
     260.. code-block:: console 
     261 
     262  $ cmake -DPREFER_EXTERNAL_LZ4=ON .. 
     263 
     264You can also disable support for some compression libraries: 
     265 
     266.. code-block:: console 
     267 
     268  $ cmake -DDEACTIVATE_SNAPPY=ON .. 
     269 
     270Mac OSX troubleshooting 
     271~~~~~~~~~~~~~~~~~~~~~~~ 
     272 
     273If you run into compilation troubles when using Mac OSX, please make 
     274sure that you have installed the command line developer tools.  You 
     275can always install them with: 
     276 
     277.. code-block:: console 
     278 
     279  $ xcode-select --install 
    177280 
    178281Wrapper for Python 
     
    181284Blosc has an official wrapper for Python.  See: 
    182285 
    183 https://github.com/FrancescAlted/python-blosc 
     286https://github.com/Blosc/python-blosc 
     287 
     288Command line interface and serialization format for Blosc 
     289========================================================= 
     290 
     291Blosc can be used from command line by using Bloscpack.  See: 
     292 
     293https://github.com/Blosc/bloscpack 
    184294 
    185295Filter for HDF5 
    186296=============== 
    187297 
    188 For those that want to use Blosc as a filter in the HDF5 library, 
    189 there is a sample implementation in the hdf5/ directory. 
     298For those who want to use Blosc as a filter in the HDF5 library, 
     299there is a sample implementation in the blosc/hdf5 project in: 
     300 
     301https://github.com/Blosc/hdf5 
    190302 
    191303Mailing list 
     
    200312=============== 
    201313 
    202 I'd like to thank the PyTables community that have collaborated in the 
    203 exhaustive testing of Blosc.  With an aggregate amount of more than 300 TB of 
    204 different datasets compressed *and* decompressed successfully, I can say that 
    205 Blosc is pretty safe now and ready for production purposes. 
    206  
    207 Other important contributions: 
    208  
    209 * Thibault North contributed a way to call Blosc from different threads in a 
    210   safe way. 
    211  
    212 * The cmake support was a contribution of Thibault North, Antonio Valentino 
    213   and Mark Wiebe. 
    214  
    215 * Valentin Haenel did a terrific work fixing typos and improving docs and the 
    216   plotting script. 
     314See THANKS.rst. 
    217315 
    218316 
  • thirdparty/blosc/README_HEADER.rst

    r00587dc r981e22c  
    2121    (``uint8``) Blosc format version. 
    2222:versionlz: 
    23     (``uint8``) Blosclz format  version (internal Lempel-Ziv algorithm). 
    24 :flags: 
    25     (``bitfield``) The flags of the buffer. 
     23    (``uint8``) Version of the internal compressor used. 
     24:flags and compressor enumeration: 
     25    (``bitfield``) The flags of the buffer 
    2626 
    2727    :bit 0 (``0x01``): 
    28         Whether the shuffle filter has been applied or not. 
     28        Whether the byte-shuffle filter has been applied or not. 
    2929    :bit 1 (``0x02``): 
    3030        Whether the internal buffer is a pure memcpy or not. 
     31    :bit 2 (``0x04``): 
     32        Whether the bit-shuffle filter has been applied or not. 
     33    :bit 3 (``0x08``): 
     34        Reserved 
     35    :bit 4 (``0x16``): 
     36        Reserved 
     37    :bit 5 (``0x32``): 
     38        Part of the enumeration for compressors. 
     39    :bit 6 (``0x64``): 
     40        Part of the enumeration for compressors. 
     41    :bit 7 (``0x64``): 
     42        Part of the enumeration for compressors. 
     43 
     44    The last three bits form an enumeration that allows to use alternative 
     45    compressors. 
     46 
     47    :``0``: 
     48        ``blosclz`` 
     49    :``1``: 
     50        ``lz4`` or ``lz4hc`` 
     51    :``2``: 
     52        ``snappy`` 
     53    :``3``: 
     54        ``zlib`` 
     55    :``4``: 
     56        ``zstd`` 
    3157 
    3258:typesize: 
     
    3864:ctbytes: 
    3965    (``uint32``) Compressed size of the buffer. 
    40  
  • thirdparty/blosc/RELEASE_NOTES.rst

    r00587dc r981e22c  
    1 =============================== 
    2  Release notes for Blosc 1.2.3 
    3 =============================== 
     1=========================== 
     2 Release notes for C-Blosc 
     3=========================== 
    44 
    55:Author: Francesc Alted 
    6 :Contact: f[email protected] 
     6:Contact: f[email protected] 
    77:URL: http://www.blosc.org 
     8 
     9 
     10Changes from 1.10.0 to 1.10.1 
     11============================= 
     12 
     13 #XXX version-specific blurb XXX# 
     14 
     15 
     16Changes from 1.9.3 to 1.10.0 
     17============================ 
     18 
     19- Initial support for Zstandard (0.7.4). Zstandard (or Zstd for short) is a new 
     20  compression library that allows better compression than Zlib, but that works 
     21  typically faster (and some times much faster), making of it a good match for 
     22  Blosc. 
     23 
     24  Although the Zstd format is considered stable 
     25  (http://fastcompression.blogspot.com.es/2016_07_03_archive.html), its API is 
     26  maturing very fast, and despite passing the extreme test suite for C-Blosc, 
     27  this codec should be considered in beta for C-Blosc usage purposes. Please 
     28  test it and report back any possible issues you may get. 
     29 
     30 
     31Changes from 1.9.2 to 1.9.3 
     32=========================== 
     33 
     34- Reverted a mistake introduced in 1.7.1.  At that time, bit-shuffling 
     35  was enabled for typesize == 1 (i.e. strings), but the change also 
     36  included byte-shuffling accidentally.  This only affected performance, 
     37  but in a quite bad way (a copy was needed).  This has been fixed and 
     38  byte-shuffling is not active when typesize == 1 anymore. 
     39 
     40 
     41Changes from 1.9.1 to 1.9.2 
     42=========================== 
     43 
     44- Check whether Blosc is actually initialized before blosc_init(), 
     45  blosc_destroy() and blosc_free_resources().  This makes the library 
     46  more resistant to different initialization cycles 
     47  (e.g. https://github.com/stevengj/Blosc.jl/issues/19). 
     48 
     49 
     50Changes from 1.9.0 to 1.9.1 
     51=========================== 
     52 
     53- The internal copies when clevel=0 are made now via memcpy().  At the 
     54  beginning of C-Blosc development, benchmarks where saying that the 
     55  internal, multi-threaded copies inside C-Blosc were faster than 
     56  memcpy(), but 6 years later, memcpy() made greats strides in terms 
     57  of efficiency.  With this, you should expect an slight speed 
     58  advantage (10% ~ 20%) when C-Blosc is used as a replacement of 
     59  memcpy() (which should not be the most common scenario out there). 
     60 
     61- Added a new DEACTIVATE_AVX2 cmake option to explicitly disable AVX2 
     62  at build-time.  Thanks to James Bird. 
     63 
     64- The ``make -jN`` for parallel compilation should work now.  Thanks 
     65  to James Bird. 
     66 
     67 
     68Changes from 1.8.1 to 1.9.0 
     69=========================== 
     70 
     71* New blosc_get_nthreads() function to get the number of threads that 
     72  will be used internally during compression/decompression (set by 
     73  already existing blosc_set_nthreads()). 
     74 
     75* New blosc_get_compressor() function to get the compressor that will 
     76  be used internally during compression (set by already existing 
     77  blosc_set_compressor()). 
     78 
     79* New blosc_get_blocksize() function to get the internal blocksize to 
     80  be used during compression (set by already existing 
     81  blosc_set_blocksize()). 
     82 
     83* Now, when the BLOSC_NOLOCK environment variable is set (to any 
     84  value), the calls to blosc_compress() and blosc_decompress() will 
     85  call blosc_compress_ctx() and blosc_decompress_ctx() under the hood 
     86  so as to avoid the internal locks.  See blosc.h for details.  This 
     87  allows multi-threaded apps calling the non _ctx() functions to avoid 
     88  the internal locks in C-Blosc.  For the not multi-threaded app 
     89  though, it is in general slower to call the _ctx() functions so the 
     90  use of BLOSC_NOLOCK is discouraged. 
     91 
     92* In the same vein, from now on, when the BLOSC_NTHREADS environment 
     93  variable is set to an integer, every call to blosc_compress() and 
     94  blosc_decompress() will call blosc_set_nthreads(BLOSC_NTHREADS) 
     95  before the actuall compression/decompression process.  See blosc.h 
     96  for details. 
     97 
     98* Finally, if BLOSC_CLEVEL, BLOSC_SHUFFLE, BLOSC_TYPESIZE and/or 
     99  BLOSC_COMPRESSOR variables are set in the environment, these will be 
     100  also honored before calling blosc_compress(). 
     101 
     102* Calling blosc_init() before any other Blosc call, although 
     103  recommended, is not necessary anymore.  The idea is that you can use 
     104  just the basic blosc_compress() and blosc_decompress() and control 
     105  other parameters (nthreads, compressor, blocksize) by using 
     106  environment variables (see above). 
     107 
     108 
     109Changes from 1.8.0 to 1.8.1 
     110=========================== 
     111 
     112* Disable the use of __builtin_cpu_supports() for GCC 5.3.1 
     113  compatibility.  Details in: 
     114  https://lists.fedoraproject.org/archives/list/[email protected]/thread/ZM2L65WIZEEQHHLFERZYD5FAG7QY2OGB/ 
     115 
     116 
     117Changes from 1.7.1 to 1.8.0 
     118=========================== 
     119 
     120* The code is (again) compatible with VS2008 and VS2010.  This is 
     121  important for compatibility with Python 2.6/2.7/3.3/3.4. 
     122 
     123* Introduced a new global lock during blosc_decompress() operation. 
     124  As the blosc_compress() was already guarded by a global lock, this 
     125  means that the compression/decompression is again thread safe. 
     126  However, when using C-Blosc from multi-threaded environments, it is 
     127  important to keep using the *_ctx() functions for performance 
     128  reasons.  NOTE: _ctx() functions will be replaced by more powerful 
     129  ones in C-Blosc 2.0. 
     130 
     131 
     132Changes from 1.7.0 to 1.7.1 
     133=========================== 
     134 
     135* Fixed a bug preventing bitshuffle to work correctly on getitem(). 
     136  Now, everything with bitshuffle seems to work correctly. 
     137 
     138* Fixed the thread initialization for blosc_decompress_ctx().  Issue 
     139  #158.  Thanks to Chris Webers. 
     140 
     141* Fixed a bug in the blocksize computation introduced in 1.7.0.  This 
     142  could have been creating segfaults. 
     143 
     144* Allow bitshuffle to run on 1-byte typesizes. 
     145 
     146* New parametrization of the blocksize to be independent of the 
     147  typesize.  This allows a smoother speed throughout all typesizes. 
     148 
     149* lz4 and lz4hc codecs upgraded to 1.7.2 (from 1.7.0). 
     150 
     151* When calling set_nthreads() but not actually changing the number of 
     152  threads in the internal pool does not teardown and setup it anymore. 
     153  PR #153.  Thanks to Santi Villalba. 
     154 
     155 
     156Changes from 1.6.1 to 1.7.0 
     157=========================== 
     158 
     159* Added a new 'bitshuffle' filter so that the shuffle takes place at a 
     160  bit level and not just at a byte one, which is what it does the 
     161  previous 'shuffle' filter. 
     162 
     163  For activating this new bit-level filter you only have to pass the 
     164  symbol BLOSC_BITSHUFFLE to `blosc_compress()`.  For the previous 
     165  byte-level one, pass BLOSC_SHUFFLE.  For disabling the shuffle, pass 
     166  BLOSC_NOSHUFFLE. 
     167 
     168  This is a port of the existing filter in 
     169  https://github.com/kiyo-masui/bitshuffle.  Thanks to Kiyo Masui for 
     170  changing the license and allowing its inclusion here. 
     171 
     172* New acceleration mode for LZ4 and BloscLZ codecs that enters in 
     173  operation with complevel < 9.  This allows for an important boost in 
     174  speed with minimal compression ratio loss.  Francesc Alted. 
     175 
     176* LZ4 codec updated to 1.7.0 (r130). 
     177 
     178* PREFER_EXTERNAL_COMPLIBS cmake option has been removed and replaced 
     179  by the more fine grained PREFER_EXTERNAL_LZ4, PREFER_EXTERNAL_SNAPPY 
     180  and PREFER_EXTERNAL_ZLIB.  In order to allow the use of the new API 
     181  introduced in LZ4 1.7.0, PREFER_EXTERNAL_LZ4 has been set to OFF by 
     182  default, whereas PREFER_EXTERNAL_SNAPPY and PREFER_EXTERNAL_ZLIB 
     183  continues to be ON. 
     184 
     185* Implemented SSE2 shuffle support for buffers containing a number of 
     186  elements which is not a multiple of (typesize * vectorsize).  Jack 
     187  Pappas. 
     188 
     189* Added SSE2 shuffle/unshuffle routines for types larger than 16 
     190  bytes.  Jack Pappas. 
     191 
     192* 'test_basic' suite has been split in components for a much better 
     193  granularity on what's a possibly failing test.  Also, lots of new 
     194  tests have been added.  Jack Pappas. 
     195 
     196* Fixed compilation on non-Intel archs (tested on ARM).  Zbyszek 
     197  Szmek. 
     198 
     199* Modifyied cmake files in order to inform that AVX2 on Visual Studio 
     200  is supported only in 2013 update 2 and higher. 
     201 
     202* Added a replacement for stdbool.h for Visual Studio < 2013. 
     203 
     204* blosclz codec adds Win64/Intel as a platform supporting unaligned 
     205  addressing.  That leads to a speed-up of 2.2x in decompression. 
     206 
     207* New blosc_get_version_string() function for retrieving the version 
     208  of the c-blosc library.  Useful when linking with dynamic libraries 
     209  and one want to know its version. 
     210 
     211* New example (win-dynamic-linking.c) that shows how to link a Blosc 
     212  DLL dynamically in run-time (Windows only). 
     213 
     214* The `context.threads_started` is initialized now when decompressing. 
     215  This could cause crashes in case you decompressed before compressing 
     216  (e.g. directly deserializing blosc buffers).  @atchouprakov. 
     217 
     218* The HDF5 filter has been removed from c-blosc and moved into its own 
     219  repo at: https://github.com/Blosc/hdf5 
     220 
     221* The MS Visual Studio 2008 has been tested with c-blosc for ensuring 
     222  compatibility with extensions for Python 2.6 and up. 
     223 
     224 
     225Changes from 1.6.0 to 1.6.1 
     226=========================== 
     227 
     228* Support for *runtime* detection of AVX2 and SSE2 SIMD instructions. 
     229  These changes make it possible to compile one single binary that 
     230  runs on a system that supports SSE2 or AVX2 (or neither), so the 
     231  redistribution problem is fixed (see #101).  Thanks to Julian Taylor 
     232  and Jack Pappas. 
     233 
     234* Added support for MinGW and TDM-GCC compilers for Windows.  Thanks 
     235  to yasushima-gd. 
     236 
     237* Fixed a bug in blosclz that could potentially overwrite an area 
     238  beyond the output buffer.  See #113. 
     239 
     240* New computation for blocksize so that larger typesizes (> 8 bytes) 
     241  would benefit of much better compression ratios.  Speed is not 
     242  penalized too much. 
     243 
     244* New parametrization of the hash table for blosclz codec.  This 
     245  allows better compression in many scenarios, while slightly 
     246  increasing the speed. 
     247 
     248 
     249Changes from 1.5.4 to 1.6.0 
     250=========================== 
     251 
     252* Support for AVX2 is here!  The benchmarks with a 4-core Intel 
     253  Haswell machine tell that both compression and decompression are 
     254  accelerated around a 10%, reaching peaks of 9.6 GB/s during 
     255  compression and 26 GB/s during decompression (memcpy() speed for 
     256  this machine is 7.5 GB/s for writes and 11.7 GB/s for reads).  Many 
     257  thanks to @littlezhou for this nice work. 
     258 
     259* Support for HPET (high precision timers) for the `bench` program. 
     260  This is particularly important for microbenchmarks like bench is 
     261  doing; since they take so little time to run, the granularity of a 
     262  less-accurate timer may account for a significant portion of the 
     263  runtime of the benchmark itself, skewing the results.  Thanks to 
     264  Jack Pappas. 
     265 
     266 
     267Changes from 1.5.3 to 1.5.4 
     268=========================== 
     269 
     270* Updated to LZ4 1.6.0 (r128). 
     271 
     272* Fix resource leak in t_blosc.  Jack Pappas. 
     273 
     274* Better checks during testing.  Jack Pappas. 
     275 
     276* Dynamically loadable HDF5 filter plugin. Kiyo Masui. 
     277 
     278 
     279Changes from 1.5.2 to 1.5.3 
     280=========================== 
     281 
     282* Use llabs function (where available) instead of abs to avoid 
     283  truncating the result.  Jack Pappas. 
     284 
     285* Use C11 aligned_alloc when it's available.  Jack Pappas. 
     286 
     287* Use the built-in stdint.h with MSVC when available.  Jack Pappas. 
     288 
     289* Only define the __SSE2__ symbol when compiling with MS Visual C++ 
     290  and targeting x64 or x86 with the correct /arch flag set. This 
     291  avoids re-defining the symbol which makes other compilers issue 
     292  warnings.  Jack Pappas. 
     293 
     294* Reinitializing Blosc during a call to set_nthreads() so as to fix 
     295  problems with contexts.  Francesc Alted. 
     296 
     297 
     298 
     299Changes from 1.5.1 to 1.5.2 
     300=========================== 
     301 
     302* Using blosc_compress_ctx() / blosc_decompress_ctx() inside the HDF5 
     303  compressor for allowing operation in multiprocess scenarios.  See: 
     304  https://github.com/PyTables/PyTables/issues/412 
     305 
     306  The drawback of this quick fix is that the Blosc filter will be only 
     307  able to use a single thread until another solution can be devised. 
     308 
     309 
     310Changes from 1.5.0 to 1.5.1 
     311=========================== 
     312 
     313* Updated to LZ4 1.5.0.  Closes #74. 
     314 
     315* Added the 'const' qualifier to non SSE2 shuffle functions. Closes #75. 
     316 
     317* Explicitly call blosc_init() in HDF5 blosc_filter.c, fixing a 
     318  segfault. 
     319 
     320* Quite a few improvements in cmake files for HDF5 support.  Thanks to 
     321  Dana Robinson (The HDF Group). 
     322 
     323* Variable 'class' caused problems compiling the HDF5 filter with g++. 
     324  Thanks to Laurent Chapon. 
     325 
     326* Small improvements on docstrings of c-blosc main functions. 
     327 
     328 
     329Changes from 1.4.1 to 1.5.0 
     330=========================== 
     331 
     332* Added new calls for allowing Blosc to be used *simultaneously* 
     333  (i.e. lock free) from multi-threaded environments.  The new 
     334  functions are: 
     335 
     336  - blosc_compress_ctx(...) 
     337  - blosc_decompress_ctx(...) 
     338 
     339  See the new docstrings in blosc.h for how to use them.  The previous 
     340  API should be completely unaffected.  Thanks to Christopher Speller. 
     341 
     342* Optimized copies during BloscLZ decompression.  This can make BloscLZ 
     343  to decompress up to 1.5x faster in some situations. 
     344 
     345* LZ4 and LZ4HC compressors updated to version 1.3.1. 
     346 
     347* Added an examples directory on how to link apps with Blosc. 
     348 
     349* stdlib.h moved from blosc.c to blosc.h as suggested by Rob Lathm. 
     350 
     351* Fix a warning for {snappy,lz4}-free compilation.  Thanks to Andrew Schaaf. 
     352 
     353* Several improvements for CMakeLists.txt (cmake). 
     354 
     355* Fixing C99 compatibility warnings.  Thanks to Christopher Speller. 
     356 
     357 
     358Changes from 1.4.0 to 1.4.1 
     359=========================== 
     360 
     361* Fixed a bug in blosc_getitem() introduced in 1.4.0.  Added a test for 
     362  blosc_getitem() as well. 
     363 
     364 
     365Changes from 1.3.6 to 1.4.0 
     366=========================== 
     367 
     368* Support for non-Intel and non-SSE2 architectures has been added.  In 
     369  particular, the Raspberry Pi platform (ARM) has been tested and all 
     370  tests pass here. 
     371 
     372* Architectures requiring strict access alignment are supported as well. 
     373  Due to this, arquitectures with a high penalty in accessing unaligned 
     374  data (e.g. Raspberry Pi, ARMv6) can compress up to 2.5x faster. 
     375 
     376* LZ4 has been updated to r119 (1.2.0) so as to fix a possible security 
     377  breach. 
     378 
     379 
     380Changes from 1.3.5 to 1.3.6 
     381=========================== 
     382 
     383* Updated to LZ4 r118 due to a (highly unlikely) security hole.  For 
     384  details see: 
     385 
     386  http://fastcompression.blogspot.fr/2014/06/debunking-lz4-20-years-old-bug-myth.html 
     387 
     388 
     389Changes from 1.3.4 to 1.3.5 
     390=========================== 
     391 
     392* Removed a pointer from 'pointer from integer without a cast' compiler 
     393  warning due to a bad macro definition. 
     394 
     395 
     396Changes from 1.3.3 to 1.3.4 
     397=========================== 
     398 
     399* Fixed a false buffer overrun condition.  This bug made c-blosc to 
     400  fail, even if the failure was not real. 
     401 
     402* Fixed the type of a buffer string. 
     403 
     404 
     405Changes from 1.3.2 to 1.3.3 
     406=========================== 
     407 
     408* Updated to LZ4 1.1.3 (improved speed for 32-bit platforms). 
     409 
     410* Added a new `blosc_cbuffer_complib()` for getting the compression 
     411  library for a compressed buffer. 
     412 
     413 
     414Changes from 1.3.1 to 1.3.2 
     415=========================== 
     416 
     417* Fix for compiling Snappy sources against MSVC 2008.  Thanks to Mark 
     418  Wiebe! 
     419 
     420* Version for internal LZ4 and Snappy are now supported.  When compiled 
     421  against the external libraries, this info is not available because 
     422  they do not support the symbols (yet). 
     423 
     424 
     425Changes from 1.3.0 to 1.3.1 
     426=========================== 
     427 
     428* Fixes for a series of issues with the filter for HDF5 and, in 
     429  particular, a problem in the decompression buffer size that made it 
     430  impossible to use the blosc_filter in combination with other ones 
     431  (e.g. fletcher32).  See 
     432  https://github.com/PyTables/PyTables/issues/21. 
     433 
     434  Thanks to Antonio Valentino for the fix! 
     435 
     436 
     437Changes from 1.2.4 to 1.3.0 
     438=========================== 
     439 
     440A nice handful of compressors have been added to Blosc: 
     441 
     442* LZ4 (http://code.google.com/p/lz4/): A very fast 
     443  compressor/decompressor.  Could be thought as a replacement of the 
     444  original BloscLZ, but it can behave better is some scenarios. 
     445 
     446* LZ4HC (http://code.google.com/p/lz4/): This is a variation of LZ4 
     447  that achieves much better compression ratio at the cost of being 
     448  much slower for compressing.  Decompression speed is unaffected (and 
     449  sometimes better than when using LZ4 itself!), so this is very good 
     450  for read-only datasets. 
     451 
     452* Snappy (http://code.google.com/p/snappy/): A very fast 
     453  compressor/decompressor.  Could be thought as a replacement of the 
     454  original BloscLZ, but it can behave better is some scenarios. 
     455 
     456* Zlib (http://www.zlib.net/): This is a classic.  It achieves very 
     457  good compression ratios, at the cost of speed.  However, 
     458  decompression speed is still pretty good, so it is a good candidate 
     459  for read-only datasets. 
     460 
     461With this, you can select the compression library with the new 
     462function:: 
     463 
     464  int blosc_set_complib(char* complib); 
     465 
     466where you pass the library that you want to use (currently "blosclz", 
     467"lz4", "lz4hc", "snappy" and "zlib", but the list can grow in the 
     468future). 
     469 
     470You can get more info about compressors support in you Blosc build by 
     471using these functions:: 
     472 
     473  char* blosc_list_compressors(void); 
     474  int blosc_get_complib_info(char *compressor, char **complib, char **version); 
    8475 
    9476 
     
    245712  necessary on Mac because 16 bytes alignment is ensured by default. 
    246713  Thanks to Ivan Vilata.  Fixes #3. 
    247  
    248  
    249  
    250  
    251 .. Local Variables: 
    252 .. mode: rst 
    253 .. coding: utf-8 
    254 .. fill-column: 72 
    255 .. End: 
  • thirdparty/blosc/RELEASING.rst

    r00587dc r981e22c  
    44 
    55:Author: Francesc Alted 
    6 :Contact: f[email protected] 
    7 :Date: 2012-09-16 
     6:Contact: f[email protected] 
     7:Date: 2014-01-15 
    88 
    99 
     
    1616- Check that *VERSION* symbols in blosc/blosc.h contains the correct info. 
    1717 
     18- Commit the changes:: 
     19 
     20    $ git commit -a -m"Getting ready for X.Y.Z release" 
     21 
     22 
    1823Testing 
    1924------- 
    2025 
    21 Go to the test/ directory and issue:: 
     26Create a new build/ directory, change into it and issue:: 
    2227 
    23   $ make test 
     28  $ cmake .. 
     29  $ cmake --build . 
     30  $ ctest 
    2431 
    25 These tests are very basic, and only valid for platforms where GNU 
    26 make/gcc tools are available.  To actually test Blosc the hard way, 
    27 look at the end of: 
     32To actually test Blosc the hard way, look at the end of: 
    2833 
    29 http://blosc.org/trac/wiki/SyntheticBenchmarks 
     34http://blosc.org/synthetic-benchmarks.html 
    3035 
    3136where instructions on how to intensively test (and benchmark) Blosc 
    3237are given. 
    33  
    34 Packaging 
    35 --------- 
    36  
    37 - Unpack the archive of the repository in a temporary directory:: 
    38  
    39   $ export VERSION="the version number" 
    40   $ mkdir /tmp/blosc-$VERSION 
    41   # IMPORTANT: make sure that you are at the root of the repo now! 
    42   $ git archive master | tar -x -C /tmp/blosc-$VERSION 
    43  
    44 - And package the repo:: 
    45  
    46   $ cd /tmp 
    47   $ tar cvfz blosc-$VERSION.tar.gz blosc-$VERSION 
    48  
    49 Do a quick check that the tarball is sane. 
    50  
    51  
    52 Uploading 
    53 --------- 
    54  
    55 - Go to the downloads section in blosc.org and upload the source 
    56   tarball. 
    5738 
    5839 
     
    7253---------- 
    7354 
    74 - Update the release notes in the github wiki: 
    75  
    76 https://github.com/FrancescAlted/blosc/wiki/Release-notes 
    77  
    78 - Send an announcement to the blosc, pytables, carray and 
     55- Send an announcement to the blosc, pytables-dev, bcolz and 
    7956  comp.compression lists.  Use the ``ANNOUNCE.rst`` file as skeleton 
    8057  (possibly as the definitive version). 
     58 
    8159 
    8260Post-release actions 
     
    8765 
    8866- Create new headers for adding new features in ``RELEASE_NOTES.rst`` 
    89   and empty the release-specific information in ``ANNOUNCE.rst`` and 
    90   add this place-holder instead: 
     67  and add this place-holder instead: 
    9168 
    9269  #XXX version-specific blurb XXX# 
     70 
     71- Commit the changes:: 
     72 
     73    $ git commit -a -m"Post X.Y.Z release actions done" 
     74    $ git push 
    9375 
    9476 
  • thirdparty/blosc/blosc.c

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
    3  
    4   Author: Francesc Alted <f[email protected]> 
     2  Blosc - Blocked Shuffling and Compression Library 
     3 
     4  Author: Francesc Alted <f[email protected]> 
    55  Creation date: 2009-05-20 
    66 
     
    99 
    1010 
     11#include <stdio.h> 
    1112#include <stdlib.h> 
    12 #include <stdio.h> 
     13#include <errno.h> 
    1314#include <string.h> 
    1415#include <sys/types.h> 
    1516#include <sys/stat.h> 
    1617#include <assert.h> 
     18#if defined(USING_CMAKE) 
     19  #include "config.h" 
     20#endif /*  USING_CMAKE */ 
    1721#include "blosc.h" 
     22#include "shuffle.h" 
    1823#include "blosclz.h" 
    19 #include "shuffle.h" 
     24#if defined(HAVE_LZ4) 
     25  #include "lz4.h" 
     26  #include "lz4hc.h" 
     27#endif /*  HAVE_LZ4 */ 
     28#if defined(HAVE_SNAPPY) 
     29  #include "snappy-c.h" 
     30#endif /*  HAVE_SNAPPY */ 
     31#if defined(HAVE_ZLIB) 
     32  #include "zlib.h" 
     33#endif /*  HAVE_ZLIB */ 
     34#if defined(HAVE_ZSTD) 
     35  #include "zstd.h" 
     36#endif /*  HAVE_ZSTD */ 
    2037 
    2138#if defined(_WIN32) && !defined(__MINGW32__) 
    2239  #include <windows.h> 
    23   #include "win32/stdint-windows.h" 
     40  #include <malloc.h> 
     41 
     42  /* stdint.h only available in VS2010 (VC++ 16.0) and newer */ 
     43  #if defined(_MSC_VER) && _MSC_VER < 1600 
     44    #include "win32/stdint-windows.h" 
     45  #else 
     46    #include <stdint.h> 
     47  #endif 
     48 
    2449  #include <process.h> 
    2550  #define getpid _getpid 
     
    3055#endif  /* _WIN32 */ 
    3156 
    32 #if defined(_WIN32) 
     57#if defined(_WIN32) && !defined(__GNUC__) 
    3358  #include "win32/pthread.h" 
    3459  #include "win32/pthread.c" 
     
    3762#endif 
    3863 
     64/* If C11 is supported, use it's built-in aligned allocation. */ 
     65#if __STDC_VERSION__ >= 201112L 
     66  #include <stdalign.h> 
     67#endif 
     68 
    3969 
    4070/* Some useful units */ 
     
    5080/* The size of L1 cache.  32 KB is quite common nowadays. */ 
    5181#define L1 (32*KB) 
    52  
    53 /* Wrapped function to adjust the number of threads used by blosc */ 
    54 int blosc_set_nthreads_(int); 
    55  
    56 /* Global variables for main logic */ 
    57 static int32_t init_temps_done = 0;    /* temp for compr/decompr initialized? */ 
    58 static int32_t force_blocksize = 0;    /* force the use of a blocksize? */ 
    59 static int pid = 0;                    /* the PID for this process */ 
    60 static int init_lib = 0;               /* is library initalized? */ 
    61  
    62 /* Global variables for threads */ 
    63 static int32_t nthreads = 1;            /* number of desired threads in pool */ 
    64 static int32_t init_threads_done = 0;   /* pool of threads initialized? */ 
    65 static int32_t end_threads = 0;         /* should exisiting threads end? */ 
    66 static int32_t init_sentinels_done = 0; /* sentinels initialized? */ 
    67 static int32_t giveup_code;             /* error code when give up */ 
    68 static int32_t nblock;                  /* block counter */ 
    69 static pthread_t threads[BLOSC_MAX_THREADS];  /* opaque structure for threads */ 
    70 static int32_t tids[BLOSC_MAX_THREADS];       /* ID per each thread */ 
    71 #if !defined(_WIN32) 
    72 static pthread_attr_t ct_attr;          /* creation time attrs for threads */ 
    73 #endif 
    7482 
    7583/* Have problems using posix barriers when symbol value is 200112L */ 
     
    7886#define _POSIX_BARRIERS_MINE 
    7987#endif 
    80  
    8188/* Synchronization variables */ 
    82 static pthread_mutex_t count_mutex; 
     89 
     90 
     91struct blosc_context { 
     92  int32_t compress;               /* 1 if we are doing compression 0 if decompress */ 
     93 
     94  const uint8_t* src; 
     95  uint8_t* dest;                  /* The current pos in the destination buffer */ 
     96  uint8_t* header_flags;          /* Flags for header.  Currently booked: 
     97                                    - 0: byte-shuffled? 
     98                                    - 1: memcpy'ed? 
     99                                    - 2: bit-shuffled? */ 
     100  int32_t sourcesize;             /* Number of bytes in source buffer (or uncompressed bytes in compressed file) */ 
     101  int32_t nblocks;                /* Number of total blocks in buffer */ 
     102  int32_t leftover;               /* Extra bytes at end of buffer */ 
     103  int32_t blocksize;              /* Length of the block in bytes */ 
     104  int32_t typesize;               /* Type size */ 
     105  int32_t num_output_bytes;       /* Counter for the number of output bytes */ 
     106  int32_t destsize;               /* Maximum size for destination buffer */ 
     107  uint8_t* bstarts;               /* Start of the buffer past header info */ 
     108  int32_t compcode;               /* Compressor code to use */ 
     109  int clevel;                     /* Compression level (1-9) */ 
     110 
     111  /* Threading */ 
     112  int32_t numthreads; 
     113  int32_t threads_started; 
     114  int32_t end_threads; 
     115  pthread_t threads[BLOSC_MAX_THREADS]; 
     116  int32_t tids[BLOSC_MAX_THREADS]; 
     117  pthread_mutex_t count_mutex; 
     118  #ifdef _POSIX_BARRIERS_MINE 
     119  pthread_barrier_t barr_init; 
     120  pthread_barrier_t barr_finish; 
     121  #else 
     122  int32_t count_threads; 
     123  pthread_mutex_t count_threads_mutex; 
     124  pthread_cond_t count_threads_cv; 
     125  #endif 
     126  #if !defined(_WIN32) 
     127  pthread_attr_t ct_attr;            /* creation time attrs for threads */ 
     128  #endif 
     129  int32_t thread_giveup_code;               /* error code when give up */ 
     130  int32_t thread_nblock;                    /* block counter */ 
     131}; 
     132 
     133struct thread_context { 
     134  struct blosc_context* parent_context; 
     135  int32_t tid; 
     136  uint8_t* tmp; 
     137  uint8_t* tmp2; 
     138  uint8_t* tmp3; 
     139  int32_t tmpblocksize; /* Used to keep track of how big the temporary buffers are */ 
     140}; 
     141 
     142/* Global context for non-contextual API */ 
     143static struct blosc_context* g_global_context; 
    83144static pthread_mutex_t global_comp_mutex; 
    84 #ifdef _POSIX_BARRIERS_MINE 
    85 static pthread_barrier_t barr_init; 
    86 static pthread_barrier_t barr_finish; 
    87 #else 
    88 static int32_t count_threads; 
    89 static pthread_mutex_t count_threads_mutex; 
    90 static pthread_cond_t count_threads_cv; 
    91 #endif 
    92  
    93  
    94 /* Structure for parameters in (de-)compression threads */ 
    95 static struct thread_data { 
    96   int32_t typesize; 
    97   int32_t blocksize; 
    98   int32_t compress; 
    99   int32_t clevel; 
    100   int32_t flags; 
    101   int32_t memcpyed; 
    102   int32_t ntbytes; 
    103   int32_t nbytes; 
    104   int32_t maxbytes; 
    105   int32_t nblocks; 
    106   int32_t leftover; 
    107   int32_t *bstarts;             /* start pointers for each block */ 
    108   uint8_t *src; 
    109   uint8_t *dest; 
    110   uint8_t *tmp[BLOSC_MAX_THREADS]; 
    111   uint8_t *tmp2[BLOSC_MAX_THREADS]; 
    112 } params; 
    113  
    114  
    115 /* Structure for parameters meant for keeping track of current temporaries */ 
    116 static struct temp_data { 
    117   int32_t nthreads; 
    118   int32_t typesize; 
    119   int32_t blocksize; 
    120 } current_temp; 
    121  
     145static int32_t g_compressor = BLOSC_BLOSCLZ;  /* the compressor to use by default */ 
     146static int32_t g_threads = 1; 
     147static int32_t g_force_blocksize = 0; 
     148static int32_t g_initlib = 0; 
     149 
     150 
     151 
     152/* Wrapped function to adjust the number of threads used by blosc */ 
     153int blosc_set_nthreads_(struct blosc_context*); 
     154 
     155/* Releases the global threadpool */ 
     156int blosc_release_threadpool(struct blosc_context* context); 
    122157 
    123158/* Macros for synchronization */ 
     
    125160/* Wait until all threads are initialized */ 
    126161#ifdef _POSIX_BARRIERS_MINE 
    127 static int rc; 
    128 #define WAIT_INIT \ 
    129   rc = pthread_barrier_wait(&barr_init); \ 
     162#define WAIT_INIT(RET_VAL, CONTEXT_PTR)  \ 
     163  rc = pthread_barrier_wait(&CONTEXT_PTR->barr_init); \ 
    130164  if (rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD) { \ 
    131     printf("Could not wait on barrier (init)\n"); \ 
    132     return(-1); \ 
     165    printf("Could not wait on barrier (init): %d\n", rc); \ 
     166    return((RET_VAL));                            \ 
    133167  } 
    134168#else 
    135 #define WAIT_INIT \ 
    136   pthread_mutex_lock(&count_threads_mutex); \ 
    137   if (count_threads < nthreads) { \ 
    138     count_threads++; \ 
    139     pthread_cond_wait(&count_threads_cv, &count_threads_mutex); \ 
     169#define WAIT_INIT(RET_VAL, CONTEXT_PTR)  \ 
     170  pthread_mutex_lock(&CONTEXT_PTR->count_threads_mutex); \ 
     171  if (CONTEXT_PTR->count_threads < CONTEXT_PTR->numthreads) { \ 
     172    CONTEXT_PTR->count_threads++; \ 
     173    pthread_cond_wait(&CONTEXT_PTR->count_threads_cv, &CONTEXT_PTR->count_threads_mutex); \ 
    140174  } \ 
    141175  else { \ 
    142     pthread_cond_broadcast(&count_threads_cv); \ 
     176    pthread_cond_broadcast(&CONTEXT_PTR->count_threads_cv); \ 
    143177  } \ 
    144   pthread_mutex_unlock(&count_threads_mutex); 
     178  pthread_mutex_unlock(&CONTEXT_PTR->count_threads_mutex); 
    145179#endif 
    146180 
    147181/* Wait for all threads to finish */ 
    148182#ifdef _POSIX_BARRIERS_MINE 
    149 #define WAIT_FINISH \ 
    150   rc = pthread_barrier_wait(&barr_finish); \ 
     183#define WAIT_FINISH(RET_VAL, CONTEXT_PTR)  \ 
     184  rc = pthread_barrier_wait(&CONTEXT_PTR->barr_finish); \ 
    151185  if (rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD) { \ 
    152186    printf("Could not wait on barrier (finish)\n"); \ 
    153     return(-1);                                       \ 
     187    return((RET_VAL));                              \ 
    154188  } 
    155189#else 
    156 #define WAIT_FINISH \ 
    157   pthread_mutex_lock(&count_threads_mutex); \ 
    158   if (count_threads > 0) { \ 
    159     count_threads--; \ 
    160     pthread_cond_wait(&count_threads_cv, &count_threads_mutex); \ 
     190#define WAIT_FINISH(RET_VAL, CONTEXT_PTR)                          \ 
     191  pthread_mutex_lock(&CONTEXT_PTR->count_threads_mutex); \ 
     192  if (CONTEXT_PTR->count_threads > 0) { \ 
     193    CONTEXT_PTR->count_threads--; \ 
     194    pthread_cond_wait(&CONTEXT_PTR->count_threads_cv, &CONTEXT_PTR->count_threads_mutex); \ 
    161195  } \ 
    162196  else { \ 
    163     pthread_cond_broadcast(&count_threads_cv); \ 
     197    pthread_cond_broadcast(&CONTEXT_PTR->count_threads_cv); \ 
    164198  } \ 
    165   pthread_mutex_unlock(&count_threads_mutex); 
     199  pthread_mutex_unlock(&CONTEXT_PTR->count_threads_mutex); 
    166200#endif 
    167201 
     
    173207  int res = 0; 
    174208 
    175 #if defined(_WIN32) 
     209/* Do an alignment to 32 bytes because AVX2 is supported */ 
     210#if _ISOC11_SOURCE 
     211  /* C11 aligned allocation. 'size' must be a multiple of the alignment. */ 
     212  block = aligned_alloc(32, size); 
     213#elif defined(_WIN32) 
    176214  /* A (void *) cast needed for avoiding a warning with MINGW :-/ */ 
    177   block = (void *)_aligned_malloc(size, 16); 
     215  block = (void *)_aligned_malloc(size, 32); 
    178216#elif defined __APPLE__ 
    179217  /* Mac OS X guarantees 16-byte alignment in small allocs */ 
     
    181219#elif _POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600 
    182220  /* Platform does have an implementation of posix_memalign */ 
    183   res = posix_memalign(&block, 16, size); 
     221  res = posix_memalign(&block, 32, size); 
    184222#else 
    185223  block = malloc(size); 
     
    206244 
    207245 
    208 /* If `a` is little-endian, return it as-is.  If not, return a copy, 
    209    with the endianness changed */ 
    210 static int32_t sw32(int32_t a) 
    211 { 
    212   int32_t tmp; 
    213   char *pa = (char *)&a; 
    214   char *ptmp = (char *)&tmp; 
     246/* Copy 4 bytes from `*pa` to int32_t, changing endianness if necessary. */ 
     247static int32_t sw32_(const uint8_t *pa) 
     248{ 
     249  int32_t idest; 
     250  uint8_t *dest = (uint8_t *)&idest; 
    215251  int i = 1;                    /* for big/little endian detection */ 
    216252  char *p = (char *)&i; 
     
    218254  if (p[0] != 1) { 
    219255    /* big endian */ 
    220     ptmp[0] = pa[3]; 
    221     ptmp[1] = pa[2]; 
    222     ptmp[2] = pa[1]; 
    223     ptmp[3] = pa[0]; 
    224     return tmp; 
     256    dest[0] = pa[3]; 
     257    dest[1] = pa[2]; 
     258    dest[2] = pa[1]; 
     259    dest[3] = pa[0]; 
    225260  } 
    226261  else { 
    227262    /* little endian */ 
    228     return a; 
    229   } 
    230 } 
    231  
     263    dest[0] = pa[0]; 
     264    dest[1] = pa[1]; 
     265    dest[2] = pa[2]; 
     266    dest[3] = pa[3]; 
     267  } 
     268  return idest; 
     269} 
     270 
     271 
     272/* Copy 4 bytes from `*pa` to `*dest`, changing endianness if necessary. */ 
     273static void _sw32(uint8_t* dest, int32_t a) 
     274{ 
     275  uint8_t *pa = (uint8_t *)&a; 
     276  int i = 1;                    /* for big/little endian detection */ 
     277  char *p = (char *)&i; 
     278 
     279  if (p[0] != 1) { 
     280    /* big endian */ 
     281    dest[0] = pa[3]; 
     282    dest[1] = pa[2]; 
     283    dest[2] = pa[1]; 
     284    dest[3] = pa[0]; 
     285  } 
     286  else { 
     287    /* little endian */ 
     288    dest[0] = pa[0]; 
     289    dest[1] = pa[1]; 
     290    dest[2] = pa[2]; 
     291    dest[3] = pa[3]; 
     292  } 
     293} 
     294 
     295 
     296/* 
     297 * Conversion routines between compressor and compression libraries 
     298 */ 
     299 
     300/* Return the library code associated with the compressor name */ 
     301static int compname_to_clibcode(const char *compname) 
     302{ 
     303  if (strcmp(compname, BLOSC_BLOSCLZ_COMPNAME) == 0) 
     304    return BLOSC_BLOSCLZ_LIB; 
     305  if (strcmp(compname, BLOSC_LZ4_COMPNAME) == 0) 
     306    return BLOSC_LZ4_LIB; 
     307  if (strcmp(compname, BLOSC_LZ4HC_COMPNAME) == 0) 
     308    return BLOSC_LZ4_LIB; 
     309  if (strcmp(compname, BLOSC_SNAPPY_COMPNAME) == 0) 
     310    return BLOSC_SNAPPY_LIB; 
     311  if (strcmp(compname, BLOSC_ZLIB_COMPNAME) == 0) 
     312    return BLOSC_ZLIB_LIB; 
     313  if (strcmp(compname, BLOSC_ZSTD_COMPNAME) == 0) 
     314    return BLOSC_ZSTD_LIB; 
     315  return -1; 
     316} 
     317 
     318/* Return the library name associated with the compressor code */ 
     319static char *clibcode_to_clibname(int clibcode) 
     320{ 
     321  if (clibcode == BLOSC_BLOSCLZ_LIB) return BLOSC_BLOSCLZ_LIBNAME; 
     322  if (clibcode == BLOSC_LZ4_LIB) return BLOSC_LZ4_LIBNAME; 
     323  if (clibcode == BLOSC_SNAPPY_LIB) return BLOSC_SNAPPY_LIBNAME; 
     324  if (clibcode == BLOSC_ZLIB_LIB) return BLOSC_ZLIB_LIBNAME; 
     325  if (clibcode == BLOSC_ZSTD_LIB) return BLOSC_ZSTD_LIBNAME; 
     326  return NULL;                  /* should never happen */ 
     327} 
     328 
     329 
     330/* 
     331 * Conversion routines between compressor names and compressor codes 
     332 */ 
     333 
     334/* Get the compressor name associated with the compressor code */ 
     335int blosc_compcode_to_compname(int compcode, char **compname) 
     336{ 
     337  int code = -1;    /* -1 means non-existent compressor code */ 
     338  char *name = NULL; 
     339 
     340  /* Map the compressor code */ 
     341  if (compcode == BLOSC_BLOSCLZ) 
     342    name = BLOSC_BLOSCLZ_COMPNAME; 
     343  else if (compcode == BLOSC_LZ4) 
     344    name = BLOSC_LZ4_COMPNAME; 
     345  else if (compcode == BLOSC_LZ4HC) 
     346    name = BLOSC_LZ4HC_COMPNAME; 
     347  else if (compcode == BLOSC_SNAPPY) 
     348    name = BLOSC_SNAPPY_COMPNAME; 
     349  else if (compcode == BLOSC_ZLIB) 
     350    name = BLOSC_ZLIB_COMPNAME; 
     351  else if (compcode == BLOSC_ZSTD) 
     352    name = BLOSC_ZSTD_COMPNAME; 
     353 
     354  *compname = name; 
     355 
     356  /* Guess if there is support for this code */ 
     357  if (compcode == BLOSC_BLOSCLZ) 
     358    code = BLOSC_BLOSCLZ; 
     359#if defined(HAVE_LZ4) 
     360  else if (compcode == BLOSC_LZ4) 
     361    code = BLOSC_LZ4; 
     362  else if (compcode == BLOSC_LZ4HC) 
     363    code = BLOSC_LZ4HC; 
     364#endif /*  HAVE_LZ4 */ 
     365#if defined(HAVE_SNAPPY) 
     366  else if (compcode == BLOSC_SNAPPY) 
     367    code = BLOSC_SNAPPY; 
     368#endif /*  HAVE_SNAPPY */ 
     369#if defined(HAVE_ZLIB) 
     370  else if (compcode == BLOSC_ZLIB) 
     371    code = BLOSC_ZLIB; 
     372#endif /*  HAVE_ZLIB */ 
     373#if defined(HAVE_ZSTD) 
     374  else if (compcode == BLOSC_ZSTD) 
     375    code = BLOSC_ZSTD; 
     376#endif /*  HAVE_ZSTD */ 
     377 
     378  return code; 
     379} 
     380 
     381/* Get the compressor code for the compressor name. -1 if it is not available */ 
     382int blosc_compname_to_compcode(const char *compname) 
     383{ 
     384  int code = -1;  /* -1 means non-existent compressor code */ 
     385 
     386  if (strcmp(compname, BLOSC_BLOSCLZ_COMPNAME) == 0) { 
     387    code = BLOSC_BLOSCLZ; 
     388  } 
     389#if defined(HAVE_LZ4) 
     390  else if (strcmp(compname, BLOSC_LZ4_COMPNAME) == 0) { 
     391    code = BLOSC_LZ4; 
     392  } 
     393  else if (strcmp(compname, BLOSC_LZ4HC_COMPNAME) == 0) { 
     394    code = BLOSC_LZ4HC; 
     395  } 
     396#endif /*  HAVE_LZ4 */ 
     397#if defined(HAVE_SNAPPY) 
     398  else if (strcmp(compname, BLOSC_SNAPPY_COMPNAME) == 0) { 
     399    code = BLOSC_SNAPPY; 
     400  } 
     401#endif /*  HAVE_SNAPPY */ 
     402#if defined(HAVE_ZLIB) 
     403  else if (strcmp(compname, BLOSC_ZLIB_COMPNAME) == 0) { 
     404    code = BLOSC_ZLIB; 
     405  } 
     406#endif /*  HAVE_ZLIB */ 
     407#if defined(HAVE_ZSTD) 
     408  else if (strcmp(compname, BLOSC_ZSTD_COMPNAME) == 0) { 
     409    code = BLOSC_ZSTD; 
     410  } 
     411#endif /*  HAVE_ZSTD */ 
     412 
     413return code; 
     414} 
     415 
     416 
     417#if defined(HAVE_LZ4) 
     418static int lz4_wrap_compress(const char* input, size_t input_length, 
     419                             char* output, size_t maxout, int accel) 
     420{ 
     421  int cbytes; 
     422  cbytes = LZ4_compress_fast(input, output, (int)input_length, (int)maxout, 
     423                             accel); 
     424  return cbytes; 
     425} 
     426 
     427static int lz4hc_wrap_compress(const char* input, size_t input_length, 
     428                               char* output, size_t maxout, int clevel) 
     429{ 
     430  int cbytes; 
     431  if (input_length > (size_t)(2<<30)) 
     432    return -1;   /* input larger than 1 GB is not supported */ 
     433  /* clevel for lz4hc goes up to 16, at least in LZ4 1.1.3 */ 
     434  cbytes = LZ4_compressHC2_limitedOutput(input, output, (int)input_length, 
     435                                         (int)maxout, clevel*2-1); 
     436  return cbytes; 
     437} 
     438 
     439static int lz4_wrap_decompress(const char* input, size_t compressed_length, 
     440                               char* output, size_t maxout) 
     441{ 
     442  size_t cbytes; 
     443  cbytes = LZ4_decompress_fast(input, output, (int)maxout); 
     444  if (cbytes != compressed_length) { 
     445    return 0; 
     446  } 
     447  return (int)maxout; 
     448} 
     449 
     450#endif /* HAVE_LZ4 */ 
     451 
     452#if defined(HAVE_SNAPPY) 
     453static int snappy_wrap_compress(const char* input, size_t input_length, 
     454                                char* output, size_t maxout) 
     455{ 
     456  snappy_status status; 
     457  size_t cl = maxout; 
     458  status = snappy_compress(input, input_length, output, &cl); 
     459  if (status != SNAPPY_OK){ 
     460    return 0; 
     461  } 
     462  return (int)cl; 
     463} 
     464 
     465static int snappy_wrap_decompress(const char* input, size_t compressed_length, 
     466                                  char* output, size_t maxout) 
     467{ 
     468  snappy_status status; 
     469  size_t ul = maxout; 
     470  status = snappy_uncompress(input, compressed_length, output, &ul); 
     471  if (status != SNAPPY_OK){ 
     472    return 0; 
     473  } 
     474  return (int)ul; 
     475} 
     476#endif /* HAVE_SNAPPY */ 
     477 
     478#if defined(HAVE_ZLIB) 
     479/* zlib is not very respectful with sharing name space with others. 
     480 Fortunately, its names do not collide with those already in blosc. */ 
     481static int zlib_wrap_compress(const char* input, size_t input_length, 
     482                              char* output, size_t maxout, int clevel) 
     483{ 
     484  int status; 
     485  uLongf cl = maxout; 
     486  status = compress2( 
     487             (Bytef*)output, &cl, (Bytef*)input, (uLong)input_length, clevel); 
     488  if (status != Z_OK){ 
     489    return 0; 
     490  } 
     491  return (int)cl; 
     492} 
     493 
     494static int zlib_wrap_decompress(const char* input, size_t compressed_length, 
     495                                char* output, size_t maxout) 
     496{ 
     497  int status; 
     498  uLongf ul = maxout; 
     499  status = uncompress( 
     500             (Bytef*)output, &ul, (Bytef*)input, (uLong)compressed_length); 
     501  if (status != Z_OK){ 
     502    return 0; 
     503  } 
     504  return (int)ul; 
     505} 
     506#endif /*  HAVE_ZLIB */ 
     507 
     508#if defined(HAVE_ZSTD) 
     509static int zstd_wrap_compress(const char* input, size_t input_length, 
     510                              char* output, size_t maxout, int clevel) { 
     511  size_t code; 
     512  // clevel = (clevel < 9) ? clevel * 2 - 1 : ZSTD_maxCLevel();  // see zstd#254 
     513  clevel = (clevel < 9) ? clevel * 2 - 1 : 22; 
     514  code = ZSTD_compress( 
     515      (void*)output, maxout, (void*)input, input_length, clevel); 
     516  if (ZSTD_isError(code)) { 
     517    return 0; 
     518  } 
     519  return (int)code; 
     520} 
     521 
     522static int zstd_wrap_decompress(const char* input, size_t compressed_length, 
     523                                char* output, size_t maxout) { 
     524  size_t code; 
     525  code = ZSTD_decompress( 
     526      (void*)output, maxout, (void*)input, compressed_length); 
     527  if (ZSTD_isError(code)) { 
     528    fprintf(stderr, "error decompressing with Zstd: %s \n", ZSTD_getErrorName(code)); 
     529    return 0; 
     530  } 
     531  return (int)code; 
     532} 
     533#endif /*  HAVE_ZSTD */ 
     534 
     535/* Compute acceleration for blosclz */ 
     536static int get_accel(const struct blosc_context* context) { 
     537  int32_t clevel = context->clevel; 
     538  int32_t typesize = context->typesize; 
     539 
     540  if (clevel == 9) { 
     541    return 1; 
     542  } 
     543  if (context->compcode == BLOSC_BLOSCLZ) { 
     544    /* Compute the power of 2. See: 
     545     * http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/ 
     546     */ 
     547    int32_t tspow2 = ((typesize != 0) && !(typesize & (typesize - 1))); 
     548    if (tspow2 && typesize < 32) { 
     549      return 32; 
     550    } 
     551  } 
     552  else if (context->compcode == BLOSC_LZ4) { 
     553    /* This acceleration setting based on discussions held in: 
     554     * https://groups.google.com/forum/#!topic/lz4c/zosy90P8MQw 
     555     */ 
     556    return (10 - clevel); 
     557  } 
     558  return 1; 
     559} 
    232560 
    233561/* Shuffle & compress a single block */ 
    234 static int blosc_c(int32_t blocksize, int32_t leftoverblock, 
    235                    int32_t ntbytes, int32_t maxbytes, 
    236                    uint8_t *src, uint8_t *dest, uint8_t *tmp) 
     562static int blosc_c(const struct blosc_context* context, int32_t blocksize, 
     563                   int32_t leftoverblock, int32_t ntbytes, int32_t maxbytes, 
     564                   const uint8_t *src, uint8_t *dest, uint8_t *tmp, 
     565                   uint8_t *tmp2) 
    237566{ 
    238567  int32_t j, neblock, nsplits; 
     
    240569  int32_t ctbytes = 0;              /* number of compressed bytes in block */ 
    241570  int32_t maxout; 
    242   int32_t typesize = params.typesize; 
    243   uint8_t *_tmp; 
    244  
    245   if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 
    246     /* Shuffle this block (this makes sense only if typesize > 1) */ 
     571  int32_t typesize = context->typesize; 
     572  const uint8_t *_tmp = src; 
     573  char *compname; 
     574  int accel; 
     575  int bscount; 
     576 
     577  if (*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) { 
     578    /* Byte shuffling only makes sense if typesize > 1 */ 
    247579    shuffle(typesize, blocksize, src, tmp); 
    248580    _tmp = tmp; 
    249581  } 
    250   else { 
    251     _tmp = src; 
    252   } 
     582  /* We don't allow more than 1 filter at the same time (yet) */ 
     583  else if (*(context->header_flags) & BLOSC_DOBITSHUFFLE) { 
     584    bscount = bitshuffle(typesize, blocksize, src, tmp, tmp2); 
     585    if (bscount < 0) 
     586      return bscount; 
     587    _tmp = tmp; 
     588  } 
     589 
     590  /* Calculate acceleration for different compressors */ 
     591  accel = get_accel(context); 
    253592 
    254593  /* Compress for each shuffled slice split for this block. */ 
     
    268607    ctbytes += (int32_t)sizeof(int32_t); 
    269608    maxout = neblock; 
     609    #if defined(HAVE_SNAPPY) 
     610    if (context->compcode == BLOSC_SNAPPY) { 
     611      /* TODO perhaps refactor this to keep the value stashed somewhere */ 
     612      maxout = snappy_max_compressed_length(neblock); 
     613    } 
     614    #endif /*  HAVE_SNAPPY */ 
    270615    if (ntbytes+maxout > maxbytes) { 
    271616      maxout = maxbytes - ntbytes;   /* avoid buffer overrun */ 
     
    274619      } 
    275620    } 
    276     cbytes = blosclz_compress(params.clevel, _tmp+j*neblock, neblock, 
    277                               dest, maxout); 
    278     if (cbytes >= maxout) { 
    279       /* Buffer overrun caused by blosclz_compress (should never happen) */ 
     621    if (context->compcode == BLOSC_BLOSCLZ) { 
     622      cbytes = blosclz_compress(context->clevel, _tmp+j*neblock, neblock, 
     623                                dest, maxout, accel); 
     624    } 
     625    #if defined(HAVE_LZ4) 
     626    else if (context->compcode == BLOSC_LZ4) { 
     627      cbytes = lz4_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 
     628                                 (char *)dest, (size_t)maxout, accel); 
     629    } 
     630    else if (context->compcode == BLOSC_LZ4HC) { 
     631      cbytes = lz4hc_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 
     632                                   (char *)dest, (size_t)maxout, 
     633                                   context->clevel); 
     634    } 
     635    #endif /* HAVE_LZ4 */ 
     636    #if defined(HAVE_SNAPPY) 
     637    else if (context->compcode == BLOSC_SNAPPY) { 
     638      cbytes = snappy_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 
     639                                    (char *)dest, (size_t)maxout); 
     640    } 
     641    #endif /* HAVE_SNAPPY */ 
     642    #if defined(HAVE_ZLIB) 
     643    else if (context->compcode == BLOSC_ZLIB) { 
     644      cbytes = zlib_wrap_compress((char *)_tmp+j*neblock, (size_t)neblock, 
     645                                  (char *)dest, (size_t)maxout, 
     646                                  context->clevel); 
     647    } 
     648    #endif /* HAVE_ZLIB */ 
     649    #if defined(HAVE_ZSTD) 
     650    else if (context->compcode == BLOSC_ZSTD) { 
     651      cbytes = zstd_wrap_compress((char*)_tmp + j * neblock, (size_t)neblock, 
     652                                  (char*)dest, (size_t)maxout, context->clevel); 
     653    } 
     654    #endif /* HAVE_ZSTD */ 
     655 
     656    else { 
     657      blosc_compcode_to_compname(context->compcode, &compname); 
     658      fprintf(stderr, "Blosc has not been compiled with '%s' ", compname); 
     659      fprintf(stderr, "compression support.  Please use one having it."); 
     660      return -5;    /* signals no compression support */ 
     661    } 
     662 
     663    if (cbytes > maxout) { 
     664      /* Buffer overrun caused by compression (should never happen) */ 
    280665      return -1; 
    281666    } 
     
    284669      return -2; 
    285670    } 
    286     else if (cbytes == 0) { 
    287       /* The compressor has been unable to compress data significantly. */ 
     671    else if (cbytes == 0 || cbytes == neblock) { 
     672      /* The compressor has been unable to compress data at all. */ 
    288673      /* Before doing the copy, check that we are not running into a 
    289674         buffer overflow. */ 
     
    294679      cbytes = neblock; 
    295680    } 
    296     ((int32_t *)(dest))[-1] = sw32(cbytes); 
     681    _sw32(dest - 4, cbytes); 
    297682    dest += cbytes; 
    298683    ntbytes += cbytes; 
     
    303688} 
    304689 
    305  
    306690/* Decompress & unshuffle a single block */ 
    307 static int blosc_d(int32_t blocksize, int32_t leftoverblock, 
    308                    uint8_t *src, uint8_t *dest, uint8_t *tmp, uint8_t *tmp2) 
     691static int blosc_d(struct blosc_context* context, int32_t blocksize, int32_t leftoverblock, 
     692                   const uint8_t *src, uint8_t *dest, uint8_t *tmp, uint8_t *tmp2) 
    309693{ 
    310694  int32_t j, neblock, nsplits; 
     
    313697  int32_t ctbytes = 0;           /* number of compressed bytes in block */ 
    314698  int32_t ntbytes = 0;           /* number of uncompressed bytes in block */ 
    315   uint8_t *_tmp; 
    316   int32_t typesize = params.typesize; 
    317  
    318   if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 
     699  uint8_t *_tmp = dest; 
     700  int32_t typesize = context->typesize; 
     701  int32_t compformat; 
     702  char *compname; 
     703  int bscount; 
     704 
     705  if ((*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) ||  \ 
     706      (*(context->header_flags) & BLOSC_DOBITSHUFFLE)) { 
    319707    _tmp = tmp; 
    320708  } 
    321   else { 
    322     _tmp = dest; 
    323   } 
     709 
     710  compformat = (*(context->header_flags) & 0xe0) >> 5; 
    324711 
    325712  /* Compress for each shuffled slice split for this block. */ 
     
    333720  neblock = blocksize / nsplits; 
    334721  for (j = 0; j < nsplits; j++) { 
    335     cbytes = sw32(((int32_t *)(src))[0]);   /* amount of compressed bytes */ 
     722    cbytes = sw32_(src);      /* amount of compressed bytes */ 
    336723    src += sizeof(int32_t); 
    337724    ctbytes += (int32_t)sizeof(int32_t); 
     
    342729    } 
    343730    else { 
    344       nbytes = blosclz_decompress(src, cbytes, _tmp, neblock); 
     731      if (compformat == BLOSC_BLOSCLZ_FORMAT) { 
     732        nbytes = blosclz_decompress(src, cbytes, _tmp, neblock); 
     733      } 
     734      #if defined(HAVE_LZ4) 
     735      else if (compformat == BLOSC_LZ4_FORMAT) { 
     736        nbytes = lz4_wrap_decompress((char *)src, (size_t)cbytes, 
     737                                     (char*)_tmp, (size_t)neblock); 
     738      } 
     739      #endif /*  HAVE_LZ4 */ 
     740      #if defined(HAVE_SNAPPY) 
     741      else if (compformat == BLOSC_SNAPPY_FORMAT) { 
     742        nbytes = snappy_wrap_decompress((char *)src, (size_t)cbytes, 
     743                                        (char*)_tmp, (size_t)neblock); 
     744      } 
     745      #endif /*  HAVE_SNAPPY */ 
     746      #if defined(HAVE_ZLIB) 
     747      else if (compformat == BLOSC_ZLIB_FORMAT) { 
     748        nbytes = zlib_wrap_decompress((char *)src, (size_t)cbytes, 
     749                                      (char*)_tmp, (size_t)neblock); 
     750      } 
     751      #endif /*  HAVE_ZLIB */ 
     752      #if defined(HAVE_ZSTD) 
     753      else if (compformat == BLOSC_ZSTD_FORMAT) { 
     754        nbytes = zstd_wrap_decompress((char*)src, (size_t)cbytes, 
     755                                      (char*)_tmp, (size_t)neblock); 
     756      } 
     757      #endif /*  HAVE_ZSTD */ 
     758      else { 
     759        compname = clibcode_to_clibname(compformat); 
     760        fprintf(stderr, 
     761                "Blosc has not been compiled with decompression " 
     762                "support for '%s' format. ", compname); 
     763        fprintf(stderr, "Please recompile for adding this support.\n"); 
     764        return -5;    /* signals no decompression support */ 
     765      } 
     766 
     767      /* Check that decompressed bytes number is correct */ 
    345768      if (nbytes != neblock) { 
    346         return -2; 
    347       } 
     769          return -2; 
     770      } 
     771 
    348772    } 
    349773    src += cbytes; 
     
    353777  } /* Closes j < nsplits */ 
    354778 
    355   if ((params.flags & BLOSC_DOSHUFFLE) && (typesize > 1)) { 
    356     if ((uintptr_t)dest % 16 == 0) { 
    357       /* 16-bytes aligned dest.  SSE2 unshuffle will work. */ 
    358       unshuffle(typesize, blocksize, tmp, dest); 
    359     } 
    360     else { 
    361       /* dest is not aligned.  Use tmp2, which is aligned, and copy. */ 
    362       unshuffle(typesize, blocksize, tmp, tmp2); 
    363       if (tmp2 != dest) { 
    364         /* Copy only when dest is not tmp2 (e.g. not blosc_getitem())  */ 
    365         memcpy(dest, tmp2, blocksize); 
    366       } 
    367     } 
     779  if (*(context->header_flags) & BLOSC_DOSHUFFLE & (typesize > 1)) { 
     780    unshuffle(typesize, blocksize, tmp, dest); 
     781  } 
     782  else if (*(context->header_flags) & BLOSC_DOBITSHUFFLE) { 
     783    bscount = bitunshuffle(typesize, blocksize, tmp, dest, tmp2); 
     784    if (bscount < 0) 
     785      return bscount; 
    368786  } 
    369787 
     
    374792 
    375793/* Serial version for compression/decompression */ 
    376 static int serial_blosc(void) 
     794static int serial_blosc(struct blosc_context* context) 
    377795{ 
    378796  int32_t j, bsize, leftoverblock; 
    379797  int32_t cbytes; 
    380   int32_t compress = params.compress; 
    381   int32_t blocksize = params.blocksize; 
    382   int32_t ntbytes = params.ntbytes; 
    383   int32_t flags = params.flags; 
    384   int32_t maxbytes = params.maxbytes; 
    385   int32_t nblocks = params.nblocks; 
    386   int32_t leftover = params.nbytes % params.blocksize; 
    387   int32_t *bstarts = params.bstarts; 
    388   uint8_t *src = params.src; 
    389   uint8_t *dest = params.dest; 
    390   uint8_t *tmp = params.tmp[0];     /* tmp for thread 0 */ 
    391   uint8_t *tmp2 = params.tmp2[0];   /* tmp2 for thread 0 */ 
    392  
    393   for (j = 0; j < nblocks; j++) { 
    394     if (compress && !(flags & BLOSC_MEMCPYED)) { 
    395       bstarts[j] = sw32(ntbytes); 
    396     } 
    397     bsize = blocksize; 
     798 
     799  int32_t ebsize = context->blocksize + context->typesize * (int32_t)sizeof(int32_t); 
     800  int32_t ntbytes = context->num_output_bytes; 
     801 
     802  uint8_t *tmp = my_malloc(context->blocksize + ebsize); 
     803  uint8_t *tmp2 = tmp + context->blocksize; 
     804 
     805  for (j = 0; j < context->nblocks; j++) { 
     806    if (context->compress && !(*(context->header_flags) & BLOSC_MEMCPYED)) { 
     807      _sw32(context->bstarts + j * 4, ntbytes); 
     808    } 
     809    bsize = context->blocksize; 
    398810    leftoverblock = 0; 
    399     if ((j == nblocks - 1) && (leftover > 0)) { 
    400       bsize = leftover; 
     811    if ((j == context->nblocks - 1) && (context->leftover > 0)) { 
     812      bsize = context->leftover; 
    401813      leftoverblock = 1; 
    402814    } 
    403     if (compress) { 
    404       if (flags & BLOSC_MEMCPYED) { 
     815    if (context->compress) { 
     816      if (*(context->header_flags) & BLOSC_MEMCPYED) { 
    405817        /* We want to memcpy only */ 
    406         memcpy(dest+BLOSC_MAX_OVERHEAD+j*blocksize, src+j*blocksize, bsize); 
     818        memcpy(context->dest+BLOSC_MAX_OVERHEAD+j*context->blocksize, 
     819                context->src+j*context->blocksize, 
     820                bsize); 
    407821        cbytes = bsize; 
    408822      } 
    409823      else { 
    410824        /* Regular compression */ 
    411         cbytes = blosc_c(bsize, leftoverblock, ntbytes, maxbytes, 
    412                          src+j*blocksize, dest+ntbytes, tmp); 
     825        cbytes = blosc_c(context, bsize, leftoverblock, ntbytes, 
     826                         context->destsize, context->src+j*context->blocksize, 
     827                         context->dest+ntbytes, tmp, tmp2); 
    413828        if (cbytes == 0) { 
    414829          ntbytes = 0;              /* uncompressible data */ 
     
    418833    } 
    419834    else { 
    420       if (flags & BLOSC_MEMCPYED) { 
     835      if (*(context->header_flags) & BLOSC_MEMCPYED) { 
    421836        /* We want to memcpy only */ 
    422         memcpy(dest+j*blocksize, src+BLOSC_MAX_OVERHEAD+j*blocksize, bsize); 
     837        memcpy(context->dest+j*context->blocksize, 
     838                context->src+BLOSC_MAX_OVERHEAD+j*context->blocksize, 
     839                bsize); 
    423840        cbytes = bsize; 
    424841      } 
    425842      else { 
    426843        /* Regular decompression */ 
    427         cbytes = blosc_d(bsize, leftoverblock, 
    428                          src+sw32(bstarts[j]), dest+j*blocksize, tmp, tmp2); 
     844        cbytes = blosc_d(context, bsize, leftoverblock, 
     845                          context->src + sw32_(context->bstarts + j * 4), 
     846                          context->dest+j*context->blocksize, tmp, tmp2); 
    429847      } 
    430848    } 
     
    436854  } 
    437855 
     856  // Free temporaries 
     857  my_free(tmp); 
     858 
    438859  return ntbytes; 
    439860} 
     
    441862 
    442863/* Threaded version for compression/decompression */ 
    443 static int parallel_blosc(void) 
    444 { 
     864static int parallel_blosc(struct blosc_context* context) 
     865{ 
     866  int rc; 
    445867 
    446868  /* Check whether we need to restart threads */ 
    447   if (!init_threads_done || pid != getpid()) { 
    448     blosc_set_nthreads_(nthreads); 
    449   } 
     869  blosc_set_nthreads_(context); 
     870 
     871  /* Set sentinels */ 
     872  context->thread_giveup_code = 1; 
     873  context->thread_nblock = -1; 
    450874 
    451875  /* Synchronization point for all threads (wait for initialization) */ 
    452   WAIT_INIT; 
     876  WAIT_INIT(-1, context); 
     877 
    453878  /* Synchronization point for all threads (wait for finalization) */ 
    454   WAIT_FINISH; 
    455  
    456   if (giveup_code > 0) { 
     879  WAIT_FINISH(-1, context); 
     880 
     881  if (context->thread_giveup_code > 0) { 
    457882    /* Return the total bytes (de-)compressed in threads */ 
    458     return params.ntbytes; 
     883    return context->num_output_bytes; 
    459884  } 
    460885  else { 
    461886    /* Compression/decompression gave up.  Return error code. */ 
    462     return giveup_code; 
    463   } 
    464 } 
    465  
    466  
    467 /* Convenience functions for creating and releasing temporaries */ 
    468 static int create_temporaries(void) 
    469 { 
    470   int32_t tid; 
    471   int32_t typesize = params.typesize; 
    472   int32_t blocksize = params.blocksize; 
    473   /* Extended blocksize for temporary destination.  Extended blocksize 
    474    is only useful for compression in parallel mode, but it doesn't 
    475    hurt serial mode either. */ 
    476   int32_t ebsize = blocksize + typesize*(int32_t)sizeof(int32_t); 
    477  
    478   /* Create temporary area for each thread */ 
    479   for (tid = 0; tid < nthreads; tid++) { 
    480     uint8_t *tmp = my_malloc(blocksize); 
    481     uint8_t *tmp2; 
    482     if (tmp == NULL) { 
    483       return -1; 
    484     } 
    485     params.tmp[tid] = tmp; 
    486     tmp2 = my_malloc(ebsize); 
    487     if (tmp2 == NULL) { 
    488       return -1; 
    489     } 
    490     params.tmp2[tid] = tmp2; 
    491   } 
    492  
    493   init_temps_done = 1; 
    494   /* Update params for current temporaries */ 
    495   current_temp.nthreads = nthreads; 
    496   current_temp.typesize = typesize; 
    497   current_temp.blocksize = blocksize; 
    498   return 0; 
    499 } 
    500  
    501  
    502 static void release_temporaries(void) 
    503 { 
    504   int32_t tid; 
    505  
    506   /* Release buffers */ 
    507   for (tid = 0; tid < nthreads; tid++) { 
    508     my_free(params.tmp[tid]); 
    509     my_free(params.tmp2[tid]); 
    510   } 
    511  
    512   init_temps_done = 0; 
     887    return context->thread_giveup_code; 
     888  } 
    513889} 
    514890 
     
    516892/* Do the compression or decompression of the buffer depending on the 
    517893   global params. */ 
    518 static int do_job(void) 
     894static int do_job(struct blosc_context* context) 
    519895{ 
    520896  int32_t ntbytes; 
    521  
    522   /* Initialize/reset temporaries if needed */ 
    523   if (!init_temps_done) { 
    524     int ret; 
    525     ret = create_temporaries(); 
    526     if (ret < 0) { 
    527       return -1; 
    528     } 
    529   } 
    530   else if (current_temp.nthreads != nthreads || 
    531            current_temp.typesize != params.typesize || 
    532            current_temp.blocksize != params.blocksize) { 
    533     int ret; 
    534     release_temporaries(); 
    535     ret = create_temporaries(); 
    536     if (ret < 0) { 
    537       return -1; 
    538     } 
    539   } 
    540897 
    541898  /* Run the serial version when nthreads is 1 or when the buffers are 
    542899     not much larger than blocksize */ 
    543   if (nthreads == 1 || (params.nbytes / params.blocksize) <= 1) { 
    544     ntbytes = serial_blosc(); 
     900  if (context->numthreads == 1 || (context->sourcesize / context->blocksize) <= 1) { 
     901    ntbytes = serial_blosc(context); 
    545902  } 
    546903  else { 
    547     ntbytes = parallel_blosc(); 
     904    ntbytes = parallel_blosc(context); 
    548905  } 
    549906 
     
    552909 
    553910 
    554 static int32_t compute_blocksize(int32_t clevel, int32_t typesize, 
    555                                  int32_t nbytes) 
     911static int32_t compute_blocksize(struct blosc_context* context, int32_t clevel, 
     912                                 int32_t typesize, int32_t nbytes, 
     913                                 int32_t forced_blocksize) 
    556914{ 
    557915  int32_t blocksize; 
     
    564922  blocksize = nbytes;           /* Start by a whole buffer as blocksize */ 
    565923 
    566   if (force_blocksize) { 
    567     blocksize = force_blocksize; 
    568     /* Check that forced blocksize is not too small nor too large */ 
     924  if (forced_blocksize) { 
     925    blocksize = forced_blocksize; 
     926    /* Check that forced blocksize is not too small */ 
    569927    if (blocksize < MIN_BUFFERSIZE) { 
    570928      blocksize = MIN_BUFFERSIZE; 
    571929    } 
    572930  } 
    573   else if (nbytes >= L1*4) { 
    574     blocksize = L1 * 4; 
     931  else if (nbytes >= L1) { 
     932    blocksize = L1; 
     933 
     934    /* For LZ4HC, increase the block sizes by a factor of 8 because it 
     935       is meant for compressing large blocks (it shows a big overhead 
     936       when compressing small ones). */ 
     937    if (context->compcode == BLOSC_LZ4HC) { 
     938      blocksize *= 8; 
     939    } 
     940 
     941    /* For Zlib, increase the block sizes by a factor of 8 because it 
     942       is meant for compressing large blocks (it shows a big overhead 
     943       when compressing small ones). */ 
     944    if (context->compcode == BLOSC_ZLIB) { 
     945      blocksize *= 8; 
     946    } 
     947 
     948    /* For Zstd, increase the block sizes by a factor of 8 because it 
     949       is meant for compressing large blocks (it shows a big overhead 
     950       when compressing small ones). */ 
     951    if (context->compcode == BLOSC_ZSTD) { 
     952      blocksize *= 8; 
     953    } 
     954 
    575955    if (clevel == 0) { 
    576       blocksize /= 16; 
     956      blocksize /= 4; 
    577957    } 
    578958    else if (clevel <= 3) { 
    579       blocksize /= 8; 
     959      blocksize /= 2; 
    580960    } 
    581961    else if (clevel <= 5) { 
    582       blocksize /= 4; 
     962      blocksize *= 1; 
    583963    } 
    584964    else if (clevel <= 6) { 
    585       blocksize /= 2; 
     965      blocksize *= 2; 
    586966    } 
    587967    else if (clevel < 9) { 
    588       blocksize *= 1; 
     968      blocksize *= 4; 
    589969    } 
    590970    else { 
    591       blocksize *= 2; 
     971      blocksize *= 16; 
    592972    } 
    593973  } 
     
    598978  } 
    599979 
    600   /* blocksize must be a multiple of the typesize */ 
     980  /* blocksize *must absolutely* be a multiple of the typesize */ 
    601981  if (blocksize > typesize) { 
    602982    blocksize = blocksize / typesize * typesize; 
    603983  } 
    604984 
    605   /* blocksize must not exceed (64 KB * typesize) in order to allow 
    606      BloscLZ to achieve better compression ratios (the ultimate reason 
    607      for this is that hash_log in BloscLZ cannot be larger than 15) */ 
    608   if ((blocksize / typesize) > 64*KB) { 
    609     blocksize = 64 * KB * typesize; 
    610   } 
    611  
    612985  return blocksize; 
    613986} 
    614987 
    615  
    616 /* The public routine for compression.  See blosc.h for docstrings. */ 
    617 int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 
    618       const void *src, void *dest, size_t destsize) 
    619 { 
    620   uint8_t *_dest=NULL;         /* current pos for destination buffer */ 
    621   uint8_t *flags;              /* flags for header.  Currently booked: 
    622                                   - 0: shuffled? 
    623                                   - 1: memcpy'ed? */ 
    624   int32_t nbytes_;            /* number of bytes in source buffer */ 
    625   int32_t nblocks;            /* number of total blocks in buffer */ 
    626   int32_t leftover;           /* extra bytes at end of buffer */ 
    627   int32_t *bstarts;           /* start pointers for each block */ 
    628   int32_t blocksize;          /* length of the block in bytes */ 
    629   int32_t ntbytes = 0;        /* the number of compressed bytes */ 
    630   int32_t *ntbytes_;          /* placeholder for bytes in output buffer */ 
    631   int32_t maxbytes = (int32_t)destsize;  /* maximum size for dest buffer */ 
     988static int initialize_context_compression(struct blosc_context* context, 
     989                          int clevel, 
     990                          int doshuffle, 
     991                          size_t typesize, 
     992                          size_t sourcesize, 
     993                          const void* src, 
     994                          void* dest, 
     995                          size_t destsize, 
     996                          int32_t compressor, 
     997                          int32_t blocksize, 
     998                          int32_t numthreads) 
     999{ 
     1000  /* Set parameters */ 
     1001  context->compress = 1; 
     1002  context->src = (const uint8_t*)src; 
     1003  context->dest = (uint8_t *)(dest); 
     1004  context->num_output_bytes = 0; 
     1005  context->destsize = (int32_t)destsize; 
     1006  context->sourcesize = sourcesize; 
     1007  context->typesize = typesize; 
     1008  context->compcode = compressor; 
     1009  context->numthreads = numthreads; 
     1010  context->end_threads = 0; 
     1011  context->clevel = clevel; 
    6321012 
    6331013  /* Check buffer size limits */ 
    634   if (nbytes > BLOSC_MAX_BUFFERSIZE) { 
     1014  if (sourcesize > BLOSC_MAX_BUFFERSIZE) { 
    6351015    /* If buffer is too large, give up. */ 
    6361016    fprintf(stderr, "Input buffer size cannot exceed %d bytes\n", 
     
    6381018    return -1; 
    6391019  } 
    640  
    641   /* We can safely do this assignation now */ 
    642   nbytes_ = (int32_t)nbytes; 
    6431020 
    6441021  /* Compression level */ 
     
    6501027 
    6511028  /* Shuffle */ 
    652   if (doshuffle != 0 && doshuffle != 1) { 
    653     fprintf(stderr, "`shuffle` parameter must be either 0 or 1!\n"); 
     1029  if (doshuffle != 0 && doshuffle != 1 && doshuffle != 2) { 
     1030    fprintf(stderr, "`shuffle` parameter must be either 0, 1 or 2!\n"); 
    6541031    return -10; 
    6551032  } 
    6561033 
    6571034  /* Check typesize limits */ 
    658   if (typesize > BLOSC_MAX_TYPESIZE) { 
     1035  if (context->typesize > BLOSC_MAX_TYPESIZE) { 
    6591036    /* If typesize is too large, treat buffer as an 1-byte stream. */ 
    660     typesize = 1; 
     1037    context->typesize = 1; 
    6611038  } 
    6621039 
    6631040  /* Get the blocksize */ 
    664   blocksize = compute_blocksize(clevel, (int32_t)typesize, nbytes_); 
     1041  context->blocksize = compute_blocksize(context, clevel, (int32_t)context->typesize, context->sourcesize, blocksize); 
    6651042 
    6661043  /* Compute number of blocks in buffer */ 
    667   nblocks = nbytes_ / blocksize; 
    668   leftover = nbytes_ % blocksize; 
    669   nblocks = (leftover>0)? nblocks+1: nblocks; 
    670  
    671   _dest = (uint8_t *)(dest); 
    672   /* Write header for this block */ 
    673   _dest[0] = BLOSC_VERSION_FORMAT;         /* blosc format version */ 
    674   _dest[1] = BLOSCLZ_VERSION_FORMAT;       /* blosclz format version */ 
    675   flags = _dest+2;                         /* flags */ 
    676   _dest[2] = 0;                            /* zeroes flags */ 
    677   _dest[3] = (uint8_t)typesize;            /* type size */ 
    678   _dest += 4; 
    679   ((int32_t *)_dest)[0] = sw32(nbytes_);  /* size of the buffer */ 
    680   ((int32_t *)_dest)[1] = sw32(blocksize);/* block size */ 
    681   ntbytes_ = (int32_t *)(_dest+8);        /* compressed buffer size */ 
    682   _dest += sizeof(int32_t)*3; 
    683   bstarts = (int32_t *)_dest;             /* starts for every block */ 
    684   _dest += sizeof(int32_t)*nblocks;        /* space for pointers to blocks */ 
    685   ntbytes = (int32_t)(_dest - (uint8_t *)dest); 
    686  
    687   if (clevel == 0) { 
     1044  context->nblocks = context->sourcesize / context->blocksize; 
     1045  context->leftover = context->sourcesize % context->blocksize; 
     1046  context->nblocks = (context->leftover > 0) ? (context->nblocks + 1) : context->nblocks; 
     1047 
     1048  return 1; 
     1049} 
     1050 
     1051static int write_compression_header(struct blosc_context* context, int clevel, int doshuffle) 
     1052{ 
     1053  int32_t compformat; 
     1054 
     1055  /* Write version header for this block */ 
     1056  context->dest[0] = BLOSC_VERSION_FORMAT;              /* blosc format version */ 
     1057 
     1058  /* Write compressor format */ 
     1059  compformat = -1; 
     1060  switch (context->compcode) 
     1061  { 
     1062  case BLOSC_BLOSCLZ: 
     1063    compformat = BLOSC_BLOSCLZ_FORMAT; 
     1064    context->dest[1] = BLOSC_BLOSCLZ_VERSION_FORMAT; /* blosclz format version */ 
     1065    break; 
     1066 
     1067#if defined(HAVE_LZ4) 
     1068  case BLOSC_LZ4: 
     1069    compformat = BLOSC_LZ4_FORMAT; 
     1070    context->dest[1] = BLOSC_LZ4_VERSION_FORMAT;  /* lz4 format version */ 
     1071    break; 
     1072  case BLOSC_LZ4HC: 
     1073    compformat = BLOSC_LZ4HC_FORMAT; 
     1074    context->dest[1] = BLOSC_LZ4HC_VERSION_FORMAT; /* lz4hc is the same as lz4 */ 
     1075    break; 
     1076#endif /* HAVE_LZ4 */ 
     1077 
     1078#if defined(HAVE_SNAPPY) 
     1079  case BLOSC_SNAPPY: 
     1080    compformat = BLOSC_SNAPPY_FORMAT; 
     1081    context->dest[1] = BLOSC_SNAPPY_VERSION_FORMAT;    /* snappy format version */ 
     1082    break; 
     1083#endif /* HAVE_SNAPPY */ 
     1084 
     1085#if defined(HAVE_ZLIB) 
     1086  case BLOSC_ZLIB: 
     1087    compformat = BLOSC_ZLIB_FORMAT; 
     1088    context->dest[1] = BLOSC_ZLIB_VERSION_FORMAT;      /* zlib format version */ 
     1089    break; 
     1090#endif /* HAVE_ZLIB */ 
     1091 
     1092#if defined(HAVE_ZSTD) 
     1093  case BLOSC_ZSTD: 
     1094    compformat = BLOSC_ZSTD_FORMAT; 
     1095    context->dest[1] = BLOSC_ZSTD_VERSION_FORMAT;      /* zstd format version */ 
     1096    break; 
     1097#endif /* HAVE_ZSTD */ 
     1098 
     1099  default: 
     1100  { 
     1101    char *compname; 
     1102    compname = clibcode_to_clibname(compformat); 
     1103    fprintf(stderr, "Blosc has not been compiled with '%s' ", compname); 
     1104    fprintf(stderr, "compression support.  Please use one having it."); 
     1105    return -5;    /* signals no compression support */ 
     1106    break; 
     1107  } 
     1108  } 
     1109 
     1110  context->header_flags = context->dest+2;  /* flags */ 
     1111  context->dest[2] = 0;  /* zeroes flags */ 
     1112  context->dest[3] = (uint8_t)context->typesize;  /* type size */ 
     1113  _sw32(context->dest + 4, context->sourcesize);  /* size of the buffer */ 
     1114  _sw32(context->dest + 8, context->blocksize);  /* block size */ 
     1115  context->bstarts = context->dest + 16;  /* starts for every block */ 
     1116  context->num_output_bytes = 16 + sizeof(int32_t)*context->nblocks;  /* space for header and pointers */ 
     1117 
     1118  if (context->clevel == 0) { 
    6881119    /* Compression level 0 means buffer to be memcpy'ed */ 
    689     *flags |= BLOSC_MEMCPYED; 
    690   } 
    691  
    692   if (nbytes_ < MIN_BUFFERSIZE) { 
     1120    *(context->header_flags) |= BLOSC_MEMCPYED; 
     1121  } 
     1122 
     1123  if (context->sourcesize < MIN_BUFFERSIZE) { 
    6931124    /* Buffer is too small.  Try memcpy'ing. */ 
    694     *flags |= BLOSC_MEMCPYED; 
    695   } 
    696  
    697   if (doshuffle == 1) { 
    698     /* Shuffle is active */ 
    699     *flags |= BLOSC_DOSHUFFLE;              /* bit 0 set to one in flags */ 
    700   } 
    701  
    702   /* Take global lock for the time of compression */ 
    703   pthread_mutex_lock(&global_comp_mutex); 
    704   /* Populate parameters for compression routines */ 
    705   params.compress = 1; 
    706   params.clevel = clevel; 
    707   params.flags = (int32_t)*flags; 
    708   params.typesize = (int32_t)typesize; 
    709   params.blocksize = blocksize; 
    710   params.ntbytes = ntbytes; 
    711   params.nbytes = nbytes_; 
    712   params.maxbytes = maxbytes; 
    713   params.nblocks = nblocks; 
    714   params.leftover = leftover; 
    715   params.bstarts = bstarts; 
    716   params.src = (uint8_t *)src; 
    717   params.dest = (uint8_t *)dest; 
    718  
    719   if (!(*flags & BLOSC_MEMCPYED)) { 
     1125    *(context->header_flags) |= BLOSC_MEMCPYED; 
     1126  } 
     1127 
     1128  if (doshuffle == BLOSC_SHUFFLE) { 
     1129    /* Byte-shuffle is active */ 
     1130    *(context->header_flags) |= BLOSC_DOSHUFFLE;     /* bit 0 set to one in flags */ 
     1131  } 
     1132 
     1133  if (doshuffle == BLOSC_BITSHUFFLE) { 
     1134    /* Bit-shuffle is active */ 
     1135    *(context->header_flags) |= BLOSC_DOBITSHUFFLE;  /* bit 2 set to one in flags */ 
     1136  } 
     1137 
     1138  *(context->header_flags) |= compformat << 5;      /* compressor format start at bit 5 */ 
     1139 
     1140  return 1; 
     1141} 
     1142 
     1143int blosc_compress_context(struct blosc_context* context) 
     1144{ 
     1145  int32_t ntbytes = 0; 
     1146 
     1147  if (!(*(context->header_flags) & BLOSC_MEMCPYED)) { 
    7201148    /* Do the actual compression */ 
    721     ntbytes = do_job(); 
     1149    ntbytes = do_job(context); 
    7221150    if (ntbytes < 0) { 
    7231151      return -1; 
    7241152    } 
    725     if ((ntbytes == 0) && (nbytes_+BLOSC_MAX_OVERHEAD <= maxbytes)) { 
     1153    if ((ntbytes == 0) && (context->sourcesize+BLOSC_MAX_OVERHEAD <= context->destsize)) { 
    7261154      /* Last chance for fitting `src` buffer in `dest`.  Update flags 
    7271155       and do a memcpy later on. */ 
    728       *flags |= BLOSC_MEMCPYED; 
    729       params.flags |= BLOSC_MEMCPYED; 
    730     } 
    731   } 
    732  
    733   if (*flags & BLOSC_MEMCPYED) { 
    734     if (nbytes_+BLOSC_MAX_OVERHEAD > maxbytes) { 
     1156      *(context->header_flags) |= BLOSC_MEMCPYED; 
     1157    } 
     1158  } 
     1159 
     1160  if (*(context->header_flags) & BLOSC_MEMCPYED) { 
     1161    if (context->sourcesize + BLOSC_MAX_OVERHEAD > context->destsize) { 
    7351162      /* We are exceeding maximum output size */ 
    7361163      ntbytes = 0; 
    7371164    } 
    738     else if (((nbytes_ % L1) == 0) || (nthreads > 1)) { 
    739       /* More effective with large buffers that are multiples of the 
    740        cache size or multi-cores */ 
    741       params.ntbytes = BLOSC_MAX_OVERHEAD; 
    742       ntbytes = do_job(); 
    743       if (ntbytes < 0) { 
    744         return -1; 
    745       } 
    746     } 
    7471165    else { 
    748       memcpy((uint8_t *)dest+BLOSC_MAX_OVERHEAD, src, nbytes_); 
    749       ntbytes = nbytes_ + BLOSC_MAX_OVERHEAD; 
     1166      memcpy(context->dest+BLOSC_MAX_OVERHEAD, context->src, 
     1167             context->sourcesize); 
     1168      ntbytes = context->sourcesize + BLOSC_MAX_OVERHEAD; 
    7501169    } 
    7511170  } 
    7521171 
    7531172  /* Set the number of compressed bytes in header */ 
    754   *ntbytes_ = sw32(ntbytes); 
    755  
    756   /* Release global lock */ 
     1173  _sw32(context->dest + 12, ntbytes); 
     1174 
     1175  assert(ntbytes <= context->destsize); 
     1176  return ntbytes; 
     1177} 
     1178 
     1179/* The public routine for compression with context. */ 
     1180int blosc_compress_ctx(int clevel, int doshuffle, size_t typesize, 
     1181                       size_t nbytes, const void* src, void* dest, 
     1182                       size_t destsize, const char* compressor, 
     1183                       size_t blocksize, int numinternalthreads) 
     1184{ 
     1185  int error, result; 
     1186  struct blosc_context context; 
     1187 
     1188  context.threads_started = 0; 
     1189  error = initialize_context_compression(&context, clevel, doshuffle, typesize, 
     1190                                         nbytes, src, dest, destsize, 
     1191                                         blosc_compname_to_compcode(compressor), 
     1192                                         blocksize, numinternalthreads); 
     1193  if (error < 0) { return error; } 
     1194 
     1195  error = write_compression_header(&context, clevel, doshuffle); 
     1196  if (error < 0) { return error; } 
     1197 
     1198  result = blosc_compress_context(&context); 
     1199 
     1200  if (numinternalthreads > 1) 
     1201  { 
     1202    blosc_release_threadpool(&context); 
     1203  } 
     1204 
     1205  return result; 
     1206} 
     1207 
     1208/* The public routine for compression.  See blosc.h for docstrings. */ 
     1209int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 
     1210                   const void *src, void *dest, size_t destsize) 
     1211{ 
     1212  int error; 
     1213  int result; 
     1214  char* envvar; 
     1215 
     1216  /* Check if should initialize */ 
     1217  if (!g_initlib) blosc_init(); 
     1218 
     1219  /* Check for a BLOSC_CLEVEL environment variable */ 
     1220  envvar = getenv("BLOSC_CLEVEL"); 
     1221  if (envvar != NULL) { 
     1222    long value; 
     1223    value = strtol(envvar, NULL, 10); 
     1224    if ((value != EINVAL) && (value >= 0)) { 
     1225      clevel = (int)value; 
     1226    } 
     1227  } 
     1228 
     1229  /* Check for a BLOSC_SHUFFLE environment variable */ 
     1230  envvar = getenv("BLOSC_SHUFFLE"); 
     1231  if (envvar != NULL) { 
     1232    if (strcmp(envvar, "NOSHUFFLE") == 0) { 
     1233      doshuffle = BLOSC_NOSHUFFLE; 
     1234    } 
     1235    if (strcmp(envvar, "SHUFFLE") == 0) { 
     1236      doshuffle = BLOSC_SHUFFLE; 
     1237    } 
     1238    if (strcmp(envvar, "BITSHUFFLE") == 0) { 
     1239      doshuffle = BLOSC_BITSHUFFLE; 
     1240    } 
     1241  } 
     1242 
     1243  /* Check for a BLOSC_TYPESIZE environment variable */ 
     1244  envvar = getenv("BLOSC_TYPESIZE"); 
     1245  if (envvar != NULL) { 
     1246    long value; 
     1247    value = strtol(envvar, NULL, 10); 
     1248    if ((value != EINVAL) && (value > 0)) { 
     1249      typesize = (int)value; 
     1250    } 
     1251  } 
     1252 
     1253  /* Check for a BLOSC_COMPRESSOR environment variable */ 
     1254  envvar = getenv("BLOSC_COMPRESSOR"); 
     1255  if (envvar != NULL) { 
     1256    result = blosc_set_compressor(envvar); 
     1257    if (result < 0) { return result; } 
     1258  } 
     1259 
     1260  /* Check for a BLOSC_COMPRESSOR environment variable */ 
     1261  envvar = getenv("BLOSC_BLOCKSIZE"); 
     1262  if (envvar != NULL) { 
     1263    long blocksize; 
     1264    blocksize = strtol(envvar, NULL, 10); 
     1265    if ((blocksize != EINVAL) && (blocksize > 0)) { 
     1266      blosc_set_blocksize((size_t)blocksize); 
     1267    } 
     1268  } 
     1269 
     1270  /* Check for a BLOSC_NTHREADS environment variable */ 
     1271  envvar = getenv("BLOSC_NTHREADS"); 
     1272  if (envvar != NULL) { 
     1273    long nthreads; 
     1274    nthreads = strtol(envvar, NULL, 10); 
     1275    if ((nthreads != EINVAL) && (nthreads > 0)) { 
     1276      result = blosc_set_nthreads((int)nthreads); 
     1277      if (result < 0) { return result; } 
     1278    } 
     1279  } 
     1280 
     1281  /* Check for a BLOSC_NOLOCK environment variable.  It is important 
     1282     that this should be the last env var so that it can take the 
     1283     previous ones into account */ 
     1284  envvar = getenv("BLOSC_NOLOCK"); 
     1285  if (envvar != NULL) { 
     1286    char *compname; 
     1287    blosc_compcode_to_compname(g_compressor, &compname); 
     1288    result = blosc_compress_ctx(clevel, doshuffle, typesize, 
     1289                                nbytes, src, dest, destsize, 
     1290                                compname, g_force_blocksize, g_threads); 
     1291    return result; 
     1292  } 
     1293 
     1294  pthread_mutex_lock(&global_comp_mutex); 
     1295 
     1296  error = initialize_context_compression(g_global_context, clevel, doshuffle, 
     1297                                         typesize, nbytes, src, dest, destsize, 
     1298                                         g_compressor, g_force_blocksize, 
     1299                                         g_threads); 
     1300  if (error < 0) { return error; } 
     1301 
     1302  error = write_compression_header(g_global_context, clevel, doshuffle); 
     1303  if (error < 0) { return error; } 
     1304 
     1305  result = blosc_compress_context(g_global_context); 
     1306 
    7571307  pthread_mutex_unlock(&global_comp_mutex); 
    758    
    759   assert((int32_t)ntbytes <= (int32_t)maxbytes); 
    760   return ntbytes; 
    761 } 
    762  
    763  
    764 /* The public routine for decompression.  See blosc.h for docstrings. */ 
    765 int blosc_decompress(const void *src, void *dest, size_t destsize) 
    766 { 
    767   uint8_t *_src=NULL;            /* current pos for source buffer */ 
    768   uint8_t version, versionlz;    /* versions for compressed header */ 
    769   uint8_t flags;                 /* flags for header */ 
    770   int32_t ntbytes;               /* the number of uncompressed bytes */ 
    771   int32_t nblocks;              /* number of total blocks in buffer */ 
    772   int32_t leftover;             /* extra bytes at end of buffer */ 
    773   int32_t *bstarts;             /* start pointers for each block */ 
    774   int32_t typesize, blocksize, nbytes, ctbytes; 
    775  
    776   _src = (uint8_t *)(src); 
     1308 
     1309  return result; 
     1310} 
     1311 
     1312int blosc_run_decompression_with_context(struct blosc_context* context, 
     1313                                         const void* src, 
     1314                                         void* dest, 
     1315                                         size_t destsize, 
     1316                                         int numinternalthreads) 
     1317{ 
     1318  uint8_t version; 
     1319  uint8_t versionlz; 
     1320  uint32_t ctbytes; 
     1321  int32_t ntbytes; 
     1322 
     1323  context->compress = 0; 
     1324  context->src = (const uint8_t*)src; 
     1325  context->dest = (uint8_t*)dest; 
     1326  context->destsize = destsize; 
     1327  context->num_output_bytes = 0; 
     1328  context->numthreads = numinternalthreads; 
     1329  context->end_threads = 0; 
    7771330 
    7781331  /* Read the header block */ 
    779   version = _src[0];                         /* blosc format version */ 
    780   versionlz = _src[1];                       /* blosclz format version */ 
    781   flags = _src[2];                           /* flags */ 
    782   typesize = (int32_t)_src[3];              /* typesize */ 
    783   _src += 4; 
    784   nbytes = sw32(((int32_t *)_src)[0]);      /* buffer size */ 
    785   blocksize = sw32(((int32_t *)_src)[1]);   /* block size */ 
    786   ctbytes = sw32(((int32_t *)_src)[2]);     /* compressed buffer size */ 
    787  
     1332  version = context->src[0];                        /* blosc format version */ 
     1333  versionlz = context->src[1];                      /* blosclz format version */ 
     1334 
     1335  context->header_flags = (uint8_t*)(context->src + 2);           /* flags */ 
     1336  context->typesize = (int32_t)context->src[3];      /* typesize */ 
     1337  context->sourcesize = sw32_(context->src + 4);     /* buffer size */ 
     1338  context->blocksize = sw32_(context->src + 8);      /* block size */ 
     1339  ctbytes = sw32_(context->src + 12);               /* compressed buffer size */ 
     1340 
     1341  /* Unused values */ 
    7881342  version += 0;                             /* shut up compiler warning */ 
    7891343  versionlz += 0;                           /* shut up compiler warning */ 
    7901344  ctbytes += 0;                             /* shut up compiler warning */ 
    7911345 
    792   _src += sizeof(int32_t)*3; 
    793   bstarts = (int32_t *)_src; 
     1346  context->bstarts = (uint8_t*)(context->src + 16); 
     1347  /* Compute some params */ 
     1348  /* Total blocks */ 
     1349  context->nblocks = context->sourcesize / context->blocksize; 
     1350  context->leftover = context->sourcesize % context->blocksize; 
     1351  context->nblocks = (context->leftover>0)? context->nblocks+1: context->nblocks; 
     1352 
     1353  /* Check that we have enough space to decompress */ 
     1354  if (context->sourcesize > (int32_t)destsize) { 
     1355    return -1; 
     1356  } 
     1357 
     1358  /* Check whether this buffer is memcpy'ed */ 
     1359  if (*(context->header_flags) & BLOSC_MEMCPYED) { 
     1360      memcpy(dest, (uint8_t *)src+BLOSC_MAX_OVERHEAD, context->sourcesize); 
     1361      ntbytes = context->sourcesize; 
     1362  } 
     1363  else { 
     1364    /* Do the actual decompression */ 
     1365    ntbytes = do_job(context); 
     1366    if (ntbytes < 0) { 
     1367      return -1; 
     1368    } 
     1369  } 
     1370 
     1371  assert(ntbytes <= (int32_t)destsize); 
     1372  return ntbytes; 
     1373} 
     1374 
     1375/* The public routine for decompression with context. */ 
     1376int blosc_decompress_ctx(const void *src, void *dest, size_t destsize, 
     1377                         int numinternalthreads) 
     1378{ 
     1379  int result; 
     1380  struct blosc_context context; 
     1381 
     1382  context.threads_started = 0; 
     1383  result = blosc_run_decompression_with_context(&context, src, dest, destsize, numinternalthreads); 
     1384 
     1385  if (numinternalthreads > 1) 
     1386  { 
     1387    blosc_release_threadpool(&context); 
     1388  } 
     1389 
     1390  return result; 
     1391} 
     1392 
     1393 
     1394/* The public routine for decompression.  See blosc.h for docstrings. */ 
     1395int blosc_decompress(const void *src, void *dest, size_t destsize) 
     1396{ 
     1397  int result; 
     1398  char* envvar; 
     1399  long nthreads; 
     1400 
     1401  /* Check if should initialize */ 
     1402  if (!g_initlib) blosc_init(); 
     1403 
     1404  /* Check for a BLOSC_NTHREADS environment variable */ 
     1405  envvar = getenv("BLOSC_NTHREADS"); 
     1406  if (envvar != NULL) { 
     1407    nthreads = strtol(envvar, NULL, 10); 
     1408    if ((nthreads != EINVAL) && (nthreads > 0)) { 
     1409      result = blosc_set_nthreads((int)nthreads); 
     1410      if (result < 0) { return result; } 
     1411    } 
     1412  } 
     1413 
     1414  /* Check for a BLOSC_NOLOCK environment variable.  It is important 
     1415     that this should be the last env var so that it can take the 
     1416     previous ones into account */ 
     1417  envvar = getenv("BLOSC_NOLOCK"); 
     1418  if (envvar != NULL) { 
     1419    result = blosc_decompress_ctx(src, dest, destsize, g_threads); 
     1420    return result; 
     1421  } 
     1422 
     1423  pthread_mutex_lock(&global_comp_mutex); 
     1424 
     1425  result = blosc_run_decompression_with_context(g_global_context, src, dest, 
     1426                                                destsize, g_threads); 
     1427 
     1428  pthread_mutex_unlock(&global_comp_mutex); 
     1429 
     1430  return result; 
     1431} 
     1432 
     1433 
     1434/* Specific routine optimized for decompression a small number of 
     1435   items out of a compressed chunk.  This does not use threads because 
     1436   it would affect negatively to performance. */ 
     1437int blosc_getitem(const void *src, int start, int nitems, void *dest) 
     1438{ 
     1439  uint8_t *_src=NULL;               /* current pos for source buffer */ 
     1440  uint8_t version, versionlz;       /* versions for compressed header */ 
     1441  uint8_t flags;                    /* flags for header */ 
     1442  int32_t ntbytes = 0;              /* the number of uncompressed bytes */ 
     1443  int32_t nblocks;                  /* number of total blocks in buffer */ 
     1444  int32_t leftover;                 /* extra bytes at end of buffer */ 
     1445  uint8_t *bstarts;                 /* start pointers for each block */ 
     1446  int tmp_init = 0; 
     1447  int32_t typesize, blocksize, nbytes, ctbytes; 
     1448  int32_t j, bsize, bsize2, leftoverblock; 
     1449  int32_t cbytes, startb, stopb; 
     1450  int stop = start + nitems; 
     1451  uint8_t *tmp; 
     1452  uint8_t *tmp2; 
     1453  uint8_t *tmp3; 
     1454  int32_t ebsize; 
     1455 
     1456  _src = (uint8_t *)(src); 
     1457 
     1458  /* Read the header block */ 
     1459  version = _src[0];                        /* blosc format version */ 
     1460  versionlz = _src[1];                      /* blosclz format version */ 
     1461  flags = _src[2];                          /* flags */ 
     1462  typesize = (int32_t)_src[3];              /* typesize */ 
     1463  nbytes = sw32_(_src + 4);                 /* buffer size */ 
     1464  blocksize = sw32_(_src + 8);              /* block size */ 
     1465  ctbytes = sw32_(_src + 12);               /* compressed buffer size */ 
     1466 
     1467  ebsize = blocksize + typesize * (int32_t)sizeof(int32_t); 
     1468  tmp = my_malloc(blocksize + ebsize + blocksize); 
     1469  tmp2 = tmp + blocksize; 
     1470  tmp3 = tmp + blocksize + ebsize; 
     1471 
     1472  version += 0;                             /* shut up compiler warning */ 
     1473  versionlz += 0;                           /* shut up compiler warning */ 
     1474  ctbytes += 0;                             /* shut up compiler warning */ 
     1475 
     1476  _src += 16; 
     1477  bstarts = _src; 
    7941478  /* Compute some params */ 
    7951479  /* Total blocks */ 
     
    7991483  _src += sizeof(int32_t)*nblocks; 
    8001484 
    801   /* Check that we have enough space to decompress */ 
    802   if (nbytes > (int32_t)destsize) { 
    803     return -1; 
    804   } 
    805  
    806   /* Take global lock for the time of decompression */ 
    807   pthread_mutex_lock(&global_comp_mutex); 
    808    
    809   /* Populate parameters for decompression routines */ 
    810   params.compress = 0; 
    811   params.clevel = 0;            /* specific for compression */ 
    812   params.flags = (int32_t)flags; 
    813   params.typesize = typesize; 
    814   params.blocksize = blocksize; 
    815   params.ntbytes = 0; 
    816   params.nbytes = nbytes; 
    817   params.nblocks = nblocks; 
    818   params.leftover = leftover; 
    819   params.bstarts = bstarts; 
    820   params.src = (uint8_t *)src; 
    821   params.dest = (uint8_t *)dest; 
    822  
    823   /* Check whether this buffer is memcpy'ed */ 
    824   if (flags & BLOSC_MEMCPYED) { 
    825     if (((nbytes % L1) == 0) || (nthreads > 1)) { 
    826       /* More effective with large buffers that are multiples of the 
    827        cache size or multi-cores */ 
    828       ntbytes = do_job(); 
    829       if (ntbytes < 0) { 
    830         return -1; 
    831       } 
    832     } 
    833     else { 
    834       memcpy(dest, (uint8_t *)src+BLOSC_MAX_OVERHEAD, nbytes); 
    835       ntbytes = nbytes; 
    836     } 
    837   } 
    838   else { 
    839     /* Do the actual decompression */ 
    840     ntbytes = do_job(); 
    841     if (ntbytes < 0) { 
    842       return -1; 
    843     } 
    844   } 
    845   /* Release global lock */ 
    846   pthread_mutex_unlock(&global_comp_mutex); 
    847    
    848   assert(ntbytes <= (int32_t)destsize); 
    849   return ntbytes; 
    850 } 
    851  
    852  
    853 /* Specific routine optimized for decompression a small number of 
    854    items out of a compressed chunk.  This does not use threads because 
    855    it would affect negatively to performance. */ 
    856 int blosc_getitem(const void *src, int start, int nitems, void *dest) 
    857 { 
    858   uint8_t *_src=NULL;               /* current pos for source buffer */ 
    859   uint8_t version, versionlz;       /* versions for compressed header */ 
    860   uint8_t flags;                    /* flags for header */ 
    861   int32_t ntbytes = 0;              /* the number of uncompressed bytes */ 
    862   int32_t nblocks;                 /* number of total blocks in buffer */ 
    863   int32_t leftover;                /* extra bytes at end of buffer */ 
    864   int32_t *bstarts;                /* start pointers for each block */ 
    865   uint8_t *tmp = params.tmp[0];     /* tmp for thread 0 */ 
    866   uint8_t *tmp2 = params.tmp2[0];   /* tmp2 for thread 0 */ 
    867   int tmp_init = 0; 
    868   int32_t typesize, blocksize, nbytes, ctbytes; 
    869   int32_t j, bsize, bsize2, leftoverblock; 
    870   int32_t cbytes, startb, stopb; 
    871   int stop = start + nitems; 
    872  
    873   _src = (uint8_t *)(src); 
    874  
    875   /* Take global lock  */ 
    876   pthread_mutex_lock(&global_comp_mutex); 
    877    
    878   /* Read the header block */ 
    879   version = _src[0];                         /* blosc format version */ 
    880   versionlz = _src[1];                       /* blosclz format version */ 
    881   flags = _src[2];                           /* flags */ 
    882   typesize = (int32_t)_src[3];              /* typesize */ 
    883   _src += 4; 
    884   nbytes = sw32(((int32_t *)_src)[0]);      /* buffer size */ 
    885   blocksize = sw32(((int32_t *)_src)[1]);   /* block size */ 
    886   ctbytes = sw32(((int32_t *)_src)[2]);     /* compressed buffer size */ 
    887  
    888   version += 0;                             /* shut up compiler warning */ 
    889   versionlz += 0;                           /* shut up compiler warning */ 
    890   ctbytes += 0;                             /* shut up compiler warning */ 
    891  
    892   _src += sizeof(int32_t)*3; 
    893   bstarts = (int32_t *)_src; 
    894   /* Compute some params */ 
    895   /* Total blocks */ 
    896   nblocks = nbytes / blocksize; 
    897   leftover = nbytes % blocksize; 
    898   nblocks = (leftover>0)? nblocks+1: nblocks; 
    899   _src += sizeof(int32_t)*nblocks; 
    900  
    9011485  /* Check region boundaries */ 
    9021486  if ((start < 0) || (start*typesize > nbytes)) { 
    9031487    fprintf(stderr, "`start` out of bounds"); 
    904     return (-1); 
     1488    return -1; 
    9051489  } 
    9061490 
    9071491  if ((stop < 0) || (stop*typesize > nbytes)) { 
    9081492    fprintf(stderr, "`start`+`nitems` out of bounds"); 
    909     return (-1); 
    910   } 
    911  
    912   /* Parameters needed by blosc_d */ 
    913   params.typesize = typesize; 
    914   params.flags = flags; 
    915  
    916   /* Initialize temporaries if needed */ 
    917   if (tmp == NULL || tmp2 == NULL || current_temp.blocksize < blocksize) { 
    918     tmp = my_malloc(blocksize); 
    919     if (tmp == NULL) { 
    920       return -1; 
    921     } 
    922     tmp2 = my_malloc(blocksize); 
    923     if (tmp2 == NULL) { 
    924       return -1; 
    925     } 
    926     tmp_init = 1; 
     1493    return -1; 
    9271494  } 
    9281495 
     
    9581525    } 
    9591526    else { 
     1527      struct blosc_context context; 
     1528      /* blosc_d only uses typesize and flags */ 
     1529      context.typesize = typesize; 
     1530      context.header_flags = &flags; 
     1531 
    9601532      /* Regular decompression.  Put results in tmp2. */ 
    961       cbytes = blosc_d(bsize, leftoverblock, 
    962                        (uint8_t *)src+sw32(bstarts[j]), tmp2, tmp, tmp2); 
     1533      cbytes = blosc_d(&context, bsize, leftoverblock, 
     1534                       (uint8_t *)src + sw32_(bstarts + j * 4), 
     1535                       tmp2, tmp, tmp3); 
    9631536      if (cbytes < 0) { 
    9641537        ntbytes = cbytes; 
     
    9711544    ntbytes += cbytes; 
    9721545  } 
    973    
    974   /* Release global lock */ 
    975   pthread_mutex_unlock(&global_comp_mutex); 
    976  
    977   if (tmp_init) { 
    978     my_free(tmp); 
    979     my_free(tmp2); 
    980   } 
     1546 
     1547  my_free(tmp); 
    9811548 
    9821549  return ntbytes; 
     
    9851552 
    9861553/* Decompress & unshuffle several blocks in a single thread */ 
    987 static int t_blosc(void *tids) 
    988 { 
    989   int32_t tid = *(int32_t *)tids; 
     1554static void *t_blosc(void *ctxt) 
     1555{ 
     1556  struct thread_context* context = (struct thread_context*)ctxt; 
    9901557  int32_t cbytes, ntdest; 
    9911558  int32_t tblocks;              /* number of blocks per thread */ 
     
    10031570  int32_t nblocks; 
    10041571  int32_t leftover; 
    1005   int32_t *bstarts; 
    1006   uint8_t *src; 
     1572  uint8_t *bstarts; 
     1573  const uint8_t *src; 
    10071574  uint8_t *dest; 
    10081575  uint8_t *tmp; 
    10091576  uint8_t *tmp2; 
    1010  
    1011   while (1) { 
    1012  
    1013     init_sentinels_done = 0;     /* sentinels have to be initialised yet */ 
    1014  
     1577  uint8_t *tmp3; 
     1578  int rc; 
     1579 
     1580  while(1) 
     1581  { 
    10151582    /* Synchronization point for all threads (wait for initialization) */ 
    1016     WAIT_INIT; 
    1017  
    1018     /* Check if thread has been asked to return */ 
    1019     if (end_threads) { 
    1020       return(0); 
    1021     } 
    1022  
    1023     pthread_mutex_lock(&count_mutex); 
    1024     if (!init_sentinels_done) { 
    1025       /* Set sentinels and other global variables */ 
    1026       giveup_code = 1;            /* no error code initially */ 
    1027       nblock = -1;                /* block counter */ 
    1028       init_sentinels_done = 1;    /* sentinels have been initialised */ 
    1029     } 
    1030     pthread_mutex_unlock(&count_mutex); 
     1583    WAIT_INIT(NULL, context->parent_context); 
     1584 
     1585    if(context->parent_context->end_threads) 
     1586    { 
     1587      break; 
     1588    } 
    10311589 
    10321590    /* Get parameters for this thread before entering the main loop */ 
    1033     blocksize = params.blocksize; 
    1034     ebsize = blocksize + params.typesize*(int32_t)sizeof(int32_t); 
    1035     compress = params.compress; 
    1036     flags = params.flags; 
    1037     maxbytes = params.maxbytes; 
    1038     nblocks = params.nblocks; 
    1039     leftover = params.leftover; 
    1040     bstarts = params.bstarts; 
    1041     src = params.src; 
    1042     dest = params.dest; 
    1043     tmp = params.tmp[tid]; 
    1044     tmp2 = params.tmp2[tid]; 
     1591    blocksize = context->parent_context->blocksize; 
     1592    ebsize = blocksize + context->parent_context->typesize * (int32_t)sizeof(int32_t); 
     1593    compress = context->parent_context->compress; 
     1594    flags = *(context->parent_context->header_flags); 
     1595    maxbytes = context->parent_context->destsize; 
     1596    nblocks = context->parent_context->nblocks; 
     1597    leftover = context->parent_context->leftover; 
     1598    bstarts = context->parent_context->bstarts; 
     1599    src = context->parent_context->src; 
     1600    dest = context->parent_context->dest; 
     1601 
     1602    if (blocksize > context->tmpblocksize) 
     1603    { 
     1604      my_free(context->tmp); 
     1605      context->tmp = my_malloc(blocksize + ebsize + blocksize); 
     1606      context->tmp2 = context->tmp + blocksize; 
     1607      context->tmp3 = context->tmp + blocksize + ebsize; 
     1608    } 
     1609 
     1610    tmp = context->tmp; 
     1611    tmp2 = context->tmp2; 
     1612    tmp3 = context->tmp3; 
    10451613 
    10461614    ntbytes = 0;                /* only useful for decompression */ 
     
    10481616    if (compress && !(flags & BLOSC_MEMCPYED)) { 
    10491617      /* Compression always has to follow the block order */ 
    1050       pthread_mutex_lock(&count_mutex); 
    1051       nblock++; 
    1052       nblock_ = nblock; 
    1053       pthread_mutex_unlock(&count_mutex); 
     1618      pthread_mutex_lock(&context->parent_context->count_mutex); 
     1619      context->parent_context->thread_nblock++; 
     1620      nblock_ = context->parent_context->thread_nblock; 
     1621      pthread_mutex_unlock(&context->parent_context->count_mutex); 
    10541622      tblock = nblocks; 
    10551623    } 
     
    10591627 
    10601628      /* Blocks per thread */ 
    1061       tblocks = nblocks / nthreads; 
    1062       leftover2 = nblocks % nthreads; 
     1629      tblocks = nblocks / context->parent_context->numthreads; 
     1630      leftover2 = nblocks % context->parent_context->numthreads; 
    10631631      tblocks = (leftover2>0)? tblocks+1: tblocks; 
    10641632 
    1065       nblock_ = tid*tblocks; 
     1633      nblock_ = context->tid*tblocks; 
    10661634      tblock = nblock_ + tblocks; 
    10671635      if (tblock > nblocks) { 
     
    10721640    /* Loop over blocks */ 
    10731641    leftoverblock = 0; 
    1074     while ((nblock_ < tblock) && giveup_code > 0) { 
     1642    while ((nblock_ < tblock) && context->parent_context->thread_giveup_code > 0) { 
    10751643      bsize = blocksize; 
    10761644      if (nblock_ == (nblocks - 1) && (leftover > 0)) { 
     
    10871655        else { 
    10881656          /* Regular compression */ 
    1089           cbytes = blosc_c(bsize, leftoverblock, 0, ebsize, 
    1090                            src+nblock_*blocksize, tmp2, tmp); 
     1657          cbytes = blosc_c(context->parent_context, bsize, leftoverblock, 0, ebsize, 
     1658                           src+nblock_*blocksize, tmp2, tmp, tmp3); 
    10911659        } 
    10921660      } 
     
    10991667        } 
    11001668        else { 
    1101           cbytes = blosc_d(bsize, leftoverblock, 
    1102                            src+sw32(bstarts[nblock_]), dest+nblock_*blocksize, 
     1669          cbytes = blosc_d(context->parent_context, bsize, leftoverblock, 
     1670                           src + sw32_(bstarts + nblock_ * 4), 
     1671                           dest+nblock_*blocksize, 
    11031672                           tmp, tmp2); 
    11041673        } 
     
    11061675 
    11071676      /* Check whether current thread has to giveup */ 
    1108       if (giveup_code <= 0) { 
     1677      if (context->parent_context->thread_giveup_code <= 0) { 
    11091678        break; 
    11101679      } 
     
    11131682      if (cbytes < 0) {            /* compr/decompr failure */ 
    11141683        /* Set giveup_code error */ 
    1115         pthread_mutex_lock(&count_mutex); 
    1116         giveup_code = cbytes; 
    1117         pthread_mutex_unlock(&count_mutex); 
     1684        pthread_mutex_lock(&context->parent_context->count_mutex); 
     1685        context->parent_context->thread_giveup_code = cbytes; 
     1686        pthread_mutex_unlock(&context->parent_context->count_mutex); 
    11181687        break; 
    11191688      } 
     
    11211690      if (compress && !(flags & BLOSC_MEMCPYED)) { 
    11221691        /* Start critical section */ 
    1123         pthread_mutex_lock(&count_mutex); 
    1124         ntdest = params.ntbytes; 
    1125         bstarts[nblock_] = sw32(ntdest);    /* update block start counter */ 
    1126         if ( (cbytes == 0) || (ntdest+cbytes > (int32_t)maxbytes) ) { 
    1127           giveup_code = 0;                  /* uncompressible buffer */ 
    1128           pthread_mutex_unlock(&count_mutex); 
     1692        pthread_mutex_lock(&context->parent_context->count_mutex); 
     1693        ntdest = context->parent_context->num_output_bytes; 
     1694        _sw32(bstarts + nblock_ * 4, ntdest); /* update block start counter */ 
     1695        if ( (cbytes == 0) || (ntdest+cbytes > maxbytes) ) { 
     1696          context->parent_context->thread_giveup_code = 0;  /* uncompressible buffer */ 
     1697          pthread_mutex_unlock(&context->parent_context->count_mutex); 
    11291698          break; 
    11301699        } 
    1131         nblock++; 
    1132         nblock_ = nblock; 
    1133         params.ntbytes += cbytes;           /* update return bytes counter */ 
    1134         pthread_mutex_unlock(&count_mutex); 
     1700        context->parent_context->thread_nblock++; 
     1701        nblock_ = context->parent_context->thread_nblock; 
     1702        context->parent_context->num_output_bytes += cbytes;           /* update return bytes counter */ 
     1703        pthread_mutex_unlock(&context->parent_context->count_mutex); 
    11351704        /* End of critical section */ 
    11361705 
     
    11471716 
    11481717    /* Sum up all the bytes decompressed */ 
    1149     if ((!compress || (flags & BLOSC_MEMCPYED)) && giveup_code > 0) { 
     1718    if ((!compress || (flags & BLOSC_MEMCPYED)) && context->parent_context->thread_giveup_code > 0) { 
    11501719      /* Update global counter for all threads (decompression only) */ 
    1151       pthread_mutex_lock(&count_mutex); 
    1152       params.ntbytes += ntbytes; 
    1153       pthread_mutex_unlock(&count_mutex); 
     1720      pthread_mutex_lock(&context->parent_context->count_mutex); 
     1721      context->parent_context->num_output_bytes += ntbytes; 
     1722      pthread_mutex_unlock(&context->parent_context->count_mutex); 
    11541723    } 
    11551724 
    11561725    /* Meeting point for all threads (wait for finalization) */ 
    1157     WAIT_FINISH; 
    1158  
    1159   }  /* closes while(1) */ 
    1160  
    1161   /* This should never be reached, but anyway */ 
    1162   return(0); 
    1163 } 
    1164  
    1165  
    1166 static int init_threads(void) 
     1726    WAIT_FINISH(NULL, context->parent_context); 
     1727  } 
     1728 
     1729  /* Cleanup our working space and context */ 
     1730  my_free(context->tmp); 
     1731  my_free(context); 
     1732 
     1733  return(NULL); 
     1734} 
     1735 
     1736 
     1737static int init_threads(struct blosc_context* context) 
    11671738{ 
    11681739  int32_t tid; 
    11691740  int rc2; 
     1741  int32_t ebsize; 
     1742  struct thread_context* thread_context; 
    11701743 
    11711744  /* Initialize mutex and condition variable objects */ 
    1172   pthread_mutex_init(&count_mutex, NULL); 
     1745  pthread_mutex_init(&context->count_mutex, NULL); 
     1746 
     1747  /* Set context thread sentinels */ 
     1748  context->thread_giveup_code = 1; 
     1749  context->thread_nblock = -1; 
    11731750 
    11741751  /* Barrier initialization */ 
    11751752#ifdef _POSIX_BARRIERS_MINE 
    1176   pthread_barrier_init(&barr_init, NULL, nthreads+1); 
    1177   pthread_barrier_init(&barr_finish, NULL, nthreads+1); 
     1753  pthread_barrier_init(&context->barr_init, NULL, context->numthreads+1); 
     1754  pthread_barrier_init(&context->barr_finish, NULL, context->numthreads+1); 
    11781755#else 
    1179   pthread_mutex_init(&count_threads_mutex, NULL); 
    1180   pthread_cond_init(&count_threads_cv, NULL); 
    1181   count_threads = 0;      /* Reset threads counter */ 
     1756  pthread_mutex_init(&context->count_threads_mutex, NULL); 
     1757  pthread_cond_init(&context->count_threads_cv, NULL); 
     1758  context->count_threads = 0;      /* Reset threads counter */ 
    11821759#endif 
    11831760 
    11841761#if !defined(_WIN32) 
    11851762  /* Initialize and set thread detached attribute */ 
    1186   pthread_attr_init(&ct_attr); 
    1187   pthread_attr_setdetachstate(&ct_attr, PTHREAD_CREATE_JOINABLE); 
     1763  pthread_attr_init(&context->ct_attr); 
     1764  pthread_attr_setdetachstate(&context->ct_attr, PTHREAD_CREATE_JOINABLE); 
    11881765#endif 
    11891766 
    11901767  /* Finally, create the threads in detached state */ 
    1191   for (tid = 0; tid < nthreads; tid++) { 
    1192     tids[tid] = tid; 
     1768  for (tid = 0; tid < context->numthreads; tid++) { 
     1769    context->tids[tid] = tid; 
     1770 
     1771    /* Create a thread context thread owns context (will destroy when finished) */ 
     1772    thread_context = (struct thread_context*)my_malloc(sizeof(struct thread_context)); 
     1773    thread_context->parent_context = context; 
     1774    thread_context->tid = tid; 
     1775 
     1776    ebsize = context->blocksize + context->typesize * (int32_t)sizeof(int32_t); 
     1777    thread_context->tmp = my_malloc(context->blocksize + ebsize + context->blocksize); 
     1778    thread_context->tmp2 = thread_context->tmp + context->blocksize; 
     1779    thread_context->tmp3 = thread_context->tmp + context->blocksize + ebsize; 
     1780    thread_context->tmpblocksize = context->blocksize; 
     1781 
    11931782#if !defined(_WIN32) 
    1194     rc2 = pthread_create(&threads[tid], &ct_attr, (void*)t_blosc, 
    1195                         (void *)&tids[tid]); 
     1783    rc2 = pthread_create(&context->threads[tid], &context->ct_attr, t_blosc, (void *)thread_context); 
    11961784#else 
    1197     rc2 = pthread_create(&threads[tid], NULL, (void*)t_blosc, 
    1198                         (void *)&tids[tid]); 
     1785    rc2 = pthread_create(&context->threads[tid], NULL, t_blosc, (void *)thread_context); 
    11991786#endif 
    12001787    if (rc2) { 
     
    12051792  } 
    12061793 
    1207   init_threads_done = 1;                 /* Initialization done! */ 
    1208   pid = (int)getpid();                   /* save the PID for this process */ 
    12091794 
    12101795  return(0); 
    12111796} 
    12121797 
    1213 void blosc_init(void) { 
    1214   /* Init global lock  */ 
    1215   pthread_mutex_init(&global_comp_mutex, NULL); 
    1216   init_lib = 1; 
    1217 } 
    1218  
    1219 int blosc_set_nthreads(int nthreads_new)  
    1220 { 
    1221   int ret; 
    1222  
    1223   /* Check if should initialize (implementing previous 1.2.3 behaviour, 
    1224      where calling blosc_set_nthreads was enough) */ 
    1225   if (!init_lib) blosc_init(); 
    1226  
    1227   /* Take global lock  */ 
    1228   pthread_mutex_lock(&global_comp_mutex); 
    1229    
    1230   ret = blosc_set_nthreads_(nthreads_new); 
    1231   /* Release global lock  */ 
    1232   pthread_mutex_unlock(&global_comp_mutex); 
    1233    
     1798int blosc_get_nthreads(void) 
     1799{ 
     1800  int ret = g_threads; 
     1801 
    12341802  return ret; 
    12351803} 
    12361804 
    1237 int blosc_set_nthreads_(int nthreads_new) 
    1238 { 
    1239   int32_t nthreads_old = nthreads; 
    1240   int32_t t; 
    1241   int rc2; 
    1242   void *status; 
    1243  
    1244   if (nthreads_new > BLOSC_MAX_THREADS) { 
     1805int blosc_set_nthreads(int nthreads_new) 
     1806{ 
     1807  int ret = g_threads; 
     1808 
     1809  /* Check if should initialize */ 
     1810  if (!g_initlib) blosc_init(); 
     1811 
     1812  if (nthreads_new != ret){ 
     1813    /* Re-initialize Blosc */ 
     1814    blosc_destroy(); 
     1815    blosc_init(); 
     1816    g_threads = nthreads_new; 
     1817  } 
     1818 
     1819  return ret; 
     1820} 
     1821 
     1822int blosc_set_nthreads_(struct blosc_context* context) 
     1823{ 
     1824  if (context->numthreads > BLOSC_MAX_THREADS) { 
    12451825    fprintf(stderr, 
    12461826            "Error.  nthreads cannot be larger than BLOSC_MAX_THREADS (%d)", 
     
    12481828    return -1; 
    12491829  } 
    1250   else if (nthreads_new <= 0) { 
     1830  else if (context->numthreads <= 0) { 
    12511831    fprintf(stderr, "Error.  nthreads must be a positive integer"); 
    12521832    return -1; 
    12531833  } 
    12541834 
    1255   /* Only join threads if they are not initialized or if our PID is 
    1256      different from that in pid var (probably means that we are a 
    1257      subprocess, and thus threads are non-existent). */ 
    1258   if (nthreads > 1 && init_threads_done && pid == getpid()) { 
    1259       /* Tell all existing threads to finish */ 
    1260       end_threads = 1; 
    1261       /* Synchronization point for all threads (wait for initialization) */ 
    1262       WAIT_INIT; 
    1263       /* Join exiting threads */ 
    1264       for (t=0; t<nthreads; t++) { 
    1265         rc2 = pthread_join(threads[t], &status); 
    1266         if (rc2) { 
    1267           fprintf(stderr, "ERROR; return code from pthread_join() is %d\n", rc2); 
    1268           fprintf(stderr, "\tError detail: %s\n", strerror(rc2)); 
    1269           return(-1); 
    1270         } 
    1271       } 
    1272       init_threads_done = 0; 
    1273       end_threads = 0; 
    1274     } 
    1275  
    1276   /* Launch a new pool of threads (if necessary) */ 
    1277   nthreads = nthreads_new; 
    1278   if (nthreads > 1 && (!init_threads_done || pid != getpid())) { 
    1279     init_threads(); 
    1280   } 
    1281  
    1282   return nthreads_old; 
    1283 } 
    1284  
    1285  
    1286 /* Free possible memory temporaries and thread resources */ 
    1287 int blosc_free_resources(void) 
     1835  /* Launch a new pool of threads */ 
     1836  if (context->numthreads > 1 && context->numthreads != context->threads_started) { 
     1837    blosc_release_threadpool(context); 
     1838    init_threads(context); 
     1839  } 
     1840 
     1841  /* We have now started the threads */ 
     1842  context->threads_started = context->numthreads; 
     1843 
     1844  return context->numthreads; 
     1845} 
     1846 
     1847char* blosc_get_compressor(void) 
     1848{ 
     1849  char* compname; 
     1850  blosc_compcode_to_compname(g_compressor, &compname); 
     1851 
     1852  return compname; 
     1853} 
     1854 
     1855int blosc_set_compressor(const char *compname) 
     1856{ 
     1857  int code = blosc_compname_to_compcode(compname); 
     1858 
     1859  g_compressor = code; 
     1860 
     1861  /* Check if should initialize */ 
     1862  if (!g_initlib) blosc_init(); 
     1863 
     1864  return code; 
     1865} 
     1866 
     1867char* blosc_list_compressors(void) 
     1868{ 
     1869  static int compressors_list_done = 0; 
     1870  static char ret[256]; 
     1871 
     1872  if (compressors_list_done) return ret; 
     1873  ret[0] = '\0'; 
     1874  strcat(ret, BLOSC_BLOSCLZ_COMPNAME); 
     1875#if defined(HAVE_LZ4) 
     1876  strcat(ret, ","); strcat(ret, BLOSC_LZ4_COMPNAME); 
     1877  strcat(ret, ","); strcat(ret, BLOSC_LZ4HC_COMPNAME); 
     1878#endif /* HAVE_LZ4 */ 
     1879#if defined(HAVE_SNAPPY) 
     1880  strcat(ret, ","); strcat(ret, BLOSC_SNAPPY_COMPNAME); 
     1881#endif /* HAVE_SNAPPY */ 
     1882#if defined(HAVE_ZLIB) 
     1883  strcat(ret, ","); strcat(ret, BLOSC_ZLIB_COMPNAME); 
     1884#endif /* HAVE_ZLIB */ 
     1885#if defined(HAVE_ZSTD) 
     1886  strcat(ret, ","); strcat(ret, BLOSC_ZSTD_COMPNAME); 
     1887#endif /* HAVE_ZSTD */ 
     1888  compressors_list_done = 1; 
     1889  return ret; 
     1890} 
     1891 
     1892char* blosc_get_version_string(void) 
     1893{ 
     1894  static char ret[256]; 
     1895  strcpy(ret, BLOSC_VERSION_STRING); 
     1896  return ret; 
     1897} 
     1898 
     1899int blosc_get_complib_info(char *compname, char **complib, char **version) 
     1900{ 
     1901  int clibcode; 
     1902  char *clibname; 
     1903  char *clibversion = "unknown"; 
     1904 
     1905#if (defined(HAVE_LZ4) && defined(LZ4_VERSION_MAJOR)) || (defined(HAVE_SNAPPY) && defined(SNAPPY_VERSION)) || defined(ZSTD_VERSION_MAJOR) 
     1906  char sbuffer[256]; 
     1907#endif 
     1908 
     1909  clibcode = compname_to_clibcode(compname); 
     1910  clibname = clibcode_to_clibname(clibcode); 
     1911 
     1912  /* complib version */ 
     1913  if (clibcode == BLOSC_BLOSCLZ_LIB) { 
     1914    clibversion = BLOSCLZ_VERSION_STRING; 
     1915  } 
     1916#if defined(HAVE_LZ4) 
     1917  else if (clibcode == BLOSC_LZ4_LIB) { 
     1918#if defined(LZ4_VERSION_MAJOR) 
     1919    sprintf(sbuffer, "%d.%d.%d", 
     1920            LZ4_VERSION_MAJOR, LZ4_VERSION_MINOR, LZ4_VERSION_RELEASE); 
     1921    clibversion = sbuffer; 
     1922#endif /* LZ4_VERSION_MAJOR */ 
     1923  } 
     1924#endif /* HAVE_LZ4 */ 
     1925#if defined(HAVE_SNAPPY) 
     1926  else if (clibcode == BLOSC_SNAPPY_LIB) { 
     1927#if defined(SNAPPY_VERSION) 
     1928    sprintf(sbuffer, "%d.%d.%d", SNAPPY_MAJOR, SNAPPY_MINOR, SNAPPY_PATCHLEVEL); 
     1929    clibversion = sbuffer; 
     1930#endif /* SNAPPY_VERSION */ 
     1931  } 
     1932#endif /* HAVE_SNAPPY */ 
     1933#if defined(HAVE_ZLIB) 
     1934  else if (clibcode == BLOSC_ZLIB_LIB) { 
     1935    clibversion = ZLIB_VERSION; 
     1936  } 
     1937#endif /* HAVE_ZLIB */ 
     1938#if defined(HAVE_ZSTD) 
     1939  else if (clibcode == BLOSC_ZSTD_LIB) { 
     1940    sprintf(sbuffer, "%d.%d.%d", 
     1941            ZSTD_VERSION_MAJOR, ZSTD_VERSION_MINOR, ZSTD_VERSION_RELEASE); 
     1942    clibversion = sbuffer; 
     1943  } 
     1944#endif /* HAVE_ZSTD */ 
     1945 
     1946  *complib = strdup(clibname); 
     1947  *version = strdup(clibversion); 
     1948  return clibcode; 
     1949} 
     1950 
     1951/* Return `nbytes`, `cbytes` and `blocksize` from a compressed buffer. */ 
     1952void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 
     1953                         size_t *cbytes, size_t *blocksize) 
     1954{ 
     1955  uint8_t *_src = (uint8_t *)(cbuffer);    /* current pos for source buffer */ 
     1956  uint8_t version, versionlz;              /* versions for compressed header */ 
     1957 
     1958  /* Read the version info (could be useful in the future) */ 
     1959  version = _src[0];                       /* blosc format version */ 
     1960  versionlz = _src[1];                     /* blosclz format version */ 
     1961 
     1962  version += 0;                            /* shut up compiler warning */ 
     1963  versionlz += 0;                          /* shut up compiler warning */ 
     1964 
     1965  /* Read the interesting values */ 
     1966  *nbytes = (size_t)sw32_(_src + 4);       /* uncompressed buffer size */ 
     1967  *blocksize = (size_t)sw32_(_src + 8);    /* block size */ 
     1968  *cbytes = (size_t)sw32_(_src + 12);      /* compressed buffer size */ 
     1969} 
     1970 
     1971 
     1972/* Return `typesize` and `flags` from a compressed buffer. */ 
     1973void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 
     1974                            int *flags) 
     1975{ 
     1976  uint8_t *_src = (uint8_t *)(cbuffer);  /* current pos for source buffer */ 
     1977  uint8_t version, versionlz;            /* versions for compressed header */ 
     1978 
     1979  /* Read the version info (could be useful in the future) */ 
     1980  version = _src[0];                     /* blosc format version */ 
     1981  versionlz = _src[1];                   /* blosclz format version */ 
     1982 
     1983  version += 0;                             /* shut up compiler warning */ 
     1984  versionlz += 0;                           /* shut up compiler warning */ 
     1985 
     1986  /* Read the interesting values */ 
     1987  *flags = (int)_src[2];                 /* flags */ 
     1988  *typesize = (size_t)_src[3];           /* typesize */ 
     1989} 
     1990 
     1991 
     1992/* Return version information from a compressed buffer. */ 
     1993void blosc_cbuffer_versions(const void *cbuffer, int *version, 
     1994                            int *versionlz) 
     1995{ 
     1996  uint8_t *_src = (uint8_t *)(cbuffer);  /* current pos for source buffer */ 
     1997 
     1998  /* Read the version info */ 
     1999  *version = (int)_src[0];         /* blosc format version */ 
     2000  *versionlz = (int)_src[1];       /* Lempel-Ziv compressor format version */ 
     2001} 
     2002 
     2003 
     2004/* Return the compressor library/format used in a compressed buffer. */ 
     2005char *blosc_cbuffer_complib(const void *cbuffer) 
     2006{ 
     2007  uint8_t *_src = (uint8_t *)(cbuffer);  /* current pos for source buffer */ 
     2008  int clibcode; 
     2009  char *complib; 
     2010 
     2011  /* Read the compressor format/library info */ 
     2012  clibcode = (_src[2] & 0xe0) >> 5; 
     2013  complib = clibcode_to_clibname(clibcode); 
     2014  return complib; 
     2015} 
     2016 
     2017/* Get the internal blocksize to be used during compression.  0 means 
     2018   that an automatic blocksize is computed internally. */ 
     2019int blosc_get_blocksize(void) 
     2020{ 
     2021  return (int)g_force_blocksize; 
     2022} 
     2023 
     2024/* Force the use of a specific blocksize.  If 0, an automatic 
     2025   blocksize will be used (the default). */ 
     2026void blosc_set_blocksize(size_t size) 
     2027{ 
     2028  g_force_blocksize = (int32_t)size; 
     2029} 
     2030 
     2031void blosc_init(void) 
     2032{ 
     2033  /* Return if we are already initialized */ 
     2034  if (g_initlib) return; 
     2035 
     2036  pthread_mutex_init(&global_comp_mutex, NULL); 
     2037  g_global_context = (struct blosc_context*)my_malloc(sizeof(struct blosc_context)); 
     2038  g_global_context->threads_started = 0; 
     2039  g_initlib = 1; 
     2040} 
     2041 
     2042void blosc_destroy(void) 
     2043{ 
     2044  /* Return if Blosc is not initialized */ 
     2045  if (!g_initlib) return; 
     2046 
     2047  g_initlib = 0; 
     2048  blosc_release_threadpool(g_global_context); 
     2049  my_free(g_global_context); 
     2050  pthread_mutex_destroy(&global_comp_mutex); 
     2051} 
     2052 
     2053int blosc_release_threadpool(struct blosc_context* context) 
    12882054{ 
    12892055  int32_t t; 
     2056  void* status; 
     2057  int rc; 
    12902058  int rc2; 
    1291   void *status; 
    1292   
    1293    /* Take global lock  */ 
    1294   pthread_mutex_lock(&global_comp_mutex); 
    1295  
    1296   /* Release temporaries */ 
    1297   if (init_temps_done) { 
    1298     release_temporaries(); 
    1299   } 
    1300  
    1301   /* Finish the possible thread pool */ 
    1302   if (nthreads > 1 && init_threads_done) { 
     2059 
     2060  if (context->threads_started > 0) 
     2061  { 
    13032062    /* Tell all existing threads to finish */ 
    1304     end_threads = 1; 
    1305     /* Synchronization point for all threads (wait for initialization) */ 
    1306     WAIT_INIT; 
     2063    context->end_threads = 1; 
     2064 
     2065    /* Sync threads */ 
     2066    WAIT_INIT(-1, context); 
     2067 
    13072068    /* Join exiting threads */ 
    1308     for (t=0; t<nthreads; t++) { 
    1309       rc2 = pthread_join(threads[t], &status); 
     2069    for (t=0; t<context->threads_started; t++) { 
     2070      rc2 = pthread_join(context->threads[t], &status); 
    13102071      if (rc2) { 
    13112072        fprintf(stderr, "ERROR; return code from pthread_join() is %d\n", rc2); 
    13122073        fprintf(stderr, "\tError detail: %s\n", strerror(rc2)); 
    1313         return(-1); 
    13142074      } 
    13152075    } 
    13162076 
    13172077    /* Release mutex and condition variable objects */ 
    1318     pthread_mutex_destroy(&count_mutex); 
     2078    pthread_mutex_destroy(&context->count_mutex); 
    13192079 
    13202080    /* Barriers */ 
    1321 #ifdef _POSIX_BARRIERS_MINE 
    1322     pthread_barrier_destroy(&barr_init); 
    1323     pthread_barrier_destroy(&barr_finish); 
    1324 #else 
    1325     pthread_mutex_destroy(&count_threads_mutex); 
    1326     pthread_cond_destroy(&count_threads_cv); 
    1327 #endif 
    1328  
    1329     /* Thread attributes */ 
    1330 #if !defined(_WIN32) 
    1331     pthread_attr_destroy(&ct_attr); 
    1332 #endif 
    1333  
    1334     init_threads_done = 0; 
    1335     end_threads = 0; 
    1336   } 
    1337    /* Release global lock  */ 
    1338   pthread_mutex_unlock(&global_comp_mutex); 
    1339   return(0); 
    1340  
    1341 } 
    1342  
    1343 void blosc_destroy(void) { 
    1344   /* Free the resources */ 
    1345   blosc_free_resources(); 
    1346   /* Destroy global lock */ 
    1347   pthread_mutex_destroy(&global_comp_mutex); 
    1348 } 
    1349  
    1350 /* Return `nbytes`, `cbytes` and `blocksize` from a compressed buffer. */ 
    1351 void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 
    1352                          size_t *cbytes, size_t *blocksize) 
    1353 { 
    1354   uint8_t *_src = (uint8_t *)(cbuffer);    /* current pos for source buffer */ 
    1355   uint8_t version, versionlz;              /* versions for compressed header */ 
    1356  
    1357   /* Read the version info (could be useful in the future) */ 
    1358   version = _src[0];                         /* blosc format version */ 
    1359   versionlz = _src[1];                       /* blosclz format version */ 
    1360  
    1361   version += 0;                             /* shut up compiler warning */ 
    1362   versionlz += 0;                           /* shut up compiler warning */ 
    1363  
    1364   /* Read the interesting values */ 
    1365   _src += 4; 
    1366   *nbytes = (size_t)sw32(((int32_t *)_src)[0]);  /* uncompressed buffer size */ 
    1367   *blocksize = (size_t)sw32(((int32_t *)_src)[1]);   /* block size */ 
    1368   *cbytes = (size_t)sw32(((int32_t *)_src)[2]);  /* compressed buffer size */ 
    1369 } 
    1370  
    1371  
    1372 /* Return `typesize` and `flags` from a compressed buffer. */ 
    1373 void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 
    1374                             int *flags) 
    1375 { 
    1376   uint8_t *_src = (uint8_t *)(cbuffer);  /* current pos for source buffer */ 
    1377   uint8_t version, versionlz;            /* versions for compressed header */ 
    1378  
    1379   /* Read the version info (could be useful in the future) */ 
    1380   version = _src[0];                     /* blosc format version */ 
    1381   versionlz = _src[1];                   /* blosclz format version */ 
    1382  
    1383   version += 0;                             /* shut up compiler warning */ 
    1384   versionlz += 0;                           /* shut up compiler warning */ 
    1385  
    1386   /* Read the interesting values */ 
    1387   *flags = (int)_src[2];                 /* flags */ 
    1388   *typesize = (size_t)_src[3];           /* typesize */ 
    1389 } 
    1390  
    1391  
    1392 /* Return version information from a compressed buffer. */ 
    1393 void blosc_cbuffer_versions(const void *cbuffer, int *version, 
    1394                             int *versionlz) 
    1395 { 
    1396   uint8_t *_src = (uint8_t *)(cbuffer);  /* current pos for source buffer */ 
    1397  
    1398   /* Read the version info */ 
    1399   *version = (int)_src[0];             /* blosc format version */ 
    1400   *versionlz = (int)_src[1];           /* blosclz format version */ 
    1401 } 
    1402  
    1403  
    1404 /* Force the use of a specific blocksize.  If 0, an automatic 
    1405    blocksize will be used (the default). */ 
    1406 void blosc_set_blocksize(size_t size) 
    1407 { 
    1408   /* Take global lock  */ 
    1409   pthread_mutex_lock(&global_comp_mutex); 
    1410    
    1411   force_blocksize = (int32_t)size; 
    1412    
    1413    /* Release global lock  */ 
    1414   pthread_mutex_unlock(&global_comp_mutex); 
    1415 } 
     2081  #ifdef _POSIX_BARRIERS_MINE 
     2082      pthread_barrier_destroy(&context->barr_init); 
     2083      pthread_barrier_destroy(&context->barr_finish); 
     2084  #else 
     2085      pthread_mutex_destroy(&context->count_threads_mutex); 
     2086      pthread_cond_destroy(&context->count_threads_cv); 
     2087  #endif 
     2088 
     2089      /* Thread attributes */ 
     2090  #if !defined(_WIN32) 
     2091      pthread_attr_destroy(&context->ct_attr); 
     2092  #endif 
     2093 
     2094  } 
     2095 
     2096  context->threads_started = 0; 
     2097 
     2098  return 0; 
     2099} 
     2100 
     2101int blosc_free_resources(void) 
     2102{ 
     2103  /* Return if Blosc is not initialized */ 
     2104  if (!g_initlib) return -1; 
     2105 
     2106  return blosc_release_threadpool(g_global_context); 
     2107} 
  • thirdparty/blosc/blosc.h

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
    3  
    4   Author: Francesc Alted <f[email protected]> 
     2  Blosc - Blocked Shuffling and Compression Library 
     3 
     4  Author: Francesc Alted <f[email protected]> 
    55 
    66  See LICENSES/BLOSC.txt for details about copyright and rights to use. 
    77**********************************************************************/ 
    8  
    9 #include <limits.h> 
    10  
    118#ifndef BLOSC_H 
    129#define BLOSC_H 
    1310 
     11#include <limits.h> 
     12#include <stdlib.h> 
     13#include "blosc-export.h" 
     14 
     15#ifdef __cplusplus 
     16extern "C" { 
     17#endif 
     18 
    1419/* Version numbers */ 
    1520#define BLOSC_VERSION_MAJOR    1    /* for major interface/format changes  */ 
    16 #define BLOSC_VERSION_MINOR    2    /* for minor interface/format changes  */ 
    17 #define BLOSC_VERSION_RELEASE  3    /* for tweaks, bug-fixes, or development */ 
    18  
    19 #define BLOSC_VERSION_STRING   "1.2.3"  /* string version.  Sync with above! */ 
     21#define BLOSC_VERSION_MINOR    10   /* for minor interface/format changes  */ 
     22#define BLOSC_VERSION_RELEASE  1    /* for tweaks, bug-fixes, or development */ 
     23 
     24#define BLOSC_VERSION_STRING   "1.10.1.dev"  /* string version.  Sync with above! */ 
    2025#define BLOSC_VERSION_REVISION "$Rev$"   /* revision version */ 
    21 #define BLOSC_VERSION_DATE     "$Date:: 2013-05-17 #$"    /* date version */ 
    22  
    23 /* The *_VERS_FORMAT should be just 1-byte long */ 
     26#define BLOSC_VERSION_DATE     "$Date:: 2016-07-20 #$"    /* date version */ 
     27 
     28#define BLOSCLZ_VERSION_STRING "1.0.5"   /* the internal compressor version */ 
     29 
     30/* The *_FORMAT symbols should be just 1-byte long */ 
    2431#define BLOSC_VERSION_FORMAT    2   /* Blosc format version, starting at 1 */ 
    25 #define BLOSCLZ_VERSION_FORMAT  1   /* Blosclz format version, starting at 1 */ 
    26  
    27 /* The combined blosc and blosclz formats */ 
    28 #define BLOSC_VERSION_CFORMAT (BLOSC_VERSION_FORMAT << 8) & (BLOSCLZ_VERSION_FORMAT) 
    2932 
    3033/* Minimum header length */ 
     
    3639#define BLOSC_MAX_OVERHEAD BLOSC_MIN_HEADER_LENGTH 
    3740 
    38 /* Maximum buffer size to be compressed */ 
     41/* Maximum source buffer size to be compressed */ 
    3942#define BLOSC_MAX_BUFFERSIZE (INT_MAX - BLOSC_MAX_OVERHEAD) 
    4043 
    41 /* Maximum typesize before considering buffer as a stream of bytes */ 
     44/* Maximum typesize before considering source buffer as a stream of bytes */ 
    4245#define BLOSC_MAX_TYPESIZE 255         /* Cannot be larger than 255 */ 
    4346 
     
    4548#define BLOSC_MAX_THREADS 256 
    4649 
     50/* Codes for shuffling (see blosc_compress) */ 
     51#define BLOSC_NOSHUFFLE   0  /* no shuffle */ 
     52#define BLOSC_SHUFFLE     1  /* byte-wise shuffle */ 
     53#define BLOSC_BITSHUFFLE  2  /* bit-wise shuffle */ 
     54 
    4755/* Codes for internal flags (see blosc_cbuffer_metainfo) */ 
    48 #define BLOSC_DOSHUFFLE 0x1 
    49 #define BLOSC_MEMCPYED  0x2 
    50  
    51  
    52  
    53 /** 
    54   Initialize the Blosc library. You must call this previous to any other 
    55   Blosc call, and make sure that you call this in a non-threaded environment. 
    56   Other Blosc calls can be called in a threaded environment, if desired. 
    57  
    58  */ 
    59  
    60 void blosc_init(void); 
    61  
    62  
    63 /** 
    64  
    65   Destroy the Blosc library environment. You must call this after to you are 
    66   done with all the Blosc calls, and make sure that you call this in a 
    67   non-threaded environment. 
    68  
    69  */ 
    70  
    71 void blosc_destroy(void); 
     56#define BLOSC_DOSHUFFLE    0x1  /* byte-wise shuffle */ 
     57#define BLOSC_MEMCPYED     0x2  /* plain copy */ 
     58#define BLOSC_DOBITSHUFFLE 0x4  /* bit-wise shuffle */ 
     59 
     60/* Codes for the different compressors shipped with Blosc */ 
     61#define BLOSC_BLOSCLZ   0 
     62#define BLOSC_LZ4       1 
     63#define BLOSC_LZ4HC     2 
     64#define BLOSC_SNAPPY    3 
     65#define BLOSC_ZLIB      4 
     66#define BLOSC_ZSTD      5 
     67 
     68/* Names for the different compressors shipped with Blosc */ 
     69#define BLOSC_BLOSCLZ_COMPNAME   "blosclz" 
     70#define BLOSC_LZ4_COMPNAME       "lz4" 
     71#define BLOSC_LZ4HC_COMPNAME     "lz4hc" 
     72#define BLOSC_SNAPPY_COMPNAME    "snappy" 
     73#define BLOSC_ZLIB_COMPNAME      "zlib" 
     74#define BLOSC_ZSTD_COMPNAME      "zstd" 
     75 
     76/* Codes for compression libraries shipped with Blosc (code must be < 8) */ 
     77#define BLOSC_BLOSCLZ_LIB   0 
     78#define BLOSC_LZ4_LIB       1 
     79#define BLOSC_SNAPPY_LIB    2 
     80#define BLOSC_ZLIB_LIB      3 
     81#define BLOSC_ZSTD_LIB      4 
     82 
     83/* Names for the different compression libraries shipped with Blosc */ 
     84#define BLOSC_BLOSCLZ_LIBNAME   "BloscLZ" 
     85#define BLOSC_LZ4_LIBNAME       "LZ4" 
     86#define BLOSC_SNAPPY_LIBNAME    "Snappy" 
     87#define BLOSC_ZLIB_LIBNAME      "Zlib" 
     88#define BLOSC_ZSTD_LIBNAME      "Zstd" 
     89 
     90/* The codes for compressor formats shipped with Blosc */ 
     91#define BLOSC_BLOSCLZ_FORMAT  BLOSC_BLOSCLZ_LIB 
     92#define BLOSC_LZ4_FORMAT      BLOSC_LZ4_LIB 
     93#define BLOSC_LZ4HC_FORMAT    BLOSC_LZ4_LIB /* LZ4HC and LZ4 share the same format */ 
     94#define BLOSC_SNAPPY_FORMAT   BLOSC_SNAPPY_LIB 
     95#define BLOSC_ZLIB_FORMAT     BLOSC_ZLIB_LIB 
     96#define BLOSC_ZSTD_FORMAT     BLOSC_ZSTD_LIB 
     97 
     98 
     99/* The version formats for compressors shipped with Blosc */ 
     100/* All versions here starts at 1 */ 
     101#define BLOSC_BLOSCLZ_VERSION_FORMAT  1 
     102#define BLOSC_LZ4_VERSION_FORMAT      1 
     103#define BLOSC_LZ4HC_VERSION_FORMAT    1  /* LZ4HC and LZ4 share the same format */ 
     104#define BLOSC_SNAPPY_VERSION_FORMAT   1 
     105#define BLOSC_ZLIB_VERSION_FORMAT     1 
     106#define BLOSC_ZSTD_VERSION_FORMAT     1 
     107 
     108 
     109/** 
     110  Initialize the Blosc library environment. 
     111 
     112  You must call this previous to any other Blosc call, unless you want 
     113  Blosc to be used simultaneously in a multi-threaded environment, in 
     114  which case you should *exclusively* use the 
     115  blosc_compress_ctx()/blosc_decompress_ctx() pair (see below). 
     116  */ 
     117BLOSC_EXPORT void blosc_init(void); 
     118 
     119 
     120/** 
     121  Destroy the Blosc library environment. 
     122 
     123  You must call this after to you are done with all the Blosc calls, 
     124  unless you have not used blosc_init() before (see blosc_init() 
     125  above). 
     126  */ 
     127BLOSC_EXPORT void blosc_destroy(void); 
    72128 
    73129 
    74130/** 
    75131  Compress a block of data in the `src` buffer and returns the size of 
    76   compressed block.  The size of `src` buffer is specified by 
     132  the compressed block.  The size of `src` buffer is specified by 
    77133  `nbytes`.  There is not a minimum for `src` buffer size (`nbytes`). 
    78134 
     
    81137 
    82138  `doshuffle` specifies whether the shuffle compression preconditioner 
    83   should be applied or not.  0 means not applying it and 1 means 
    84   applying it. 
     139  should be applied or not.  BLOSC_NOSHUFFLE means not applying it, 
     140  BLOSC_SHUFFLE means applying it at a byte level and BLOSC_BITSHUFFLE 
     141  at a bit level (slower but may achieve better entropy alignment). 
    85142 
    86143  `typesize` is the number of bytes for the atomic type in binary 
    87144  `src` buffer.  This is mainly useful for the shuffle preconditioner. 
    88   Only a typesize > 1 will allow the shuffle to work. 
     145  For implementation reasons, only a 1 < typesize < 256 will allow the 
     146  shuffle filter to work.  When typesize is not in this range, shuffle 
     147  will be silently disabled. 
    89148 
    90149  The `dest` buffer must have at least the size of `destsize`.  Blosc 
     
    93152  The `src` buffer and the `dest` buffer can not overlap. 
    94153 
     154  Compression is memory safe and guaranteed not to write the `dest` 
     155  buffer more than what is specified in `destsize`. 
     156 
    95157  If `src` buffer cannot be compressed into `destsize`, the return 
    96158  value is zero and you should discard the contents of the `dest` 
     
    101163  together with the buffer data causing this and compression settings. 
    102164 
    103   Compression is memory safe and guaranteed not to write the `dest` 
    104   buffer more than what is specified in `destsize`.  However, it is 
    105   not re-entrant and not thread-safe (despite the fact that it uses 
    106   threads internally). 
    107  */ 
    108  
    109 int blosc_compress(int clevel, int doshuffle, size_t typesize, size_t nbytes, 
    110                    const void *src, void *dest, size_t destsize); 
    111  
     165  Environment variables 
     166  --------------------- 
     167 
     168  blosc_compress() honors different environment variables to control 
     169  internal parameters without the need of doing that programatically. 
     170  Here are the ones supported: 
     171 
     172  BLOSC_CLEVEL=(INTEGER): This will overwrite the `clevel` parameter 
     173  before the compression process starts. 
     174 
     175  BLOSC_SHUFFLE=[NOSHUFFLE | SHUFFLE | BITSHUFFLE]: This will 
     176  overwrite the `doshuffle` parameter before the compression process 
     177  starts. 
     178 
     179  BLOSC_TYPESIZE=(INTEGER): This will overwrite the `typesize` 
     180  parameter before the compression process starts. 
     181 
     182  BLOSC_COMPRESSOR=[BLOSCLZ | LZ4 | LZ4HC | SNAPPY | ZLIB]: This will 
     183  call blosc_set_compressor(BLOSC_COMPRESSOR) before the compression 
     184  process starts. 
     185 
     186  BLOSC_NTHREADS=(INTEGER): This will call 
     187  blosc_set_nthreads(BLOSC_NTHREADS) before the compression process 
     188  starts. 
     189 
     190  BLOSC_BLOCKSIZE=(INTEGER): This will call 
     191  blosc_set_blocksize(BLOSC_BLOCKSIZE) before the compression process 
     192  starts.  *NOTE:* The blocksize is a critical parameter with 
     193  important restrictions in the allowed values, so use this with care. 
     194 
     195  BLOSC_NOLOCK=(ANY VALUE): This will call blosc_compress_ctx() under 
     196  the hood, with the `compressor`, `blocksize` and 
     197  `numinternalthreads` parameters set to the same as the last calls to 
     198  blosc_set_compressor(), blosc_set_blocksize() and 
     199  blosc_set_nthreads().  BLOSC_CLEVEL, BLOSC_SHUFFLE, BLOSC_TYPESIZE 
     200  environment vars will also be honored. 
     201  */ 
     202BLOSC_EXPORT int blosc_compress(int clevel, int doshuffle, size_t typesize, 
     203                                size_t nbytes, const void *src, void *dest, 
     204                                size_t destsize); 
     205 
     206 
     207/** 
     208  Context interface to blosc compression. This does not require a call 
     209  to blosc_init() and can be called from multithreaded applications 
     210  without the global lock being used, so allowing Blosc be executed 
     211  simultaneously in those scenarios. 
     212 
     213  It uses the same parameters than the blosc_compress() function plus: 
     214 
     215  `compressor`: the string representing the type of compressor to use. 
     216 
     217  `blocksize`: the requested size of the compressed blocks.  If 0, an 
     218   automatic blocksize will be used. 
     219 
     220  `numinternalthreads`: the number of threads to use internally. 
     221 
     222  A negative return value means that an internal error happened.  This 
     223  should never happen.  If you see this, please report it back 
     224  together with the buffer data causing this and compression settings. 
     225*/ 
     226BLOSC_EXPORT int blosc_compress_ctx(int clevel, int doshuffle, size_t typesize, 
     227                                    size_t nbytes, const void* src, void* dest, 
     228                                    size_t destsize, const char* compressor, 
     229                                    size_t blocksize, int numinternalthreads); 
    112230 
    113231/** 
    114232  Decompress a block of compressed data in `src`, put the result in 
    115   `dest` and returns the size of the decompressed block. If error 
    116   occurs, e.g. the compressed data is corrupted or the output buffer 
    117   is not large enough, then 0 (zero) or a negative value will be 
    118   returned instead. 
     233  `dest` and returns the size of the decompressed block. 
    119234 
    120235  The `src` buffer and the `dest` buffer can not overlap. 
    121236 
    122237  Decompression is memory safe and guaranteed not to write the `dest` 
    123   buffer more than what is specified in `destsize`.  However, it is 
    124   not re-entrant and not thread-safe (despite the fact that it uses 
    125   threads internally). 
     238  buffer more than what is specified in `destsize`. 
     239 
     240  If an error occurs, e.g. the compressed data is corrupted or the 
     241  output buffer is not large enough, then 0 (zero) or a negative value 
     242  will be returned instead. 
     243 
     244  Environment variables 
     245  --------------------- 
     246 
     247  blosc_decompress() honors different environment variables to control 
     248  internal parameters without the need of doing that programatically. 
     249  Here are the ones supported: 
     250 
     251  BLOSC_NTHREADS=(INTEGER): This will call 
     252  blosc_set_nthreads(BLOSC_NTHREADS) before the proper decompression 
     253  process starts. 
     254 
     255  BLOSC_NOLOCK=(ANY VALUE): This will call blosc_decompress_ctx() 
     256  under the hood, with the `numinternalthreads` parameter set to the 
     257  same value as the last call to blosc_set_nthreads(). 
    126258*/ 
    127  
    128 int blosc_decompress(const void *src, void *dest, size_t destsize); 
    129  
     259BLOSC_EXPORT int blosc_decompress(const void *src, void *dest, size_t destsize); 
     260 
     261 
     262/** 
     263  Context interface to blosc decompression. This does not require a 
     264  call to blosc_init() and can be called from multithreaded 
     265  applications without the global lock being used, so allowing Blosc 
     266  be executed simultaneously in those scenarios. 
     267 
     268  It uses the same parameters than the blosc_decompress() function plus: 
     269 
     270  `numinternalthreads`: number of threads to use internally. 
     271 
     272  Decompression is memory safe and guaranteed not to write the `dest` 
     273  buffer more than what is specified in `destsize`. 
     274 
     275  If an error occurs, e.g. the compressed data is corrupted or the 
     276  output buffer is not large enough, then 0 (zero) or a negative value 
     277  will be returned instead. 
     278*/ 
     279BLOSC_EXPORT int blosc_decompress_ctx(const void *src, void *dest, 
     280                                      size_t destsize, int numinternalthreads); 
    130281 
    131282/** 
    132283  Get `nitems` (of typesize size) in `src` buffer starting in `start`. 
    133284  The items are returned in `dest` buffer, which has to have enough 
    134   space for storing all items.  Returns the number of bytes copied to 
    135   `dest` or a negative value if some error happens. 
    136  */ 
    137  
    138 int blosc_getitem(const void *src, int start, int nitems, void *dest); 
     285  space for storing all items. 
     286 
     287  Returns the number of bytes copied to `dest` or a negative value if 
     288  some error happens. 
     289  */ 
     290BLOSC_EXPORT int blosc_getitem(const void *src, int start, int nitems, void *dest); 
     291 
     292 
     293/** 
     294  Returns the current number of threads that are used for 
     295  compression/decompression. 
     296  */ 
     297BLOSC_EXPORT int blosc_get_nthreads(void); 
    139298 
    140299 
     
    142301  Initialize a pool of threads for compression/decompression.  If 
    143302  `nthreads` is 1, then the serial version is chosen and a possible 
    144   previous existing pool is ended.  Returns the previous number of 
    145   threads.  If this is not called, `nthreads` is set to 1 internally. 
     303  previous existing pool is ended.  If this is not called, `nthreads` 
     304  is set to 1 internally. 
     305 
     306  Returns the previous number of threads. 
     307  */ 
     308BLOSC_EXPORT int blosc_set_nthreads(int nthreads); 
     309 
     310 
     311/** 
     312  Returns the current compressor that is used for compression. 
     313  */ 
     314BLOSC_EXPORT char* blosc_get_compressor(void); 
     315 
     316 
     317/** 
     318  Select the compressor to be used.  The supported ones are "blosclz", 
     319  "lz4", "lz4hc", "snappy", "zlib" and "ztsd".  If this function is not 
     320  called, then "blosclz" will be used. 
     321 
     322  In case the compressor is not recognized, or there is not support 
     323  for it in this build, it returns a -1.  Else it returns the code for 
     324  the compressor (>=0). 
     325  */ 
     326BLOSC_EXPORT int blosc_set_compressor(const char* compname); 
     327 
     328 
     329/** 
     330  Get the `compname` associated with the `compcode`. 
     331 
     332  If the compressor code is not recognized, or there is not support 
     333  for it in this build, -1 is returned.  Else, the compressor code is 
     334  returned. 
     335 */ 
     336BLOSC_EXPORT int blosc_compcode_to_compname(int compcode, char **compname); 
     337 
     338 
     339/** 
     340  Return the compressor code associated with the compressor name. 
     341 
     342  If the compressor name is not recognized, or there is not support 
     343  for it in this build, -1 is returned instead. 
     344 */ 
     345BLOSC_EXPORT int blosc_compname_to_compcode(const char *compname); 
     346 
     347 
     348/** 
     349  Get a list of compressors supported in the current build.  The 
     350  returned value is a string with a concatenation of "blosclz", "lz4", 
     351  "lz4hc", "snappy", "zlib" or "zstd "separated by commas, depending 
     352  on which ones are present in the build. 
     353 
     354  This function does not leak, so you should not free() the returned 
     355  list. 
     356 
     357  This function should always succeed. 
     358  */ 
     359BLOSC_EXPORT char* blosc_list_compressors(void); 
     360 
     361/** 
     362  Return the version of blosc in string format. 
     363 
     364  Useful for dynamic libraries. 
    146365*/ 
    147  
    148 int blosc_set_nthreads(int nthreads); 
    149  
    150  
    151 /** 
    152   Free possible memory temporaries and thread resources.  Use this when you 
    153   are not going to use Blosc for a long while.  In case of problems releasing 
    154   the resources, it returns a negative number, else it returns 0. 
    155 */ 
    156  
    157 int blosc_free_resources(void); 
     366BLOSC_EXPORT char* blosc_get_version_string(void); 
     367 
     368 
     369/** 
     370  Get info from compression libraries included in the current build. 
     371  In `compname` you pass the compressor name that you want info from. 
     372  In `complib` and `version` you get the compression library name and 
     373  version (if available) as output. 
     374 
     375  In `complib` and `version` you get a pointer to the compressor 
     376  library name and the version in string format respectively.  After 
     377  using the name and version, you should free() them so as to avoid 
     378  leaks. 
     379 
     380  If the compressor is supported, it returns the code for the library 
     381  (>=0).  If it is not supported, this function returns -1. 
     382  */ 
     383BLOSC_EXPORT int blosc_get_complib_info(char *compname, char **complib, char **version); 
     384 
     385 
     386/** 
     387  Free possible memory temporaries and thread resources.  Use this 
     388  when you are not going to use Blosc for a long while.  In case of 
     389  problems releasing the resources, it returns a negative number, else 
     390  it returns 0. 
     391  */ 
     392BLOSC_EXPORT int blosc_free_resources(void); 
    158393 
    159394 
     
    168403 
    169404  This function should always succeed. 
    170 */ 
    171  
    172 void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 
    173                          size_t *cbytes, size_t *blocksize); 
     405  */ 
     406BLOSC_EXPORT void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes, 
     407                                      size_t *cbytes, size_t *blocksize); 
    174408 
    175409 
     
    182416    * bit 1: whether the internal buffer is a pure memcpy or not 
    183417 
    184   You can use the `BLOSC_DOSHUFFLE` and `BLOSC_MEMCPYED` symbols for 
    185   extracting the interesting bits (e.g. ``flags & BLOSC_DOSHUFFLE`` 
    186   says whether the buffer is shuffled or not). 
     418  You can use the `BLOSC_DOSHUFFLE`, `BLOSC_DOBITSHUFFLE` and 
     419  `BLOSC_MEMCPYED` symbols for extracting the interesting bits 
     420  (e.g. ``flags & BLOSC_DOSHUFFLE`` says whether the buffer is 
     421  byte-shuffled or not). 
    187422 
    188423  This function should always succeed. 
    189 */ 
    190  
    191 void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 
    192                             int *flags); 
     424  */ 
     425BLOSC_EXPORT void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize, 
     426                                         int *flags); 
    193427 
    194428 
     
    196430  Return information about a compressed buffer, namely the internal 
    197431  Blosc format version (`version`) and the format for the internal 
    198   Lempel-Ziv algorithm (`versionlz`).  This function should always 
    199   succeed. 
    200 */ 
    201  
    202 void blosc_cbuffer_versions(const void *cbuffer, int *version, 
    203                             int *versionlz); 
     432  Lempel-Ziv compressor used (`versionlz`). 
     433 
     434  This function should always succeed. 
     435  */ 
     436BLOSC_EXPORT void blosc_cbuffer_versions(const void *cbuffer, int *version, 
     437                                             int *versionlz); 
     438 
     439 
     440/** 
     441  Return the compressor library/format used in a compressed buffer. 
     442 
     443  This function should always succeed. 
     444  */ 
     445BLOSC_EXPORT char *blosc_cbuffer_complib(const void *cbuffer); 
    204446 
    205447 
     
    211453*********************************************************************/ 
    212454 
     455/* Get the internal blocksize to be used during compression.  0 means 
     456   that an automatic blocksize is computed internally. */ 
     457BLOSC_EXPORT int blosc_get_blocksize(void); 
    213458 
    214459/** 
    215460  Force the use of a specific blocksize.  If 0, an automatic 
    216461  blocksize will be used (the default). 
    217 */ 
    218  
    219 void blosc_set_blocksize(size_t blocksize); 
    220  
    221  
     462 
     463  The blocksize is a critical parameter with important restrictions in 
     464  the allowed values, so use this with care. 
     465  */ 
     466BLOSC_EXPORT void blosc_set_blocksize(size_t blocksize); 
     467 
     468#ifdef __cplusplus 
     469} 
    222470#endif 
     471 
     472 
     473#endif 
  • thirdparty/blosc/blosclz.c

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
    3  
    4   Author: Francesc Alted <f[email protected]> 
     2  Blosc - Blocked Shuffling and Compression Library 
     3 
     4  Author: Francesc Alted <f[email protected]> 
    55  Creation date: 2009-05-20 
    66 
     
    2121#if defined(_WIN32) && !defined(__MINGW32__) 
    2222  #include <windows.h> 
    23   #include "win32/stdint-windows.h" 
     23 
     24  /* stdint.h only available in VS2010 (VC++ 16.0) and newer */ 
     25  #if defined(_MSC_VER) && _MSC_VER < 1600 
     26    #include "win32/stdint-windows.h" 
     27  #else 
     28    #include <stdint.h> 
     29  #endif 
     30  /* llabs only available in VS2013 (VC++ 18.0) and newer */ 
     31  #if defined(_MSC_VER) && _MSC_VER < 1800 
     32    #define llabs(v) abs(v) 
     33  #endif 
    2434#else 
    2535  #include <stdint.h> 
     
    3646#elif defined(__i486__) || defined(__i586__) || defined(__i686__)  /* GNU C */ 
    3747#undef BLOSCLZ_STRICT_ALIGN 
    38 #elif defined(_M_IX86) /* Intel, MSVC */ 
     48#elif defined(_M_IX86) || defined(_M_X64)   /* Intel, MSVC */ 
    3949#undef BLOSCLZ_STRICT_ALIGN 
    4050#elif defined(__386) 
     
    4454#elif defined(__I86__) /* Digital Mars */ 
    4555#undef BLOSCLZ_STRICT_ALIGN 
     56/* Seems like unaligned access in ARM (at least ARMv6) is pretty 
     57   expensive, so we are going to always enforce strict aligment in ARM. 
     58   If anybody suggest that newer ARMs are better, we can revisit this. */ 
     59/* #elif defined(__ARM_FEATURE_UNALIGNED) */  /* ARM, GNU C */ 
     60/* #undef BLOSCLZ_STRICT_ALIGN */ 
    4661#endif 
    4762#endif 
     
    6782 * Use inlined functions for supported systems. 
    6883 */ 
    69 #if defined(__GNUC__) || defined(__DMC__) || defined(__POCC__) || defined(__WATCOMC__) || defined(__SUNPRO_C) 
    70 #define BLOSCLZ_INLINE inline 
    71 #elif defined(__BORLANDC__) || defined(_MSC_VER) || defined(__LCC__) 
    72 #define BLOSCLZ_INLINE __inline 
    73 #else 
    74 #define BLOSCLZ_INLINE 
     84#if defined(_MSC_VER) && !defined(__cplusplus)   /* Visual Studio */ 
     85#define inline __inline  /* Visual C is not C99, but supports some kind of inline */ 
    7586#endif 
    7687 
    7788#define MAX_COPY       32 
    78 #define MAX_LEN       264  /* 256 + 8 */ 
    7989#define MAX_DISTANCE 8191 
    8090#define MAX_FARDISTANCE (65535+MAX_DISTANCE-1) 
     
    8797 
    8898 
    89 static BLOSCLZ_INLINE int32_t hash_function(uint8_t* p, uint8_t hash_log) 
    90 { 
    91   int32_t v; 
    92  
    93   v = BLOSCLZ_READU16(p); 
    94   v ^= BLOSCLZ_READU16(p+1)^(v>>(16-hash_log)); 
    95   v &= (1 << hash_log) - 1; 
    96   return v; 
     99/* 
     100 * Fast copy macros 
     101 */ 
     102#if defined(_WIN32) 
     103  #define CPYSIZE              32 
     104#else 
     105  #define CPYSIZE              8 
     106#endif 
     107#define MCPY(d,s)            { memcpy(d, s, CPYSIZE); d+=CPYSIZE; s+=CPYSIZE; } 
     108#define FASTCOPY(d,s,e)      { do { MCPY(d,s) } while (d<e); } 
     109#define SAFECOPY(d,s,e)      { while (d<e) { MCPY(d,s) } } 
     110 
     111/* Copy optimized for copying in blocks */ 
     112#define BLOCK_COPY(op, ref, len, op_limit)    \ 
     113{ int ilen = len % CPYSIZE;                   \ 
     114  uint8_t *cpy = op + len;                    \ 
     115  if (cpy + CPYSIZE - ilen <= op_limit) {     \ 
     116    FASTCOPY(op, ref, cpy);                   \ 
     117    ref -= (op-cpy); op = cpy;                \ 
     118  }                                           \ 
     119  else {                                      \ 
     120    cpy -= ilen;                              \ 
     121    SAFECOPY(op, ref, cpy);                   \ 
     122    ref -= (op-cpy); op = cpy;                \ 
     123    for(; ilen; --ilen)                       \ 
     124        *op++ = *ref++;                       \ 
     125  }                                           \ 
    97126} 
    98127 
     128#define SAFE_COPY(op, ref, len, op_limit)     \ 
     129if (llabs(op-ref) < CPYSIZE) {                \ 
     130  for(; len; --len)                           \ 
     131    *op++ = *ref++;                           \ 
     132}                                             \ 
     133else BLOCK_COPY(op, ref, len, op_limit); 
     134 
     135/* Copy optimized for GCC 4.8.  Seems like long copy loops are optimal. */ 
     136#define GCC_SAFE_COPY(op, ref, len, op_limit) \ 
     137if ((len > 32) || (llabs(op-ref) < CPYSIZE)) { \ 
     138  for(; len; --len)                           \ 
     139    *op++ = *ref++;                           \ 
     140}                                             \ 
     141else BLOCK_COPY(op, ref, len, op_limit); 
     142 
     143/* Simple, but pretty effective hash function for 3-byte sequence */ 
     144#define HASH_FUNCTION(v, p, l) {                       \ 
     145    v = BLOSCLZ_READU16(p);                            \ 
     146    v ^= BLOSCLZ_READU16(p + 1) ^ ( v >> (16 - l));    \ 
     147    v &= (1 << l) - 1;                                 \ 
     148} 
     149 
     150/* Another version which seems to be a bit more effective than the above, 
     151 * but a bit slower.  Could be interesting for high opt_level. 
     152 */ 
     153#define MINMATCH 3 
     154#define HASH_FUNCTION2(v, p, l) {                       \ 
     155  v = BLOSCLZ_READU16(p);                               \ 
     156  v = (v * 2654435761U) >> ((MINMATCH * 8) - (l + 1));  \ 
     157  v &= (1 << l) - 1;                                    \ 
     158} 
     159 
     160#define LITERAL(ip, op, op_limit, anchor, copy) {        \ 
     161  if (BLOSCLZ_UNEXPECT_CONDITIONAL(op+2 > op_limit))     \ 
     162    goto out;                                            \ 
     163  *op++ = *anchor++;                                     \ 
     164  ip = anchor;                                           \ 
     165  copy++;                                                \ 
     166  if(BLOSCLZ_UNEXPECT_CONDITIONAL(copy == MAX_COPY)) {   \ 
     167    copy = 0;                                            \ 
     168    *op++ = MAX_COPY-1;                                  \ 
     169  }                                                      \ 
     170  continue;                                              \ 
     171} 
    99172 
    100173#define IP_BOUNDARY 2 
    101174 
    102 int blosclz_compress(int opt_level, const void* input, 
    103                      int length, void* output, int maxout) 
     175 
     176int blosclz_compress(const int opt_level, const void* input, int length, 
     177                     void* output, int maxout, int accel) 
    104178{ 
    105179  uint8_t* ip = (uint8_t*) input; 
     
    110184 
    111185  /* Hash table depends on the opt level.  Hash_log cannot be larger than 15. */ 
    112   uint8_t hash_log_[10] = {-1, 8, 9, 9, 11, 11, 12, 13, 14, 15}; 
     186  /* The parametrization below is made from playing with the bench suite, like: 
     187     $ bench/bench blosclz single 4 
     188     $ bench/bench blosclz single 4 4194280 12 25 
     189     and taking the minimum times on a i5-3380M @ 2.90GHz. 
     190     Curiously enough, values >= 14 does not always 
     191     get maximum compression, even with large blocksizes. */ 
     192  int8_t hash_log_[10] = {-1, 11, 11, 11, 12, 13, 13, 13, 13, 13}; 
    113193  uint8_t hash_log = hash_log_[opt_level]; 
    114194  uint16_t hash_size = 1 << hash_log; 
     
    116196  uint8_t* op_limit; 
    117197 
    118   int32_t hslot; 
    119198  int32_t hval; 
    120199  uint8_t copy; 
    121200 
    122   double maxlength_[10] = {-1, .1, .15, .2, .5, .7, .85, .925, .975, 1.0}; 
     201  double maxlength_[10] = {-1, .1, .15, .2, .3, .45, .6, .75, .9, 1.0}; 
    123202  int32_t maxlength = (int32_t) (length * maxlength_[opt_level]); 
    124203  if (maxlength > (int32_t) maxout) { 
     
    127206  op_limit = op + maxlength; 
    128207 
    129   /* output buffer cannot be less than 66 bytes or we can get into problems. 
    130      As output is usually the same length than input, we take input length. */ 
    131   if (length < 66) { 
    132     return 0;                   /* Mark this as uncompressible */ 
     208  /* output buffer cannot be less than 66 bytes or we can get into trouble */ 
     209  if (BLOSCLZ_UNEXPECT_CONDITIONAL(maxlength < 66 || length < 4)) { 
     210    return 0; 
    133211  } 
    134212 
    135   htab = (uint16_t *) malloc(hash_size*sizeof(uint16_t)); 
    136  
    137   /* sanity check */ 
    138   if(BLOSCLZ_UNEXPECT_CONDITIONAL(length < 4)) { 
    139     if(length) { 
    140       /* create literal copy only */ 
    141       *op++ = length-1; 
    142       ip_bound++; 
    143       while(ip <= ip_bound) 
    144         *op++ = *ip++; 
    145       free(htab); 
    146       return length+1; 
    147     } 
    148     else goto out; 
    149   } 
    150  
    151   /* initializes hash table */ 
    152   for (hslot = 0; hslot < hash_size; hslot++) 
    153     htab[hslot] = 0; 
     213  /* prepare the acceleration to be used in condition */ 
     214  accel = accel < 1 ? 1 : accel; 
     215  accel -= 1; 
     216 
     217  htab = (uint16_t *) calloc(hash_size, sizeof(uint16_t)); 
    154218 
    155219  /* we start with literal copy */ 
     
    175239 
    176240    /* find potential match */ 
    177     hval = hash_function(ip, hash_log); 
     241    HASH_FUNCTION(hval, ip, hash_log); 
    178242    ref = ibase + htab[hval]; 
    179     /* update hash table */ 
    180     htab[hval] = (uint16_t)(anchor - ibase); 
    181243 
    182244    /* calculate distance to the match */ 
    183245    distance = (int32_t)(anchor - ref); 
     246 
     247    /* update hash table if necessary */ 
     248    if ((distance & accel) == 0) 
     249      htab[hval] = (uint16_t)(anchor - ibase); 
    184250 
    185251    /* is this a match? check the first 3 bytes */ 
    186252    if (distance==0 || (distance >= MAX_FARDISTANCE) || 
    187253        *ref++ != *ip++ || *ref++!=*ip++ || *ref++!=*ip++) 
    188       goto literal; 
     254      LITERAL(ip, op, op_limit, anchor, copy); 
    189255 
    190256    /* far, needs at least 5-byte match */ 
    191     if (distance >= MAX_DISTANCE) { 
     257    if (opt_level >= 5 && distance >= MAX_DISTANCE) { 
    192258      if (*ip++ != *ref++ || *ip++ != *ref++) 
    193         goto literal; 
     259        LITERAL(ip, op, op_limit, anchor, copy); 
    194260      len += 2; 
    195261    } 
     
    211277      /* safe because the outer check against ip limit */ 
    212278      while (ip < (ip_bound - (sizeof(int64_t) - IP_BOUNDARY))) { 
     279#if !defined(BLOSCLZ_STRICT_ALIGN) 
    213280        value2 = ((int64_t *)ref)[0]; 
     281#else 
     282        memcpy(&value2, ref, 8); 
     283#endif 
    214284        if (value != value2) { 
    215285          /* Find the byte that starts to differ */ 
     
    234304        /* safe because the outer check against ip limit */ 
    235305        while (ip < (ip_bound - (sizeof(int64_t) - IP_BOUNDARY))) { 
    236           if (*ref++ != *ip++) break; 
     306#if !defined(BLOSCLZ_STRICT_ALIGN) 
    237307          if (((int64_t *)ref)[0] != ((int64_t *)ip)[0]) { 
     308#endif 
    238309            /* Find the byte that starts to differ */ 
    239310            while (ip < ip_bound) { 
     
    241312            } 
    242313            break; 
    243           } 
    244           else { 
    245             ip += 8; 
    246             ref += 8; 
    247           } 
     314#if !defined(BLOSCLZ_STRICT_ALIGN) 
     315          } else { ip += 8; ref += 8; } 
     316#endif 
    248317        } 
    249318        /* Last correction before exiting loop */ 
     
    311380 
    312381    /* update the hash at match boundary */ 
    313     hval = hash_function(ip, hash_log); 
     382    HASH_FUNCTION(hval, ip, hash_log); 
    314383    htab[hval] = (uint16_t)(ip++ - ibase); 
    315     hval = hash_function(ip, hash_log); 
     384    HASH_FUNCTION(hval, ip, hash_log); 
    316385    htab[hval] = (uint16_t)(ip++ - ibase); 
    317386 
    318387    /* assuming literal copy */ 
    319388    *op++ = MAX_COPY-1; 
    320  
    321     continue; 
    322  
    323   literal: 
    324     if (BLOSCLZ_UNEXPECT_CONDITIONAL(op+2 > op_limit)) goto out; 
    325     *op++ = *anchor++; 
    326     ip = anchor; 
    327     copy++; 
    328     if(BLOSCLZ_UNEXPECT_CONDITIONAL(copy == MAX_COPY)) { 
    329       copy = 0; 
    330       *op++ = MAX_COPY-1; 
    331     } 
    332389  } 
    333390 
     
    362419} 
    363420 
    364  
    365421int blosclz_decompress(const void* input, int length, void* output, int maxout) 
    366422{ 
     
    373429 
    374430  do { 
    375     const uint8_t* ref = op; 
     431    uint8_t* ref = op; 
    376432    int32_t len = ctrl >> 5; 
    377433    int32_t ofs = (ctrl & 31) << 8; 
     
    422478        ref--; 
    423479        len += 3; 
    424         if (abs((int32_t)(ref-op)) <= (int32_t)len) { 
    425           /* src and dst do overlap: do a loop */ 
    426           for(; len; --len) 
    427             *op++ = *ref++; 
    428           /* The memmove below does not work well (don't know why) */ 
    429           /* memmove(op, ref, len); 
    430              op += len; 
    431              ref += len; 
    432              len = 0; */ 
    433         } 
    434         else { 
    435           memcpy(op, ref, len); 
    436           op += len; 
    437           ref += len; 
    438         } 
     480#if !defined(_WIN32) && ((defined(__GNUC__) || defined(__INTEL_COMPILER) || !defined(__clang__))) 
     481        GCC_SAFE_COPY(op, ref, len, op_limit); 
     482#else 
     483        SAFE_COPY(op, ref, len, op_limit); 
     484#endif 
    439485      } 
    440486    } 
     
    450496#endif 
    451497 
    452       memcpy(op, ip, ctrl); 
    453       ip += ctrl; 
    454       op += ctrl; 
     498      BLOCK_COPY(op, ip, ctrl, op_limit); 
    455499 
    456500      loop = (int32_t)BLOSCLZ_EXPECT_CONDITIONAL(ip < ip_limit); 
  • thirdparty/blosc/blosclz.h

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
     2  Blosc - Blocked Shuffling and Compression Library 
    33 
    4   Author: Francesc Alted <f[email protected]> 
     4  Author: Francesc Alted <f[email protected]> 
    55 
    66  See LICENSES/BLOSC.txt for details about copyright and rights to use. 
     
    3333  output buffer. 
    3434 
     35  The acceleration parameter is related with the frequency for 
     36  updating the internal hash.  An acceleration of 1 means that the 
     37  internal hash is updated at full rate.  A value < 1 is not allowed 
     38  and will be silently set to 1. 
     39 
    3540  The input buffer and the output buffer can not overlap. 
    3641*/ 
    3742 
    38 int blosclz_compress(int opt_level, const void* input, int length, 
    39                      void* output, int maxout); 
     43int blosclz_compress(const int opt_level, const void* input, int length, 
     44                     void* output, int maxout, int accel); 
    4045 
    4146/** 
  • thirdparty/blosc/shuffle.c

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
    3  
    4   Author: Francesc Alted <f[email protected]> 
     2  Blosc - Blocked Shuffling and Compression Library 
     3 
     4  Author: Francesc Alted <f[email protected]> 
    55  Creation date: 2009-05-20 
    66 
     
    88**********************************************************************/ 
    99 
     10#include "shuffle.h" 
     11#include "shuffle-common.h" 
     12#include "shuffle-generic.h" 
     13#include "bitshuffle-generic.h" 
    1014#include <stdio.h> 
    1115#include <string.h> 
    12 #include "shuffle.h" 
    13  
    14 #if defined(_WIN32) && !defined(__MINGW32__) 
    15   #include <windows.h> 
    16   #include "win32/stdint-windows.h" 
    17   #define __SSE2__          /* Windows does not define this by default */ 
    18 #else 
    19   #include <stdint.h> 
    20   #include <inttypes.h> 
    21 #endif  /* _WIN32 */ 
    22  
    23  
    24 /* The non-SSE2 versions of shuffle and unshuffle */ 
    25  
    26 /* Shuffle a block.  This can never fail. */ 
    27 static void _shuffle(size_t bytesoftype, size_t blocksize, 
    28                          uint8_t* _src, uint8_t* _dest) 
    29 { 
    30   size_t i, j, neblock, leftover; 
    31  
    32   /* Non-optimized shuffle */ 
    33   neblock = blocksize / bytesoftype;  /* Number of elements in a block */ 
    34   for (j = 0; j < bytesoftype; j++) { 
    35     for (i = 0; i < neblock; i++) { 
    36       _dest[j*neblock+i] = _src[i*bytesoftype+j]; 
     16 
     17/* Visual Studio < 2013 does not have stdbool.h so here it is a replacement: */ 
     18#if defined __STDC__ && defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L 
     19/* have a C99 compiler */ 
     20typedef _Bool bool; 
     21#else 
     22/* do not have a C99 compiler */ 
     23typedef unsigned char bool; 
     24#endif 
     25static const bool false = 0; 
     26static const bool true = 1; 
     27 
     28 
     29#if !defined(__clang__) && defined(__GNUC__) && defined(__GNUC_MINOR__) && \ 
     30    __GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8) 
     31#define HAVE_CPU_FEAT_INTRIN 
     32#endif 
     33 
     34/*  Include hardware-accelerated shuffle/unshuffle routines based on 
     35    the target architecture. Note that a target architecture may support 
     36    more than one type of acceleration!*/ 
     37#if defined(SHUFFLE_AVX2_ENABLED) 
     38  #include "shuffle-avx2.h" 
     39  #include "bitshuffle-avx2.h" 
     40#endif  /* defined(SHUFFLE_AVX2_ENABLED) */ 
     41 
     42#if defined(SHUFFLE_SSE2_ENABLED) 
     43  #include "shuffle-sse2.h" 
     44  #include "bitshuffle-sse2.h" 
     45#endif  /* defined(SHUFFLE_SSE2_ENABLED) */ 
     46 
     47 
     48/*  Define function pointer types for shuffle/unshuffle routines. */ 
     49typedef void(*shuffle_func)(const size_t, const size_t, const uint8_t*, const uint8_t*); 
     50typedef void(*unshuffle_func)(const size_t, const size_t, const uint8_t*, const uint8_t*); 
     51typedef int64_t(*bitshuffle_func)(void*, void*, const size_t, const size_t, void*); 
     52typedef int64_t(*bitunshuffle_func)(void*, void*, const size_t, const size_t, void*); 
     53 
     54/* An implementation of shuffle/unshuffle routines. */ 
     55typedef struct shuffle_implementation { 
     56  /* Name of this implementation. */ 
     57  const char* name; 
     58  /* Function pointer to the shuffle routine for this implementation. */ 
     59  shuffle_func shuffle; 
     60  /* Function pointer to the unshuffle routine for this implementation. */ 
     61  unshuffle_func unshuffle; 
     62  /* Function pointer to the bitshuffle routine for this implementation. */ 
     63  bitshuffle_func bitshuffle; 
     64  /* Function pointer to the bitunshuffle routine for this implementation. */ 
     65  bitunshuffle_func bitunshuffle; 
     66} shuffle_implementation_t; 
     67 
     68typedef enum { 
     69  BLOSC_HAVE_NOTHING = 0, 
     70  BLOSC_HAVE_SSE2 = 1, 
     71  BLOSC_HAVE_AVX2 = 2 
     72} blosc_cpu_features; 
     73 
     74/*  Detect hardware and set function pointers to the best shuffle/unshuffle 
     75    implementations supported by the host processor. */ 
     76#if defined(SHUFFLE_AVX2_ENABLED) || defined(SHUFFLE_SSE2_ENABLED)    /* Intel/i686 */ 
     77 
     78/*  Disabled the __builtin_cpu_supports() call, as it has issues with 
     79    new versions of gcc (like 5.3.1 in forthcoming ubuntu/xenial: 
     80      "undefined symbol: __cpu_model" 
     81    For a similar report, see: 
     82    https://lists.fedoraproject.org/archives/list/[email protected]/thread/ZM2L65WIZEEQHHLFERZYD5FAG7QY2OGB/ 
     83*/ 
     84#if defined(HAVE_CPU_FEAT_INTRIN) && 0 
     85static blosc_cpu_features blosc_get_cpu_features(void) { 
     86  blosc_cpu_features cpu_features = BLOSC_HAVE_NOTHING; 
     87  if (__builtin_cpu_supports("sse2")) { 
     88    cpu_features |= BLOSC_HAVE_SSE2; 
     89  } 
     90  if (__builtin_cpu_supports("avx2")) { 
     91    cpu_features |= BLOSC_HAVE_AVX2; 
     92  } 
     93  return cpu_features; 
     94} 
     95#else 
     96 
     97#if defined(_MSC_VER) && !defined(__clang__) 
     98  #include <intrin.h>     /* Needed for __cpuid */ 
     99 
     100/*  _xgetbv is only supported by VS2010 SP1 and newer versions of VS. */ 
     101#if _MSC_FULL_VER >= 160040219 
     102  #include <immintrin.h>  /* Needed for _xgetbv */ 
     103#elif defined(_M_IX86) 
     104 
     105/*  Implement _xgetbv for VS2008 and VS2010 RTM with 32-bit (x86) targets. */ 
     106 
     107static uint64_t _xgetbv(uint32_t xcr) { 
     108    uint32_t xcr0, xcr1; 
     109    __asm { 
     110        mov        ecx, xcr 
     111        _asm _emit 0x0f _asm _emit 0x01 _asm _emit 0xd0 
     112        mov        xcr0, eax 
     113        mov        xcr1, edx 
    37114    } 
    38   } 
    39   leftover = blocksize % bytesoftype; 
    40   memcpy(_dest + neblock*bytesoftype, _src + neblock*bytesoftype, leftover); 
    41 } 
    42  
    43 /* Unshuffle a block.  This can never fail. */ 
    44 static void _unshuffle(size_t bytesoftype, size_t blocksize, 
    45                        uint8_t* _src, uint8_t* _dest) 
    46 { 
    47   size_t i, j, neblock, leftover; 
    48  
    49   /* Non-optimized unshuffle */ 
    50   neblock = blocksize / bytesoftype;  /* Number of elements in a block */ 
    51   for (i = 0; i < neblock; i++) { 
    52     for (j = 0; j < bytesoftype; j++) { 
    53       _dest[i*bytesoftype+j] = _src[j*neblock+i]; 
    54     } 
    55   } 
    56   leftover = blocksize % bytesoftype; 
    57   memcpy(_dest+neblock*bytesoftype, _src+neblock*bytesoftype, leftover); 
    58 } 
    59  
    60  
    61 #ifdef __SSE2__ 
    62  
    63 /* The SSE2 versions of shuffle and unshuffle */ 
    64  
    65 #include <emmintrin.h> 
    66  
    67 /* The next is useful for debugging purposes */ 
    68 #if 0 
    69 static void printxmm(__m128i xmm0) 
    70 { 
    71   uint8_t buf[16]; 
    72  
    73   ((__m128i *)buf)[0] = xmm0; 
    74   printf("%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x,%x\n", 
    75           buf[0], buf[1], buf[2], buf[3], 
    76           buf[4], buf[5], buf[6], buf[7], 
    77           buf[8], buf[9], buf[10], buf[11], 
    78           buf[12], buf[13], buf[14], buf[15]); 
    79 } 
    80 #endif 
    81  
    82  
    83 /* Routine optimized for shuffling a buffer for a type size of 2 bytes. */ 
    84 static void 
    85 shuffle2(uint8_t* dest, uint8_t* src, size_t size) 
    86 { 
    87   size_t i, j, k; 
    88   size_t numof16belem; 
    89   __m128i xmm0[2], xmm1[2]; 
    90  
    91   numof16belem = size / (16*2); 
    92   for (i = 0, j = 0; i < numof16belem; i++, j += 16*2) { 
    93     /* Fetch and transpose bytes, words and double words in groups of 
    94        32 bytes */ 
    95     for (k = 0; k < 2; k++) { 
    96       xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 
    97       xmm0[k] = _mm_shufflelo_epi16(xmm0[k], 0xd8); 
    98       xmm0[k] = _mm_shufflehi_epi16(xmm0[k], 0xd8); 
    99       xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 
    100       xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 
    101       xmm0[k] = _mm_unpacklo_epi8(xmm0[k], xmm1[k]); 
    102       xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 
    103       xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 
    104       xmm0[k] = _mm_unpacklo_epi16(xmm0[k], xmm1[k]); 
    105       xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 
    106     } 
    107     /* Transpose quad words */ 
    108     for (k = 0; k < 1; k++) { 
    109       xmm1[k*2] = _mm_unpacklo_epi64(xmm0[k], xmm0[k+1]); 
    110       xmm1[k*2+1] = _mm_unpackhi_epi64(xmm0[k], xmm0[k+1]); 
    111     } 
    112     /* Store the result vectors */ 
    113     for (k = 0; k < 2; k++) { 
    114       ((__m128i *)dest)[k*numof16belem+i] = xmm1[k]; 
    115     } 
    116   } 
    117 } 
    118  
    119  
    120 /* Routine optimized for shuffling a buffer for a type size of 4 bytes. */ 
    121 static void 
    122 shuffle4(uint8_t* dest, uint8_t* src, size_t size) 
    123 { 
    124   size_t i, j, k; 
    125   size_t numof16belem; 
    126   __m128i xmm0[4], xmm1[4]; 
    127  
    128   numof16belem = size / (16*4); 
    129   for (i = 0, j = 0; i < numof16belem; i++, j += 16*4) { 
    130     /* Fetch and transpose bytes and words in groups of 64 bytes */ 
    131     for (k = 0; k < 4; k++) { 
    132       xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 
    133       xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0xd8); 
    134       xmm0[k] = _mm_shuffle_epi32(xmm0[k], 0x8d); 
    135       xmm0[k] = _mm_unpacklo_epi8(xmm1[k], xmm0[k]); 
    136       xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x04e); 
    137       xmm0[k] = _mm_unpacklo_epi16(xmm0[k], xmm1[k]); 
    138     } 
    139     /* Transpose double words */ 
    140     for (k = 0; k < 2; k++) { 
    141       xmm1[k*2] = _mm_unpacklo_epi32(xmm0[k*2], xmm0[k*2+1]); 
    142       xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[k*2], xmm0[k*2+1]); 
    143     } 
    144     /* Transpose quad words */ 
    145     for (k = 0; k < 2; k++) { 
    146       xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+2]); 
    147       xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+2]); 
    148     } 
    149     /* Store the result vectors */ 
    150     for (k = 0; k < 4; k++) { 
    151       ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 
    152     } 
    153   } 
    154 } 
    155  
    156  
    157 /* Routine optimized for shuffling a buffer for a type size of 8 bytes. */ 
    158 static void 
    159 shuffle8(uint8_t* dest, uint8_t* src, size_t size) 
    160 { 
    161   size_t i, j, k, l; 
    162   size_t numof16belem; 
    163   __m128i xmm0[8], xmm1[8]; 
    164  
    165   numof16belem = size / (16*8); 
    166   for (i = 0, j = 0; i < numof16belem; i++, j += 16*8) { 
    167     /* Fetch and transpose bytes in groups of 128 bytes */ 
    168     for (k = 0; k < 8; k++) { 
    169       xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 
    170       xmm1[k] = _mm_shuffle_epi32(xmm0[k], 0x4e); 
    171       xmm1[k] = _mm_unpacklo_epi8(xmm0[k], xmm1[k]); 
    172     } 
    173     /* Transpose words */ 
    174     for (k = 0, l = 0; k < 4; k++, l +=2) { 
    175       xmm0[k*2] = _mm_unpacklo_epi16(xmm1[l], xmm1[l+1]); 
    176       xmm0[k*2+1] = _mm_unpackhi_epi16(xmm1[l], xmm1[l+1]); 
    177     } 
    178     /* Transpose double words */ 
    179     for (k = 0, l = 0; k < 4; k++, l++) { 
    180       if (k == 2) l += 2; 
    181       xmm1[k*2] = _mm_unpacklo_epi32(xmm0[l], xmm0[l+2]); 
    182       xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[l], xmm0[l+2]); 
    183     } 
    184     /* Transpose quad words */ 
    185     for (k = 0; k < 4; k++) { 
    186       xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+4]); 
    187       xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+4]); 
    188     } 
    189     /* Store the result vectors */ 
    190     for (k = 0; k < 8; k++) { 
    191       ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 
    192     } 
    193   } 
    194 } 
    195  
    196  
    197 /* Routine optimized for shuffling a buffer for a type size of 16 bytes. */ 
    198 static void 
    199 shuffle16(uint8_t* dest, uint8_t* src, size_t size) 
    200 { 
    201   size_t i, j, k, l; 
    202   size_t numof16belem; 
    203   __m128i xmm0[16], xmm1[16]; 
    204  
    205   numof16belem = size / (16*16); 
    206   for (i = 0, j = 0; i < numof16belem; i++, j += 16*16) { 
    207     /* Fetch elements in groups of 256 bytes */ 
    208     for (k = 0; k < 16; k++) { 
    209       xmm0[k] = _mm_loadu_si128((__m128i*)(src+j+k*16)); 
    210     } 
    211     /* Transpose bytes */ 
    212     for (k = 0, l = 0; k < 8; k++, l +=2) { 
    213       xmm1[k*2] = _mm_unpacklo_epi8(xmm0[l], xmm0[l+1]); 
    214       xmm1[k*2+1] = _mm_unpackhi_epi8(xmm0[l], xmm0[l+1]); 
    215     } 
    216     /* Transpose words */ 
    217     for (k = 0, l = -2; k < 8; k++, l++) { 
    218       if ((k%2) == 0) l += 2; 
    219       xmm0[k*2] = _mm_unpacklo_epi16(xmm1[l], xmm1[l+2]); 
    220       xmm0[k*2+1] = _mm_unpackhi_epi16(xmm1[l], xmm1[l+2]); 
    221     } 
    222     /* Transpose double words */ 
    223     for (k = 0, l = -4; k < 8; k++, l++) { 
    224       if ((k%4) == 0) l += 4; 
    225       xmm1[k*2] = _mm_unpacklo_epi32(xmm0[l], xmm0[l+4]); 
    226       xmm1[k*2+1] = _mm_unpackhi_epi32(xmm0[l], xmm0[l+4]); 
    227     } 
    228     /* Transpose quad words */ 
    229     for (k = 0; k < 8; k++) { 
    230       xmm0[k*2] = _mm_unpacklo_epi64(xmm1[k], xmm1[k+8]); 
    231       xmm0[k*2+1] = _mm_unpackhi_epi64(xmm1[k], xmm1[k+8]); 
    232     } 
    233     /* Store the result vectors */ 
    234     for (k = 0; k < 16; k++) { 
    235       ((__m128i *)dest)[k*numof16belem+i] = xmm0[k]; 
    236     } 
    237   } 
    238 } 
    239  
    240  
    241 /* Shuffle a block.  This can never fail. */ 
    242 void shuffle(size_t bytesoftype, size_t blocksize, 
    243              uint8_t* _src, uint8_t* _dest) { 
    244   int unaligned_dest = (int)((uintptr_t)_dest % 16); 
    245   int power_of_two = (blocksize & (blocksize - 1)) == 0; 
    246   int too_small = (blocksize < 256); 
    247  
    248   if (unaligned_dest || !power_of_two || too_small) { 
    249     /* _dest buffer is not aligned, not a power of two or is too 
    250        small.  Call the non-sse2 version. */ 
    251     _shuffle(bytesoftype, blocksize, _src, _dest); 
    252     return; 
    253   } 
    254  
    255   /* Optimized shuffle */ 
    256   /* The buffer must be aligned on a 16 bytes boundary, have a power */ 
    257   /* of 2 size and be larger or equal than 256 bytes. */ 
    258   if (bytesoftype == 4) { 
    259     shuffle4(_dest, _src, blocksize); 
    260   } 
    261   else if (bytesoftype == 8) { 
    262     shuffle8(_dest, _src, blocksize); 
    263   } 
    264   else if (bytesoftype == 16) { 
    265     shuffle16(_dest, _src, blocksize); 
    266   } 
    267   else if (bytesoftype == 2) { 
    268     shuffle2(_dest, _src, blocksize); 
    269   } 
    270   else { 
    271     /* Non-optimized shuffle */ 
    272     _shuffle(bytesoftype, blocksize, _src, _dest); 
    273   } 
    274 } 
    275  
    276  
    277 /* Routine optimized for unshuffling a buffer for a type size of 2 bytes. */ 
    278 static void 
    279 unshuffle2(uint8_t* dest, uint8_t* orig, size_t size) 
    280 { 
    281   size_t i, k; 
    282   size_t neblock, numof16belem; 
    283   __m128i xmm1[2], xmm2[2]; 
    284  
    285   neblock = size / 2; 
    286   numof16belem = neblock / 16; 
    287   for (i = 0, k = 0; i < numof16belem; i++, k += 2) { 
    288     /* Load the first 32 bytes in 2 XMM registrers */ 
    289     xmm1[0] = ((__m128i *)orig)[0*numof16belem+i]; 
    290     xmm1[1] = ((__m128i *)orig)[1*numof16belem+i]; 
    291     /* Shuffle bytes */ 
    292     /* Compute the low 32 bytes */ 
    293     xmm2[0] = _mm_unpacklo_epi8(xmm1[0], xmm1[1]); 
    294     /* Compute the hi 32 bytes */ 
    295     xmm2[1] = _mm_unpackhi_epi8(xmm1[0], xmm1[1]); 
    296     /* Store the result vectors in proper order */ 
    297     ((__m128i *)dest)[k+0] = xmm2[0]; 
    298     ((__m128i *)dest)[k+1] = xmm2[1]; 
    299   } 
    300 } 
    301  
    302  
    303 /* Routine optimized for unshuffling a buffer for a type size of 4 bytes. */ 
    304 static void 
    305 unshuffle4(uint8_t* dest, uint8_t* orig, size_t size) 
    306 { 
    307   size_t i, j, k; 
    308   size_t neblock, numof16belem; 
    309   __m128i xmm0[4], xmm1[4]; 
    310  
    311   neblock = size / 4; 
    312   numof16belem = neblock / 16; 
    313   for (i = 0, k = 0; i < numof16belem; i++, k += 4) { 
    314     /* Load the first 64 bytes in 4 XMM registrers */ 
    315     for (j = 0; j < 4; j++) { 
    316       xmm0[j] = ((__m128i *)orig)[j*numof16belem+i]; 
    317     } 
    318     /* Shuffle bytes */ 
    319     for (j = 0; j < 2; j++) { 
    320       /* Compute the low 32 bytes */ 
    321       xmm1[j] = _mm_unpacklo_epi8(xmm0[j*2], xmm0[j*2+1]); 
    322       /* Compute the hi 32 bytes */ 
    323       xmm1[2+j] = _mm_unpackhi_epi8(xmm0[j*2], xmm0[j*2+1]); 
    324     } 
    325     /* Shuffle 2-byte words */ 
    326     for (j = 0; j < 2; j++) { 
    327       /* Compute the low 32 bytes */ 
    328       xmm0[j] = _mm_unpacklo_epi16(xmm1[j*2], xmm1[j*2+1]); 
    329       /* Compute the hi 32 bytes */ 
    330       xmm0[2+j] = _mm_unpackhi_epi16(xmm1[j*2], xmm1[j*2+1]); 
    331     } 
    332     /* Store the result vectors in proper order */ 
    333     ((__m128i *)dest)[k+0] = xmm0[0]; 
    334     ((__m128i *)dest)[k+1] = xmm0[2]; 
    335     ((__m128i *)dest)[k+2] = xmm0[1]; 
    336     ((__m128i *)dest)[k+3] = xmm0[3]; 
    337   } 
    338 } 
    339  
    340  
    341 /* Routine optimized for unshuffling a buffer for a type size of 8 bytes. */ 
    342 static void 
    343 unshuffle8(uint8_t* dest, uint8_t* orig, size_t size) 
    344 { 
    345   size_t i, j, k; 
    346   size_t neblock, numof16belem; 
    347   __m128i xmm0[8], xmm1[8]; 
    348  
    349   neblock = size / 8; 
    350   numof16belem = neblock / 16; 
    351   for (i = 0, k = 0; i < numof16belem; i++, k += 8) { 
    352     /* Load the first 64 bytes in 8 XMM registrers */ 
    353     for (j = 0; j < 8; j++) { 
    354       xmm0[j] = ((__m128i *)orig)[j*numof16belem+i]; 
    355     } 
    356     /* Shuffle bytes */ 
    357     for (j = 0; j < 4; j++) { 
    358       /* Compute the low 32 bytes */ 
    359       xmm1[j] = _mm_unpacklo_epi8(xmm0[j*2], xmm0[j*2+1]); 
    360       /* Compute the hi 32 bytes */ 
    361       xmm1[4+j] = _mm_unpackhi_epi8(xmm0[j*2], xmm0[j*2+1]); 
    362     } 
    363     /* Shuffle 2-byte words */ 
    364     for (j = 0; j < 4; j++) { 
    365       /* Compute the low 32 bytes */ 
    366       xmm0[j] = _mm_unpacklo_epi16(xmm1[j*2], xmm1[j*2+1]); 
    367       /* Compute the hi 32 bytes */ 
    368       xmm0[4+j] = _mm_unpackhi_epi16(xmm1[j*2], xmm1[j*2+1]); 
    369     } 
    370     /* Shuffle 4-byte dwords */ 
    371     for (j = 0; j < 4; j++) { 
    372       /* Compute the low 32 bytes */ 
    373       xmm1[j] = _mm_unpacklo_epi32(xmm0[j*2], xmm0[j*2+1]); 
    374       /* Compute the hi 32 bytes */ 
    375       xmm1[4+j] = _mm_unpackhi_epi32(xmm0[j*2], xmm0[j*2+1]); 
    376     } 
    377     /* Store the result vectors in proper order */ 
    378     ((__m128i *)dest)[k+0] = xmm1[0]; 
    379     ((__m128i *)dest)[k+1] = xmm1[4]; 
    380     ((__m128i *)dest)[k+2] = xmm1[2]; 
    381     ((__m128i *)dest)[k+3] = xmm1[6]; 
    382     ((__m128i *)dest)[k+4] = xmm1[1]; 
    383     ((__m128i *)dest)[k+5] = xmm1[5]; 
    384     ((__m128i *)dest)[k+6] = xmm1[3]; 
    385     ((__m128i *)dest)[k+7] = xmm1[7]; 
    386   } 
    387 } 
    388  
    389  
    390 /* Routine optimized for unshuffling a buffer for a type size of 16 bytes. */ 
    391 static void 
    392 unshuffle16(uint8_t* dest, uint8_t* orig, size_t size) 
    393 { 
    394   size_t i, j, k; 
    395   size_t neblock, numof16belem; 
    396   __m128i xmm1[16], xmm2[16]; 
    397  
    398   neblock = size / 16; 
    399   numof16belem = neblock / 16; 
    400   for (i = 0, k = 0; i < numof16belem; i++, k += 16) { 
    401     /* Load the first 128 bytes in 16 XMM registrers */ 
    402     for (j = 0; j < 16; j++) { 
    403       xmm1[j] = ((__m128i *)orig)[j*numof16belem+i]; 
    404     } 
    405     /* Shuffle bytes */ 
    406     for (j = 0; j < 8; j++) { 
    407       /* Compute the low 32 bytes */ 
    408       xmm2[j] = _mm_unpacklo_epi8(xmm1[j*2], xmm1[j*2+1]); 
    409       /* Compute the hi 32 bytes */ 
    410       xmm2[8+j] = _mm_unpackhi_epi8(xmm1[j*2], xmm1[j*2+1]); 
    411     } 
    412     /* Shuffle 2-byte words */ 
    413     for (j = 0; j < 8; j++) { 
    414       /* Compute the low 32 bytes */ 
    415       xmm1[j] = _mm_unpacklo_epi16(xmm2[j*2], xmm2[j*2+1]); 
    416       /* Compute the hi 32 bytes */ 
    417       xmm1[8+j] = _mm_unpackhi_epi16(xmm2[j*2], xmm2[j*2+1]); 
    418     } 
    419     /* Shuffle 4-byte dwords */ 
    420     for (j = 0; j < 8; j++) { 
    421       /* Compute the low 32 bytes */ 
    422       xmm2[j] = _mm_unpacklo_epi32(xmm1[j*2], xmm1[j*2+1]); 
    423       /* Compute the hi 32 bytes */ 
    424       xmm2[8+j] = _mm_unpackhi_epi32(xmm1[j*2], xmm1[j*2+1]); 
    425     } 
    426     /* Shuffle 8-byte qwords */ 
    427     for (j = 0; j < 8; j++) { 
    428       /* Compute the low 32 bytes */ 
    429       xmm1[j] = _mm_unpacklo_epi64(xmm2[j*2], xmm2[j*2+1]); 
    430       /* Compute the hi 32 bytes */ 
    431       xmm1[8+j] = _mm_unpackhi_epi64(xmm2[j*2], xmm2[j*2+1]); 
    432     } 
    433     /* Store the result vectors in proper order */ 
    434     ((__m128i *)dest)[k+0] = xmm1[0]; 
    435     ((__m128i *)dest)[k+1] = xmm1[8]; 
    436     ((__m128i *)dest)[k+2] = xmm1[4]; 
    437     ((__m128i *)dest)[k+3] = xmm1[12]; 
    438     ((__m128i *)dest)[k+4] = xmm1[2]; 
    439     ((__m128i *)dest)[k+5] = xmm1[10]; 
    440     ((__m128i *)dest)[k+6] = xmm1[6]; 
    441     ((__m128i *)dest)[k+7] = xmm1[14]; 
    442     ((__m128i *)dest)[k+8] = xmm1[1]; 
    443     ((__m128i *)dest)[k+9] = xmm1[9]; 
    444     ((__m128i *)dest)[k+10] = xmm1[5]; 
    445     ((__m128i *)dest)[k+11] = xmm1[13]; 
    446     ((__m128i *)dest)[k+12] = xmm1[3]; 
    447     ((__m128i *)dest)[k+13] = xmm1[11]; 
    448     ((__m128i *)dest)[k+14] = xmm1[7]; 
    449     ((__m128i *)dest)[k+15] = xmm1[15]; 
    450   } 
    451 } 
    452  
    453  
    454 /* Unshuffle a block.  This can never fail. */ 
    455 void unshuffle(size_t bytesoftype, size_t blocksize, 
    456                uint8_t* _src, uint8_t* _dest) { 
    457   int unaligned_src = (int)((uintptr_t)_src % 16); 
    458   int unaligned_dest = (int)((uintptr_t)_dest % 16); 
    459   int power_of_two = (blocksize & (blocksize - 1)) == 0; 
    460   int too_small = (blocksize < 256); 
    461  
    462   if (unaligned_src || unaligned_dest || !power_of_two || too_small) { 
    463     /* _src or _dest buffer is not aligned, not a power of two or is 
    464        too small.  Call the non-sse2 version. */ 
    465     _unshuffle(bytesoftype, blocksize, _src, _dest); 
    466     return; 
    467   } 
    468  
    469   /* Optimized unshuffle */ 
    470   /* The buffers must be aligned on a 16 bytes boundary, have a power */ 
    471   /* of 2 size and be larger or equal than 256 bytes. */ 
    472   if (bytesoftype == 4) { 
    473     unshuffle4(_dest, _src, blocksize); 
    474   } 
    475   else if (bytesoftype == 8) { 
    476     unshuffle8(_dest, _src, blocksize); 
    477   } 
    478   else if (bytesoftype == 16) { 
    479     unshuffle16(_dest, _src, blocksize); 
    480   } 
    481   else if (bytesoftype == 2) { 
    482     unshuffle2(_dest, _src, blocksize); 
    483   } 
    484   else { 
    485     /* Non-optimized unshuffle */ 
    486     _unshuffle(bytesoftype, blocksize, _src, _dest); 
    487   } 
    488 } 
    489  
    490 #else   /* no __SSE2__ available */ 
    491  
    492 void shuffle(size_t bytesoftype, size_t blocksize, 
    493              uint8_t* _src, uint8_t* _dest) { 
    494   _shuffle(bytesoftype, blocksize, _src, _dest); 
    495 } 
    496  
    497 void unshuffle(size_t bytesoftype, size_t blocksize, 
    498                uint8_t* _src, uint8_t* _dest) { 
    499   _unshuffle(bytesoftype, blocksize, _src, _dest); 
    500 } 
    501  
    502 #endif  /* __SSE2__ */ 
     115    return ((uint64_t)xcr1 << 32) | xcr0; 
     116} 
     117 
     118#elif defined(_M_X64) 
     119 
     120/*  Implement _xgetbv for VS2008 and VS2010 RTM with 64-bit (x64) targets. 
     121    These compilers don't support any of the newer acceleration ISAs 
     122    (e.g., AVX2) supported by blosc, and all x64 hardware supports SSE2 
     123    which means we can get away with returning a hard-coded value from 
     124    this implementation of _xgetbv. */ 
     125 
     126static inline uint64_t 
     127_xgetbv(uint32_t xcr) { 
     128    /* A 64-bit OS must have XMM save support. */ 
     129    return xcr == 0 ? (1UL << 1) : 0UL; 
     130} 
     131 
     132#else 
     133 
     134/* Hardware detection for any other MSVC targets (e.g., ARM) 
     135   isn't implemented at this time. */ 
     136#error This version of c-blosc only supports x86 and x64 targets with MSVC. 
     137 
     138#endif /* _MSC_FULL_VER >= 160040219 */ 
     139   
     140#else 
     141 
     142/*  Implement the __cpuid and __cpuidex intrinsics for GCC, Clang, 
     143    and others using inline assembly. */ 
     144__attribute__((always_inline)) 
     145static inline void 
     146__cpuidex(int32_t cpuInfo[4], int32_t function_id, int32_t subfunction_id) { 
     147  __asm__ __volatile__ ( 
     148# if defined(__i386__) && defined (__PIC__) 
     149  /*  Can't clobber ebx with PIC running under 32-bit, so it needs to be manually restored. 
     150      https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family 
     151  */ 
     152    "movl %%ebx, %%edi\n\t" 
     153    "cpuid\n\t" 
     154    "xchgl %%ebx, %%edi": 
     155    "=D" (cpuInfo[1]), 
     156#else 
     157    "cpuid": 
     158    "=b" (cpuInfo[1]), 
     159#endif  /* defined(__i386) && defined(__PIC__) */ 
     160    "=a" (cpuInfo[0]), 
     161    "=c" (cpuInfo[2]), 
     162    "=d" (cpuInfo[3]) : 
     163    "a" (function_id), "c" (subfunction_id) 
     164    ); 
     165} 
     166 
     167#define __cpuid(cpuInfo, function_id) __cpuidex(cpuInfo, function_id, 0) 
     168 
     169#define _XCR_XFEATURE_ENABLED_MASK 0 
     170 
     171/* Reads the content of an extended control register. 
     172   https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family 
     173*/ 
     174static inline uint64_t 
     175_xgetbv(uint32_t xcr) { 
     176  uint32_t eax, edx; 
     177  __asm__ __volatile__ ( 
     178    /* "xgetbv" 
     179       This is specified as raw instruction bytes due to some older compilers 
     180       having issues with the mnemonic form. 
     181    */ 
     182    ".byte 0x0f, 0x01, 0xd0": 
     183    "=a" (eax), 
     184    "=d" (edx) : 
     185    "c" (xcr) 
     186    ); 
     187  return ((uint64_t)edx << 32) | eax; 
     188} 
     189 
     190#endif /* defined(_MSC_FULL_VER) */ 
     191 
     192#ifndef _XCR_XFEATURE_ENABLED_MASK 
     193#define _XCR_XFEATURE_ENABLED_MASK 0x0 
     194#endif 
     195 
     196static blosc_cpu_features blosc_get_cpu_features(void) { 
     197  blosc_cpu_features result = BLOSC_HAVE_NOTHING; 
     198  int32_t max_basic_function_id; 
     199  /* Holds the values of eax, ebx, ecx, edx set by the `cpuid` instruction */ 
     200  int32_t cpu_info[4]; 
     201  int sse2_available; 
     202  int sse3_available; 
     203  int ssse3_available; 
     204  int sse41_available; 
     205  int sse42_available; 
     206  int xsave_available; 
     207  int xsave_enabled_by_os; 
     208  int avx2_available = 0; 
     209  int avx512bw_available = 0; 
     210  int xmm_state_enabled = 0; 
     211  int ymm_state_enabled = 0; 
     212  int zmm_state_enabled = 0; 
     213  uint64_t xcr0_contents; 
     214 
     215  /* Get the number of basic functions available. */ 
     216  __cpuid(cpu_info, 0); 
     217  max_basic_function_id = cpu_info[0]; 
     218 
     219  /* Check for SSE-based features and required OS support */ 
     220  __cpuid(cpu_info, 1); 
     221  sse2_available = (cpu_info[3] & (1 << 26)) != 0; 
     222  sse3_available = (cpu_info[2] & (1 << 0)) != 0; 
     223  ssse3_available = (cpu_info[2] & (1 << 9)) != 0; 
     224  sse41_available = (cpu_info[2] & (1 << 19)) != 0; 
     225  sse42_available = (cpu_info[2] & (1 << 20)) != 0; 
     226 
     227  xsave_available = (cpu_info[2] & (1 << 26)) != 0; 
     228  xsave_enabled_by_os = (cpu_info[2] & (1 << 27)) != 0; 
     229 
     230  /* Check for AVX-based features, if the processor supports extended features. */ 
     231  if (max_basic_function_id >= 7) { 
     232    __cpuid(cpu_info, 7); 
     233    avx2_available = (cpu_info[1] & (1 << 5)) != 0; 
     234    avx512bw_available = (cpu_info[1] & (1 << 30)) != 0; 
     235  } 
     236 
     237  /*  Even if certain features are supported by the CPU, they may not be supported 
     238      by the OS (in which case using them would crash the process or system). 
     239      If xsave is available and enabled by the OS, check the contents of the 
     240      extended control register XCR0 to see if the CPU features are enabled. */ 
     241#if defined(_XCR_XFEATURE_ENABLED_MASK) 
     242  if (xsave_available && xsave_enabled_by_os && ( 
     243      sse2_available || sse3_available || ssse3_available 
     244      || sse41_available || sse42_available 
     245      || avx2_available || avx512bw_available)) { 
     246    /* Determine which register states can be restored by the OS. */ 
     247    xcr0_contents = _xgetbv(_XCR_XFEATURE_ENABLED_MASK); 
     248 
     249    xmm_state_enabled = (xcr0_contents & (1UL << 1)) != 0; 
     250    ymm_state_enabled = (xcr0_contents & (1UL << 2)) != 0; 
     251 
     252    /*  Require support for both the upper 256-bits of zmm0-zmm15 to be 
     253        restored as well as all of zmm16-zmm31 and the opmask registers. */ 
     254    zmm_state_enabled = (xcr0_contents & 0x70) == 0x70; 
     255  } 
     256#endif /* defined(_XCR_XFEATURE_ENABLED_MASK) */ 
     257 
     258#if defined(BLOSC_DUMP_CPU_INFO) 
     259  printf("Shuffle CPU Information:\n"); 
     260  printf("SSE2 available: %s\n", sse2_available ? "True" : "False"); 
     261  printf("SSE3 available: %s\n", sse3_available ? "True" : "False"); 
     262  printf("SSSE3 available: %s\n", ssse3_available ? "True" : "False"); 
     263  printf("SSE4.1 available: %s\n", sse41_available ? "True" : "False"); 
     264  printf("SSE4.2 available: %s\n", sse42_available ? "True" : "False"); 
     265  printf("AVX2 available: %s\n", avx2_available ? "True" : "False"); 
     266  printf("AVX512BW available: %s\n", avx512bw_available ? "True" : "False"); 
     267  printf("XSAVE available: %s\n", xsave_available ? "True" : "False"); 
     268  printf("XSAVE enabled: %s\n", xsave_enabled_by_os ? "True" : "False"); 
     269  printf("XMM state enabled: %s\n", xmm_state_enabled ? "True" : "False"); 
     270  printf("YMM state enabled: %s\n", ymm_state_enabled ? "True" : "False"); 
     271  printf("ZMM state enabled: %s\n", zmm_state_enabled ? "True" : "False"); 
     272#endif /* defined(BLOSC_DUMP_CPU_INFO) */ 
     273 
     274  /* Using the gathered CPU information, determine which implementation to use. */ 
     275  /* technically could fail on sse2 cpu on os without xmm support, but that 
     276   * shouldn't exist anymore */ 
     277  if (sse2_available) { 
     278    result |= BLOSC_HAVE_SSE2; 
     279  } 
     280  if (xmm_state_enabled && ymm_state_enabled && avx2_available) { 
     281    result |= BLOSC_HAVE_AVX2; 
     282  } 
     283  return result; 
     284} 
     285#endif 
     286 
     287#else   /* No hardware acceleration supported for the target architecture. */ 
     288  #if defined(_MSC_VER) 
     289  #pragma message("Hardware-acceleration detection not implemented for the target architecture. Only the generic shuffle/unshuffle routines will be available.") 
     290  #else 
     291  #warning Hardware-acceleration detection not implemented for the target architecture. Only the generic shuffle/unshuffle routines will be available. 
     292  #endif 
     293 
     294static blosc_cpu_features blosc_get_cpu_features(void) { 
     295  return BLOSC_HAVE_NOTHING; 
     296} 
     297 
     298#endif 
     299 
     300static shuffle_implementation_t get_shuffle_implementation() { 
     301  blosc_cpu_features cpu_features = blosc_get_cpu_features(); 
     302  shuffle_implementation_t impl_generic; 
     303 
     304#if defined(SHUFFLE_AVX2_ENABLED) 
     305  if (cpu_features & BLOSC_HAVE_AVX2) { 
     306    shuffle_implementation_t impl_avx2; 
     307    impl_avx2.name = "avx2"; 
     308    impl_avx2.shuffle = (shuffle_func)shuffle_avx2; 
     309    impl_avx2.unshuffle = (unshuffle_func)unshuffle_avx2; 
     310    impl_avx2.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_avx2; 
     311    impl_avx2.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_avx2; 
     312    return impl_avx2; 
     313  } 
     314#endif  /* defined(SHUFFLE_AVX2_ENABLED) */ 
     315 
     316#if defined(SHUFFLE_SSE2_ENABLED) 
     317  if (cpu_features & BLOSC_HAVE_SSE2) { 
     318    shuffle_implementation_t impl_sse2; 
     319    impl_sse2.name = "sse2"; 
     320    impl_sse2.shuffle = (shuffle_func)shuffle_sse2; 
     321    impl_sse2.unshuffle = (unshuffle_func)unshuffle_sse2; 
     322    impl_sse2.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_sse2; 
     323    impl_sse2.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_sse2; 
     324    return impl_sse2; 
     325  } 
     326#endif  /* defined(SHUFFLE_SSE2_ENABLED) */ 
     327 
     328  /*  Processor doesn't support any of the hardware-accelerated implementations, 
     329      so use the generic implementation. */ 
     330  impl_generic.name = "generic"; 
     331  impl_generic.shuffle = (shuffle_func)shuffle_generic; 
     332  impl_generic.unshuffle = (unshuffle_func)unshuffle_generic; 
     333  impl_generic.bitshuffle = (bitshuffle_func)bshuf_trans_bit_elem_scal; 
     334  impl_generic.bitunshuffle = (bitunshuffle_func)bshuf_untrans_bit_elem_scal; 
     335  return impl_generic; 
     336} 
     337 
     338 
     339/*  Flag indicating whether the implementation has been initialized. 
     340    Zero means it hasn't been initialized, non-zero means it has. */ 
     341static int32_t implementation_initialized; 
     342 
     343/*  The dynamically-chosen shuffle/unshuffle implementation. 
     344    This is only safe to use once `implementation_initialized` is set. */ 
     345static shuffle_implementation_t host_implementation; 
     346 
     347/*  Initialize the shuffle implementation, if necessary. */ 
     348#if defined(__GNUC__) || defined(__clang__) 
     349__attribute__((always_inline)) 
     350#endif 
     351static 
     352#if defined(_MSC_VER) 
     353__forceinline 
     354#else 
     355inline 
     356#endif 
     357void init_shuffle_implementation() { 
     358  /* Initialization could (in rare cases) take place concurrently on 
     359     multiple threads, but it shouldn't matter because the 
     360     initialization should return the same result on each thread (so 
     361     the implementation will be the same). Since that's the case we 
     362     can avoid complicated synchronization here and get a small 
     363     performance benefit because we don't need to perform a volatile 
     364     load on the initialization variable each time this function is 
     365     called. */ 
     366#if defined(__GNUC__) || defined(__clang__) 
     367  if (__builtin_expect(!implementation_initialized, 0)) { 
     368#else 
     369  if (!implementation_initialized) { 
     370#endif 
     371    /* Initialize the implementation. */ 
     372    host_implementation = get_shuffle_implementation(); 
     373 
     374    /*  Set the flag indicating the implementation has been initialized. */ 
     375    implementation_initialized = 1; 
     376  } 
     377} 
     378 
     379/*  Shuffle a block by dynamically dispatching to the appropriate 
     380    hardware-accelerated routine at run-time. */ 
     381void 
     382shuffle(const size_t bytesoftype, const size_t blocksize, 
     383        const uint8_t* _src, const uint8_t* _dest) { 
     384  /* Initialize the shuffle implementation if necessary. */ 
     385  init_shuffle_implementation(); 
     386 
     387  /*  The implementation is initialized. 
     388      Dispatch to it's shuffle routine. */ 
     389  (host_implementation.shuffle)(bytesoftype, blocksize, _src, _dest); 
     390} 
     391 
     392/*  Unshuffle a block by dynamically dispatching to the appropriate 
     393    hardware-accelerated routine at run-time. */ 
     394void 
     395unshuffle(const size_t bytesoftype, const size_t blocksize, 
     396          const uint8_t* _src, const uint8_t* _dest) { 
     397  /* Initialize the shuffle implementation if necessary. */ 
     398  init_shuffle_implementation(); 
     399 
     400  /*  The implementation is initialized. 
     401      Dispatch to it's unshuffle routine. */ 
     402  (host_implementation.unshuffle)(bytesoftype, blocksize, _src, _dest); 
     403} 
     404 
     405/*  Bit-shuffle a block by dynamically dispatching to the appropriate 
     406    hardware-accelerated routine at run-time. */ 
     407int 
     408bitshuffle(const size_t bytesoftype, const size_t blocksize, 
     409           const uint8_t* const _src, const uint8_t* _dest, 
     410           const uint8_t* _tmp) { 
     411  int size = blocksize / bytesoftype; 
     412  /* Initialize the shuffle implementation if necessary. */ 
     413  init_shuffle_implementation(); 
     414 
     415  if ((size % 8) == 0) 
     416    /* The number of elems is a multiple of 8 which is supported by 
     417       bitshuffle. */ 
     418    return (int)(host_implementation.bitshuffle)((void*)_src, (void*)_dest, 
     419                                                 blocksize / bytesoftype, 
     420                                                 bytesoftype, (void*)_tmp); 
     421  else 
     422    memcpy((void*)_dest, (void*)_src, blocksize); 
     423  return size; 
     424} 
     425 
     426/*  Bit-unshuffle a block by dynamically dispatching to the appropriate 
     427    hardware-accelerated routine at run-time. */ 
     428int 
     429bitunshuffle(const size_t bytesoftype, const size_t blocksize, 
     430             const uint8_t* const _src, const uint8_t* _dest, 
     431             const uint8_t* _tmp) { 
     432  int size = blocksize / bytesoftype; 
     433  /* Initialize the shuffle implementation if necessary. */ 
     434  init_shuffle_implementation(); 
     435 
     436  if ((size % 8) == 0) 
     437    /* The number of elems is a multiple of 8 which is supported by 
     438       bitshuffle. */ 
     439    return (int)(host_implementation.bitunshuffle)((void*)_src, (void*)_dest, 
     440                                                   blocksize / bytesoftype, 
     441                                                   bytesoftype, (void*)_tmp); 
     442  else 
     443    memcpy((void*)_dest, (void*)_src, blocksize); 
     444  return size; 
     445} 
  • thirdparty/blosc/shuffle.h

    r00587dc r981e22c  
    11/********************************************************************* 
    2   Blosc - Blocked Suffling and Compression Library 
     2  Blosc - Blocked Shuffling and Compression Library 
    33 
    4   Author: Francesc Alted <f[email protected]> 
     4  Author: Francesc Alted <f[email protected]> 
    55 
    66  See LICENSES/BLOSC.txt for details about copyright and rights to use. 
    77**********************************************************************/ 
    88 
     9/*  Shuffle/unshuffle routines which dynamically dispatch to hardware- 
     10    accelerated routines based on the processor's architecture. 
     11    Consumers should almost always prefer to call these routines instead 
     12    of directly calling one of the hardware-accelerated routines, since 
     13    these are cross-platform and future-proof. */ 
    914 
    10 /* Shuffle/unshuffle routines */ 
     15#ifndef SHUFFLE_H 
     16#define SHUFFLE_H 
    1117 
    12 void shuffle(size_t bytesoftype, size_t blocksize, 
    13              unsigned char* _src, unsigned char* _dest); 
     18#include "shuffle-common.h" 
    1419 
    15 void unshuffle(size_t bytesoftype, size_t blocksize, 
    16                unsigned char* _src, unsigned char* _dest); 
     20#ifdef __cplusplus 
     21extern "C" { 
     22#endif 
     23 
     24/** 
     25  Primary shuffle and bitshuffle routines. 
     26  This function dynamically dispatches to the appropriate hardware-accelerated 
     27  routine based on the host processor's architecture. If the host processor is 
     28  not supported by any of the hardware-accelerated routines, the generic 
     29  (non-accelerated) implementation is used instead. 
     30  Consumers should almost always prefer to call this routine instead of directly 
     31  calling the hardware-accelerated routines because this method is both cross- 
     32  platform and future-proof. 
     33*/ 
     34BLOSC_NO_EXPORT void 
     35shuffle(const size_t bytesoftype, const size_t blocksize, 
     36        const uint8_t* _src, const uint8_t* _dest); 
     37 
     38BLOSC_NO_EXPORT int 
     39bitshuffle(const size_t bytesoftype, const size_t blocksize, 
     40           const uint8_t* const _src, const uint8_t* _dest, 
     41           const uint8_t* _tmp); 
     42 
     43/** 
     44  Primary unshuffle and bitunshuffle routine. 
     45  This function dynamically dispatches to the appropriate hardware-accelerated 
     46  routine based on the host processor's architecture. If the host processor is 
     47  not supported by any of the hardware-accelerated routines, the generic 
     48  (non-accelerated) implementation is used instead. 
     49  Consumers should almost always prefer to call this routine instead of directly 
     50  calling the hardware-accelerated routines because this method is both cross- 
     51  platform and future-proof. 
     52*/ 
     53BLOSC_NO_EXPORT void 
     54unshuffle(const size_t bytesoftype, const size_t blocksize, 
     55          const uint8_t* _src, const uint8_t* _dest); 
     56 
     57 
     58BLOSC_NO_EXPORT int 
     59bitunshuffle(const size_t bytesoftype, const size_t blocksize, 
     60             const uint8_t* const _src, const uint8_t* _dest, 
     61             const uint8_t* _tmp); 
     62 
     63#ifdef __cplusplus 
     64} 
     65#endif 
     66 
     67#endif /* SHUFFLE_H */ 
Note: See TracChangeset for help on using the changeset viewer.