1 | Blosc supports threading |
---|
2 | ======================== |
---|
3 | |
---|
4 | Threads are the most efficient way to program parallel code for |
---|
5 | multi-core processors, but also the more difficult to program well. |
---|
6 | Also, they has a non-negligible start-up time that does not fit well |
---|
7 | with a high-performance compressor as Blosc tries to be. |
---|
8 | |
---|
9 | In order to reduce the overhead of threads as much as possible, I've |
---|
10 | decided to implement a pool of threads (the workers) that are waiting |
---|
11 | for the main process (the master) to send them jobs (basically, |
---|
12 | compressing and decompressing small blocks of the initial buffer). |
---|
13 | |
---|
14 | Despite this and many other internal optimizations in the threaded |
---|
15 | code, it does not work faster than the serial version for buffer sizes |
---|
16 | around 64/128 KB or less. This is for Intel Quad Core2 (Q8400 @ 2.66 |
---|
17 | GHz) / Linux (openSUSE 11.2, 64 bit), but your mileage may vary (and |
---|
18 | will vary!) for other processors / operating systems. |
---|
19 | |
---|
20 | In contrast, for buffers larger than 64/128 KB, the threaded version |
---|
21 | starts to perform significantly better, being the sweet point at 1 MB |
---|
22 | (again, this is with my setup). For larger buffer sizes than 1 MB, |
---|
23 | the threaded code slows down again, but it is probably due to a cache |
---|
24 | size issue and besides, it is still considerably faster than serial |
---|
25 | code. |
---|
26 | |
---|
27 | This is why Blosc falls back to use the serial version for such a |
---|
28 | 'small' buffers. So, you don't have to worry too much about deciding |
---|
29 | whether you should set the number of threads to 1 (serial) or more |
---|
30 | (parallel). Just set it to the number of cores in your processor and |
---|
31 | your are done! |
---|
32 | |
---|
33 | Francesc Alted |
---|