[00587dc] | 1 | Blosc supports threading |
---|
| 2 | ======================== |
---|
| 3 | |
---|
| 4 | Threads are the most efficient way to program parallel code for |
---|
| 5 | multi-core processors, but also the more difficult to program well. |
---|
| 6 | Also, they has a non-negligible start-up time that does not fit well |
---|
| 7 | with a high-performance compressor as Blosc tries to be. |
---|
| 8 | |
---|
| 9 | In order to reduce the overhead of threads as much as possible, I've |
---|
| 10 | decided to implement a pool of threads (the workers) that are waiting |
---|
| 11 | for the main process (the master) to send them jobs (basically, |
---|
| 12 | compressing and decompressing small blocks of the initial buffer). |
---|
| 13 | |
---|
| 14 | Despite this and many other internal optimizations in the threaded |
---|
| 15 | code, it does not work faster than the serial version for buffer sizes |
---|
| 16 | around 64/128 KB or less. This is for Intel Quad Core2 (Q8400 @ 2.66 |
---|
| 17 | GHz) / Linux (openSUSE 11.2, 64 bit), but your mileage may vary (and |
---|
| 18 | will vary!) for other processors / operating systems. |
---|
| 19 | |
---|
| 20 | In contrast, for buffers larger than 64/128 KB, the threaded version |
---|
| 21 | starts to perform significantly better, being the sweet point at 1 MB |
---|
| 22 | (again, this is with my setup). For larger buffer sizes than 1 MB, |
---|
| 23 | the threaded code slows down again, but it is probably due to a cache |
---|
| 24 | size issue and besides, it is still considerably faster than serial |
---|
| 25 | code. |
---|
| 26 | |
---|
| 27 | This is why Blosc falls back to use the serial version for such a |
---|
| 28 | 'small' buffers. So, you don't have to worry too much about deciding |
---|
| 29 | whether you should set the number of threads to 1 (serial) or more |
---|
| 30 | (parallel). Just set it to the number of cores in your processor and |
---|
| 31 | your are done! |
---|
| 32 | |
---|
| 33 | Francesc Alted |
---|