1,227 Posts served
5,086 Conversations started
|
Arch is the architect of Threading Building Blocks. He was the lead developer for KAI C++. At Shell he worked on seismic imaging on a 256 node nCUBE. He has a Ph.D. in computer science from the University of Illinois. |
The TBB class task was designed for high-performance implementations of the TBB templates. It's efficiency, particularly its emphasis on continuation-passing style, comes at some price in convenience. Rick Molloy of Microsoft has posted a description of a task_group interface that Microsoft is considering. It's more convenient for than the TBB interface, particularly when your compiler supports C++ [...]
TBB started out as a task-based framework for parallel programming. TBB 2.1 adds threads. This note explains the new threading interface, when to use it, and when to use tasks instead. TBB tasks rely on non-preemptive cooperative scheduling based on work stealing, similar to Cilk. Once the TBB scheduler starts a task on a software thread, [...]
[Disclaimer: I'm sketching possibilities here. There is no commitment from the TBB group to implement any of this.] Threading packages often have some notion of a thread id or thread local storage. The two are equivalent in the sense if given one, you can easily build the other. For example, thread local storage can be implemented [...]
I've been asked several times why TBB does not have a concurrent list class; i.e., a list that supports concurrent access. The answer is that we'd add one if: We could figure out semantics that are useful for parallel programming and We could implement it reasonably efficiently on current hardware. I usually try to avoid linked [...]
There is a widespread notion that the keyword volatile is good for multi-threaded programming. I've seen interfaces with volatile qualifiers justified as "it might be used for multi-threaded programming". I thought was useful until the last few weeks, when it finally dawned on me (or if you prefer, got through my thick head) that volatile [...]
I'm back from Supercomputing '07. Gone is the heyday of wacky hyper-dimensional topologies and strange new architectures like the Connection Machine. Clusters built from commodity parts have become the ubiquitous denizens of the High Performance Computing (HPC) ecological niche. The Precambrian explosion seems to be over. The environment directs evolution. It was [...]
"Give people a fish, and you feed them for a day. Teach people how to fish, and you feed them forever." This blog does both. A recurring question from TBB users is how to break from a parallel loop. This blog shows one way to do it, by writing a new kind of range type. It's not a perfect solution, but [...]
If you are trick-or-treating tonight, and visiting the house of a parallel programming researcher, just dress up as one of the bogeymen listed here and you'll give them a good scare! Seriously, in discussions of new languages for parallel programming, pundits trot out various bogeymen, declare them evil, and imply that by removing them, [...]
My previous blog discussed exceptions as alternative control flow versus exceptions as alternative data values. Here I'll take that notion further and sketch how I think the TBB task scheduler should be deal with exceptions. A quick review of the TBB task scheduler. It's the low-level engine that drives the high-level templates like parallel_for. The high-level [...]
Exception handling is one of the big improvements of C++ over C. C code that checks for erroneous or unusual conditions is littered with tests for those conditions, making programs harder to read. Worse yet, the programmer might forget to check one of those conditions. Exceptions eliminated most of those problems, albeit [...]
Last week I showed how cache affinity support might be supported by the high-level algorithm templates. Here's what the low-level task interface might look like. There would be a new subclass of class task for tasks with affinity, called task_with_affinity. As much as I would prefer to avoid subclassing here, we have to because introducing new [...]
I'm currently working on improving cache reuse (cache affinity) for loops in TBB. This note describes why it is being improved, and the direction of the high-level interface for cache affinity in TBB. The TBB task scheduler is based upon task stealing a la Cilk. If a processor runs out of work to do, it randomly [...]
There seems to be a widespread notion that in order to do parallel programming, you have to write an inherently parallel program. By inherently parallel, I mean one that must have more than one thread to run correctly. A simple example is a producer-consumer program with a bounded buffer. A single thread cannot execute such [...]
Clay's blog http://softwarecommunity.intel.com/ISN/Community/en-us/blogs/multi-core-thredmonkey/archive/2006/12/18/30228042.aspx asks if Intel® Threading Building Blocks [Intel® TBB] is a solution looking for a problem. OpenMP is great if you have Fortran code, or C code that looks like Fortran, or C++ that looks like Fortran. In other words, flat do-loop centric parallelism. With TBB, we're trying to go beyond that and [...]
I recently updated my video game Frequon Invaders. It's a free download from http://home.comcast.net/~arch.robison/frequon.html , which is strictly my own product, not Intel's. In doing the update, I optimized it for Intel® Core™2 Duo processor, and ran into a tale of dependence breaking that I'll tell here. Frequon Invaders has a very unusual display for a [...]
The parallel loop templates in Intel® TBB require a grainsize parameter. Ideally, we'd have some sort of profile-guided optimization. But that's tough to do within TBB's goal of working with standard-issue compilers. It's really not that difficult to understand and set. I had this analogy in a draft of the Tutorial, but it ended up on [...]
Beginning programmers often expect linear speedup in parallel programs, thinking two cores should be twice as fast as one. Of course, this is usually not the case. Limitations such as serial portions, bus bandwidth, cache, communication etc. start to kick in. For Amdahl's law kicks in, the answer is the Gustafson-Basis Law http://en.wikipedia.org/wiki/Gustafson's_Law - more cores [...]
I suspect blogs are like poetry - more are written than read. I've had this specific posting lost twice now by the system, so I'll need a total of three readers to break even. I'm the lead developer for Intel® Threading Building Blocks (Intel® TBB). I've been pondering whether TBB should have more direct support for [...]
Well, my first blog, many lines, was completely lost. I should have applied two key lessons software engineering: Run unit tests. I.e., try a small blog like this first. Keep backups. Copy and paste the blog content somewhere else. - Arch