<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Intel® Software Network Blogs &#187; Threading Building Blocks</title>
	<link>http://softwareblogs.intel.com</link>
	<description></description>
	<pubDate>Sat, 17 May 2008 04:17:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>Reaching out to the Academic Community</title>
		<link>http://softwareblogs.intel.com/2008/05/09/reaching-out-to-the-academic-community/</link>
		<comments>http://softwareblogs.intel.com/2008/05/09/reaching-out-to-the-academic-community/#comments</comments>
		<pubDate>Fri, 09 May 2008 20:38:20 +0000</pubDate>
		<dc:creator>Paul Steinberg (Intel)</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<category><![CDATA[Graphics]]></category>

		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Social Media &amp; Virtual Worlds]]></category>

		<category><![CDATA[Software Engineering]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<category><![CDATA[University Curriculum]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/05/09/reaching-out-to-the-academic-community/</guid>
		<description><![CDATA[I am happy to launch my first Blog post as a member of Intel's Software College and Academic Community.  Much of this first post is  introductionary &#38; I am looking for your feedback.
 I am working with some of the brightest folks here at Intel, our subject matter experts and architects, such as Clay Breshears, Michael Wrinn, Bob [...]]]></description>
			<content:encoded><![CDATA[<p>I am happy to launch my first Blog post as a member of Intel's Software College and Academic Community.  Much of this first post is  introductionary &amp; I am looking for your feedback.</p>
<p> I am working with some of the brightest folks here at Intel, our subject matter experts and architects, such as <a href="http://softwareblogs.intel.com/author/clay-breshears/">Clay Breshears</a>, <a href="http://softwareblogs.intel.com/author/michael-wrinn/">Michael Wrinn</a>, <a href="http://softwareblogs.intel.com/author/robert-chesebrough/">Bob Chesebrough </a>and <a href="http://softwareblogs.intel.com/2007/06/28/tim-mattson-on-parallel-computing-at-the-researchintel-blog/">Tim Mattson</a> (amongst others).  I will also be working closely with the indomitable <a href="http://softwareblogs.intel.com/2008/04/28/the-academic-community-has-a-new-face-to-support-you/">Wolfgang Rosenberg</a>, manager of the <a href="http://softwarecollege.intel.com/academic/">Intel Academic Community. </a></p>
<p>My job is to reach out to educators and researchers around the world, to connect them with Intel experts and to help foster development of a curriculum to educate the next generation of programmers and engineers on the newest compute platforms.</p>
<p><strong>Hopefully, this blog will go a long way to opening up channels of communication</strong> </p>
<p>We have a number of events and initiatives planned for this year. </p>
<p>We have already started our monthly <a href="http://softwarecommunity.intel.com/articles/eng/3760.htm"><strong>Academic Community Curriculum Webinar Series.</strong></a>  During these webinars, we discuss the newest curriculum topics.  It is a great way to speak directly with our course architects.  I moderate the series and I very much look forward to speaking with you there soon.</p>
<p><strong>The next in the series is on May 15 on multi-core design patterns.  Please Register below.</strong></p>
<p><img border="0" width="1" src="http://softwareblogs.intel.com/wordpress/wp-admin/" height="1" /><a href="http://w.on24.com/r.htm?e=106752&amp;s=1&amp;k=C24BFCF31A05EC4A82F51D6234DA4D71&amp;partnerref=MyBlog"><img border="0" width="312" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/Webinar.jpg" height="200" /></a></p>
<p> <a href="http://w.on24.com/r.htm?e=106752&amp;s=1&amp;k=C24BFCF31A05EC4A82F51D6234DA4D71&amp;partnerref=MyBlog">Register or view past event here</a>.</p>
<p>------------------------------------------</p>
<p>We are creating quite a few short <a href="http://softwarecommunity.intel.com/videos/home.aspx?fn=1484&amp;Category=MultiCore"><strong>videos</strong></a> supporting our academic efforts.</p>
<p><img border="0" width="1" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/vids.jpg" height="1" /><img border="0" width="393" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/vids.jpg" height="167" /></p>
<p> I'm in the process now of filming a series on threading topics with an emphasis on game development and visual computing.  So far only the first title on <a href="http://softwarecommunity.intel.com/videos/home.aspx?fn=1485">Optimizing for DirectX</a> is posted, but the rest will be available soon.</p>
<p><strong>Is this type of content useful?  Are there better ways to scale out our knowledge and build conversation?  I'd like to hear that from you.</strong></p>
<p> I've asked around internally as to how folks like to consume information.  As you might imagine, there were a wide-range of responses.  Tim Mattson just rolled his eyes when I started to talk about videos and webinars.  While he is a great presenter, his own preference is to just download the PowerPoint or code and have done with it.</p>
<p>Others, myself included, prefer a richer content set.  For me, nothing beats the immediacy of a live event.  That is one reason we have our monthly webinars.  I am also quite interested in convening smaller conversations, perhaps using something like Communicator or Live Meeting, to discuss specific topics or curriculum ideas.  Let me know by responding to this blog.</p>
<p> ---------------------------------------------</p>
<p>Finally, I've become very interested in different forms of new media.  I'm often available on <a href="http://www.twitter.com">Twitter</a> -find me as @psteinb.</p>
<p>I am the owner of the <a href="http://tinyurl.com/34chl9">Intel Software Second Life Island. </a></p>
<p><a href="http://tinyurl.com/34chl9"> <img border="0" width="159" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/psteinb/PeretzVerySmall.JPG" height="119" /></a></p>
<p>IM me on Second Life as Peretz Stine.</p>
<p>Check out our <a href="http://www.youtube.com/watch?v=iWfIJWaCzrA">launch video.</a></p>
<p><a href="http://www.youtube.com/watch?v=iWfIJWaCzrA"><img border="0" width="256" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/psteinb/launchSM.jpg" height="210" /></a></p>
<p><img border="0" width="1" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/Peretz.bmp" height="1" />Over the last year, we ran an event series on our Second Life island dedicated to engaging engineers and professionals around the world in conversation on this unique environment.  That program, sadly, is ended, but you can still view much of it <a href="http://softwarecommunity.intel.com/articles/eng/3712.htm">here:</a></p>
<p><a href="http://softwarecommunity.intel.com/articles/eng/3712.htm"><img border="0" width="401" src="http://softwarecommunity.intel.com/UserFiles/en-us/Image/psteinb/IntelMetaverse2.jpg" height="338" /></a></p>
<p>Are you interested in meeting on Second Life or other virtual worlds?  It can be arranged.</p>
<p> Well that's enough for now -you have you orders -tell me how best to foster dialogue.  I'll be working as hard as I can, but you are the whole point.  Let's start the conversation.</p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/05/09/reaching-out-to-the-academic-community/feed/</wfw:commentRss>
		</item>
		<item>
		<title>TBB on Sun Solaris*</title>
		<link>http://softwareblogs.intel.com/2008/05/09/tbb-on-sun-solaris/</link>
		<comments>http://softwareblogs.intel.com/2008/05/09/tbb-on-sun-solaris/#comments</comments>
		<pubDate>Fri, 09 May 2008 14:47:45 +0000</pubDate>
		<dc:creator>David Sekowski (Intel)</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/05/09/tbb-on-sun-solaris/</guid>
		<description><![CDATA[Hello, my name is Dave Sekowski. I am a program manager at Intel working on the Threading Building Blocks (TBB) project. This week I had an opportunity to talk with Chris Huson, one of the TBB developers who worked with Sun Microsystems to port TBB to Sun Solaris, about the collaborative effort to enable Solaris [...]]]></description>
			<content:encoded><![CDATA[<p>Hello, my name is Dave Sekowski. I am a program manager at Intel working on the Threading Building Blocks (TBB) project. This week I had an opportunity to talk with Chris Huson, one of the TBB developers who worked with Sun Microsystems to port TBB to Sun Solaris, about the collaborative effort to enable Solaris developers with TBB.</p>
<p><strong><em>Dave Sekowski: </em></strong>We recently announcement with Sun Microsystems that we have made TBB available on Sun Solaris* using Sun Studio* compilers. What exactly went into the patches we made to TBB?<br />
<strong><em>Chris Huson: </em></strong>One set of changes was the addition of using statements in the test system, to accommodate slight differences in the header files; most of these were incorporated as-is after some discussion. Sun also disabled the "warning is error" switch in the build. We use it to be pedantic about the code, but the Sun compiler found different things to warn about. A change disabling the switch for Sun was put into mainline. The other major change was to add SunOS to the preprocessor statements which were for Linux.</p>
<p><strong><em>Dave Sekowski: </em></strong>In porting TBB to Sun Solaris with Sun Studio what needed to be changed and why?<br />
<strong><em>Chris Huson: </em></strong>The change needing review on the Sun side was support for the stricter Sun support for standard library functions. Some of these functions are in the global namespace on some platforms, and in the std:: namespace on others (including Sun). Vladimir Polin incorporated Sun's modifications in a way that also supported older platforms, and Sun reviewed and approved those changes.</p>
<p><strong><em>Dave Sekowski: </em></strong>Can you tell me a little about building it with Sun Studio compilers and working it into our regular build, test and release flow?<br />
<strong><em>Chris Huson: </em></strong>The process was virtually identical to the Linux build, especially after we started using the Sun Studio Express compiler. The system is now being incorporated into our nightly build and test system, with no major problems so far.</p>
<p><strong><em>Dave Sekowski:</em></strong> Do you have any additional comments on how it went?<br />
<strong><em>Chris Huson:</em></strong> The changes were small, and incorporating them into the current version of TBB was pretty painless. </p>
<p><em><strong>Dave Sekowski:</strong></em> Finally, what did you think about the collaboration with Sun to make this port happen?<br />
<em><strong>Chris Huson:</strong></em> The Sun developers did a great job in porting TBB to their platform. Because they adhered to the spirit of the design of TBB, incorporating those changes was an easy job.</p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/05/09/tbb-on-sun-solaris/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Porting OpenMP SPEC benchmarks to TBB.</title>
		<link>http://softwareblogs.intel.com/2008/05/08/porting-openmp-spec-benchmarks-to-tbb/</link>
		<comments>http://softwareblogs.intel.com/2008/05/08/porting-openmp-spec-benchmarks-to-tbb/#comments</comments>
		<pubDate>Thu, 08 May 2008 15:49:29 +0000</pubDate>
		<dc:creator>Alexey Murashov (Intel)</dc:creator>
		
		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/05/08/porting-openmp-spec-benchmarks-to-tbb/</guid>
		<description><![CDATA[Greetings, everyone! I would like to share my experience in porting OpenMP applications to the Threading Building Blocks library. Last year I managed to port some of the SPEC OMP 2001 benchmarks to TBB. It's a well-known benchmark suite in the multi-threading world, that's why it was chosen. There are three benchmarks written in C: [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings, everyone! I would like to share my experience in porting OpenMP applications to the Threading Building Blocks library. Last year I managed to port some of the SPEC OMP 2001 benchmarks to TBB. It's a well-known benchmark suite in the multi-threading world, that's why it was chosen. There are three benchmarks written in C: 320.equake, 330.art and 332.ammp. TBB is C++ library, so the other Fortran-based applications in the SPEC OMP suite are not so easily ported.</p>
<p><strong>From OpenMP to TBB - some general tips</strong></p>
<p>Basically, OpenMP applications contain pragma directives to specify parallelism. They are required to parallelize loops, provide locks and protect data. To port to TBB, we needed to match OpenMP pragmas to TBB classes and templates. Let's consider this very simple loop from 320.equake test:</p>
<p>#pragma omp parallel for private(i)<br />
    for (i = 0; i &lt; nodes; i++) {<br />
      w2[j][i] = 0;<br />
    }<br />
  }</p>
<p>OpenMP executes the loop in parallel; each thread has its own instance of variable "i". The analogous TBB code is shown below - I encapsulated the loop body in a function object that is passed to the template function tbb::parallel_for. The "i" variable is used as the loop variable in the internal loop in operator().</p>
<p>class SVMPLoop1Class<br />
{<br />
    int m_j;<br />
public:<br />
    SVMPLoop1Class (int j):m_j(j) {};<br />
    void operator () ( const tbb::blocked_range&lt;int&gt;&amp; range) const{<br />
        for ( int i = range.begin(); i != range.end(); ++i )<br />
            w2[m_j][i] = 0;<br />
    }<br />
};</p>
<p>SVMPLoop1Class svmp1loop(j);<br />
tbb::parallel_for(tbb::blocked_range&lt;int&gt;(0, nodes), svmp1loop, tbb::auto_partitioner());</p>
<p>TBB adds one more class to the code so it looks more complicated than the original code.  However the tbb::parallel_for call itself looks similar to the original loop. Note that I used tbb::auto_partitioner here to allow TBB to choose the grain size automatically.<br />
The scheme here is common for  #pragma omp parallel for directives. If you have several variables in private directive, just move them to local variables inside the loop function. Shared variables can be copied into data members, like j variable in the example above. The SPEC applications excessively use global variables; I found replacing them by local ones works fine in many cases.<br />
Explicit OpenMP locks can be replaced by TBB mutexes, for example tbb::spin_mutex. Like in this example from 332.ammp:</p>
<p><strong>OpenMP code:</strong></p>
<p>      omp_set_lock(&amp;(a1-&gt;lock));<br />
      ...<br />
      omp_unset_lock(&amp;(a1-&gt;lock));</p>
<p><strong>TBB code:</strong></p>
<p>{<br />
      // scoped_lock acquires lock on creation<br />
      // and releases it automatically after leaving the scope<br />
      tbb::spin_mutex::scoped_lock lock (a1-&gt;spin_mutex);<br />
      ...<br />
}</p>
<p>This construction is quite simple too.</p>
<p><strong>Resolving thread ID dependency - tbb::parallel_reduce instead of tbb::parallel_for.</strong></p>
<p> While I was able to replace most omp parallel for directives using the approach above, I found the loop in 320.quake that finds the hypocenter and epicenter to be more complicated. It separates the data between threads, finds out the minimum value for each thread and after that finds the global minimum. The thread ID received using omp_get_thread_num() is used to protect the data from races. Using thread IDs doesn't fit well into the higher-level task-based programming model used by TBB. I found tbb::parallel_reduce suits here. An algorithm that finds a global minimum from per-thread minima is a good example of a reduction. Here is the example from 320.quake benchmark (I skipped some code just to simplify things):</p>
<p><strong>OpenMP code:</strong></p>
<p>#pragma omp parallel private(my_cpu_id,d1,d2,c0)<br />
{<br />
   my_cpu_id=omp_get_thread_num();// thread ID<br />
   bigdist1[my_cpu_id]=1000000.0; // minimum per thread<br />
   temp1[my_cpu_id]=-1; // node for which the minimum was found<br />
#pragma omp for<br />
     for (i = 0; i &lt; ARCHnodes; i++) {<br />
        d1 = distance(...);<br />
        if (d1 &lt; bigdist1[my_cpu_id]) {<br />
           bigdist1[my_cpu_id] = d1;<br />
           temp1[my_cpu_id] = i;<br />
        }<br />
    }<br />
}</p>
<p>// finding out the global minimum<br />
numthreads = omp_get_max_threads();<br />
d1=bigdist1[0];<br />
Src.sourcenode=temp1[0];<br />
for (i=0;i&lt;numthreads;i++) // minimum per thread<br />
{<br />
   if (bigdist1[i] &lt; d1) {<br />
      d1=bigdist1[i];<br />
      Src.sourcenode = temp1[i];<br />
   }<br />
}</p>
<p><strong>TBB code:</strong></p>
<p>struct SearchEpicenterAndHypocenter<br />
{<br />
    double m_bigdist1;<br />
    struct source *m_Src;<br />
    int m_temp1;</p>
<p>    SearchEpicenterAndHypocenter(struct source * Src): m_bigdist1(1000000.0), m_temp1(-1), m_Src (Src) {}</p>
<p>    SearchEpicenterAndHypocenter(SearchEpicenterAndHypocenter &amp;search, tbb::split)<br />
    {<br />
        m_Src = search.m_Src;<br />
        m_bigdist1 =1000000.0;<br />
        m_temp1 = -1;<br />
    }</p>
<p>    void operator () ( const tbb::blocked_range&lt;int&gt;&amp; range)<br />
    {<br />
        for (int i = range.begin(); i != range.end(); ++i)<br />
        {<br />
            // The function finds out the local minimum<br />
            SearchCycle (i, m_temp1, m_bigdist1, m_Src);<br />
        }<br />
    }</p>
<p>    void join (SearchEpicenterAndHypocenter &amp;search)<br />
    {<br />
        if (search.m_bigdist1 &lt; m_bigdist1) {<br />
            m_bigdist1 = search.m_bigdist1;<br />
            m_temp1 = search.m_temp1;<br />
        }<br />
    }<br />
};</p>
<p>SearchEpicenterAndHypocenter searchcenters (&amp;Src);<br />
tbb::parallel_reduce(tbb::blocked_range&lt;int&gt;(0, ARCHnodes), searchcenters, tbb::auto_partitioner());<br />
Src.sourcenode = searchcenters.m_temp1;</p>
<p>As usual, the TBB version creates an additional function object that is passed to the template function  tbb::parallel_reduce. Due to the ability to use any user-defined code for reduction, parallel_reduce does not need the thread ID based trick used with OpenMP. So the thread ID dependency problem was naturally resolved.</p>
<p><strong>Resolving thread ID dependency - placing a thread ID dependent buffer inside the tbb::parallel_for and tbb::parallel_reduce </strong></p>
<p>The above approach won't resolve all thread ID dependencies. I found it doesn't work when I ported other SPEC benchmarks: 330.art and 332.ammp. Parallel loops in these benchmarks use some global buffers , which are protected from data races using thread IDs. Here is an example of such a buffer from the 330.art test:</p>
<p>f1_neuron **f1_layer; // The buffer protected by thread ID<br />
int numthreads = omp_get_max_threads();<br />
f1_layer = (f1_neuron **)malloc(numthreads * sizeof (f1_neuron*));</p>
<p>#pragma omp parallel for private (i)<br />
   for (i=0;i&lt;numthreads;i++)<br />
       f1_layer[i] = (f1_neuron*) malloc(numf1s * sizeof(f1_neuron));</p>
<p>// After that each thread works with its own part of f1_layer buffer.<br />
o = omp_get_thread_num();</p>
<p>#pragma omp for private (k,m,n, gPassFlag) schedule(dynamic)<br />
    for (ij = 0; ij &lt; ijmx; ij++)<br />
    { <br />
       j = ((ij/inum) * gStride) + gStartY;<br />
       i = ((ij%inum) * gStride) +gStartX;<br />
       k=0;<br />
       for (m=j;m&lt;(gLheight+j);m++)<br />
         for (n=i;n&lt;(gLwidth+i);n++)<br />
           f1_layer[o][k++].I[0] = cimage[m][n];<br />
       ...<br />
    }</p>
<p>This approach is very simple but tricky, and doesn't work with TBB because we have no access to thread IDs. So I had to implement another approach - such buffers can be moved inside the parallel_for and parallel_reduce function objects as members. Each task will operate with the buffer from its copy of the body object, and so thread safety won't be violated. Here is an implementation example from 332.ammp benchmark (I skipped some code to shorten the example):</p>
<p>//this class is for reduce version of the main loop<br />
class MMFVUpdateLoopReduceClass<br />
{<br />
  ATOM **m_atomall; //This buffer was threadID dependent in OpenMP code<br />
  public:<br />
  MMFVUpdateLoopReduceClass(...)<br />
    {<br />
        //Allocating memory for the buffer in constructor<br />
        m_atomall = (ATOM**) malloc( m_natoms * sizeof(ATOM *) );<br />
        ...<br />
    }</p>
<p>    //This method is emtpy, it doesn't do the exact reduction<br />
    void join (MMFVUpdateLoopReduceClass &amp;reduceloop) {}</p>
<p>    // Splitting constructor<br />
    MMFVUpdateLoopReduceClass(MMFVUpdateLoopReduceClass &amp;reduceloop, tbb::split)<br />
    {<br />
        //The buffer is allocated in splitting constructor too<br />
        m_atomall = (ATOM**) malloc( m_natoms * sizeof(ATOM *) );<br />
    }</p>
<p>    void operator () ( const tbb::blocked_range&lt;int&gt;&amp; range) const{<br />
        for ( int ii = range.begin(); ii != range.end(); ++ii )<br />
        {<br />
            // do some work using m_atomall<br />
        }<br />
    }</p>
<p>    ~MMFVUpdateLoopReduceClass()<br />
    {<br />
        free(m_atomall);<br />
    }<br />
};</p>
<p>// The parallel loop<br />
  MMFVUpdateLoopReduceClass loop(...);<br />
  tbb::parallel_reduce(tbb::blocked_range&lt;int&gt;(0, jj), loop, tbb::auto_partitioner());</p>
<p>You probably noticed that tbb::parallel_reduce doesn't do any reduction here. Why not just use parallel_for? The reason is due to a performance difference. Note that we placed memory allocation in the constructor. It is heavy operation in terms of performance, so if the constructor is called frequently, you get a performance degradation. I found that TBB does fewer constructor calls in tbb::parallel_reduce compared to tbb::parallel_for, so parallel_reduce with an empty "join" method is more efficient here.</p>
<p>I was able to get a slight performance improvement with TBB over OpenMP - even though that was not the purpose of this exercise.  Finally, I hope my experience will help you to add TBB support to your applications :-).</p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/05/08/porting-openmp-spec-benchmarks-to-tbb/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Under the hood: Learning more about task scheduling</title>
		<link>http://softwareblogs.intel.com/2008/05/06/under-the-hood-learning-more-about-task-scheduling/</link>
		<comments>http://softwareblogs.intel.com/2008/05/06/under-the-hood-learning-more-about-task-scheduling/#comments</comments>
		<pubDate>Wed, 07 May 2008 00:24:25 +0000</pubDate>
		<dc:creator>Robert Reed (Intel)</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/05/06/under-the-hood-learning-more-about-task-scheduling/</guid>
		<description><![CDATA[I’m back with another challenge, encountered during my support work for Intel® Threading Building Blocks.  I’ve been working with several TBB users who appreciate the general philosophy of Cilk task scheduling embodied in TBB but have run into some practical challenges applying it to their applications.  Often the issue revolves around the need to block [...]]]></description>
			<content:encoded><![CDATA[<p>I’m back with another challenge, encountered during my support work for Intel® Threading Building Blocks.  I’ve been working with several TBB users who appreciate the general philosophy of <a href="http://supertech.csail.mit.edu/cilk/">Cilk</a> task scheduling embodied in TBB but have run into some practical challenges applying it to their applications.  Often the issue revolves around the need to block some computations until other computations complete.  It may be that they need to handle either inter-object or intra-object threading—their application may at different times encounter a bunch of objects to run in parallel or just one big object that could use several threads to chew on simultaneously.  Or they may have objects they need to compute upon which other objects are dependent—here it would be great to suspend dependent object processing while the weight of the associated threads is thrown to computing the shared object.</p>
<p>These are tough problems and I don’t have answers for them yet.  But I have a few ideas and I hope in the next few posts to share them and get community feedback that might inspire some solutions.  So bear with me as I stumble about and maybe we can all learn something.</p>
<p>Anyone who has had much exposure to presentations about Intel TBB has probably heard in one form or another the motto, “process locally, steal globally.”  The Cilk philosophy embodied in the TBB task scheduler is to constrain active threads to their own local region in memory to exploit any memory that may already reside in the caches of the processing element running that thread; meanwhile, idle threads should try to steal work from memory regions as yet untouched by the active thread(s) to avoid interrupting those running threads.</p>
<p>When one of my customers presented an example using a nested pair of parallel_for statements, I realized that all my knowledge on subject of scheduling was theoretical: I hadn’t dug down into the guts of this code.  Now might be the time. </p>
<p>First, here’s a sketch of the test code.  You’ll note an outer and inner loop with a lock on the outer and some spinning work on the inner.  This is intended to represent the structure of a program operating on a set of objects that block because of some postulated resource contention, and the inner loop represents the work done to compute that shared resource.  You’ll see that the lock is currently commented out.  When running this code on an 8-core machine, it will sometimes lock up.  Some ideas have been bounced around about why, but I’m curious whether I can demonstrate the workings of the problem rather than just speculating about it.  There’s also explicit references to the auto_partitioner, but with the advent of the affinity_partitioner, use of the auto_partitioner has been deprecated.  Still, we’ll start with this one and then see if we can tell the difference when looking under the hood.<br />
<a href="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050501.JPG" title="b08050501.JPG"><img src="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050501.JPG" alt="b08050501.JPG" /></a></p>
<p>So what’s going on here?  As is my wont, I turned to Intel® Thread Profiler for a first look at the processing:</p>
<p><a href="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050502.JPG" title="b08050502.JPG"><img src="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050502.JPG" alt="b08050502.JPG" /></a></p>
<p>Hrumph!  I can see eight threads cranking at the work, which is good because I’m running on an 8-core processor.  32% of the run time is spent at concurrency level 8.  Most of the time is spent in that serial tail that must represent startup processing, but the whole run is under 0.2 seconds so I suspect that’s a fixed overhead that will be negligible compared to the overall time of real work.  But that’s about all.  Zooming in on the high concurrency zone:<br />
<a href="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050503.JPG" title="b08050503.JPG"><img src="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050503.JPG" alt="b08050503.JPG" /></a></p>
<p>Where the work is done, the test is keeping 8 threads busy 94% of the time. But still there’s no clue about how the task scheduler is dividing the work, though it looks like it gets interesting towards the end of the run. Guess I’ll have to revert to an old standard, inserting print statements to expose what is happening in the code:<br />
<a href="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050504.JPG" title="b08050504.JPG"><img src="http://softwareblogs.intel.com/wordpress/wp-content/uploads/2008/05/b08050504.JPG" alt="b08050504.JPG" /></a></p>
<p>Unfortunately, while I can print out the loop bounds that tell me where I am in the nested executions of the array, I can’t print which thread is handling which range.  The TBB task object that might contain that information lies outside the scope of the outer_loop class.  Stuck.</p>
<p>But not for long.  In the evolving library that is Intel Threading Building Blocks, there is a new feature, available in at least the latest open source release (upgraded to Stable as <a href="http://threadingbuildingblocks.org/ver.php?fid=104">tbb20_20080408oss</a>), called <em>task_scheduler_observer</em>.  I’ve implemented an observer that lets me identify which thread executes which range.  In my next post I will describe it and start exploring the behavior of the scheduler.  Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/05/06/under-the-hood-learning-more-about-task-scheduling/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Go Parallel - Intel Software Conferences in India</title>
		<link>http://softwareblogs.intel.com/2008/04/21/go-parallel-intel-software-conferences-in-india/</link>
		<comments>http://softwareblogs.intel.com/2008/04/21/go-parallel-intel-software-conferences-in-india/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 03:28:06 +0000</pubDate>
		<dc:creator>Preethi Raj (Intel)</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/04/21/go-parallel-intel-software-conferences-in-india/</guid>
		<description><![CDATA[We hope the Intel Software Conference - Go Parallel has been useful to you and will aid you in your pursuit to "think parallel".
Share your experiences and feedback with us. Blog and tell us what you think.
Be it your opinion on the need for threading applications, or the benefits of parallel programming or your feedback [...]]]></description>
			<content:encoded><![CDATA[<p>We hope the <em>Intel Software Conference</em> - <em>Go Parallel</em> has been useful to you and will aid you in your pursuit to "think parallel".</p>
<p>Share your experiences and feedback with us. Blog and tell us what you think.</p>
<p>Be it your opinion on the need for threading applications, or the benefits of parallel programming or your feedback on <a href="http://softwareprojects.intel.com/avx/">Intel® AVX</a> (Intel® Advanced Vector Extensions) or the contests on Intel® Software Network… Are you ready for the <a href="http://softwarecontests.intel.com/threadingchallenge/">Threading Challenge</a>? Or is the <a href="http://softwarecontests.intel.com/gamedemo/">Game Demo Contest</a> your calling?</p>
<p><strong>Blog and tell us all about it now!</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/04/21/go-parallel-intel-software-conferences-in-india/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Poll Result: More Developers Are Learning about TBB</title>
		<link>http://softwareblogs.intel.com/2008/03/31/poll-result-more-developers-are-learning-about-tbb/</link>
		<comments>http://softwareblogs.intel.com/2008/03/31/poll-result-more-developers-are-learning-about-tbb/#comments</comments>
		<pubDate>Tue, 01 Apr 2008 00:22:43 +0000</pubDate>
		<dc:creator>Kevin Farnham</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/03/31/poll-result-more-developers-are-learning-about-tbb/</guid>
		<description><![CDATA[
The March Threading Building Blocks poll suggests that the developer community is learning about TBB, but not that many developers are actively applying TBB in actual projects. The poll asked:



At what project level are you currently applying TBB?



81 people participated in the poll, making the following selections:



75.3% (61 votes) - Just getting started (learning about [...]]]></description>
			<content:encoded><![CDATA[<p>
The March <a href="http://www.ThreadingBuildingBlocks.org">Threading Building Blocks</a> poll suggests that the developer community is learning about TBB, but not that many developers are actively applying TBB in actual projects. The poll asked:
</p>
<blockquote>
<p>
At what project level are you currently applying TBB?
</p>
</blockquote>
<p>
81 people participated in the poll, making the following selections:
</p>
<blockquote>
<ul>
<li>75.3% (61 votes) - Just getting started (learning about TBB)</li>
<li>6.2% (5 votes) - Developing new software that applies TBB</li>
<li>6.2% (5 votes) - Modifying existing software to use TBB</li>
<li>4.9% (4 votes) - Designing new software that will apply TBB</li>
<li>4.9% (4 votes) - Working at multiple levels on multiple TBB projects</li>
<li>2.5% (2 votes) - Maintaining software that uses TBB</li>
</ul>
</blockquote>
<p>
While it's clear that most people who participated are investigating TBB, it's also interesting to note the breakout for the developers who are actively using (or planning to use) TBB in actual projects. Over 11% of respondants reported that they are designing or developing new software that applies Threading Building Blocks. Almost 9% of respondants are either modifying existing software to apply TBB or maintaining software that already applies Threading Building Blocks. And another 5% of the responding developers are working at multiple levels on multiple TBB projects.
</p>
<p>
The poll results show that, eight months after TBB was launched as an open source project, there is a group of developers who are deploying TBB in new and existing applications. Meanwhile, there is a much larger group of developers who are interested in the Threading Building Blocks technology. When these projects that are applying TBB are completed to a degree that they can be made publicly available, they will provide a template that can be studied by other developers, as they design and develop their own projects that apply TBB for multithreading and scaling. It will be interesting to repeat this poll after some of these projects are completed and made public.
</p>
<p>
<strong>New poll: your OS for TBB development</strong>
</p>
<p>
The April <a href="http://www.threadingbuildingblocks.org">Threading Building Blocks poll</a> has been posted. This poll asks:
</p>
<blockquote>
<p>
On what Operating System(s) do you develop your TBB applications?
</p>
</blockquote>
<p>
The response options are:
</p>
<blockquote>
<ul>
<li>FreeBSD</li>
<li>Linux</li>
<li>MacOS</li>
<li>Microsoft Windows</li>
<li>Unix</li>
<li>Unix on Windows (Cygwin, MinGW, UWIN)</li>
<li>Other</li>
<li>More than one OS</li>
</ul>
</blockquote>
<p>
Even if you're not working on an actual TBB-related project yet, feel free to vote. Just select the operating system you're using for experimenting with TBB as you learn about it.
</p>
<p>
To vote, go to the <a href="http://www.threadingbuildingblocks.org">TBB home page</a> and scroll down a bit; you'll see the poll on the right side of the page.
</p>
<p><strong>Kevin Farnham, O'Reilly Media</strong> <a href="http://www.ThreadingBuildingBlocks.org">TBB Open Source Community</a>, Freenode IRC #tbb, <a href="http://sourceforge.net/mail/?group_id=200923">TBB Mailing Lists</a>
</p>
<p>
<a href="http://threadingbuildingblocks.org/download.php">Download TBB</a></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/03/31/poll-result-more-developers-are-learning-about-tbb/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Threading Building Blocks on Wikipedia</title>
		<link>http://softwareblogs.intel.com/2008/03/28/threading-building-blocks-on-wikipedia/</link>
		<comments>http://softwareblogs.intel.com/2008/03/28/threading-building-blocks-on-wikipedia/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 05:24:30 +0000</pubDate>
		<dc:creator>Kevin Farnham</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/03/28/threading-building-blocks-on-wikipedia/</guid>
		<description><![CDATA[ I just finished adding new information to the Threading Building Blocks entry on Wikipedia (http://en.wikipedia.org/wiki/Threading_Building_Blocks). I added information on what's happened since TBB became an open source project, and I also added two new sections:

Open Source Operating Systems that Offer TBB Packages
Open Source Projects that Apply TBB

I consider Wikipedia one of the greatest new [...]]]></description>
			<content:encoded><![CDATA[<p> I just finished adding new information to the <a href="http://www.ThreadingBuildingBlocks.org">Threading Building Blocks</a> entry on Wikipedia (<a href="http://en.wikipedia.org/wiki/Threading_Building_Blocks">http://en.wikipedia.org/wiki/Threading_Building_Blocks</a>). I added information on what's happened since TBB became an open source project, and I also added two new sections:</p>
<ul>
<li>Open Source Operating Systems that Offer TBB Packages</li>
<li>Open Source Projects that Apply TBB</li>
</ul>
<p>I consider Wikipedia one of the greatest new informational resources the Web has produced. It provides information to those who want to find out about something, but it also provides an open venue for anyone who wants to contribute information to the world in their area of expertise.</p>
<p>I encourage the Threading Building Blocks community to extend the <a href="http://en.wikipedia.org/wiki/Threading_Building_Blocks">TBB Wikipedia page</a> by supplementing the information about TBB itself, or adding new information about your own TBB-related projects, and about Linux and other operating systems that are making TBB available through their package manager systems.</p>
<p><strong>Kevin Farnham, O'Reilly Media</strong> <a href="http://www.ThreadingBuildingBlocks.org">TBB Open Source Community</a>, Freenode IRC #tbb, <a href="http://sourceforge.net/mail/?group_id=200923">TBB Mailing Lists</a></p>
<p><a href="http://threadingbuildingblocks.org/download.php">Download TBB</a></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/03/28/threading-building-blocks-on-wikipedia/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Threading Building Blocks and Linux Distributions, Part 2</title>
		<link>http://softwareblogs.intel.com/2008/03/26/threading-building-blocks-and-linux-distributions-part-2/</link>
		<comments>http://softwareblogs.intel.com/2008/03/26/threading-building-blocks-and-linux-distributions-part-2/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 17:29:00 +0000</pubDate>
		<dc:creator>Kevin Farnham</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/03/26/threading-building-blocks-and-linux-distributions-part-2/</guid>
		<description><![CDATA[ In my last post I talked about the availability of Threading Building Blocks packages in Debian Linux, Ubuntu Linux, and the Fedora Project. In this post, I'll investigate TBB's availability in other Linux distributions and also in FreeBSD.
Commercial TBB supported Linux distros
The Commercial TBB site includes a System Requirements section that identifies several Linux [...]]]></description>
			<content:encoded><![CDATA[<p> In my <a href="http://softwareblogs.intel.com/2008/03/24/threading-building-blocks-and-linux-distributions-part-1/">last post</a> I talked about the availability of <a href="http://www.ThreadingBuildingBlocks.org">Threading Building Blocks</a> packages in <a href="http://www.debian.org">Debian Linux</a>, <a href="http://www.ubuntu.com/">Ubuntu Linux</a>, and the <a href="http://fedoraproject.org/">Fedora Project</a>. In this post, I'll investigate TBB's availability in other Linux distributions and also in FreeBSD.</p>
<p><strong>Commercial TBB supported Linux distros</strong></p>
<p>The <a href="http://www.threadingbuildingblocks.com">Commercial TBB</a> site includes a <a href="http://www.intel.com/cd/software/products/asmo-na/eng/threading/294797.htm#sysreq">System Requirements</a> section that identifies several Linux distributions on which TBB has been tested. These distributions are:</p>
<ul>
<li>Red Hat Enterprise Linux* 3, 4, or 5</li>
<li>Red Hat Fedora Core* 4, 5, or 6 (not supported on Itanium-based systems)</li>
<li>Asianux* 2.0</li>
<li>Red Flag DC Server* 5.0</li>
<li>Haansoft Linux* Server 2006</li>
<li>Miracle Linux v4.0</li>
<li>SuSE Linux Enterprise Server* 9 or 10</li>
<li>SGI Propack* 4.0 (supported on Itanium-based systems only)</li>
<li>SGI Propack 5.0 (not with IA-32 architecture processors)</li>
<li>Mandriva/Mandrake Linux 10.1.06 (not with Intel Itanium processors)<br />
Turbolinux GreatTurbo* Enterprise Server 10 SP1 (not with Intel Itanium processors)</li>
</ul>
<p>This list tells us that commercial TBB has been installed and tested on these distributions, but it doesn't tell us which of these distributions offers or plans to offer TBB open source packages. In July 2007 (when TBB Open Source was announced), people associated with several of these distributions commented on TBB open source (see the <a href="http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30238853.aspx">"TBB's status with Operating System Vendors (OSVs)"</a> TBB forum post), suggesting that they planned to make TBB "more easily assessible" on their system (Novell/OpenSUSE), or that they are bundling TBB with their distribution (Asianux and Turbolinux). So, I did some searching to see if I could find out the current status of TBB and these distributions.</p>
<p><strong>Red Hat Enterprise Linux:</strong> no open source TBB package information found...</p>
<p><strong>Fedora 8:</strong> As I reported in my last post, the <a href="http://threadingbuildingblocks.org/ver.php?fid=84">tbb20_20070927oss</a> stable release has apparently been packaged into Fedora 8 and is available as an <a href="http://download.fedora.redhat.com/pub/fedora/linux/updates/8/SRPMS/tbb-2.0-4.20070927.fc8.src.rpm">RPM file</a> on the Fedora Download Server.</p>
<p><strong>Asianux, Red Flag, Haansoft Linux, Miracle Linux:</strong> the <a href="http://www.asianux.com/aboutAX.do">Asianux About</a> page suggests that these are all essentially the same Linux distribution, just distributed by different vendors. I wasn't able to find any references to TBB in the Asianux bug list. I created an account so I could access the technical support site, but there is a new account approval procedure that blocked my access. Searching on the web, I found a description of a purchasable <a href="http://translate.google.com/translate?hl=en&amp;sl=ko&amp;u=http://www.haansoftlinux.com/product/server/server_value.php&amp;sa=X&amp;oi=translate&amp;resnum=9&amp;ct=result&amp;prev=/search%3Fq%3Dasianux%2B%2522Threading%2BBuilding%2BBlocks%2522%26start%3D10%26hl%3Den%26client%3Diceweasel-a%26rls%3Dorg.debian:en-US:unofficial%26sa%3DN">Value Pack</a> of Intel software for Haansoft Linux. So, perhaps the way you get TBB onto Asianux and the related distributions is through purchasing and installing a commercial set of Intel software. I found no information about open source TBB packages for Asianux.</p>
<p><strong>SuSE:</strong> no open source TBB package information found...</p>
<p><strong>SGI Propack:</strong> this is a purchasable software package designed for SGI Altix computers.</p>
<p><strong>Mandriva/Mandrake:</strong> For Mandriva, I searched the forums and mailing list archives, and did general web searches; no open source TBB package information turned up.</p>
<p><strong>TBB in FreeBSD</strong></p>
<p><a href="http://www.freebsd.org">FreeBSD</a> has embraced open source TBB since soon after its inception. The latest update to the <a href="http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/tbb/">FreeBSD TBB port</a> was made on February 7, 2008. I believe this means that the <a href="http://softwareblogs.intel.com/2008/02/13/threading-building-blocks-early-2008-development-releases/">tbb20_20080207oss</a> development release of TBB is available in FreeBSD. That release is no longer listed on the <a href="http://threadingbuildingblocks.org/download.php">TBB Downloads</a> site. Because of this, I'm not sure what will happen if you try to install TBB using the FreeBSD package manager. It depends on if there is a dependency in the FreeBSD package on a link to the original tbb20_20080207oss download, and if that the download link still exists (even though we can't navigate to it using a browser any longer).</p>
<p><strong>Conclusion</strong></p>
<p>Threading Building Blocks is used by many developers on Linux and FreeBSD systems. Packaging of TBB is actively under way for <a href="http://www.debian.org">Debian Linux</a>, <a href="http://www.ubuntu.com/">Ubuntu Linux</a>, and the <a href="http://fedoraproject.org/">Fedora Project</a>; TBB is also available through the <a href="http://www.freebsd.org">FreeBSD</a> package manager.</p>
<p>It's likely that work is being done to package Threading Building Blocks for other Linux distributions; but, at the moment, there does not appear to be publicly available information about these efforts.</p>
<p>Fortunately, it's relatively easy to install TBB onto any Linux system. You install the source and build it; or you can download one of the prebuilt Linux binary downloads that are delivered with the TBB <a href="http://threadingbuildingblocks.org/file.php?fid=78">Commercial Aligned</a> releases. Once you set your environment variables (see the <em>tbbvars.*</em> files), you'll be set to go!</p>
<p><strong>Kevin Farnham, O'Reilly Media</strong> <a href="http://www.ThreadingBuildingBlocks.org">TBB Open Source Community</a>, Freenode IRC #tbb, <a href="http://sourceforge.net/mail/?group_id=200923">TBB Mailing Lists</a></p>
<p><a href="http://threadingbuildingblocks.org/download.php">Download TBB</a></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/03/26/threading-building-blocks-and-linux-distributions-part-2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Threading Building Blocks and Linux Distributions, Part 1</title>
		<link>http://softwareblogs.intel.com/2008/03/24/threading-building-blocks-and-linux-distributions-part-1/</link>
		<comments>http://softwareblogs.intel.com/2008/03/24/threading-building-blocks-and-linux-distributions-part-1/#comments</comments>
		<pubDate>Mon, 24 Mar 2008 18:52:29 +0000</pubDate>
		<dc:creator>Kevin Farnham</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/03/24/threading-building-blocks-and-linux-distributions-part-1/</guid>
		<description><![CDATA[ Most of the people I've conversed with who are developing applications using Threading Building Blocks are working on Linux platforms. On the #tbb IRC channel, I've talked with people who are working with TBB on Gentoo, Debian, and Ubuntu Linux, and I'm sure several other distributions are represented as well.
Threading Building Blocks can be [...]]]></description>
			<content:encoded><![CDATA[<p> Most of the people I've conversed with who are developing applications using <a href="http://www.ThreadingBuildingBlocks.org">Threading Building Blocks</a> are working on Linux platforms. On the #tbb IRC channel, I've talked with people who are working with TBB on <a href="http://www.gentoo.org/">Gentoo</a>, <a href="http://www.debian.org">Debian</a>, and <a href="http://www.ubuntu.com/">Ubuntu</a> Linux, and I'm sure several other distributions are represented as well.</p>
<p>Threading Building Blocks can be installed on any Linux system by downloading the source and directly building TBB. I did it this way for my Gentoo system (see my <a href="http://softwareblogs.intel.com/2007/08/02/threading-building-blocks-amd-athlon-64-x2-and-gentoo-linux/">"Threading Building Blocks, AMD Athlon 64 X2, and Gentoo Linux"</a> post). But in the eight months since TBB became an open source project, a significant amount of work has also been done by developers in packaging TBB for specific Linux distributions, enabling installation of TBB using each distribution's package manager.</p>
<p>In this series of posts I'll provide an overview of the status of TBB on various Linux distributions. As far as I know, TBB will be included in several upcoming Linux releases, but is not yet available as a package in current stable Linux releases. Intel thoroughly tested TBB on several Linux distributions before TBB became an open source project, so these distributions would seem to be excellent candidates for packaging open source TBB. I'll investigate all of these as well, over the course of these blogs.</p>
<p><strong>Ubuntu and Debian</strong></p>
<p>As I reported in January, Threading Building Blocks is being <a href="http://softwareblogs.intel.com/2008/01/11/threading-building-blocks-packaged-into-ubuntu-hardy-heron/">packaged into Ubuntu Hardy Heron</a>, which is expected to be released in April (the beta can be downloaded now). This came about through a <a href="https://bugs.launchpad.net/ubuntu/+bug/181137">request</a> submitted to the Ubuntu bug list by Sadiq Jaffer, a regular visitor to the #tbb IRC channel. Though the period for syncing new packages from Debian had expired, the Ubuntu team made an exception for TBB. So, TBB will be <a href="http://packages.ubuntu.com/hardy/libdevel/libtbb-dev">available to Ubuntu Hardy Heron users</a> using the Ubuntu package manager.</p>
<p>TBB being in Ubuntu Hardy Heron was of course possible only because of the efforts of people at <a href="http://www.athenacr.com/">Athena Capital Research</a> who packaged TBB into <a href="http://packages.debian.org/source/sid/tbb">Debian Sid</a>. For Debian users who are working with the current stable Etch version, Athena's Roberto Sanchez has created a non-official set of <a href="http://people.connexer.com/~roberto/debian/">Etch TBB packages</a>. I installed TBB onto my Debian Etch machine using Debian's APT package manager and Roberto's Etch TBB packages, and described that process in my <a href="http://softwareblogs.intel.com/2008/01/04/threading-building-blocks-debian-linux-packages/">"Threading Building Blocks Debian Linux Packages"</a> post.</p>
<p><strong>Fedora</strong></p>
<p>The status of Threading Building Blocks on Fedora is a bit confusing. The Fedora Package Database site indicates that TBB has been <a href="https://admin.fedoraproject.org/pkgdb/packages/name/tbb">approved for inclusion in Fedora 8</a>, but the package has not yet been implemented. The <a href="http://blog.fedoramd.org/">FedoraMD.org blog</a> documented <a href="http://blog.fedoramd.org/2008/02/13/tbb-20-420070927fc8src/">TBB's approval</a> for Fedora 8 on February 13, 2008.</p>
<p>But further searching located a downloadable <a href="http://download.fedora.redhat.com/pub/fedora/linux/updates/8/SRPMS/tbb-2.0-4.20070927.fc8.src.rpm">tbb-2.0-4.20070927.fc8.src.rpm</a> file on the Fedora Download Server. This package is based on the <a href="http://threadingbuildingblocks.org/ver.php?fid=84">tbb20_20070927oss</a> release, which is now classified as a stable release. You can see an overview of the features that were new in this release in my <a href="http://softwareblogs.intel.com/2007/12/07/threading-building-blocks-open-source-release-versions-matrix/">"Threading Building Blocks Open Source Release Versions Matrix"</a> post. The 20070927 release also includes changes that were initially released in the earlier 20070815 and 20070719 TBB releases.</p>
<p><strong>Conclusion</strong></p>
<p>Debian, Ubuntu, and Fedora are the three of the most widely used Linux distributions. In all three cases, TBB is (or will soon be) available using the respective distribution package managers.</p>
<p>In my next post I'll look into TBB availability on some other popular Linux distributions, and I'll also look at TBB on FreeBSD as well.</p>
<p><strong>Kevin Farnham, O'Reilly Media</strong> <a href="http://www.ThreadingBuildingBlocks.org">TBB Open Source Community</a>, Freenode IRC #tbb, <a href="http://sourceforge.net/mail/?group_id=200923">TBB Mailing Lists</a></p>
<p><a href="http://threadingbuildingblocks.org/download.php">Download TBB</a></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/03/24/threading-building-blocks-and-linux-distributions-part-1/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Superlinearity and Algorithmic Complexity; or, My Interesting Conversation with Herb Sutter</title>
		<link>http://softwareblogs.intel.com/2008/03/19/superlinearity-and-algorithmic-complexity-or-my-interesting-conversation-with-herb-sutter/</link>
		<comments>http://softwareblogs.intel.com/2008/03/19/superlinearity-and-algorithmic-complexity-or-my-interesting-conversation-with-herb-sutter/#comments</comments>
		<pubDate>Wed, 19 Mar 2008 16:55:51 +0000</pubDate>
		<dc:creator>Kevin Farnham</dc:creator>
		
		<category><![CDATA[Multicore]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://softwareblogs.intel.com/2008/03/19/superlinearity-and-algorithmic-complexity-or-my-interesting-conversation-with-herb-sutter/</guid>
		<description><![CDATA[ In my recent "Superlinearity Is Impossible; We Just Don't Always Think Correctly" I argued that algorithmic processing superlinearity is impossible. It might appear that a parallel application was achieving superlinearity, but that appearance was due to factors other than the algorithm itself engendering a more efficient computation. For example, memory access might be more [...]]]></description>
			<content:encoded><![CDATA[<p> In my recent <a href="http://softwareblogs.intel.com/2008/03/17/superlinearity-is-impossible-we-just-dont-always-think-correctly/">"Superlinearity Is Impossible; We Just Don't Always Think Correctly"</a> I argued that algorithmic processing superlinearity is impossible. It might appear that a parallel application was achieving superlinearity, but that appearance was due to factors other than the algorithm itself engendering a more efficient computation. For example, memory access might be more efficient in the parallel algorithm. But this would not mean the limits defined by <a href="http://en.wikipedia.org/wiki/Amdahl's_law">Amdahl's Law</a> had been surpassed. I agreed with the statement Herb Sutter makes in his <a href="http://www.ddj.com/cpp/206100542">"Going Superlinear"</a> article in the February <a href="http://www.ddj.com">Dr. Dobb's Journal</a>:</p>
<blockquote><p> "But wait," someone could complain, "your example so far is unfair because you've stacked the deck. The truth is that, when you find a superlinear speedup, what you've really found is an inefficiency in the sequential algorithm."</p></blockquote>
<p><strong>When parallelism is simpler</strong></p>
<p>I had the good fortune to be able to speak with Herb Sutter today about these issues. We found ourselves in agreement in most areas. We agree, for example, that memory issues can make an enormous performance difference when you compare an algorithm run in parallel with a serial algorithm that retraces the data processing of the parallel algorithm in an element-by-element manner.</p>
<p>But even when I asked Herb to ignore memory issues (which is where I believe the greatest advantages will lie for some parallel algorithms), he remained quite adamant that superlinearity is possible. How so? Well, Herb mentioned things like algorithm complexity, and the known or unknown characteristics of the data. As I pondered this, Herb asked that I re-look at the last section of his "Going Superlinear" article. Here we find (among other things) a comparison between a simple parallel algorithm and the serial algorithm that retraces the parallel algorithm's data processing element-by-element. The serial algorithm, in this case, is:</p>
<blockquote><p> more complex. It has to do more bookkeeping than a simple linear traversal. This additional work can be a small additional source of performance overhead ...</p></blockquote>
<p>This is indeed correct. The "bread-slicing" example in my <a href="http://softwareblogs.intel.com/2008/03/17/superlinearity-is-impossible-we-just-dont-always-think-correctly/">"Superlinearity Is Impossible"</a> post would involve slightly more complicated programming than the simple incremented loop that could be easily parallelized using something like <a href="http://www.ThreadingBuildingBlocks.org">TBB</a>.</p>
<p>Here's Herb's conclusion about the serial algorithm that mimics the parallel processing and the parallel algorithm:</p>
<blockquote><p> when we're comparing the proposed algorithm with simple parallel search, we're not really comparing apples with apples. We are comparing:</p>
<ul>
<li>A complex sequential algorithm that has been designed to optimize for certain expected data distributions, and</li>
<li>A simple parallel algorithm that doesn't make assumptions about distributions, works well for a wide range of distributions, and naturally takes advantage of the special ones the optimized one is trying to exploit ...</li>
</ul>
</blockquote>
<p><strong>So, is superlinearity possible?</strong></p>
<p>Have I been proven wrong? I don't think so. When we talk about "superlinear" speedups, I think it all comes down to a matter of definition: what do we each mean by "superlinear performance"? The comments posted to my "Superlinearity Is Impossible" post revealed that different people think about this quite differently.</p>
<p>My background is physics, and mathematical modeling and simulation. I'm accustomed to thinking in terms of an ideal world that doesn't really exist when I think of equations (a world full of infinitely-long wires, which can be approximated as having an infinitesimal thickness, etc.). I view <a href="http://en.wikipedia.org/wiki/Amdahl's_law">Amdahl's Law</a> as existing in this type of realm. Hence, I asked Herb to pretend memory access is instantaneous as we discussed whether or not superlinearity is possible.</p>
<p>In this "purist" theoretical way of looking at superlinearity and Amdahl's Law, I still consider superlinearity impossible. I view Amdahl's Law as akin to the "law" of gravity. Just because I see an airplane go up into the air doesn't mean the law of gravity has been broken. Other factors were involved.</p>
<p>The fact that duplication of the parallel algorithm's element-by-element processing can require a more complex serial algorithm means you really aren't comparing apples with apples, as Herb says. Nor can you; this situation seems inescapable.</p>
<p>Indeed, I ran several actual tests tonight on my quad-core system, writing some very simple programs where I eliminated advantages due to memory cache as much as possible. I found many cases where the more complex serial algorithm was slower than the simple algorithm that would have been parallelized; but surprisingly, sometimes my compiler's optimization actually made the more complex serial algorithm run faster than it's simpler cousin! If I turned the optimizer off, then the more complex algorithm always took more time to complete.</p>
<p>So, this is a complex realm. Looking at it all through a practical (i.e., non-purist, not rigidly theoretical) lens, I certainly agree that there are software engineering techniques which, when applied to software that is intended for a parallel environment, will produce programs that will complete their tasks in an amount of time that implies superlinear performance when compared with an equivalent serial algorithm. I do not, however, believe that this means that the computational limits represented by Amdahl's Law are being violated.</p>
<p><strong>The implications for programming education</strong></p>
<p>All this complexity implies that parallel programming might be best considered as being its own unique realm when it comes to performance optimization techniques. Even discounting the standard problems involving thread-safety, race conditions, deadlocks, etc., optimized parallel programming is a very different creature from optimized serial programming. To consider parallel programming as being a "next natural stage" that is somehow in sync with the serial programming most developers already understand is quite possibly a mistake, in varying ways. Software performance optimization within the parallel realm can be <em>very</em> different, as Herb Sutter and other are showing.</p>
<p>So, once again: is superlinear performance through parallelization possible? Can Amdahl's Law be broken? You tell me...</p>
<p><strong>Kevin Farnham, O'Reilly Media</strong> <a href="http://www.ThreadingBuildingBlocks.org">TBB Open Source Community</a>, Freenode IRC #tbb, <a href="http://sourceforge.net/mail/?group_id=200923">TBB Mailing Lists</a></p>
<p><a href="http://threadingbuildingblocks.org/download.php">Download TBB</a></p>
]]></content:encoded>
			<wfw:commentRss>http://softwareblogs.intel.com/2008/03/19/superlinearity-and-algorithmic-complexity-or-my-interesting-conversation-with-herb-sutter/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
