Parallel FX & Threading for newbies (like me.)

By Kevin Pirkl (Intel) (19 posts) on January 9, 2008 at 2:04 pm

This is my first real foray into writing threaded applications (I use worker thread's in ASP.Net applications quite often to do small tasks) and taking a peek into how much more effort it will add for managed code apps (.Net.) First let me throw out that I consider new stuff technologies kind-of a pain in the butt and perfer spoon feeding. I like what works where code is concerned and consider change as a hard truth. I hope the details and links and notes and such can help you in your own endeavors though even if this post taken as a total attempt is only a BRICK. Like I said I am not an expert but only a newbie with Threading so enjoy it or toss it. :)

Here are some quick thoughts I have, "YATTL (Yet another technology to learn)", "Clay Breshears post "This is like deja vu all over again", Rick Strahl "What's ailing ASP.Net Web Forms" Clay and Rick sum up quite well my two thoughts on new technologies, what I would gripe about myself if they hadn't already mentioned it. Many of us would like to start writing code that makes use of threading and multi-core technologies but we also want it to be as easy as using an iPod menu system (jk.) Laura Hess wrote an article titled "Slaves To Technology" which provides simple insights into why technology becomes more oppressive as opposed to unburdening. To me this means that the average developer when learning a new framework, developer tool version, etc every year or so forces them to be the beasts of burden as opposed to the other way around. Technology should make life easier and not harder or more complicated and that's all I have to say on the subject.

Learning new technologies and tools is not an easy process and typically requires a small paradigm shift for each new iteration. Modern days have given us two very big developers shifts with the first being OOP which I read about at age 13 (approximately 1978) and Parallel Programming which I would consider the second modern day paradigm shift. Parallel Programming requires some steep brain function to wrap your head around but once you get it the going get's much better.

The Parallel Programming paradigm shift is happening today and wont be ignored. Some good readings can be found by Matt Gillespie "Transitioning Software to Future Generations of Multi-Core" and Herb Sutter wrote a Dr. Dobb's Journal article "The Free Lunch Is Over - A Fundamental Turn Toward Concurrency in Software" which talks all about the switch from faster processors to multi-core architectures. Parallel Programming represents a problem for the generalist software developer in terms of language availability. Many popular languages already have the basics underpinnings for wiring into multi-core but many languages do not and may never be done in such a way as to provide visibility to the developer. 

So why the headache and me complaining so much when I plan to use the technology anyway.  Well lets work our way through an application and see what we find.

Here is an example of some sample code that I converted to make use of TPL and the blowback from the changes. I found an old Managed Code WinForm Migration Utility Application (just MU for short) we used to convert XML documents into HTML data for a new content delivery system we created last year. Migration Utility (MU) is a C# WinForm application that I will convert over to use a few features of the Tasks Parallel Library (TPL.)  For this conversion I used the Customer Technology Preview .Net Framework 3.5 and Visual C# 2008 Express Edition and sample code found in the MSDN article "Optimize Managed Code For Multi-Core Machines" as a guide for converting the MU code. Note that it took me three install attempts on different computers to get everything working. Code segments that I'll show here parse XML files in a disk folder and downloads all the images to disk. Note that this code was not written by me (but modified) to make use of TPL and other asynchronous approaches to better see TPL in action. This was perhaps not the best example and I think something like the Quick Sort problem on the Intel Threading Challenge page would have made for a better example for comparison shoppers. In fact with TPL you might even consider the taking one of the next challenges and win $100 (see contest rules for details.)  <--  Shameless plug *kp

My base computer for testing is a Dual Core MacBook Pro running Windows XP at boot.

Here is the MU application and what it did and the changes I will make and thoughts along the way.

So take some 2027 files in a folder

File List

And process each one in a sequential fashion

File Processing Loop

First parsing the HTML contents

Content Processing

And then downloading the images extracted from the HTML source attributes using a synchronous download.

Image Processing

Now lets make changes to this application start to make use of TPL.

Parallel FX TPL Loop

One item that becomes very apparent on running this web application is that the System.Net.WebClient call to DownloadFile(...) is not being done in an asynchronous manner which is easily changed. Before that change the main processing look took 10 minutes for the UI to come back and become responsive and with that change it comes back in under 10 seconds. Problem now is that with a responsive UI I need to add code to disable buttons and I have to add an indicator to show how many downloads still remain (I'll skip that for now.) One download complete callback handler to the rescue like so will suit our purposes for the time being.

WebClient.DownloadFileAsynch() Downloading

So we have to wrap the call in a callback handler for the on completion event and also implement a counter to track all the calls to know when the application has completed downloading files but I still needed a polling mechanism to know when downloading was complete.

Regarding the asynchronous downloading I attempted to see if I could get better performance via setting some connections configuration options in app.config but the enhancement was negligible as the connections for images were all over the place so call blocking was not a real problem. Setting up say four connections per IP address over the normal two lent small gains but say 8/16 did not make much of a difference.

One other thing to note is the responsiveness of the UI. With the linear processing the UI just locks up and control does not come back till the entire application has completed the directory files processing. I read this little article by jfoscoding called "Keeping your UI Responsive and the Dangers of Application.DoEvents http://blogs.msdn.com/jfoscoding/archive/2005/08/06/448560.aspx" and attempted a Application.DoEvents but looking at the CPU utilization it pushed up over a constant %50 burn rate. 

Don't do this.... (Perhaps except in the Winform bound timer event though.)

Application.DoEvents() Loop

I tried a better approach using a thread timer callback that checked a counter and informed the user when the process was complete. Later I changed the timer to only check back every couple of seconds (initially I had set it to every 10th of a second which still pushed the CPU utilization up.)

Threaded Wait Timer

While debugging I came across an issue/error message with multiple threads attempting to update the WinForm's list object. There is an overriding setting to ignore these errors but then your nice linear listing gets out of order. Some suggestions are to buffer the text and write it all at once when the file processing is completed. I just commented out all writes to the list object in the end but this line of code took care of some annoying non thread safe write error messages

  CheckForIllegalCrossThreadCalls = false; 

I decided to cut out the image processing code just to see how some loop performance numbers. Regarding image downloading taking a peeking at the task manager statistics it appears that WebClient uses Asynchronous callbacks spawned multiple to threads anyway so disabling the image downloads (now Asynch) to better check performance. 

For the average run processing all the files in the folder without downloading images.

Serial Loop timing

Using TPL the average run processing all the files in the folder without downloading images.

TPL Loop Average

Conclusion:

Sometimes on exit the application crashes so you would need to add some checks for object existance before trying to use those objects which means more potential error handling. 

Well digging the net and articles above I've read that even with multiple cores it does not mean double the speed but in some cases close to it. It perhaps kind of stinks that I did not do more but then again my only purpose was to learn a little more about TPL and I think I've managed that.

The biggest gain here is the responsiveness of the UI by just implementing WebClient using Asynchronous callbacks and TPL in this case only made things a couple seconds faster. Perhaps my bad choice of applications for this test but then again a good learning experience.

Some of my worries and lack of understanding about this new way surround thread safety.  Using static variables (or not) for counters and thread timers kind of scares me as I do not understand the underpinnings in .Net on how these work and what's really safe and what's not when writing threaded code. What I did like was the ease of use of the TPL For loops and the new Lambda expressions (which you can read about at Scott Guthrie's Blog entry "New Orcas Language Feature: Lambda Expressions")

Given the changes I made the applications still crashes from time to time due to some screen objects going out of scope when exiting via the close box. All in all it was not worth the effort to make this particular application "Parallel" and it was a sort of failure but a good experience.

I like to think that the lie of "Ontogeny recapitulates phylogeny" as applied to technology might yield a better organization and core usage for Parallel Programming and a common set of base constructs found across programming languages. Even while it's a lie it's one I would still like to believe in that TBB/TPL/etc. planners could get together and organized to provide multi-core aware object constructs that act like communities making things simple once again.

Supporting notes:

Loosely taken from the thoughts of David Charbonneau's article "Hectic lifestyle catching up with Canadians health"
Clay Breshears "Eight Simple Rules for Designing Threaded Applications" provides some basic thinking guidance on good approaches when designing or writing threaded applications.

Communities like Intel's "Multi-Core Developer Community"

Microsoft's "Parallel Computing Developer Center" both provide a good start for developers seeking to gain a better understanding for both managed and un-managed environments where the developer wants to start making headway into writing code for Parallel Computing. 

Other resources:

Scott Hanselman's - Hanselminutes Podcast 84 - Concurency Programming with .NET Parallel Framework Extensions

Intel's Software College - "Introduction to Parallel Programming Courses

Intel Software Network Home

Parallel Extensions to the .NET FX CTP -

Welcome to Intel Threading Building Blocks.org -

Microsoft's Parallel FX Blog

Categories: Multicore, Software Engineering

Comments (2) Comments RSS Feed

By Блоги Intel® Software Network » Top 5 ISN Blogs в январе on January 29th, 2008 at 1:21 pm
[...] "Parallel FX и Threading для новичков (типа меня)" [...]

By Rick (Vectorpedia) on February 1st, 2008 at 1:03 pm
Thanks for an informative article written for Newbies like me


What do you think?

Name (required)

Email (required; will not be displayed on this page)

Your URL (optional)

Comments (required)