Making access to memory faster in OpenSolaris (and Core2)

By David Stewart (Intel) (82 posts) on April 4, 2008 at 3:55 pm

We have been working so hard over the last year plus on implementing new Intel technologies into OpenSolaris and supporting new processors and platforms, it's good to stop for a moment and consider how we are speeding up the product you have in your hands today. (Or, you really *ought* to have your hands on). Here I am referring to the current Core2 processor architecture, which you can obtain in Centrino and Centrino Pro processor-based laptops, Xeon servers, vPro and ViiV desktops, etc.

Our colleagues at Intel who work with customers to optimize their applications often suggest that you can get a good speedup of C or C++ code very easily if you replace the standard libc functions with versions which are optimized for newer hardware. But why not just improve the standard libc functions so everyone benefits?

So we set about doing just that. We started with memcpy(), memmove() and memset(), routines which are so basic that programmers will use them without thinking about it, assuming that they will be simple and fast.

Here I have to give credit where credit is due: We started with code from Intel engineer Pat Fay, who has been working on improving these functions using Intel's Streaming SIMD Extensions (SSE). Then Bob Kasten, an engineer on the Intel OpenSolaris team started extending and optimizing this code extensively for Core2 and OpenSolaris.

And I am pleased to tell you, the results are very impressive. Here is an example of improvements to memcpy() for larger data sets which are not aligned on a 16 byte boundary.

memopsgraph

As you can see, this test case was a comparison of using SSE2 instructions vs using SSE3 instructions, and in this case, we get a nice boost from SSE3, which is available in Core 2. [1]

Of course, all of this improvement would be immaterial if broke existing code or hurt compatibility. For this, we're grateful to our OpenSolaris sponsor, Bill Holler, who worked through all of issues and ensured that we got integrated into OpenSolaris. Thanks, Bill!

So the cool thing is, without making any changes to your code, or without even recompiling or relinking, you can get an automatic performance boost, just by running the latest version of OpenSolaris! That's because libc is a shared object which gets loaded at run time, so existing binaries which call the libc versions of memset(), memmove() and memcpy() will take advantage of this speedup. For example, we saw some benchmark video compression code speed up over 10% with these new functions. That's on top of what you already get from running the newest Intel Core 2 processor.

I'm delighted to tell you that this new code should appear in Nevada b87, which comes out in a few weeks. [2] Good work, Pat, Bob and Bill!

We have more of this low-level optimization that is in the works. I'll say more when they get closer to reality. And we're of course looking at even more new instructions which are coming out in our next generation processor family, and optimizing OpenSolaris to run on them faster as well.

[1] This particular case was a result from an early version of memcpy() using LibMicro 0.4.0; Baseline: Open Solaris Nevada Build 70; Processor: Intel® Core™ 2 Duo CPU E6850 @ 3GHz
[2] Since this is of course an open source contribution, you can see exactly how the code works, and even improve on it

Categories: Open Source

Comments (4) Comments RSS Feed

By Planet OpenSolaris on April 4th, 2008 at 5:48 pm
links from TechnoratiMaking access to memory faster in OpenSolaris (and Core2)

By artem on April 4th, 2008 at 8:19 pm
Especially nice to see contributions from multiple people come together like this. And libc? You can't get more fundamental than that :) Keep in coming!

By UX-admin on April 5th, 2008 at 2:22 am
This is great work you guys are doing on boosting OpenSolaris performance on intel processors!
It makes me glad I bought an intel-based system, and gives me comfort to know that Solaris is able to get an extra performance boost because intel is supporting me, the end user, as well as reaffirming my platform choice, with this work.

By blueslugs.com on April 5th, 2008 at 3:22 am
links from TechnoratiIntelŽ Software Network Blogs ť Making access to memory faster in OpenSolaris (and Core2)


What do you think?

Name (required)

Email (required; will not be displayed on this page)

Your URL (optional)

Comments (required)