Thursday, March 26, 2009

Breaking News - Intel vs AMD

Intel got over its rival
I just got my hands on a very ... let's say surprising benchmark about the latest Xeon X7460 Nahelem processor. This CPU is in the same line as the Core i7 but are on the server side so we mean business here; and big ones. I will resume my analyze as this even if it might offense AMD fans here (just face it, it is called reality) :

Pick yours : Less than half the price or more than five times faster.

Indeed! Intel regain the first place in the PassMark benchmark with an outstanding score of 25 881 (as of 24 of march 2009) which is 1.9 times faster than the fastest AMD system available on the market who score only 13 600. What is really special here is that where AMD need's 8 quad core processors to achieve this score, Intel only need 3. Long way to go AMD.

Update (25 of march 2009)
A bug seem to have appeard in the benchmarks and the score for the Intel X7460 was replaced by a AMD Sempron 1100 LE which should be in the Mid-End list.

Update (26 of march 2009)
The mark had been silently removed from the web site.

Update (1 of april 2009)
Apparently, the web site was updated and they removed all 2+ processors system from the list and moved them to a separated list named "multi-cpu system". Still, the X7460 is not back on the list but X7350 (in a quad processor system) score 16 715 which is at least 3 000 better than the fastest AMD with half the number of CPU.

Sunday, March 22, 2009

Intel Core i7, Part 3 : Power management

As the title of this post said, I will not talk about required electrical power to sue the processor nor what kind of power management feature it has since there is not much new except in the Turboboost technology. Instead, I'll talk about calculation power. What make those CPUs as fast as they are. You know... processing power management.

Super powers
Yes, those new processors are really fast but what about their super powers? You didn't know, didn't you? Intel made a deal with some super heros to provide the new Core i7 with super powers like :
  • Turboboost Technology - A real-time dynamic, per core, overlocking technology.
  • Wide Dynamic Execution - A technology that enable to dispatch up to 4 instructions per clock cycle per processing unit.
  • Hyperthreading - An other technology that enable the system to fetch up to two times more instruction in the processor.
  • Integrated memory controller - Help pushing data real fast into the processor.
  • Advanced Smart Cache - Cutting in fourth the time it take to do about every memory access.

Power at work
Those technologies give a big boost at speed make the processor way more powerful than before. Let's start with the first one, Turboboost. It analyze, in real time, which core is idle and which are overloaded. It then cut off power to those who doesn't need it and reroute it to the one that are struggling by overclocking them for up to 30% of their original speed. This technology gives a big performance boost for single threaded application that does not benefit from additional core's presence. The second one, Wide Dynamic Execution, is a technology that will peek at your processor activities using techniques like data flow analysis, speculative execution, out of order execution, and super scalars to execute up to 33% more instructions per clock cycle on each core than before. On older Intel architecture and AMD Phenom architecture, the processor could only handle 3 instructions in the same clock cycle. This technology push that to four and will help on getting more data processed at faster rate in the CPU. Finaly, hyperthreading, which was on the older Pentium 4 HT processors, is coming back with some enhancements. It can now double the amount of data to process in the clock cycle in one core. For those who don't know/remember how it work, we can say that it try to fill blank and thus unused quarter of CPU clock by filling them with an other thread.

Power to fill
While all those technologies get super processing power to the CPU they made it harder to fill with data. That's where the new Integrated memory controller and Advanced Smart Cache technology come in place. Those two little guy help at fetching data in the processor up to three times faster than on the previous processors by using a faster connection to memory and three channel instead of two and the second one make data closer to every core of every processors thus nearly dividing by four the latency needed to check if a core is working on the same data as an other. Those little thing contribute on the 45% performance improvement over the older Penryn generation.

Sunday, March 15, 2009

Intel Core i7, Part 2 : QPI

Northbridge, Southbridge, Socket, FSB and ... QPI?
In the early 90's, AMD was making top notch CPU and, in some cases, they were up to two times faster than the Pentium 2 and Pentium 3 processors. While years went up, AMD went down when intel announced the Pentium 4 with breakthrough technologies like hyper-threading and SSE2 support. We are now in year 2003. Intel business is going very well and they are dominating the market. AMD was falling down so they needed a solution or they where to die. That's when they introduced two new technologies to the consumers, 64 bits architectures and HyperTransport. The first one should have been enough to put AMD back into the course but, even today, only 1/8 of our computers use a 64bit operating system so that wasn't the solution. HyperTransport, on the other side, did a very good job at reviving them. This technology aimed at replacing the FSB that was in our computers since the Intel 8088, released in 1979. The motive is simple : "A 25 years old technology does not have place in a 2003 computer."

HyperTransport is a ultra-fast link between the CPU and the computer's memory. AMD's solution is to move the memory controller form the northbridge to the processor. While they didn't completely removed the FSB, this initiative was frightening 6 years ago. When the Athlon 64 3200+ got into the market, they managed to produce a CPU that was on par with the Intel 3.2gHz Pentium 4 but was half the price and used a 2gHz clock instead of a 3.2gHz one. Those little details made computers more power efficient and reduced the amount of emitted heat.

A urge breakthrough indeed and AMD went even further in 2005 when they released the first dual core desktop CPU on the market. When Intel users needed to pay for a dual socket machine which cost a lot, AMD users could get better performances for half and even quarter of the price.

Now Intel was in trouble. They were in the same position that AMD was two years before. They rushed to provide 64 bit architecture, dual-core and even quad-core processor but something... yes... something was missing. Even if Intel quad-core processors were better than any AMD's in 2006, they wanted to show their superiority. They got trough every technology wall by creating 45 nm processors but still... something was missing.

In the last 4 months, Intel started to produce the Core i7. The fastest processors on the planet and what make it so fast is the QPI; Intel 5 years late response to AMD's HyperTransport. Instead of simply taking the memory controller into the CPU, Intel completely removed the now 30 years old FSB and replaced it by a brand new technology : the Quick Path Interconnect.

Not just a FSB replacement
What make the QPI so spacial is that's it's not just a very effective FSB replacement. It is a very fast (Quick) way (Path) to connect internal computer component (Interconnect). Who talked about CPU, nothbridge or memory here? It can connect everything that need a lot of bandwidth. For now, the base line Core i7 only have one QPI that is used to make a direct link between the CPU and the X58 chipset but hi-end Xeon processors have two of them.

Those processors are on dual socket motherboard and the northbridge always was a massive bottleneck on this kind of architecture. They needed snoop-filter caches and a very complicated routing system to ensure that multi-threaded application worked correctly. Worse, those problem even were in quad core systems since they were using the same routing as dual socket system.

With the QPI, there is no need to go though the FSB to the chipset and back on the FSB to the second processor just to know if he is working on the data the first one needed. They now have a direct inter-processor link to do those operations minimizing cost by simplified routing and making dual socket system as fast as multi core system.

Specs time!
AMD's current implementation of HyperTransport use revision 3.0, up to 2.6gHz, 16bit link in every processors thus providing 20.8 GB/s of bandwidth. Intel, on the other side is using it's first revision of the QPI which is 16bit links at 3.2gHz. This little difference is enough to make a big jump in performances and provide 25.6 GB/s (23% more bandwidth). They basically double the bandwidth of a traditional 1600mHz FSB like the one used in the previous Intel x48 chipset.

Part 2, Conclusion
With the arrival of this new point-to-point link technology, Intel is pushing our computers to extreme performances for a very good price and enable a new era of HPC, server, workstation and desktop class computers. In the end, by using the new QuickPath technology, we are not only using faster bus but also using more of them and using them more wisely. It definitely mark the end of a 30 years old era.

Tuesday, March 10, 2009

Intel Core i7, Part 1 : First look

Core i7 Logo
Core i7 Extreme Logo

After a lots of thinking about what might be the best subject for the first real review, I decided to take suggestions from people I know and they asked : "Talk about the new i7!". But this is hell of a subject with a lots of things and data to put out on the blackboard. So, I decided that the best way to talk about that subject is to make a series of blog post about it. Fortunately, You don't have to overlock your computer to take a look at those review; even if they describe in every little detail, the spectacular and brand new Intel Core i7 processors series.

Overview
Intel did a strange choice this time by releasing low end processors before the high end one. Or not? Those CPUs are, in reality, the high end ones and they are impressively cheap! The thing is that this time, Intel put the accent on desktop chips instead of the higher and more pricy Xeons. You have to take a look at the socket to realize that those are monsters, and not smalls one. They use the same socket than their Xeon counter-parts and they are nearly identical to the Bloomfield's Xeons batch of processor. Even more; they have the same price! Only the processor model and release date are different. Officially, there is only 5 desktop computer class chips and 13 server class. Some of them are not even on the market yet and some of them are just the same. This post will concentrate on the desktop class processors since they might be of interest for more people. I will give more details about the Xeons and how they compare the the Core i7 in a later part.

The series
The Core i7 brand consist of 3 available CPUs and 2 ones that will not hit the store before Q2 2009. The 920, 940 and 9501 are the standard processor that about every OEM will put in their PCs. The 965 and 9751 are part of the Extreme series and will be reserved for high-end computers. They all use the new LGA-1366 socket that provide extended functionality and more bandwidth over it's older brother, the LGA-775.

Some numbers
General specs :
  • Quad-Core design
  • 64bit, 45nm architecture
  • 8MiB L3 shared cache
  • A TPD of 130 watt
  • Support for Intel Hyper-Threading Technology
  • Support for SSE 4.2 instruction set
  • Support for Intel Turbo Boost Technology
  • Support for Intel Virtualization Technology
Intel Core i7 920 :
  • 2.66 gHz
  • 1x 4.8 GT/s QPI
  • 3 way DDR3 1066mHz memory support
Intel Core i7 940 :
  • 2.93 gHz
  • 1x 4.8 GT/s QPI
  • 3 way DDR3 1066mHz memory support
Intel Core i7 965 :
  • 3.2 gHz
  • 1x 6.4 GT/s QPI
  • 3 way DDR3 1600mHz memory support

Nice! Now, what does that mean?
Basically, it means that the only difference between every processors in the series is clock speed except for the Extreme one which has a faster QPI and a better memory bus to push more data in the processor. They all have the exact same core with the same stepping, caches and technologies.

What difference them is quality. When Intel design a processor, they design the higher end first. After some test on those, they can find a way to produce lower quality chip for lower price. Here, lower quality doesn't mean that they will break faster. It means that the 920 didn't passed the heat, stability and power usage tests to run at 2.93 gHz like the 940. That's the primary reason why over-clocking is a bad idea with low end CPUs. They will get very hot faster and might not be as stable as a higher end processor.

Benchmarks
All the benchmarks that I post on hardware that I do not own is publicly available on the internet and I will try to give links to them but most of the time, they come from PassMark software.
CPU Mark (11th March 2009 average)
Model no.Absolute MarkRelative MarkAbsolute Price2Relative Price 
i7-9205 410+ 0%$ 284+ 0%
i7-9406 071+ 12.22%$ 562+ 197.89%
i7-9656 562+ 21.29%$ 999+ 351.76%

For those of you who have a good computer and who use benchmarking software like 3D/PC Mark those scores might seem very low. Don't worry, those are in fact very good. For those of you who lasyness is taking over and who don't want to check on the original web site. Here's the score of some processor in comparison (11th of March 2009 average)
  1. [Quad CPU] Dual-Core AMD Opteron 8218 - 5 082
  2. [Dual CPU] Intel Xeon E5335 - 5 103
  3. [Dual CPU] Intel Xeon X5272 - 5 879
  4. [Dual CPU] Intel Xeon X5355 - 6 315
  5. [Dual CPU] Quad-Core AMD Opteron 2380 - 6 531
As you might have notice, there is no AMD Phenom II X4 nor Intel Core 2 Quad processors in this list. Those are all server class, multi-socket, quad-core and dual-core processors.

Part 1 : Conclusion
Those are not little toys that you can carry around like your credit card and you might need more than one if you choose to get a system with the i7-965. They are Monsters and deserve to be respected as such. The Core i7 line is a very powerful and impressive series of CPUs which will blow away any kind of computer might have in the same category ($2 000 and less).

On the other side, don't expect that much from the i7-940 or i7-965 if the price is a prime directive for you. They cost up to 350% more than the i7-920 but offer only 21.3% more performance. And since other components that handle those beast cost a LOT, you will destroy your budget in a flash. Instead, you should consider getting an older and cheaper workstation class computer. I never tough I'd said that but a computer like the 2008 Mac Pro which has two Penryn Xeons processors is, of course, faster but cost less than a single socket i7-965 machine.

In the end, the i7-920 is without a doubt the best processor you can get today in the $250-350 price range on the market. Do not let you deceived by the fact that he is the slower of it's line. This guy will still deliver awesome performances in everyday computing, games, video and photo editing, 3d modeling and digitally assisted music creation.

1 Release date : Q2 2009
2 Official release price