Recently update of Opteron Series, Bulldozer Third Version “Roller” and AMD APUs Kaveri the mix of GCN GPU all the new stuff is from AMD and what about theoretical performance? AnandTech recently comes with the Peak Floating Point Performance of Intel Haswell, Ivy Bridge Processors and from AMDs Kaveri, Llano and Trinity. The War performance includes the different flavor of the instructions Clock Cycle performance Floating-Point, Pleak-Floating Point Computing Power, the purpose of the performance war is to see which of is more worthy in computing.

Well, there is one point that for now this is difficult to find the exact frequency due to the todays CPU, GPU dynamic Acceleration.
The GPU frequency is accelerated due to the multi-threading technology. Additionally the new AMD Kaveri APU double precision FP64 Single Precision performance is Fp32 16/1, due to the mainstream GCN architecture of Graphic.

The CPU Floating-point peak performance depends on compiling SIMD instructions set architecture. Here consider three: SSE, AVX, AVX FMA (FMA3/FMA4).

Intel and AMD CPU Peak Floating Performance

Intel and AMD CPU Peak Floating Performance

Due to the architecture design highlights Intel leads without any doubt, Haswell is optimized AVX, FMA instructions set code preferred. At Second from AMD Trinity and Kaveri in the Bulldozer architecture are sharing the units floating points, If we see the performance SSE is worse than the Outdated OLD K10 architecture of LIano.

Now at Second GPU Floating-Point Peak Performance the Haswell GT2 and Gt3e are two different kinds of nuclear display with 128MB of embedded cache, both 4 role cache.

Intel and AMD GPU Peak Floating Performance

Intel and AMD GPU Peak Floating Performance

The Kaveri seems good and meet all the standard. Lead in the Dirtect3D fp64 not as Haswell, due to Kaveri unique HSA architecture, the idea of unified performance of CPU and GPU in applications. The performance is not in the Top because the GPU is almost adding 110GFlops so does the general calculation can also be accelerated, especially fp32 heterogeneous applications beyond Haswell GT2, Ivy Bridge, but not enough high-performance applications.