Under Bulldozer's hood

AMD's new microarchitecture is designed to provide the perfect balance between performance, toll and ability consumption for multithreaded applications. Information technology focuses on high frequencies and resource sharing to achieve optimal throughput. Every bit mentioned previously, the AMD FX processors offer up to viii power-efficient cores. These represent the first generation of a new execution-core family (15h) from AMD.

The Bulldozer concept is based on a 2-core design that shares latency-tolerant functionality, smoothes bursty/inefficient usage and provides dynamic resource allocation between threads. Each core has its own 16KB L1 cache with a 1MB L2 cache, while the L3 cache is shared. The other units are at present effectively shared between 2 cores and include: Fetch, Decode, Floating-point pipelines, and the L2 cache.

This design allows ii Cores to use a larger, higher-performance part unit (ex: floating-signal unit) as they need it with less total dice area than having separate, smaller function units for each Core. It also ways that there shouldn't exist Bulldozer-based CPUs with an uneven number of cores like the Phenom X3 series.

The Zambezi Bulldozer-based processors have a dice size of 315mm², which is smaller than the Phenom Ii x6'south 346mm² die, while it'south bigger than the Phenom II X4'south 258mm² die. The 6-core "Gulftown" Intel Core i7 processors are also smaller at 240mm2, and the complex Sandy Bridge chips such every bit the i7-2600K are 216 mm².

A large 32nm die means a lot of resistors and AMD tells us that the Zambezi compages has roughly ii billion of them. That's pretty incredible given the Intel Cadre i7-990X Gulftown (32nm) features 1.17 billion while the Core i7-2600K has just 995 million. The older Phenom Ii X6 processors have 904 1000000 and the Phenom II X4 fries just 758 million. Those numbers help convey simply how complex these Bulldozer CPUs really are.

The floating-indicate unit has too undergone a complete redesign. Information technology has been improved to support many new instructions and it at present allows resource sharing between cores. In that location are two 128-bit FMACs shared per module, allowing for two 128-bit instructions per core or 1 256-chip instruction per dual-core module.

AMD has also designed a shared front end-end which is responsible for driving the processing pipeline and will ensure that the cores are constantly fed with information. It has been designed to work with each dual-core unit and classify threads to individual cores themselves. AMD has made heavy changes that include decoupled predict and fetch pipelines equally well equally prediction-directed instruction prefetchers.

A Prediction Queue can manage direct and indirect branches that are now fed with a L1 and L2 Branch Target Buffer, which stores destination addresses. The Bulldozer modules can decode up to four instructions per bicycle, which is one more than the Phenom Two processors. The prediction pipeline produces a sequence of fetch addresses. The Fetch pipeline performs a look upwardly in the instruction cache and pulls 32 bytes per bicycle into the fetch queue to feed the decoders.

AMD has also built new instructions into the Bulldozer architecture. While AMD and Intel share SSE3, SSE4.ane/four.two, AES, and AVX, at that place are ii new didactics sets chosen FMA4 and XOP that are at present unique to AMD. The former is designed for HPC applications while the latter is used for numeric and multimedia applications too as algorithms used for sound and radio.

Unlike Sandy Bridge, which features an on-die GPU with the System Agent (aka northbridge), AMD has taken a more traditional approach with the Bulldozer architecture. The visitor is avoiding an IGP (Integrated Graphics Platform) all together with AM3+, leaving that functionality for its 32nm Llano processors, which feature a speedy Radeon core.

The northbridge is too divide from the processor. Fifty-fifty though AMD claims to include an integrated northbridge, it's really just a retentiveness controller. In fact, AMD pioneered this technology dorsum in the Athlon64 days. Bulldozer's northbridge features 2 72-flake wide DDR3 memory channels and iv 16-fleck receive/16-fleck transmit HyperTransport links.