Microprocessor Report Archive Here's an index of Tom's articles in Microprocessor Report. All articles are online in HTML and PDF formats for paid subscribers. (A few articles have free links.) Microprocessor Report articles are also available in print issues. For more information, visit the Startup's Cell Design Promises Greater Density, Lower Power You'd think a market dominated for decades by two entrenched leaders would discourage newcomers, but programmable logic continues to intrigue inventors—and investors.

Ibm System P5 Serial Connection Vs ParallelIbm System P5 Serial Connection Vs Parallel

The latest attempt to breach the FPGA market ruled by Xilinx and Intel is coming from Efinix (pronounced F-N-X), a Silicon Valley startup that recently announced its first samples. The company has patented a new field-programmable logic cell that's up to 4x denser than those in conventional FPGAs, in turn reducing power consumption and cost. Founded in 2012, Efinix has raised $16 million from several investors, including Xilinx, a potential acquirer. Other investors include Samsung and several Chinese investment funds.

In early October, the startup received the first samples from Chinese foundry SMIC, which fabricated the chips in its 40nm low-leakage process. Efinix says the initial chips are functional and will begin general sampling next quarter. Volume production could start as early as 3Q18 if a customer places a large order. [December 18, 2017] • Figure 1: XLR cells versus conventional FPGA logic cells. • Figure 2: Dual-function XLR cells.

• Figure 3: XLR direct-drive routing. New BG5CT Media Processor Recently Acquired From Marvell Synaptics has unveiled a new SoC for the rapidly evolving set-top-box (STB) market. With its VideoSmart BG5CT, the company is repositioning itself as a chip vendor for STBs that support open platforms, such as Android TV and the open-source Reference Design Kit (RDK). These platforms cater to viewers of streaming-video services as well as traditional cable customers. Specifically, the BG5CT aims at service-operator STBs that integrate over-the-top (OTT) video with traditional pay-TV service. Because it supports transport-stream processing, the chip also targets hybrid boxes that combine OTT video with conventional broadcast TV.

Lacking a cable modem, the BG5CT isn't ideal for traditional cable-TV boxes that require DOCSIS compatibility. But a growing number of 'cable cutters' are canceling their traditional cable service in favor of OTT alternatives such as Netflix, Amazon Prime Video, and Hulu.

Almost 50% of these viewers employ smart TVs to access these services. The rest use STBs; the leading products are Roku, Google Chromecast, Amazon Fire TV, and Apple TV. Cable and satellite operators are supporting these services in their new STBs as well. [December 4, 2017] • Figure 1: Synaptics VideoSmart BG5CT block diagram.

Cabling a model 9406-520, 9406-525, or 9407-515 to access the Advanced System Management Interface (ASMI) 15. Cabling a model 9406-520. Back view of a Thin Console for System i5. Keyboard port. 10/100 BaseT Ethernet port. Parallel port. Monitor port.

• Table 1: Comparison of three SoCs for streaming-video set-top boxes: Synaptics VideoSmart BG5CT, NXP i.MX8M Quad, and Broadcom BCM2837. Qualcomm's New Server Processors Challenge the x86 Establishment Qualcomm has its head in the clouds, but in a good way. Early benchmarking indicates its new Centriq server processors deliver excellent scale-out performance for cloud applications and data centers. Although the company's ARMv8-compatible CPUs can't match the per-core throughput of the best x86 CPUs, they rank high in throughput per thread, per watt, per dollar, and per square millimeter of silicon. These metrics translate into bargain prices for competitive performance and power consumption—and a strong debut for a newcomer to the nearly impregnable server-processor market. The Centriq 2400 family (code-named Amberwing) initially comprises three models based on the same die: the 48-core 2460, the 46-core 2452, and the 40-core 2434. Clock frequencies hover in a narrow range around 2.2GHz (base), and the parts are all about 120W TDP.

List pricing, however, varies from $1,995 to a surprisingly low $888—posing a credible challenge to Intel and AMD in view of Centriq's competitive SPEC scores. [November 20, 2017] • Figure 1: SPEC CPU2006 per gigahertz, Centriq versus x86. • Figure 2: Normalized comparison of midrange server processors. • Table 1: Qualcomm Centriq 2400 family. • Table 2: Comparison of high-end server processors. • Table 3: Comparison of midrange server processors.

New 16-CPU Processor Has 100GbE and Integrated Ethernet Switch NXP is chasing high-end networking with its newest QorIQ processor, the LX2160A. Sporting 16 ARM Cortex-A72 cores, 100 Gigabit Ethernet, a 16-port Layer 2 switch, and faster acceleration for cryptography and data compression, it will be the company's largest and fastest multicore embedded processor when it begins production—in mid-2019, by our estimate. Announced at the recent Linley Processor Conference, the LX2160A is the most ambitious QorIQ design since the 12-core T4240, which began production five years ago.

Although the LX2160A surpasses the T4240's core count and performance, the older product still offers more threads thanks to its dual-threaded Power e6500 CPUs. Nevertheless, the LX2160A has twice as many ARM CPUs as any existing QorIQ and ranks among the largest ARM-based embedded chips announced to date. It's also NXP's first chip built with FinFETs. [October 16, 2017] • Figure 1: Block diagram of NXP QorIQ LX2160A. • Figure 2: Two common examples of LX2160A switching. • Table 1: NXP's QorIQ LS2160A and derivatives. • Table 2: Comparison of three embedded processors for networking: NXP's QorIQ LX2160A, Cavium's Octeon TX CN8360, and Intel's Xeon D-1548.

ARMv8 Server Chip Has 12MB L2, 60MB L3, PCIe, SATA, Ethernet Qualcomm disclosed more details about its Centriq 2400 server processor at the recent Linley Processor Conference, confirming it's one of the most powerful ARMv8 designs yet. The 48-core chip, which has been sampling for nearly a year, resembles Cavium's future 54-core ThunderX2 in many respects but falls short of the best x86 server chips. Even so, it's an impressive initial effort by a new server-processor vendor to target cloud-service providers. The Centriq 2400 has 48 of the new 64-bit Falkor CPUs arranged in pairs that share an L2 cache.

At the conference, Qualcomm revealed that each L2 cache is 512KB, for a total of 12MB. In addition, the L3 cache comprises twelve 5MB partitions distributed around the internal ring network that connects all the CPUs, caches, memory controllers, I/O interfaces, and other elements. Effectively, the L3 is 60MB, so the total L2/L3 is 72MB—26% more than Intel's top-end Xeon Scalable server chip. [October 9, 2017] • Figure 1: Block diagram of Qualcomm Centriq 2400.

• Table 1: Comparison of server processors: Centriq 2400, AMD Epyc, Cavium ThunderX2, and Intel Xeon Scalable. System- and Software-Analysis Tools Exploit Multilevel Parallelism German startup Silexica is pursuing two difficult goals: optimizing sequential code for parallel execution and finding the optimal hardware to run the software. Either pursuit alone would be challenging enough for most companies, but Silexica views them as inextricably linked. Parallelism has limited value if either the hardware or the software can't fully exploit it. Consequently, the company's SLX technology enables high-level systemwide analysis of both domains.

SLX tools are most effective at the dawn of a design project, when both the hardware and software are malleable. Tweaking the hardware design for better parallelism can yield big gains in software performance, and vice versa.

When the hardware design is already frozen or even deployed in the field, the software must adapt to it, but significant gains are still possible. [October 2, 2017] • Figure 1: System analysis using Silexica SLX tools. • Figure 2: SLX call graph. • Figure 3: SLX code highlighting. • Figure 4: SLX software-design options.

New Coolidge Processor Has Fewer Cores but Higher Performance Even in the semiconductor industry, sometimes less is more. While other processor vendors keep striving for higher core counts, Kalray is trying to increase efficiency by moving in the opposite direction with its newest embedded designs. But then, the French company's first product was a massively parallel 256-core chip, so there’s room to cut back. Kalray's third-generation processor, the MPPA3 Coolidge, will debut as two models that have 'only' 80 or 160 cores. Their proprietary 64-bit CPUs will run at higher clock speeds than those in the company's existing processors, however. The 80-core chip is targeting 1.2GHz; to better manage power, the 160-core chip slows to 900MHz.

Even that speed is faster than the fastest Kalray processors today, which operate at 600MHz. Thanks to these and other improvements, Kalray says Coolidge will deliver up to 4.6x more floating-point throughput and 9.2x more fixed-point throughput. The new products are scheduled to sample in 3Q18, and we expect production will start in mid-2019. [September 25, 2017] • Figure 1: Coolidge block diagram. • Figure 2: Coolidge's estimated performance on the GoogleNet CNN. • Table 1: Key parameters for Kalray MPPA processors.

• Table 2: Comparison of three embedded processors with 100GbE interfaces: Kalray Coolidge-80, Broadcom BCM58808H, and Mellanox BlueField. ARMv8-Compatible CPU Boldly Discards 32-Bit Compatibility Stretching for the semiconductor industry's highest-hanging fruit, Qualcomm's new ARMv8 Centriq processor is targeting Intel's 99% dominance of the server market. Arriving later this year, Centriq will shake Intel's tree in the hope that some of the high-margin fruit will fall into its waiting ARMs.

The new Falkor CPU is a core part of this strategy. Falkor resembles the ARM-compatible CPUs that Qualcomm formerly designed for its Snapdragon smartphone processors but adds some higher-performance features. In one major departure, it ditches 32-bit compatibility altogether in favor of software written only for the Aarch64 instruction set. Centriq is designed mainly for cloud-service providers (CSPs) that need bushels of power-efficient parallelism to run numerous virtual machines for their remote clients. Sampling for almost a year, the 48-core chips are scheduled to begin production in 4Q17.

[September 18, 2017] • Figure 1: Qualcomm Falkor block diagram. • Figure 2: Falkor pipeline diagram. • Figure 3: Centriq 2400 block diagram. • Figure 4: Memory-bandwidth compression. • Figure 5: Falkor's quality-of-service optimization.

New ARMv8 Processor Targets 200Gbps Networking and NVMe-oF High-speed network adapters and distributed flash-storage arrays are about to get a boost. Mellanox is testing the first silicon of its new 16-core BlueField processor and plans to begin general sampling in October.

Barring any last-minute problems, volume production should start in 1H18. The company has doubled the chip's original packet-throughput target to 200Gbps. BlueField combines intellectual property from three recently merged companies: Mellanox, EZchip, and Tilera. Mellanox's main contribution is the ConnectX-5 Ethernet adapter, which becomes a fully integrated subsystem in the new SoC. From EZchip, the processor inherits vital packet acceleration.

And from Tilera, it gains cryptography acceleration, a previously unreleased ARMv8 design, and experience building manycore processors using meshed tiles of programmable CPUs. [August 21, 2017] • Figure 1: Mellanox BlueField block diagram. • Figure 2: BlueField flash-array controller.

• Table 1: Key parameters for Mellanox BlueField processors. • Table 2: Comparison of 100GbE networking processors: Mellanox BlueField and Broadcom NetXtreme BCM58808H. Intel Unveils New Xeon Processors and South-Bridge Accelerators Bronze, Silver, Gold, Platinum: four color-coded product tiers familiar to anyone who has purchased an Obamacare health-insurance plan. They're also Intel's new tiers for Xeon Scalable processors. These chips supersede the Xeon E5v4 embedded processors that use the Broadwell-EP core. Like the insurance policies, the lower tiers (Bronze and Silver) cost less but offer fewer benefits than the higher tiers (Gold and Platinum).

The 16 new Xeon embedded processors derive from the new Skylake-SP server processors but have extended availability. Had Intel kept its usual branding, they would be Xeon E5v5 products. These chips have an improved CPU microarchitecture that executes about 5% more instructions per clock cycle than Broadwell. In addition, they exceed their Xeon E5v4 predecessors in core count, clock frequency, memory bandwidth, PCI Express lanes, multisocket connectivity, power consumption, and list price. [July 24, 2017] • Figure 1: Block diagram of Intel's Purley platform. • Figure 2: Intel's new key-protection technology.

• Figure 3: Intel's new product nomenclature for Xeon Scalable processors. • Table 1: Intel's new C62x-series south bridge (Lewisburg).

• Table 2: Intel Xeon Scalable embedded processors. • Table 3: Comparison of high-end embedded processors from Intel and Cavium. New Proximity Detectors Need No Holes in Screen Bezels Smartphone makers are scrounging for new ways to differentiate their products and to design phones that resemble a solid slab of smooth glass. One obsession is removing all blemishes from the front surface—including the tiny holes, or 'apertures,' in the screen's top bezel for the speaker, front camera, and sensors. The biggest aperture by far is the elongated speaker slot—but it's necessary until the last few holdouts finally stop using their phones to make phone calls. The next-largest aperture is for the selfie camera's lens, but it's required until narcissism becomes unfashionable.

So, by process of elimination, the apertures for the front-facing sensors are the best candidates for elimination. That's why AMS, the industry's largest supplier of light sensors, has invented new modules that can hide behind an inked screen bezel of any color without sacrificing performance. [May 22, 2017] • Figure 1: The evolution of apertures in smartphone bezels.

• Figure 2: Proximity-sensor design tradeoffs. • Figure 3: A two-chip no-aperture proximity sensor. • Figure 4: An AMS 3-in-1 color-sensor module. • Figure 5: 3D vision using structured light. IoTMark-BLE Debuts; More Wireless Protocols to Follow Debates over whose microcontroller and Bluetooth radio module are more power efficient for IoT applications will become easier to settle with EEMBC's newest benchmarks. The industry consortium has introduced its first IoT-Communications suite, which measures the power consumption of a typical IoT client that transfers data using Bluetooth Low Energy (BLE).

By far the most complex suite EEMBC has developed in its 20-year history, IoTMark-BLE is available for order now by members and nonmembers. It's scheduled to ship in June. Instead of measuring power, IoTMark-BLE actually measures energy consumption—power over time. The distinction is important because battery life depends on the total current a microcontroller draws to perform a particular task, regardless of its throughput performance. Thus, IoTMark-BLE measures total energy consumption during several sleep-wake-sleep cycles for a typical task.

[May 15, 2017] • Figure 1: Conceptual diagrams of an IoT client and energy profile. • Figure 2: EEMBC's IoTMark-BLE framework. • Figure 3: IoTMark-BLE test timeline. New QorIQ LA1575 Processor Has Programmable Baseband Engines NXP's first QorIQ LA-series chip has programmable baseband engines that can perform Layer 1 and Layer 2 network processing in software, so it's adaptable to multiple communications standards. Along with its enhanced packet acceleration, the new LA1575 has enough horsepower to serve in multiple roles, including next-generation Wi-Fi routers, 5G cellular radios, and mixed wired/wireless applications, such as fixed-wireless nodes in neighborhood fiber-optic networks. 'LA' stands for Layerscape Access—a nod to the Layerscape chip architecture that is the foundation of all ARM-based QorIQ processors.

Designed primarily for residential and small-business Internet gateways, enterprise access points, and fixed-wireless applications, the LA1575 will be available in dual- and quad-core variants. Both models have ARM Cortex-A53 CPUs, and their target clock frequency is 1.4–1.6GHz. Although NXP is withholding many specifications for now, the most important new features are a programmable vector engine for Layer 1 processing, enhanced accelerators for Layer 2 processing, and an integrated RFIC interface with analog-to-digital and digital-to-analog converters (ADCs/DACs). [April 3, 2017] • Figure 1: Block diagram of NXP's QorIQ LA1575. • Figure 2: NXP's LA1575 Wi-Fi software stack. • Table 1: Comparison of NXP's QorIQ LA1575, Broadcom's StrataGX BCM58713, and Cavium's Octeon TX CN8030. [Brief Item] Bargain hunters will appreciate two additions to Texas Instruments' ARM-based Sitara processor family: the AM5706 and AM5708.

Sampling now, they extend the AM57x-series into embedded applications that require lower power, lower cost, and less board space. Like other AM57x processors, they integrate an ARM Cortex-A15 CPU with two Cortex-M4 controller cores, a C66x DSP, and TI's own programmable controller cores. Several economy measures suit the AM5706 and AM5708 to low-end applications, such as drones, remote sensors, and motors. Compared with the smallest previous AM57x processors, they reduce the maximum clock speed of the Cortex-A15 CPU by 33% to 1.0GHz. The C66x DSP core runs at 750MHz as usual, but some models have a 500MHz CPU and DSP. No competing products can match their features for signal processing, floating-point throughput, and real-time control. [March 13, 2017] Audio, Video, and Graphics Enliven NXP's 64-Bit ARM Processors NXP is offering new 64-bit media processors that can bring the latest digital video and audio to TVs and other embedded systems.

Four chips in the new i.MX8M-series have ARMv8 CPUs and enough additional processing power for most digital-media applications. The superset design is the i.MX8M Quad, which has four 64-bit Cortex-A53 CPUs, one 32-bit Cortex-M4F coprocessor, a VeriSilicon GPU, a high-performance video engine, a multichannel audio engine, and dual camera interfaces. The i.MX8M QuadLite, Dual, and Solo models are subsets of this design. All four chips are scheduled to sample in 1Q17 and reach volume production in 4Q17. The new i.MX8M processors comprise the media-centric branch of the i.MX8 family, which has now more than doubled in size.

The first three chips in this family—the i.MX8 QuadMax, QuadPlus, and Quad—are even more powerful. The i.MX8 superset design is the QuadMax, which more than doubles the processing power by adding two Cortex-A72 CPUs to the i.MX8M configuration. NXP expects to sample the i.MX8-series by 2Q17, with production sometime in 2H18.

[January 23, 2017] • Figure 1: Block diagram of i.MX8M Quad. • Table 1: NXP's i.MX8/8M processor family. • Table 2: Comparison of NXP's i.MX8M Quad, Marvell's Armada 1500U, NXP's i.MX8 QuadMax, and MediaTek's MT8693. Consolidation Creates New Giants, but Some Products Suffer Addendum to Moore's Law: semiconductor-industry mergers are doubling in frequency every 24 months. At least that's how it seemed in 2016. Companies continued to devour one another, further consolidating an embedded-processor market dominated by a dwindling number of major players.

Although the acquisitions are creating larger companies with more resources, some products and roadmaps are falling victim to cost cutting. Looking forward, we expect 2017 to be a transitional year as the companies involved in the biggest mergers digest their large bites and the fabless vendors begin their move to next-generation process technology. ARM will gain momentum without slowing Intel's. Qualcomm will be the rising star after absorbing NXP, and Broadcom and Cavium will battle for third place.

[December 26, 2016] • Figure 1: Worldwide revenue market share of the leading embedded-processor vendors. • Figure 2: Cavium's new Octeon TX family. • Figure 3: Forecast of mobile subscriptions by radio technology. Goldmont CPU Debuts in Atom, Celeron, and Pentium Processors Intel is introducing several new processors that offer the enhanced Goldmont CPU core and stronger security features. Code-named Apollo Lake, these 14nm chips include six Celeron and Pentium products for entry-level PCs, three Atom E3900 embedded processors, and additional A3900 embedded models for automotive.

Their integrated GPUs support 4K-resolution graphics and up to three displays. The embedded models target IoT gateways, industrial automation, vehicle infotainment systems, automotive instrument panels, driver-assistance systems, retail kiosks, and other high-end applications. Apollo Lake supersedes the three-year-old Bay Trail.

The PC versions are shipping now, and the embedded models are scheduled for volume production next quarter. [November 14, 2016] • Figure 1: Apollo Lake block diagram. • Table 1: Apollo Lake PC processors. • Table 2: Intel's Atom E3900 embedded processors. • Table 3: Comparison of Intel's Atom x7-E3950, AMD's GX-412HC, and Texas Instruments' Sitara AM5728. [Brief Item] ARM announced higher-performance versions of its CoreLink on-chip interconnect and DRAM controller at the recent Linley Processor Conference. The new CoreLink CMN-600 can join up to 128 CPUs in a memory-coherent mesh network while boosting throughput by up to 5x over ARM's existing interconnects—making a play for server processors.

And the new CoreLink DMC-620 is an enterprise-class memory controller that slashes the DRAM latency by up to 50% compared with the company's previous controller. Preliminary RTL for both products is available now as licensable intellectual property (IP), and we expect production RTL to arrive later this year. [October 10, 2016] ARM Unveils Its First ARMv8-R Core for Vital Control Systems ARM is introducing its most advanced CPU core for safety-critical controllers in automotive, industrial, and medical systems. The new Cortex-R52 is a 32-bit ARMv8-R design that supports hypervisors by adding another privilege level and a second memory-protection unit. It can simultaneously run multiple real-time operating systems in virtual sandboxes, isolating critical tasks from others. It also boosts performance relative to the existing Cortex-R5, offering superior throughput, optional Neon SIMD extensions, faster context switching, and faster interrupt handling.

Long awaited, Cortex-R52 is the first implementation of the ARMv8-R instruction-set architecture (ISA) announced three years ago. The new core omits the 64-bit features of ARMv8 and implements only a subset of the cryptographic instructions. But it's backward compatible with the 32-bit ARMv7-R architecture, including the compressed Thumb instructions.

[October 3, 2016] • Table 1: Comparison of ARM's new Cortex-R52 and six-year-old Cortex-R5. Chinese Company Already in Production With Smaller Processors Chinese chip vendor Phytium has demonstrated working samples of the world's largest ARMv8-compatible server processor and expects to start production this year. The 64-core FT-2000/64, previously known as Mars, targets a maximum CPU frequency of 2.0GHz and will initially sell to domestic customers. The company has also disclosed the first details of two smaller processors: the 16-core FT-1500A/16 and 4-core FT-1500A/4. These 1.5GHz chips employ a slightly less muscular ARMv8-compatible CPU core that's more power efficient but is otherwise similar to the bigger processor's core. Phytium says both chips have been in production since last year. The 16-core model is designed for web servers, cloud computing, transaction processing, and network switching.

The quad-core model is designed for small servers, desktops, laptops, and embedded systems. Phytium showed the FT-2000/64 in a 2U server at the recent Hot Chips conference in Silicon Valley. It was running cloud-computing software from Tianjin Kylin—an incorporated spinoff from China's National University of Defense Technology, which built the world's fastest supercomputers, Tianhe-1 and Tianhe-2, in 2010 and 2013. [September 26, 2016] • Table 1: Phytium's ARMv8-compatible processors. [Brief Item] Responding to pleas for stronger security in IoT devices and embedded systems, Synopsys is introducing two new products in its DesignWare ARC EM family of licensable CPU cores.

Compared with earlier ARC products, the new ARC SEM110 and SEM120D add several security features, including an improved trusted execution environment with secure privilege levels, a special interface for true-random-number generators, a secure debug interface, and countermeasures against side-channel attacks. The company is pitching these 32-bit synthesizable cores for low-power processors that must protect monetary transactions or other important data. Example applications include mobile devices that enable NFC payments, embedded SIM cards, smart meters, and IoT clients that store sensitive information. [September 26, 2016] On-Chip Hardware Optimized for Databases and Analytics At the recent Hot Chips conference, Oracle revealed new details about the database-acceleration capabilities in its Sparc M7 and S7 processors, which it unveiled at the last two annual conferences.

Now shipping in Oracle systems and servers, the 32-core Sparc M7 is the flagship product, and the 8-core Sparc S7 (code-named Sonoma) is the economy model. Both processors integrate the same acceleration. Oracle says they can speed up some database operations by 10–23x and retrieve results up to 30% faster than the same software running without the new hardware.

Data decompression is 8–11x faster, and a new compression algorithm can shrink memory-resident data by 2–5x. In a footrace with a previous-generation Sparc T5 system, a Sparc M7 server handled 9x more database queries per hour, delivered 11x more performance per watt, and reduced CPU utilization by 3x. Another feature is stronger security. Oracle says its enhanced memory protection effectively repels buffer-overflow attacks, such as the Heartbleed exploit that afflicted the OpenSSL cryptography library in 2014.

[September 19, 2016] • Figure 1: Oracle's data-analytics accelerator (DAX). • Figure 2: DAX function calls. • Figure 3: Sparc CPU utilization with and without DAX acceleration. • Table 1: Oracle DB compression levels. Four Different Versions Target Scale-Up and Scale-Out Servers Much as software engineers fork an existing code base to derive a new program, IBM is forking its Power8 processor to derive a quartet of new Power9 designs. These future chips will have 12 or 24 quad- or octa-threaded CPU cores and different memory subsystems for either scale-up or scale-out servers.

The goal is to offer a few processors optimized for the most important server-market segments without duplicating Intel's vast Xeon catalog. Power9 will be IBM's first product manufactured in 14nm FinFET technology—and the first outsourced to a foundry. The company did not disclose a schedule for its new processor; we believe the initial devices are already in silicon, and the first systems will begin shipping in 2017. IBM says the new CPU cores deliver about 1.5–2.5x more throughput than Power8 cores running at the same clock frequency. In addition, the four new chip designs address various shortcomings that are limiting Power8's adoption by third-party system vendors. [September 5, 2016] • Figure 1: Four initial IBM Power9 designs. • Figure 2: Power9 SMT8 and SMT4 cores.

Crack Url Filter Definition there. • Figure 3: Power8 versus Power9 pipelines. • Figure 4: Power9 versus Power8 performance. • Figure 5: Power9 main-memory subsystems.

• Table 1: SMT4 execution resources. • Table 2: Parameters of IBM's four initial Power9 designs.

• Table 3: IBM Power9 versus two Intel Xeon server processors. [Brief Item] The Embedded Microprocessor Benchmark Consortium (EEMBC) has upgraded its automotive suite to work with the multithreaded and multicore processors that are becoming more common in vehicle-control and entertainment systems. Additional improvements enable testers to combine multiple components of the suite and to use larger data sets when benchmarking processors that have big caches. The new AutoBench 2.0 suite is available now to EEMBC members and nonmember licensees. [August 22, 2016] Intel Silicon and Software Improvements Challenge RISC SoCs Intel is quickening its march into networking with new acceleration features in the latest Broadwell Xeon chips. These features speed up common tasks such as cryptography, packet I/O, forwarding, and virtualization.

As usual, though, the company prefers to execute most networking tasks in software running on its powerful x86 CPUs instead of using specialized hardware engines. Other processor vendors prefer the latter approach. Fortunately for OEMs and their customers, these differences are partly mitigated by using the Data Plane Development Kit—originally an Intel invention that the industry has adopted as a BSD-licensed open-source standard through the DPDK.org community. All the major vendors of networking-oriented RISC SoCs have embraced the DPDK as well. The latest release enables four quality-of-service techniques that Intel collectively calls Resource Director Technology (RDT). Although several Haswell Xeon chips implemented some RDT features, new Broadwell Xeons implement all of them.

[July 4, 2016] • Figure 1: Improvements in basic Layer 3 forwarding performance. • Figure 2: Effects of Intel's cache-allocation technology. [Brief Item] Mellanox has announced a new SoC family that integrates its ConnectX Ethernet adapter designs with the Tilera processor technology acquired with EZchip. Code-named BlueField, the new SoCs will have up to 16 ARM Cortex-A72 cores and are scheduled to sample in 1Q17.

BlueField is the first fruit of Mellanox's February acquisition of EZchip, which in turn had acquired Tilera in 2014. One apparent casualty of the consolidation, however, is the 100-core Tile-Mx processor that EZchip announced before the Mellanox deal. Instead, Mellanox is greatly reducing the core count and upgrading the CPUs. By our estimate, a 16-core BlueField chip will have about the same CPU horsepower as a Cortex-A53-based 40-core chip. Thanks to the ConnectX acceleration, however, BlueField still targets 100-gigabit networking, as the Tile-Mx100 did. [June 13, 2016] Four New QorIQ LS2 Chips Use ARM's High-End CPU NXP's embedded processors keep multiplying like rabbits.

The latest litter includes four members of the QorIQ LS2 family that use ARM's most powerful CPU, the 64-bit Cortex-A72. All four are quad- or octa-core designs boasting maximum clock speeds of 2.0GHz, and they have the company's second-generation packet-processing hardware. Two of them also integrate 10G Ethernet switches. The new products are the octa-core LS2088A and its quad-core near twin, the LS2048A, plus the octa-core LS2084A and its quad-core near twin, the LS2044A.

All are scheduled to sample this quarter and qualify for volume production in the fourth quarter. All are designed primarily for networking and communications. The new Cortex-A72 chips expand the QorIQ family's reach into higher-end systems—particularly those that will implement network functions virtualization (NFV) and software-defined networking (SDN). [May 16, 2016] • Figure 1: Block diagram of NXP's QorIQ LS2088A embedded processor. • Table 1: NXP's four new QorIQ LS2 processors. • Table 2: Comparison of NXP's QorIQ LS2088A with Cavium's Octeon TX CN8240 and CN8360.

Derivative Cortex-A57 QorIQ Processors Trim Power and Cost Shortly after announcing three new LS1-series chips in March and April, NXP plans to sample two lower-cost embedded processors in the QorIQ LS2 family this quarter. The new products are the octa-core LS2080A and quad-core LS2040A, which use the 64-bit ARM Cortex-A57. They are similar to the existing LS2085A and LS2045A but trim a few features to reduce power consumption and enable lower prices.

Although networking is the main target, LS2 chips are widely used in industrial and other embedded applications. The LS2080A and LS2040A are designed for enterprise routers, line-card controllers, security appliances, virtual customer premises equipment (vCPE), and service-provider gateways.

They are so closely related to the existing LS2085A and LS2045A that we believe they are based on the same die. [May 9, 2016] • Figure 1: Block diagram of NXP's QorIQ LS2080A processor. • Table 1: Comparison of NXP's QorIQ LS204xA and LS208xA processors.

New Dual- and Quad-Core LS1 Chips Boost CPU Performance NXP is on the verge of sampling the industry's first embedded processors that use ARM's Cortex-A72. The quad-core LS1046A and dual-core LS1026A will bring greater CPU performance to the QorIQ family's LS1 series, which currently comprises 32- and 64-bit chips based on Cortex-A7, Cortex-A9, and Cortex-A53. The new A72 processors are scheduled to sample this quarter and begin production later this year.

As usual, networking is the main target, but the chips are also useful for industrial and general embedded applications. With their dual 10G Ethernet (10GbE) controllers, four GbE controllers, and one 2.5GbE controller, the new processors are well equipped for enterprise routers, switches, line-card controllers, security appliances, virtual customer premises equipment (vCPE), service-provider gateways, and network-attached storage (NAS).

[April 25, 2016] • Figure 1: Block diagram of NXP's QorIQ LS1046A processor. • Table 1: Comparison of NXP's QorIQ LS1046A, AMD's Opteron A1120, and AppliedMicro's Helix 2 APM887104-H2 processors. Three Modem Chips Span 3GPP Generations From 2G to NB-IoT Intel has revealed more details about three cellular modem chips it announced at the recent Mobile World Congress. All three are designed for use with embedded processors that need an external modem for wireless connectivity, and they reinforce the company's push into the cellular IoT market. The XMM 7120M is an LTE Category 1 modem that stacks a baseband processor, flash memory, DRAM, and power-management unit (PMU) in one package.

It's intended primarily for machine-to-machine (M2M) applications, such as factory equipment, smart meters, medical devices, security cameras, and point-of-sale (PoS) terminals. Another modem, the XMM 7115, is intended for cellular IoT clients that will use the 3GPP Release 13 protocol for Narrowband IoT (NB-IoT—unofficially known as LTE Category M2, or LTE-M2). For IoT and M2M systems that don't need LTE connectivity, the XMM 6255M is a dual-band 2G/3G modem. Like the XMM 7120M, it stacks flash memory, DRAM, and a PMU on the baseband die, but its package is about 30% smaller.

[April 4, 2016] • Table 1: Intel's new cellular modem chips for embedded applications. • Table 2: 3GPP protocols for low-cost systems. Two Integrated-Baseband Processors Embrace Emerging Standards Intel is making a stronger play for low-power embedded systems that need wireless communications by introducing two new processors with integrated LTE modems.

Although it announced these products at the recent Mobile World Congress, the technical details are only now trickling out, and we don't expect volume production to begin until later this year or next. One new processor is a quad-core Atom, and the other uses a Quark CPU. The former is the x3-M7272, which has four Airmont CPUs operating at a maximum clock frequency of 1.2GHz. The latter is the XMM 7315, which uses a Lakemont core as the application CPU. Intel's new SoCs are vital to its strategy of pushing the x86 architecture deeper into embedded markets.

[March 28, 2016] • Table 1: Intel's new processors for automotive telematics and IoT. • Table 2: LTE protocols for low-power systems. [Brief Item] NXP has announced its first QorIQ product after absorbing Freescale—a low-power 64-bit ARM processor for IoT gateways, low-end routers, networked storage, printers, and factory automation. The new LS1012A slashes typical power consumption to about 1W by operating a single Cortex-A53 CPU at an 800MHz maximum clock speed.

NXP bills it as the world's smallest and lowest-power 64-bit processor. In fact, it beats even the existing 32-bit LS1 chips. For networking, it has two Gigabit Ethernet (GbE) controllers that can also handle 2.5GbE links. For cryptography, it has the company's SEC 5.5 security engine, which boosts IPSec throughput to 1.1Gbps. In addition, the chip includes enough packet-acceleration hardware for line-rate networking, although it doesn't implement the second-generation Data Path Acceleration Architecture (DPAA2) that's coming in higher-end LS1 and LS2 chips.

[March 14, 2016] Wi-Fi Router Processor Uses Customized Cortex-A53 CPUs Broadcom is sampling its first announced 64-bit ARM chip, but it's not the much-anticipated Vulcan server processor. It's the BCM4908, an embedded processor for home Wi-Fi routers and enterprise access points. It integrates four Cortex-'B53' cores—licensed Cortex-A53 CPUs that Broadcom has customized. The BCM4908 is designed for 802.11ac Wave 2 routers that will implement such advanced features as 4x4 MIMO, multiuser MIMO (MU-MIMO), and high-bandwidth 160MHz channels.

[February 15, 2016] • Figure 1: Broadcom's BCM4908 in an 802.11ac Wave 2 router. • Figure 2: Broadcom BCM4908 block diagram. • Table 1: Broadcom's BCM4908 versus NXP's QorIQ LS1043A and Qualcomm's IPQ8064.

[Brief Item] Broadcom's first StrataGX embedded processors built in 28nm technology bring new security features to the product line while slashing power consumption. The ARM-based BCM583xx chips, which are shipping now, typically consume about 1.5W.

The previous StrataGX BCM585xx and 586xx built in 40nm CMOS typically consume 2.5-3.0W. The process shrink alone wasn't enough to achieve those power savings, however, so Broadcom also pared back some features and performance. Nevertheless, some of the new chips have additional security hardware and are particularly useful for point-of-sale (PoS) terminals, credit-card kiosks, and other secure systems. [January 18, 2016] New Intel Microcontrollers Lag Competitors in Cost and Power Intel's first Quark MCUs have finally arrived, and they are struggling to compete with similarly priced ARM-based MCUs in CPU performance, memory capacity, and standby power.

For the most part, they stand out in only one respect: they are x86 compatible. If that difference matters, Quark has an unmatched advantage. The company began shipping the initial Quark D1000 ('Silver Butte') in November. An additional model, the Quark D2000 ('Mint Valley'), is beginning production now, and a third model, the Quark SE ('Atlas Peak') is coming in 2Q16. All three chips are similar and offer typical 32-bit MCU features. The SE differs from its siblings by integrating a DSP sensor hub and a pattern-matching 'neural' engine that Intel says is capable of rudimentary machine learning.

All these MCUs clock rather slowly at 32MHz and use the Quark CPU core, which mates a 32-bit Pentium instruction set from 1993 with a 486-class microarchitecture from 1989. [January 11, 2016] • Figure 1: Block diagram of Intel's Quark D1000 microcontroller. • Table 1: Intel's first Quark microcontrollers.

• Table 2: Intel's Quark D1000 versus selected ARM Cortex-M4 MCUs. New ARMv8-M, TrustZone, and Amba 5 Protect Small Systems Heeding the call for better security in embedded systems and the Internet of (Insecure) Things, ARM has introduced a new subset of its ARMv8 architecture and a new Amba bus for future Cortex-M cores. The new ARMv8-M architecture brings the company's TrustZone security technology to even the smallest microcontrollers and deeply embedded systems. It's optimized for 32-bit chips in devices as tiny as sensors, smartwatches, and IoT end points. The improved TrustZone is a crucial part of ARMv8-M.

New hardware will enforce greater separation between secure and nonsecure code and data while easing software development in some respects. And the new Amba 5 AHB5 on-chip bus can extend TrustZone beyond the CPU to protect other SoC components, including integrated peripherals, SRAM, and flash memory. All together, they constitute ARM's most extensive security upgrade since the original TrustZone made its debut 12 years ago.

[November 16, 2015] • Figure 1: ARMv8-M memory protection. • Figure 2: TrustZone interrupt-mask banking. • Figure 3: TrustZone interrupt isolation. • Figure 4: TrustZone secure function calls.

• Figure 5: ARMv8-M memory maps. • Figure 6: Amba 5 Advanced High-Performance Bus (AHB). • Table 1: ARMv8-M Baseline enhancements.

Embedded R-Series SoCs Integrate South Bridge and Excavator CPUs AMD's new Embedded R-Series processors are the company's most highly integrated SoCs to date. They include the latest Excavator x86 CPUs, the south-bridge logic, dual DDR4 controllers, and ARM security coprocessors. Three models also have Radeon GPUs and 4K video decoders.

Code-named Merlin Falcon, the new R-series comprises five distinct models, not counting the extended-temperature versions. They improve on the 'Bald Eagle' Embedded R-Series chips introduced last year, mainly by replacing the Steamroller CPUs with Excavator and by integrating the south bridge. In fact, the new chips are almost identical to the Carrizo processors introduced last February for low-cost desktop PCs, notebooks, and tablets. One difference: they don't support adaptive voltage and frequency scaling (AVFS).

Instead, they have 'configurable TDPs,' meaning they can stay within a desired thermal design power by operating at a clock frequency and voltage in their nominal range. [November 2, 2015] • Figure 1: AMD Embedded R-Series versus Intel ULV Core processors (CoreMark and 3DMark11). • Table 1: AMD Embedded R-Series processors. New AM57x Processors Integrate C66x DSPs With Cortex-A15 Sooner or later, it seems, Texas Instruments always reverts to its roots by integrating DSP cores in its embedded processors. Now, the Sitara family is getting its first DSPs, following a trail blazed by TI's OMAP, DaVinci, Integra, and C6000 chip families. The new Sitara AM57x series is currently sampling and is scheduled for volume production early next year. The new AM5716, AM5718, AM5726, and AM5728 have one or two ARM Cortex-A15 CPUs operating at 500MHz or 1.5GHz, plus one or two TI C66x DSP cores operating at 500MHz or 750MHz.

The AM5718 and AM5728 add one or two PowerVR SGX544 GPU cores from Imagination Technologies and a GC320 graphics core from Vivante. Thanks to these and other features, the Sitara AM57x line supersedes TI's older DaVinci media processors in almost every respect. [October 19, 2015] • Figure 1: Texas Instruments Sitara AM5728 block diagram.

• Table 1: Texas Instruments Sitara AM57x series. • Table 2: Comparison of TI's Sitara AM5728, DaVinci DM8168, and KeyStone II 66AK2E02. HVX Image-Processing Instructions Debut in Snapdragon 820 At the recent Hot Chips conference in Silicon Valley, Qualcomm introduced 1,024-bit SIMD extensions that turn its new Hexagon 680 DSP into a power-efficient image-processing engine. Although these Hexagon Vector Extensions (HVX) won't replace the phone's dedicated image signal processor (ISP), they can offload some tasks from the ISP, the GPU, and the application-processor CPUs, which are ARM-compatible cores with Neon SIMD extensions.

The company says the new 1,024-bit vectors can perform eight times as many operations per clock cycle as the 128-bit Neon vectors while using only 6-25% as much energy per operation. The first chip to include the Hexagon 680 DSP core with HVX is the forthcoming Snapdragon 820, which we expect to appear in phones in 1Q16. [September 14, 2015] • Figure 1: Snapdragon 820 block diagram. • Figure 2: HVX vector processing. • Figure 3: Hexagon 680 block diagram. • Figure 4: HVX programmer's view.

• Figure 5: Preprocessing digital images with HVX. • Figure 6: HVX versus Krait plus Neon. Octa-Core Sonoma Processor Aims for Scale-Out Servers After cramming a record-setting 10 billion transistors into the 32-core Sparc M7 server processor last year, Oracle is introducing a smaller version with one-fourth as many CPUs. Code-named Sonoma, the new octa-core chip is actually better integrated in some ways—it's the first server processor to include an InfiniBand host channel adapter for clustering and remote direct memory access (RDMA).

Whereas the powerful Sparc M7 is designed for scale-up computing, Sonoma is designed for scale-out applications. Yet it retains the bigger processor's unique features, such as hardware accelerators for the company's database software and application-data integrity checking. Other features for reliability, availability, and serviceability (RAS) made the cut, too. At the recent Hot Chips conference, Oracle presented Sonoma as a junior version of the Sparc M7 that costs less money, consumes less power, and requires less board space. [September 7, 2015] • Figure 1: Die plot of Oracle's Sparc Sonoma processor. • Figure 2: Sonoma's InfiniBand virtualization.

• Figure 3: Sonoma versus Sparc T5-2. • Table 1: Sparc Sonoma versus Intel's Xeon D1540 and Xeon E5-2630Lv3. EOS-S3 Integrates ARM, DSP, FPGA, and Voice Triggering Paying attention to boring chitchat can be draining, but today's smartphones and other voice-enabled devices must constantly listen to our conversations to detect keyword commands and passphrases.

QuickLogic's new EOS-S3 sensor hub makes that tedium more power efficient than ever. It includes an always-on sound detector that can listen and respond to predefined voice triggers while drawing a mere 350 microamps.

This highly integrated SoC also has an ARM Cortex-M4, a micro-DSP core, and a programmable-logic fabric. Capable of monitoring up to 20 sensors, the EOS-S3 is designed for smartphones, tablets, Internet of Things (IoT) devices, and wearables. The EOS-S3 implements the Low Power Sound Detector and TrulyHandsFree technology from Sensory, which claims 95% command-recognition accuracy even in noisy environments.

[August 17, 2015] • Figure 1: QuickLogic EOS-S3 block diagram. • Figure 2: QuickLogic's EOS-S3 in a typical IoT or wearable design. • Table 1: EOS-S3 power consumption. DPAA2 Streamlines Packet Processing for Future QorIQ Chips Freescale is extensively overhauling its Data Path Acceleration Architecture (DPAA), a blanket term for the specialized packet-processing logic in QorIQ chips. DPAA2 is a major revision of the data plane that is more powerful, more flexible, and more programmable than the company's previous designs.

It was inspired by Nokia's Open Event Machine, a model for nonblocking data-plane processing that supersedes conventional thread-based models in multicore processors. The industrywide OpenDataPlane initiative is loosely based on Open Event Machine and is promoted by Linaro, a consortium that develops open-source Linux software for the ARM architecture. OpenDataPlane also supports the Power, MIPS, and x86 architectures, and Freescale is implementing DPAA2 in all of its QorIQ processors, not just the ARM chips.

DPAA2 will debut in the QorIQ LS2085A and LS2045A, a pair of ARM-based communications processors that began sampling in 1Q15. [July 20, 2015] • Figure 1: DPAA2 architecture. • Figure 2: Advanced I/O Processor block diagram. • Figure 3: DPAA2's queue/buffer manager. • Figure 4: Layer 2 Ethernet switch.

Freescale's New LS1 Processors Have Up to Eight 64-Bit CPUs Freescale is expanding its QorIQ family by adding the most powerful members of the LS1 series announced to date. The new LS1048A and LS1088A have four or eight ARM Cortex-A53 cores operating at 1.5GHz—plus the company's much improved packet-acceleration hardware. Although they aren't the first LS1-series chips to use Cortex-A53, the LS1088A is the first to have more than four cores.

These new chips are designed mainly for intelligent network interface cards (NICs) and edge routers, and they are also useful for industrial and aerospace applications. They have dual 10 Gigabit Ethernet (10GbE) ports, eight GbE ports, cryptography engines, and Freescale's second-generation packet-acceleration hardware (Data Path Acceleration Architecture, or DPAA2). Freescale also made two important roadmap announcements: future LS2-series processors will use the more muscular Cortex-A72, and the Power Architecture branch of the QorIQ family will advance to 16nm FinFET technology.

Some of those 16nm PowerPC chips will be shrinks of existing 28nm T-series designs; others will be fresh designs. [July 13, 2015] • Figure 1: Freescale QorIQ LS1088A block diagram. • Table 1: Feature comparison of the QorIQ LS1088A and LS1048A. CN72xx- and CN73xx-Series Processors Integrate 4 to 16 CPU Cores If Goldilocks thought Cavium's initial Octeon III processors were too small and the later ones too big, she would find the newest chips to be just right. The CN72xx and CN73xx midrange products are scheduled to begin sampling in July and start volume production in 4Q15, the company says.

The fast ramp from sampling to production is possible because Cavium has already delivered four other series of lower- and higher-end Octeon III chips using the same GlobalFoundries 28nm process. The CN7230, CN7240, CN7340, CN7350, and CN7360 fill the midrange of the Octeon III family by integrating 4 to 16 of Cavium's MIPS64-compatible CPU cores. Below them are the CN70xx and CN71xx series, which have one to four CPUs and which began production in 4Q14. Above them are the CN77xx and CN78xx series; these chips have 16 to 48 CPUs and began production in 2Q15.

[June 29, 2015] • Figure 1: Cavium's Octeon III family. • Table 1: Cavium's Octeon III CN72xx and CN73xx series.

Ice-Grain: Industry’s First Licensable Power Manager for SoCs After pioneering licensable on-chip interconnects since the 1990s, Sonics is branching out with the industry's first licensable on-chip power manager. Implemented as synthesizable intellectual property (IP), the new Ice-Grain subsystem will work with any interconnect and can bring sophisticated energy-saving technology to any SoC design. Similar technology is proprietary and appears only in some advanced SoCs designed by top-tier chip vendors.

Ice-Grain (not to be confused with 'in-circuit emulation') is a hierarchical control subsystem that manages power, clock, and voltage domains. It enables chip architects to divide their designs into many more individually controllable domains than are practical using conventional techniques. By having more domains, the chip can power only those circuits it needs at any given moment, thereby reducing both active power and static leakage.

[June 15, 2015] • Figure 1: Ice-Grain power-state switching. • Figure 2: Ice-Grain's central controller.

• Figure 3: Ice-Grain integration with SonicsGN. • Table 1: Power-saving techniques ranked by transition latency. Licensable Network-on-a-Chip Eases Timing Closure Hoping to reduce the number of chip designers furloughed to funny farms, Arteris has introduced a new version of its licensable network-on-a-chip (NoC) that tackles one of the industry's most maddening problems: timing closure. By adding some physical awareness and layout automation to the early phases of the design process, FlexNoC Physical ensures that signals can traverse the chip's interconnects within the design's timing parameters. As a leading vendor of NoC intellectual property (IP) with more than 60 licensees, Arteris has industrywide visibility into the problem. FlexNoC Physical is the company's response to customer demand for a timing solution that precedes logic synthesis and physical layout.

[May 22, 2015] • Figure 1: Arteris FlexNoC block diagram. • Figure 2: Critical-path pipelining.

• Figure 3: NoC pipeline placement. • Figure 4: FlexNoC versus a conventional crossbar. [Brief Item] Texas Instruments is sampling a new KeyStone II embedded processor for high-speed signal processing in avionics, defense, medical, and test-and-measurement applications.

The 66AK2L06 has two ARM Cortex-A15 cores and four TMS320C66x DSPs, all running at 1.0GHz or 1.2GHz, depending on the model. The chip is sampling now in 28nm technology and scheduled for volume production in 3Q15. TI derived the 66AK2L06 from the KeyStone II TCI6630AK2L wireless-base-station processor. Omitting some cellular-specific features reduces the chip's cost and power consumption. We suspect the 66AK2L06 is actually the same die, which would enable TI to salvage some base-station chips whose wireless hardware fails to pass muster.

[May 11, 2015] New S32V234 Vision Processor Enables Computer-Assisted Driving Freescale is marking another milepost on the long road to the driverless horseless carriage. In June, the company plans to sample a new processor family designed for advanced driver-assistance systems (ADAS). The first chip is the S32V234, which combines real-time computer vision with intelligent image analysis, enabling such functions as autonomous emergency braking, lane-departure correction, road-sign recognition, and adaptive cruise control. It's also capable of sensor fusion—for example, integrating the 360-degree view of multiple cameras and sensors. The S32V234 has four 64-bit ARM Cortex-A53 CPUs running at 1.0GHz. A 32-bit Cortex-M4 CPU offloads I/O control. Two Cognivue Apex-642 cores (each clocking at 500MHz) handle the computer-vision processing, aided by an image signal processor.

For 3D graphics and video, Freescale licensed Vivante's GC3000 GPU and an H.264 video encoder/decoder. A Freescale cryptography engine enables secure communications with other system components and the outside world.

[April 27, 2015] • Figure 1: Automotive-vision market growth. • Figure 2: Freescale S32V234 block diagram. • Table 1: Cognivue Apex-642 block diagram. New Ceva-XM4 DSP Core Adds FPUs and 32/64-Bit Vectors As more machines gain the gift of sight, engineers are rediscovering a principle long known to biologists: vision is equally a sensory perception and a cerebral function.

The eyes see, but the brain interprets and reacts. Thus, processing power is as vital to computer vision as image capture.

To augment those back-end functions, Ceva has introduced a new licensable DSP core optimized for vision processing. The Ceva-XM4 is a fourth-generation design that has numerous improvements over the previous Ceva-MM3101.

It quadruples the number of multiply-accumulate (MAC) units, quadruples the width of VLIW operations, adds 32-bit floating-point units and vector operations, and doubles the number of scalar units. It also boosts the I/O bandwidth by 100% and memory bandwidth by 33%.

[April 27, 2015] • Figure 1: Ceva-XM4 block diagram. • Figure 2: Ceva-XM4 performance versus Ceva-MM3101.

• Figure 3: Scatter-gather memory operations. • Table 1: Ceva-XM4 features versus Ceva-MM3101. Designers Ditch Proprietary CPUs in 100-Core ARMv8 Processor Boasting more ARMs than a Hindu goddess, EZchip's new Tile-Mx100 is by far the largest 64-bit ARM processor yet announced. It weaves 100 Cortex-A53 cores together in a cache-coherent mesh network that also includes packet accelerators, cryptography engines, memory controllers, and high-speed I/O interfaces.

Intended for 100Gbps data-plane networking and network-function virtualization, the Tile-Mx100 significantly raises the bar for manycore ARM designs. It's also the first fruit reaped from EZchip's $130 million acquisition of Tilera last year.

The Tile-Mx100 follows the Tile-Gx family, whose largest member is the 72-core Tile-Gx8072. [March 2, 2015] • Figure 1: EZchip Tile-Mx100 block diagram. • Figure 2: A SkyMesh quad-core tile. • Table 1: Comparison of manycore processors for networking: EZchip's Tile-Mx100 and Tile-Gx72, Broadcom's XLP980, and Cavium's Octeon III CN7890. Carrizo and Carrizo-L Processors Target Low-Cost Notebook PCs Like black holes, AMD's Carrizo processors are packing more stuff into the same space while radiating less heat. And AMD hopes Carrizo's gravitational attraction will be so irresistible that customers will never achieve escape velocity for a return trip to Planet Intel.

Carrizo succeeds the Kaveri processors that appeared last year. A related family, Carrizo-L, cuts costs and power further by omitting several features; it succeeds the Beema processors also introduced last year. Both new processors will appear mainly in low-cost notebook PCs, small desktops, and convertible tablet notebooks. Remarkably, Carrizo crams 29% more transistors onto a 250mm 2 die that's barely larger than Kaveri's—without a process shrink. AMD is using the larger transistor budget to introduce its new Excavator CPU core, enhance the integrated GPU and video accelerator, debut its first implementation of adaptive voltage/frequency scaling, fully support the Heterogeneous System Architecture, and make Carrizo a true SoC by integrating the south-bridge system controller. [February 23, 2015] • Figure 1: AMD Carrizo die photo. • Figure 2: AMD's adaptive voltage/frequency scaling.

• Table 1: Feature comparison of AMD's Carrizo, Carrizo-L, and A10-7300 (Kaveri) processors. Recognizing the Best Chips and Technology of the Past Year By The Linley Group To recognize the top semiconductor offerings of the year, The Linley Group presents its 2014 Analysts' Choice Awards. These awards span several categories: embedded processors, mobile processors, PC and server processors, processor-IP (intellectual property) cores, and related technology. We have presented them in Microprocessor Report for many years.

This year, we are adding two new categories to recognize chips that are not processors: mobile chips and networking chips. The new categories reflect our expanded coverage of these areas in our sister publications Mobile Chip Report and Networking Report. To choose each winner, The Linley Group's team of technology analysts gathered to discuss the merits of the leading products that entered production (or, in the case of IP, production RTL) in 2014. This guideline eliminates 'paper' products and allows us to evaluate delivered capabilities, not promises. We also considered only merchant offerings (e.g., chips that sell to system vendors) and not ASIC or in-house designs. Our analyst team is deeply familiar with all the leading products, having written about them over the course of the past year. We selected the winners on the basis of their performance, power, features, and cost for their target applications.

[January 19, 2015] • Best PC or Server Processor: Intel Xeon E7v2 family • Best Embedded Processor: Broadcom XLP980 • Best Mobile Processor: Nvidia Tegra K1-64 • Best Processor IP: ARM Cortex-M7 • Best Mobile Chip: STMicroelectronics STM32F411 sensor hub • Best Networking Chip: Marvell Prestera DX4251 Carrier Ethernet switch • Best Technology: Samsung 3D-NAND flash memory Memory Bandwidth Helps IBM Server Processor Ace Big Benchmarks IBM is making good on its plan to sell Power8 processors to third parties, with Tyan already offering rack-mount development systems. Newly disclosed scores show Power8 beating Intel's most powerful server processor, the 18-core Xeon E5-2699v3 (Haswell-EP), on important benchmark tests.

Both processors deliver outstanding performance on the SPEC CPU benchmarks, but IBM's huge advantages in multithreading and memory bandwidth favor Power8 when running larger test suites that more closely reflect real-world enterprise applications. Overall, the results show that IBM offers a viable high-end alternative to Intel's market-leading products. Equally important to Big Blue, Power8's performance is energizing the OpenPower Foundation, an IBM-led alliance that rallies other companies to create a larger hardware and software ecosystem around the processor. IBM is offering Power8 chips to system builders in the merchant semiconductor market and is even licensing the architecture to other processor vendors. [December 29, 2014] • Table 1: IBM Power8 processors for the merchant market.

• Table 2: Power8 versus Haswell-EP. Startup's Network-on-a-Chip Technology Promises to Ease SoC Design NetSpeed Systems, a three-year-old network-on-a-chip (NoC) vendor, received a vote of confidence in November by raising a second round of funding from Intel Capital and Walden-Riverwood Ventures. Although the dollar amount was undisclosed, it will strengthen the startup's position versus established rivals like Sonics and Arteris. NetSpeed also faces growing competition from ARM, whose licensable cache-coherent interconnects are becoming more sophisticated and are encroaching on some territory the NoC vendors have staked out. The growing complexity of SoC designs is creating more opportunities for licensable NoCs. These configurable fabrics are one more piece of intellectual property (IP) that's often better obtained ready-made than designed from scratch—just like the CPUs, GPUs, and peripheral cores that NoCs weave together on a chip. [December 1, 2014] • Figure 1: NetSpeed's Gemini network-on-a-chip.

• Figure 2: NetSpeed's NocStudio configuration tool. • Figure 3: Orion versus Amba AXI. • Figure 4: Optimizing Orion. • Figure 5: Two Gemini NoC implementations. New Helix Family Inherits DNA From X-Gene Server Processors AppliedMicro's X-Gene server processors are spawning a new family of ARM-compatible embedded processors intended mainly for communications. The first two members of the new Helix family use existing X-Gene die built in 40nm CMOS technology, but future products include new designs built in a 28nm high- k metal-gate (HKMG) process. All are compatible with the 64-bit ARMv8 architecture.

The Helix family will eventually supersede AppliedMicro's PacketPro APM86xxx embedded processors, which have 32-bit PowerPC 460 CPUs and are manufactured in 40nm technology. Those single- and dual-core chips are highly optimized for packet processing and communications. By contrast, Helix chips have two, four, or eight 64-bit CPUs, and we believe they have much of the same packet acceleration as the PacketPro Mamba and Diamondback processors.

[October 27, 2014] • Figure 1: Block diagram of AppliedMicro's Helix 2 embedded processor. • Table 1: Feature comparison of AppliedMicro's Helix embedded processors. • Table 2: AppliedMicro's Helix 2 versus Cavium's Octeon III CN72xx and Freescale's QorIQ LS2085A. New QorIQ LS1043A and LS1023A Processors Use Cortex-A53 Once an exclusive feature of servers, workstations, and supercomputers, 64-bit CPUs are now spreading even to some low-end embedded processors.

Freescale's new QorIQ LS1043A and LS1023A are the first 64-bit chips in the entry-level LS1 series. They integrate up to four ARM Cortex-A53 CPUs with a cryptography engine, packet acceleration, 10-Gigabit Ethernet, and DDR4 memory control. Despite their maximum target clock frequency of 1.5GHz, they consume only 8W or less—cool enough for fanless systems.

Applications include integrated-services branch routers, security appliances, industrial controllers, and edge devices that implement software-defined networking (SDN) and network-function virtualization (NFV). [October 27, 2014] • Figure 1: Block diagram of Freescale's QorIQ LS1043A communications processor. • Table 1: Feature comparison of Freescale's QorIQ LS1043A and LS1023A processors. • Table 2: Freescale's QorIQ LS1043A compared with Freescale's QorIQ T1042, Broadcom's XLP II XLP208, and Cavium's Octeon III CN7130. New Synopsys CPU Offers an MMU, SMP, and Optional L2 Cache Synopsys is licensing a new DesignWare ARC CPU core that aims for higher-end embedded applications. It is the most powerful implementation of the ARCv2 instruction-set architecture.

Targets include Wi-Fi routers, Internet gateways, digital TVs, smart appliances, and advanced driver-assistance systems. To muscle up, the new ARC HS38 core adds a memory-management unit (MMU), a translation lookaside buffer (TLB), an optional L2 cache, and extended memory addressing. Consequently, it can run a virtual-memory operating system, such as full versions of Linux. The 32-bit synthesizable CPU also supports dual- and quad-core clusters with cache-coherent symmetric multiprocessing (SMP). Yet it retains the user configurability, low power consumption, and small size of its ARC predecessors. Simulations indicate the HS38 will deliver a maximum worst-case clock frequency of 1.6GHz in a 28nm high-performance-mobile CMOS process, such as TSMC's 28nm HPM.

The typical clock frequency in that process is 2.2GHz, offering plenty of performance headroom. [October 20, 2014] • Figure 1: Block diagram of the Synopsys DesignWare ARC HS38 core. • Figure 2: Block diagram of an ARC HS38x2 dual-core cluster. • Table 1: Feature comparison of the ARC HS38, HS36, and HS34. • Table 2: ARC HS38 versus ARM's Cortex-A7 and Imagination Technologies' MIPS32 interAptiv CPU cores. [Brief Item] Texas Instruments is sampling four KeyStone II processors it originally announced two years ago and is extending their temperature range for industrial and military-aerospace applications.

These chips combine up to four Cortex-A15 CPUs with an integrated Ethernet switch. Although they are intended mainly for networked industrial applications, their switched Ethernet ports and optional DSP also suit them to enterprise gateways. TI announced the 66AK2Exx and AM5K2Exx along with the 66AK2Hxx back in 2012; the 66AK2H products sampled in December 2012, but engineering samples of the other two families didn't appear until February 2014. General sampling began in September, and production is scheduled to start by the end of this year. [October 13, 2014] Hybrid Memory Cubes Accelerate Fujitsu's Supercomputer Processor In the never-ending race to build the world's fastest supercomputer, using conventional technology is like running a 100-meter sprint in rubber boots.

So Fujitsu's newest Sparc64 processor laces up some wing-footed running shoes—such as Micron's Hybrid Memory Cubes, new 256-bit vector instructions, and a pair of 'assistant cores' that offload system software from the other 32 CPU cores. In fact, this is the first processor we've seen that uses Micron's stacked-memory cubes.

Unveiled at the recent Hot Chips symposium, the Sparc64 XIfx is the latest in Fujitsu's line of SPARC-compatible processors for high-performance computing (HPC). These devices have particularly strong FPUs and single-instruction, multiple-data (SIMD) extensions, which Fujitsu continues to improve. And the core counts are doubling with each generation. The new Sparc64 XIfx has 34 CPUs, including the two assistant cores. The previous Sparc64 IXfx (introduced in 2012) had 16 cores, and the Sparc64 VIIIfx (2011) had 8. [September 22, 2014] • Figure 1: Sparc64 XIfx die photo with overlay. • Figure 2: Block diagram of Sparc64 XIfx core groups.

• Table 1: Key parameters for Fujitsu's Sparc64 XIfx, IXfx, and VIIIfx supercomputer processors. Oracle's Newest 32-Core Server Processor Powers Bigger Iron It's a good thing Oracle doesn't sell millions of Sparc server processors, or the world might run out of sand. The next-generation Sparc M7 weighs in with more than 10 billion transistors on a die we estimate at about 700mm 2. Each of its 32 CPU cores can simultaneously execute eight threads, and the chip has more than 70MB of cache. The biggest Sparc M7 system can encompass 64 sockets, which would total 2,048 CPUs, 4.4GB of cache, 16,384 threads, and up to 128 terabytes (TB) of physical memory. What to do with this monster?

Run Oracle's database software, of course. Since Oracle acquired Sun Microsystems in 2010 and took over SPARC development, it has executed a surprisingly aggressive roadmap that has new processors coming out every year. Because Oracle is virtually the only customer for these processors, the architects can tune them for the company's famous enterprise software.

Consequently, Sparc processors are gradually evolving into Oracle ASICs—without sacrificing general-purpose programmability. [September 8, 2014] • Figure 1: Oracle Sparc M7 die photo. • Figure 2: Sparc M7 performance versus Sparc M6. • Table 1: Key parameters for Oracle's Sparc M7, M6, and M5 server processors. New BCM617x5 Processors Add CFR, Better Carrier Aggregation Broadcom is sampling three new small-cell base-station processors that improve on its previous chips by adding crest-factor reduction to their digital front ends and by enabling better LTE carrier aggregation. Other new features of the BCM617x5 series include dual-sector 2x2 MIMO in the high-end product, more-powerful CPU and DSP cores, dual-band Wi-Fi hosting, and support for China Mobile's Zuc stream cipher. The new processors are the BCM61765, BCM61755, and BCM61735.

They are pin compatible with the BCM61760, BCM61750, and BCM61730 that Broadcom announced and shipped last year. All BCM617xx processors support LTE (FDD or TDD) and LTE-Advanced (LTE-A), plus multiple 3G standards (WCDMA and TD-SCDMA).

All are capable of simultaneous dual-mode (3G/4G) operation, and they also support network sniffing for various 2G standards in self-organizing networks (SONs). The new chips began sampling in 1Q14 and are scheduled for production in 4Q14. [August 11, 2014] • Figure 1: Block diagram of Broadcom's BCM61765. • Table 1: Key parameters for Broadcom's BCM617x5 base-station processors.

• Table 2: Broadcom's BCM61765 versus Qualcomm's FSM9900 and Texas Instruments' KeyStone II TCI6620K2L. AM437x Processors Cut Some Features to Reach One Watt Texas Instruments is sampling four new Sitara embedded processors that use much less power than their predecessors while upgrading the CPU core.

Cd Driver Software For Sony Vaio. Whereas the previous Sitara AM38xx chips use Cortex-A8 and consume about 5W, the new Sitara AM437x chips use Cortex-A9 and consume only about 1W (maximum). Because TI manufactures both series in the same 45nm CMOS process, some compromises were inevitable: a lower CPU clock frequency, slower integrated graphics, fewer high-speed I/O interfaces, less memory bandwidth, and no pin compatibility with previous Sitara chips. Like their forebears, the new processors target a broad range of embedded applications, but they focus on real-time industrial communications, test-and-measurement instruments, barcode scanners, portable data terminals, medical devices, and GPS navigation.

They are particularly well suited for digital signage and other designs that take advantage of the PowerVR SGX530 GPU in two of the models. [July 21, 2014] • Figure 1: Block diagram of Texas Instruments Sitara AM4379. • Table 1: TI's new Sitara AM437x series: the AM4379, AM4378, AM4377, and AM4376.

Better Power Efficiency, Faster Graphics, and TrustZone Security The Eagles have landed: AMD's Steppe Eagle and Crowned Eagle, that is. Those are the code-names for six new Embedded G-Series SoCs that will go talon to talon with Intel's Atom and other high-performance embedded processors. Although AMD is optimistically pitching the new dual- and quad-core chips for data-center switches and network-security appliances, their main markets are PC-like embedded systems: kiosks, point-of-sale terminals, thin clients, gambling machines, and medical equipment.

Such systems commonly employ x86 embedded processors derived from PC processors and usually run Windows or Linux. Unlike previous Embedded G-Series (Kabini) processors, the new chips use the improved Puma CPU core (sometimes called Puma+ or Jaguar+). The new chips are in production now, and their clock speeds of 1.0–2.4GHz and TDPs of 6–25W will serve a wide range of embedded applications. [June 30, 2014] • Figure 1: Block diagram of AMD's Embedded G-Series GX-424CC. • Table 1: AMD's new Embedded G-Series SoCs: the GX-424CC, GX-412HC, GX-212HC, GX-210JC, GX-420MC, and GX-412TC.

Armada 385 and 380 Begin Maiden Voyage in HKMG Technology Marvell has quietly launched its first Armada-family embedded processors fabricated in 28nm technology. Leading the flotilla is the Armada 385, a dual-core chip powered by ARM's Cortex-A9. Its escort is the Armada 380, a single-core model based on the same die. Both processors target small-business, enterprise, and carrier-class communications equipment, such as 802.11ac Wi-Fi access points and network-attached storage (NAS). The most visible result of moving to 28nm high- k metal-gate (HKMG) technology is a big jump in clock speed—at its maximum target clock frequency of 1.6GHz, the Armada 385 is 60% faster than the Armada 375. And the new chip has four times more L2 cache than its predecessor, plus upgraded interfaces for DRAM, Ethernet, PCI Express (PCIe), and Serial ATA (SATA).

[June 2, 2014] • Figure 1: Marvell Armada 385 block diagram. • Figure 2: Example Wi-Fi access point using the Armada 385 and other Marvell chips. • Table 1: Comparison of Marvell's Armada 385, 380, 375, and 370. • Table 2: Comparison of Marvell's Armada 385, Broadcom's StrataGX BCM58623, Cavium's Octeon III CN7020, and Freescale's QorIQ T1020. [Brief Item] Freescale is strengthening its home-networking lineup by acquiring the last fragment of Mindspeed—the Comcerto communications processors for broadband gateways and network-attached storage (NAS). If the deal closes this quarter as expected, Freescale will merge the ARM-based Comcerto line with its QorIQ family of Power Architecture and ARM processors and will continue their development. Mindspeed, which originated as a Conexant spinoff in 2004, has now been chopped into three pieces.

In December, Macom paid $272 million for most of the company, and Intel paid $12 million for the Transcede base-station processors and related intellectual property. Macom is retaining Mindspeed's extensive analog portfolio and Comcerto voice-over-IP (VoIP) processors but is selling the Comcerto gateway processors to Freescale for an undisclosed sum.

[May 5, 2014] New T4080 and T1-Series Processors Extend 28nm Power Lineup After a yearlong drought of QorIQ T-series announcements, Freescale unveiled five new processors in that Power Architecture family at its U.S. Technology forum this month.

The T4080 is a quad-core eight-thread processor optimized for midrange communications infrastructure, and the others are single- and dual-core processors optimized for low-end communications and general embedded applications. These eagerly awaited 28nm chips fill several gaps in the QorIQ T-series, finally superseding some popular but aging P-series processors manufactured in 45nm technology. [April 21, 2014] • Figure 1: Summary of Freescale's QorIQ T-series processors. • Figure 2: QorIQ T4080 block diagram.

• Figure 3: QorIQ T1024 block diagram. • Table 1: Comparison of the new QorIQ T4080 with the existing T2080 and T1040.

• Table 2: Freescale's QorIQ T4080 versus Broadcom's XLP516 (XLP II) and Cavium's Octeon III CN7130. • Table 3: Comparison of the new QorIQ T1013, T1023, T1014, and T1024. • Table 4: Freescale's new QorIQ T1023 versus the ARM-based QorIQ LS1020A, Broadcom's StrataGX BCM58525, and Cavium's Octeon III CN7020.

Newest XLP II Processors Offer 4, 6, or 8 CPUs and 40G Ethernet With low- and high-end XLP II chips approaching volume production, Broadcom is now sampling the midrange members of this MIPS-compatible embedded-processor family. The new XLP500 series comprises three basic models with four, six, or eight CPU cores and two package options, for a total of six distinct products. They will consume about the same power as the previous-generation XLP300 chips but deliver more than twice the performance—mainly by quadrupling the speed of the network interfaces, doubling the number of CPUs, and boosting the clock frequency.

The faster network interfaces launch these midrange products into the same stratosphere as previous-generation high-end processors. The XLP500 line supports two 40 Gigabit Ethernet ports, or up to eight 10GbE or nine GbE ports. Previously, only high-end communications processors supported 40GbE interfaces. Doubling the number of CPU cores and threads will help these muscular devices handle the faster packet flows.

Assisting the CPUs are hardware accelerators for bulk cryptography, RSA cryptography, RAID storage, data compression, and deep packet inspection. [March 31, 2014] • Figure 1: Block diagram of Broadcom XLP532. • Figure 2: An XLP516 wireless base station. • Table 1: Broadcom XLP516, XLP524, and XLP532 processors. • Table 2: Broadcom’s XLP532 versus Freescale’s QorIQ T4160. Small-Cell Base-Station Processor Adds DFE and JESD Interfaces Freescale's newest QorIQ Qonverge wireless base-station processor is the company's most integrated small-cell chip.

By adding a digital front end (DFE), the B3421 eliminates the external FPGA or ASIC usually required to handle digital up/down-conversion and related functions. New JESD204 and JESD207 antenna interfaces enable glueless connections to the base station's analog section.

Yet the processor retains the usual CPRI interfaces, giving customers the flexibility to use their own DFE and a remote radio head. Another new addition is a Serial ATA (SATA) interface.

SATA enables local content caching on disk drives inside the base station—a feature particularly suited to the metrocells and microcells for which this processor is designed. [March 10, 2014] • Figure 1: Freescale QorIQ Qonverge B3421 block diagram. • Figure 2: Small-cell LTE base station.

• Figure 3: Freescale's QorIQ Qonverge family. • Table 1: Key parameters for Qonverge B3421, B4420, and B4860 processors. • Table 2: Freescale's Qonverge B3421 versus two competitors: Qualcomm's FSM9900 and Texas Instruments' KeyStone II TCI6630K2L. Microchip's PIC32MZ Microcontrollers Set CoreMark Record Microchip's newest MIPS-based 32-bit microcontrollers not only match the features of their Cortex-M4 competitors but also achieve higher EEMBC CoreMark scores. The new PIC32MZ-EC family is powered by a MIPS microAptiv CPU core running at 200MHz—a speed demon by MCU standards.

These MCUs have more memory than comparable chips (up to 2MB of flash and 512KB of SRAM) plus Ethernet, Hi-Speed USB2.0, an LCD interface, and a cryptography accelerator. An early sample scored 654 CoreMarks—the highest EEMBC-certified score for any 32-bit MCU executing from internal flash memory. Microchip designed the PIC32MZ family for high-end controller applications, such as vehicle dashboard systems, building environmental controls, and consumer-appliance control modules.

[February 17, 2014] • Figure 1: Microchip PIC32MZ-EC block diagram. • Table 1: Microchip's PIC32MZ-EC family. • Table 2: Microchip's PIC32MZ versus three competitors: Freescale's Kinetis K70, NXP's LPC43x, and Texas Instruments' TM4C129x. Recognizing the Best Processors of the Past Year By The Linley Group To recognize the top processor offerings of the year, The Linley Group presents its 2013 Analysts' Choice Awards. To choose each winner, The Linley Group’s team of technology analysts gathered to discuss the merits of the leading products that entered production (or, in the case of intellectual property, production RTL) in 2013. This guideline eliminates 'paper' products and allows us to evaluate delivered capabilities, not promises. Our analyst team is deeply familiar with all the leading processor products, having written about them for Microprocessor Report over the course of the past year.

We selected the winners on the basis of their performance, power, features, and cost for their target applications.