From WikiChip
cnMIPS - Microarchitectures - Cavium
< cavium
Revision as of 01:39, 7 December 2016 by ChipIt (talk | contribs)

Edit Values
cnMIPS µarch
General Info
ERROR: "atype" is missing!

cnMIPS or cnMIPS64 is the first microarchitecture implementing the MIPS64 ISA designed by Cavium for their Octeon family of processors. The "cn" stands for "Cavium Networks" or "content networking".

History

The cnMIPS was Cavium's first microarchitecture developed completely in-house from the ground up. The cnMIPS is also the first implementation of the MIPS64 Release 2 ISA. The design was done by a group of 35 engineers who previously worked on DEC's EV7 based in Marlboro, Massachusetts under the lead of Cavium's CTO Richard Kessler (who was previously the chief architect of the EV7). The fully custom final design proved to be around three to five times faster than the synthesized MIPS64 core.

Codenames

Family Description
Octeon Enterprise network services processor (NSP)

Process Technology

Chips were fabricated by TSMC on theri 0.13-micron CMOS process.

Architecture

cnMIPS is the first implementation of the MIPS64 Release 2 ISA. Cavium's design goals were a target clock speed of around 600 MHz. Being primarily designed for network applications, the FPU was omitted as complex floating point operations are uncommon.

cnMIPS includes a number of additional functional units:

  • SPI-4.2 interface
  • Gigabit Ethernet MAC
  • PLL
  • 64-bit 133 MHz PCI-X host/slave controller

There is no HyperTransport interface implemented. Also note the cnMIPS was planned to have a PCI Express interface but was eventually left out due to delays. Cavium did offer a number of bridge chips as a stopgap solution when desired.

Special Functional Units

Designed for network-heavy operations, cnMIPS implements an array of security and compression units.

Memory Hierarchy

  • Cache
    • L1I Cache:
      • 32 KiB 64-way set associative
      • per core
    • L1D Cache:
      • 8 KiB 8-way set associative
      • Write-through policy
      • per core
    • L2 Cache:
      • 1 MiB 8-way set associative
      • 128 B line locking and partitioning
      • shared by all cores
  • TLB
    • 32-entry
  • MMU
    • 2 KiB write buffer

Note that the L2$ is shared across all the cores over a coherent bus operating at the core's native clock frequency of 600 MHz for a theoretical bandwidth of 230 Gb/s. There is a 12-cycle latency between the L1 and L2 caches. The L2 is connected directly to an on-chip SDRAM controller with support for up to 16 GiB of single-channel 64-bit (and 128-bit for 8- and 16-core models) DDR1/DDR2 up to 400 MHz for 5.96 GiB/s (6.4 GB/s) and 11.92 GiB/s (12.8 GB/s) for the 8- and 16- cores.