From WikiChip
Difference between revisions of "movidius/microarchitectures/shave v2.0"
< movidius

(Core: sparse)
(Sparse Data Acceleration)
Line 93: Line 93:
  
 
=== Sparse Data Acceleration ===
 
=== Sparse Data Acceleration ===
 +
[[File:movidius shave v2.0 sparse data example.png|right|400px]]
 
The SHAVE cores support sparse data operations with the [[load-store unit]] using eight 4-bit fields to generate the address.
 
The SHAVE cores support sparse data operations with the [[load-store unit]] using eight 4-bit fields to generate the address.
  

Revision as of 22:45, 11 March 2018

Edit Values
SHAVE v2.0 µarch
General Info
Arch TypeAccelerator
DesignerMovidius
ManufacturerTSMC
Introduction2011
Pipeline
TypeVLIW

Streaming Hybrid Architecture Vector Engine v2.0 (SHAVE v2.0) is an accelerator microarchitecture designed by Movidius for their vision processors. SHAVE-based products are branded as the Myriad family of vision processors.

History

The original SHAVE architecture was designed primarily for the acceleration of game physics. Low demand for expensive physics acceleration in smartphones has forced to re-focused on image and vision processing. Their architecture was versatile enough that it allowed for fairly simple modification to target machine vision processing.

Process Technology

Main article: 65 nm lithography process

This microarchitecture was designed for TSMC's 65 nm process.

Architecture

Instruction Set

SHAVE supports a mixture of many different types of instructions belonging to a number of different classes of architectures.

  • RISC style
    • Instruction predication
    • Large set of integer operations
  • VLIW style
    • Parallel functional units controlled by VLIW instructions
    • 8/16/32-bit x 1-4 SIMD int
  • DSP style
    • Zero overhead looping
    • Modulo addressing
    • Transparent DMA modes
    • FFT, Viterbi, etc..
    • Parallel comparisons
  • GPU style
    • Streaming operations
    • 16/32-bit FP operations
    • Texture management unit

Block Diagram

Entire SoC

shave v2 soc block.svg

Individual Core

shave v2 block diagram.svg

Overview

shave v2 overview.svg

Architecturally, SHAVE is organized similar to IBM's CELL architecture. There are independent SHAVE cores, with up to eight in this generation may be chained together. Cores benefit from zero penalty from their two neighbors closest, an intrinsic property of architecture that inherently benefits most code. The chip features an L2 cache that is shared by all the cores as well as an integrated DDR2 memory controller that is connected to an on-package KGD stacked die ranging from 8 to 64 MiB of SDRAM.

A large set of peripherals are attached to the parameter of the chip which communicate with the cores via the AXI bus. Those peripherals include support for two high-resolution cameras (up to 12 megapixel) at high rate and high-resolution LCD controllers. The various peripherals can be software-multiplexed via the limited number of I/O pins in the package. The overall management controller core is a synthesizable SPARC V8 LEON3.

Core

Bandwidth

Each of the registers files in the core incorporate a large number of ports. That, along with wide buses allow for very high sustainable throughput to be achieved at very low clock frequencies (180 MHz).


shave v2 block diagram bandwidth.svg


Bandwidth
Parameter VRF SRF IRF LSU IDC L1 ISB L2 SDRAM
Clk 180 MHz
Bytes 16 4 4 8 16 8 16 16 4
Ports 12 12 17 2 1 1 2 1 2
Bandwidth 34.56 8.64 12.24 2.88 2.88 1.44 5.76 2.88 1.44
SHAVE cores 8  
Total BW 276.48 69.12 97.92 23.04 23.04 11.52 46.08
Total 547.2 2.88 1.44

Sparse Data Acceleration

movidius shave v2.0 sparse data example.png

The SHAVE cores support sparse data operations with the load-store unit using eight 4-bit fields to generate the address.

movidius shave v2.0 sparse data support.png

Performance claims

Movidius reported very high performance numbers for their chip. Fabricated on a 65 nm process and operating at 180 MHz and consuming 300 milliwatt, the full chip is capable of doing 300 GOPS or just over 1 TOPS per watt for 8-bit arithmetic for their integer arithmetic operations and 60 GOPS for 8-bit vector operations.

shave v2 performance claims.png

Package

myriad 1 bga.png

Movidius packaged those chips in an 8x8 mm BGA package with 225 balls. The die is then bumpped on top of a custom FR-4 substrate. The SDRAM is then wire bond on top of the Myriad die.

myriad package diagram.svg

Floorplan

The full chip consists of just two macros - the CMX block and the SHAVE core block. The whole die was place and routed with those two marcros with the rest routed flat at the end.

shave v2 floorplan.svg

Myriad Die


myriad 1 (shave v2.0) die shot.png


myriad 1 (shave v2.0) die shot (annotated).png
codenameSHAVE v2.0 +
designerMovidius +
first launched2011 +
full page namemovidius/microarchitectures/shave v2.0 +
instance ofmicroarchitecture +
manufacturerTSMC +
nameSHAVE v2.0 +