(→Architecture) |
|||
Line 8: | Line 8: | ||
|type=VLIW | |type=VLIW | ||
}} | }} | ||
− | '''Streaming Hybrid Architecture Vector Engine v2.0''' ('''SHAVE v2.0''') is an accelerator microarchitecture designed by [[Movidius]] for their vision processors. SHAVE | + | '''Streaming Hybrid Architecture Vector Engine v2.0''' ('''SHAVE v2.0''') is an accelerator microarchitecture designed by [[Movidius]] for their vision processors. SHAVE-based products are branded as the {{movidius|Myriad}} family of vision processors. |
== History == | == History == | ||
Line 55: | Line 55: | ||
==== Individual Core ==== | ==== Individual Core ==== | ||
[[File:shave v2 block diagram.svg|900px]] | [[File:shave v2 block diagram.svg|900px]] | ||
+ | |||
+ | == Overview == | ||
+ | Architecturally, SHAVE is organized similar to [[IBM]]'s {{ibm|CBEA|l=arch|CELL}} architecture. There are independent SHAVE cores, with up to eight in this generation may be chained together. Cores benefit from zero penalty from their two neighbors closest, an intrinsic property of architecture that inherently benefits most code. The chip features an [[L2 cache]] that is shared by all the cores as well as an integrated [[DDR2]] [[integrated memory controller|memory controller]] that is connected to an on-package [[known good die|KGD]] [[3d ic|stacked die]] ranging from 8 to 64 [[MiB]] of [[SDRAM]]. A large set of peripherals are attached to the parameter of the chip which communicate with the cores via the [[AXI bus]]. Those peripherals include support for two high-resolution cameras (up to 12 megapixel) at high rate and high-resolution LCD controllers. The various peripherals can be software-multiplexed via the limited number of I/O pins in the package. The overall management controller core is a [[synthesizable]] [[SPARC]] [[V8]] [[LEON3]]. | ||
+ | |||
+ | == Performance claims == | ||
+ | Movidius reported very high performance numbers for their chip. Fabricated on a [[65 nm process]] and operating at 180 MHz and consuming 300 milliwatt, the full chip is capable of doing 300 GOPS or just over 1 TOPS per watt for 8-bit arithmetic for their integer arithmetic operations and 60 GOPS for 8-bit vector operations. | ||
+ | |||
+ | :[[File:shave v2 performance claims.png|600px]] | ||
+ | |||
+ | == Package == | ||
+ | Movidius packaged those chips in an 8x8 mm [[BGA]] package. |
Revision as of 19:45, 11 March 2018
Edit Values | |
SHAVE v2.0 µarch | |
General Info | |
Arch Type | Accelerator |
Designer | Movidius |
Manufacturer | TSMC |
Introduction | 2011 |
Pipeline | |
Type | VLIW |
Streaming Hybrid Architecture Vector Engine v2.0 (SHAVE v2.0) is an accelerator microarchitecture designed by Movidius for their vision processors. SHAVE-based products are branded as the Myriad family of vision processors.
Contents
History
The original SHAVE architecture was designed primarily for the acceleration of game physics. Low demand for expensive physics acceleration in smartphones has forced to re-focused on image and vision processing. Their architecture was versatile enough that it allowed for fairly simple modification to target machine vision processing.
Process Technology
- Main article: 65 nm lithography process
This microarchitecture was designed for TSMC's 65 nm process.
Architecture
- Hybrid RISC-DSP-GPU VLIW architecture
- 20 GFLOPS computational power
- 180 MHz
- At 300 mW
- Predicated execution
- Branch delay slots
- Tailored to streaming workloads
- 128-bit vector arithmetic
- 8/16/32-bit integer
- 16/32-bit floating point
- Full support for sparse data structures (matrix/array, random access)
Instruction Set
SHAVE supports a mixture of many different types of instructions belonging to a number of different classes of architectures.
- RISC style
- Instruction predication
- Large set of integer operations
- VLIW style
- Parallel functional units controlled by VLIW instructions
- 8/16/32-bit x 1-4 SIMD int
- DSP style
- Zero overhead looping
- Modulo addressing
- Transparent DMA modes
- FFT, Viterbi, etc..
- Parallel comparisons
- GPU style
- Streaming operations
- 16/32-bit FP operations
- Texture management unit
Block Diagram
Entire SoC
Individual Core
Overview
Architecturally, SHAVE is organized similar to IBM's CELL architecture. There are independent SHAVE cores, with up to eight in this generation may be chained together. Cores benefit from zero penalty from their two neighbors closest, an intrinsic property of architecture that inherently benefits most code. The chip features an L2 cache that is shared by all the cores as well as an integrated DDR2 memory controller that is connected to an on-package KGD stacked die ranging from 8 to 64 MiB of SDRAM. A large set of peripherals are attached to the parameter of the chip which communicate with the cores via the AXI bus. Those peripherals include support for two high-resolution cameras (up to 12 megapixel) at high rate and high-resolution LCD controllers. The various peripherals can be software-multiplexed via the limited number of I/O pins in the package. The overall management controller core is a synthesizable SPARC V8 LEON3.
Performance claims
Movidius reported very high performance numbers for their chip. Fabricated on a 65 nm process and operating at 180 MHz and consuming 300 milliwatt, the full chip is capable of doing 300 GOPS or just over 1 TOPS per watt for 8-bit arithmetic for their integer arithmetic operations and 60 GOPS for 8-bit vector operations.
Package
Movidius packaged those chips in an 8x8 mm BGA package.
codename | SHAVE v2.0 + |
designer | Movidius + |
first launched | 2011 + |
full page name | movidius/microarchitectures/shave v2.0 + |
instance of | microarchitecture + |
manufacturer | TSMC + |
name | SHAVE v2.0 + |