From WikiChip
Editing movidius/microarchitectures/shave v2.0

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 6: Line 6:
 
|manufacturer=TSMC
 
|manufacturer=TSMC
 
|introduction=2011
 
|introduction=2011
|phase-out=2014
+
|type=VLIW
|process=65 nm
 
|type=VLLIW
 
|isa=SHAVE
 
|isa 2=SPARC v8
 
|l1=1 KiB
 
|l1 per=core
 
|l2=128 KiB
 
|l2 per=chip
 
|l2 desc=2-way set associative
 
|side cache=8-64 MiB SDRAM
 
|side cache per=chip
 
|predecessor=SHAVE v1.0
 
|predecessor link=movidius/microarchitectures/shave_v1.0
 
|successor=SHAVE v3.0
 
|successor link=movidius/microarchitectures/shave_v3.0
 
 
}}
 
}}
 
'''Streaming Hybrid Architecture Vector Engine v2.0''' ('''SHAVE v2.0''') is an accelerator microarchitecture designed by [[Movidius]] for their vision processors. SHAVE-based products are branded as the {{movidius|Myriad}} family of vision processors.
 
'''Streaming Hybrid Architecture Vector Engine v2.0''' ('''SHAVE v2.0''') is an accelerator microarchitecture designed by [[Movidius]] for their vision processors. SHAVE-based products are branded as the {{movidius|Myriad}} family of vision processors.
  
 
== History ==
 
== History ==
The original SHAVE architecture was designed primarily for the [[hardware acceleration|acceleration]] of game physics. Low demand for expensive physics acceleration in smartphones has forced a re-focus on image and vision processing. Their architecture was versatile enough that it allowed for fairly simple modification to target machine vision processing.
+
The original SHAVE architecture was designed primarily for the [[hardware acceleration|acceleration]] of game physics. Low demand for expensive physics acceleration in smartphones has forced to re-focused on image and vision processing. Their architecture was versatile enough that it allowed for fairly simple modification to target machine vision processing.
  
 
== Process Technology ==
 
== Process Technology ==
Line 78: Line 63:
  
 
== Core ==
 
== Core ==
The underlying concept behind the SHAVE core is extracting significant [[data-level parallelism]]. Each SHAVE core is a [[variable-length very long instruction word]] (VLLIW) processor that can operates on native 32-bit [[integer]] and 128-bit [[vector]] values. Each core incorporates a number of different types of [[register files]] as well as a large set of execution units that make use of the multiple register files with a large number of ports in order to operate on a relatively big set of values at once.
 
 
Every SHAVE core resides on an independent power island. Although we were not able to verify this, presumably you can enable and disable cores based on your power requirements through software.
 
 
=== Register Files ===
 
There are three register files: Integer Register File (IRF), Scalar Register File (SRF), Vector Register File (VRF).
 
 
The SHAVE's integer register file (IRF) has 17 ports in order enable a large set of parallel data operations each cycle. The IRF has 32 entries of 32-bit integers which are predominantly operated on by the IAU and the SAU. In addition to the IRF is the scalar register file (SRF) which has another set of 32 entries of 32-bit integers. The reason for the second register is to allow the SHAVE to do various three-operand operations and generate predicates for the execution units.
 
 
The SHAVE core also has a dedicated vector register file (VRF) which contains 32 entries, each 128-bit wide.
 
 
=== Execution Units ===
 
[[File:shave v2 eus.png|right|400px]]
 
The three major arithmetic execution units are the vector arithmetic unit (VAU), the scalar arithmetic unit (SAU), and the integer arithmetic unit (IAU). Additionally, every SHAVE core also incorporates a its own [[texture mapping unit]] (TMU), allowing for powerful bitmap transformations.
 
 
The '''integer arithmetic unit''' ('''IAU''') performs all arithmetic instructions that operate on 32-bit integer numbers and access the IRF. The '''scalar arithmetic unit''' ('''SAU''') is far more versatile and can perform all [[integer]] (8-32 bit) and [[floating point]] (HP/FP) operations. The '''vector arithmetic unit''' ('''VAU''') supports 128-bit vector operations of all the integer (8-32 bit) and floating point (HP/FP) types. Since those operations can be done at in the same cycle, it's possible to perform combination of the two in order to do various DSP-like operations such as matrix switching.
 
 
In addition to those units, there is also a '''compare-move unit''' ('''CMU''') which is used to generate [[predicates]]. This unit can do things such as three comparison operations per 16-bit/32-bit/8bit entry in the vector register file in parallel with the vector operations which can generate predicates for predicated execution. The CMU effectively interfaces with all the register files and is capable of moving data between them.
 
 
The '''branch and repeat unit''' ('''BRU''') is another execution unit with the ability to do continuous iterations with zero overhead. This applies to single or multiple VLIW words and can be used for things usch as parameterizable [[finite impulse response]] (FIR) filters, allowing for significant instruction fetch bandwidth reduction.
 
 
Both the CMU and the BRU can work in conjunction with the '''predicated execution unit''' ('''PEU''') and they work with the VAU all in the same cycle, meaning applications such as pixel decision making can be done incredibly fast in the same cycle.
 
 
=== Memory Subsystem ===
 
[[File:shave v2 mem subsys.svg|right|300px]]
 
Each SHAVE core has two [[load-store units]], each is capable of doing a single 64-bit operation per cycle from the SRAM tile.
 
 
Each SHAVE core has a local 128 KiB slice of SRAM the LSU operates on [[Movidius]] calls the  Connection MatriX (CMX) block. The cache is split between [[instruction cache|instruction]] and [[data cache|data]]. The exact amount is software configurable with a granularity of 8 KiB intervals. Each local cache tile is directly linked to two of its closest neighbors (presumably to the cores located to the west and to the east), allowing for zero-penalty accesses from those memory banks as well. For cache tiles located further on do have a slight latency penalty. Movidius noted that most of the software they've tested does almost all of its communication with its neighboring cores, allowing them to take advantage of this.
 
 
The entire chip also has a share 128 KiB of L2 cache and a integrated [[DDR2]] [[integrated memory controller|memory controller]] which is connected to an on-package stacked 8-64 MiB of [[SDRAM]].
 
  
 
=== Bandwidth ===
 
=== Bandwidth ===
Line 142: Line 97:
  
 
:[[File:movidius shave v2.0 sparse data support.png|700px]]
 
:[[File:movidius shave v2.0 sparse data support.png|700px]]
 
== Inter-core communication ==
 
This architecture does not have support for hardware synchronization primitives, however it does have a light communication system.
 
 
Each of the SHAVE cores has a small arbitration messaging [[queue]] consisting of four entries of 64-bit words which is only accessibly (read-wise) from that core. Each SHAVE core can send a message to any other core. When this happens the message is pushed onto that core's queue. If that core's queue is full, the sender will stall. This allows for very efficient SHAVE-SHAVE synchronization and communication without using any shared memory techniques.
 
 
It's worth noting that the control/management [[LEON3]] [[SPARC]] core does not have access to this communication system.
 
  
 
== Performance claims ==
 
== Performance claims ==
Line 157: Line 105:
 
== Package ==
 
== Package ==
 
[[File:myriad 1 bga.png|right|200px]]
 
[[File:myriad 1 bga.png|right|200px]]
Movidius packaged those chips in an 8x8 mm [[BGA]] package with 225 [[solder ball|balls]]. The die is then bumped on top of a custom [[FR-4 substrate]]. The SDRAM is then [[wire bond|wire bond]] on top of the Myriad die.
+
Movidius packaged those chips in an 8x8 mm [[BGA]] package with 225 [[solder ball|balls]]. The die is then bumpped on top of a custom [[FR-4 substrate]]. The SDRAM is then [[wire bond|wire bond]] on top of the Myriad die.
  
 
:[[File:myriad package diagram.svg|600px]]
 
:[[File:myriad package diagram.svg|600px]]
Line 171: Line 119:
  
  
:[[File:myriad 1 (shave v2.0) die shot.png|class=wikichip_ogimage|600px]]
+
:[[File:myriad 1 (shave v2.0) die shot.png|600px]]
  
  
 
:[[File:myriad 1 (shave v2.0) die shot (annotated).png|600px]]
 
:[[File:myriad 1 (shave v2.0) die shot (annotated).png|600px]]
 
== All SHAVE v2.0 Chips ==
 
<!-- NOTE:
 
          This table is generated automatically from the data in the actual articles.
 
          If a microprocessor is missing from the list, an appropriate article for it needs to be
 
          created and tagged accordingly.
 
 
          Missing a chip? please dump its name here: https://en.wikichip.org/wiki/WikiChip:wanted_chips
 
-->
 
{{comp table start}}
 
<table class="comptable sortable tc4">
 
{{comp table header|main|5:List of SHAVE v2.0-based Processors}}
 
{{comp table header|main|5:Main processor}}
 
{{comp table header|cols|Price|Launched|Cores|%Frequency}}
 
{{#ask: [[Category:microprocessor models by movidius]] [[microarchitecture::SHAVE v2.0]]
 
|?full page name
 
|?model number
 
|?release price
 
|?first launched
 
|?core count
 
|?base frequency#MHz
 
|format=template
 
|template=proc table 3
 
|userparam=6
 
|mainlabel=-
 
}}
 
{{comp table count|ask=[[Category:microprocessor models by movidius]] [[microarchitecture::SHAVE v2.0]]}}
 
</table>
 
{{comp table end}}
 
 
== References ==
 
* Some information was obtained directly from Movidius
 
* HotChips 23 (HC23), 2011
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
codenameSHAVE v2.0 +
designerMovidius +
first launched2011 +
full page namemovidius/microarchitectures/shave v2.0 +
instance ofmicroarchitecture +
instruction set architectureSHAVE + and SPARC v8 +
manufacturerTSMC +
nameSHAVE v2.0 +
phase-out2014 +
process65 nm (0.065 μm, 6.5e-5 mm) +