(FLOPS is a unit that already includes the per second suffix no need to duplicate it) |
|||
(13 intermediate revisions by 4 users not shown) | |||
Line 6: | Line 6: | ||
|designer=Google | |designer=Google | ||
|manufacturer=TSMC | |manufacturer=TSMC | ||
+ | |part number=X726C502 | ||
+ | |s-spec=SR3HX | ||
|market=Mobile | |market=Mobile | ||
|market 2=Embedded | |market 2=Embedded | ||
|first announced=October 17, 2017 | |first announced=October 17, 2017 | ||
|first launched=October 17, 2017 | |first launched=October 17, 2017 | ||
+ | |frequency=800 MHz | ||
|isa=vISA | |isa=vISA | ||
+ | |isa 2=pISA | ||
|process=28 nm | |process=28 nm | ||
|technology=CMOS | |technology=CMOS | ||
+ | |tdp=8 W | ||
}} | }} | ||
'''Pixel Visual Core''' ('''PVC''') is an advanced [[image processing unit]] custom designed by [[Google]] introduced in late [[2017]] for their [[wikipedia:Pixel 2|Pixel 2]] smartphone and future [[IoT]] applications. Designed by Google and fabricated by TSMC on their [[28 nm process|28HPM process]], the IPU is a fully-programmable [[domain-specific processor]] designed from the ground-up in order to deliver the highest performance at low power. | '''Pixel Visual Core''' ('''PVC''') is an advanced [[image processing unit]] custom designed by [[Google]] introduced in late [[2017]] for their [[wikipedia:Pixel 2|Pixel 2]] smartphone and future [[IoT]] applications. Designed by Google and fabricated by TSMC on their [[28 nm process|28HPM process]], the IPU is a fully-programmable [[domain-specific processor]] designed from the ground-up in order to deliver the highest performance at low power. | ||
+ | |||
+ | == Overview == | ||
+ | The pixel visual core is designed as a co-processor for various consumer products. Although it's currently only used in the Pixel 2 and Pixel 3 smartphones, Google have plans to use it in other IoT products in the future. The chip itself incorporates a dedicate [[ARM Holdings|ARM]] {{armh|Cortex-A53|l=arch}} core which handles the application-level resource requests and configures the core to handle the specific workload. For example, if the application sends a request to capture an image using HDR+, the management core will reconfigure the processing units such that an image captured by the camera will get processed and transformed into HDR+ format. The PVC is optimized for high performance by [[racing to sleep]] with a power budget of 6-8 W for very short bursts for around 10-20 seconds an dropping back down to milliwatt when idle. The chip relies equally on both hardware and software in order to achieve the high performance and efficiency by using TensorFlow for machine learning and Halide for image processing. | ||
+ | |||
+ | === Architecture === | ||
+ | The chip incorporates eight [[image processing units]] (IPUs) custom cores, each comprise 512 [[arithmetic logic units]] consisting of 256 processing elements (PEs) arranged as a 16 x 16 2-dimensional array. Those cores execute a custom [[VLIW]] ISA designed to expose maximum instruction-level and multiple program data parallelism. Though the chip supports 32-bit integers, the native operations are done on a much simpler logic that operates on 8-bit and 16-bit integers, thus larger data sizes will operate at half throughput. The basic primitive of the stencil operations is the [[multiply-accumulate]] which can accumulate 32 bits and multiply 16 bits. | ||
+ | |||
+ | There are two 16-bit ALUs per processing element and they can operate in three distinct ways: independent, joined, and fused. In the most common case, independent, the two ALUs can operate independently on two pairs of different operates (i.e., A1 op B1 and A2 op B2) while in the joined mode, the two ALUs act as a single big ALU producing 32-bit values. In the fused mode, the two ALUs are combined to form a fused 16-bit operation (i.e., A op [B op C]). | ||
+ | |||
+ | Because the [[MACs]] are not [[pipelined]], they set the clock cycle. At 800 MHz, the chip is capable of 4,096 [[FLOPs]]/cycle (2*16*16*8) or 3.28 TeraFLOPS of raw compute power. | ||
+ | |||
+ | == ISA == | ||
+ | The exposed vISA is deployed as pISA to the individual cores. The pISA is a 119-bit [[VLIW]]. | ||
+ | |||
+ | <table class="wikitable"> | ||
+ | <tr><th>Field</th><td>Scalar</td><td>Math</td><td>Memory</td><td>Imm</td><td>MemImm</td></tr> | ||
+ | <tr><th>Bits</th><td>43</td><td>38</td><td>12</td><td>16</td><td>10</td></tr> | ||
+ | </table> | ||
+ | |||
+ | == Die == | ||
+ | === Floorplan === | ||
+ | :[[File:google pvc floorplan.png|450px]] | ||
+ | === Die === | ||
+ | * TSMC 28nm 28HPM process | ||
+ | :[[File:google pvc die.png|class=wikichip_ogimage|450px]] | ||
+ | |||
+ | == References == | ||
+ | * ''Some information was obtained directly from Google'' | ||
+ | * IEEE ISSCC 2018 | ||
+ | * Ofer Shacham, "[https://blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/ Pixel Visual Core: image processing and machine learning on Pixel 2]", Oct 17, 2017. | ||
+ | * Matt Cockrell, "Use of RISC-V on Pixel Visual Core", RISC-V Workshop Barcelona, May 8, 2018 | ||
+ | * John L. Hennessy, David A. Patterson, "Computer Architecture: A Quantitative Approach" |
Latest revision as of 09:03, 19 April 2019
Edit Values | |
Pixel Visual Core | |
General Info | |
Designer | |
Manufacturer | TSMC |
Part Number | X726C502 |
S-Spec | SR3HX |
Market | Mobile, Embedded |
Introduction | October 17, 2017 (announced) October 17, 2017 (launched) |
General Specs | |
Frequency | 800 MHz |
Microarchitecture | |
ISA | vISA, pISA |
Process | 28 nm |
Technology | CMOS |
Electrical | |
TDP | 8 W |
Pixel Visual Core (PVC) is an advanced image processing unit custom designed by Google introduced in late 2017 for their Pixel 2 smartphone and future IoT applications. Designed by Google and fabricated by TSMC on their 28HPM process, the IPU is a fully-programmable domain-specific processor designed from the ground-up in order to deliver the highest performance at low power.
Overview[edit]
The pixel visual core is designed as a co-processor for various consumer products. Although it's currently only used in the Pixel 2 and Pixel 3 smartphones, Google have plans to use it in other IoT products in the future. The chip itself incorporates a dedicate ARM Cortex-A53 core which handles the application-level resource requests and configures the core to handle the specific workload. For example, if the application sends a request to capture an image using HDR+, the management core will reconfigure the processing units such that an image captured by the camera will get processed and transformed into HDR+ format. The PVC is optimized for high performance by racing to sleep with a power budget of 6-8 W for very short bursts for around 10-20 seconds an dropping back down to milliwatt when idle. The chip relies equally on both hardware and software in order to achieve the high performance and efficiency by using TensorFlow for machine learning and Halide for image processing.
Architecture[edit]
The chip incorporates eight image processing units (IPUs) custom cores, each comprise 512 arithmetic logic units consisting of 256 processing elements (PEs) arranged as a 16 x 16 2-dimensional array. Those cores execute a custom VLIW ISA designed to expose maximum instruction-level and multiple program data parallelism. Though the chip supports 32-bit integers, the native operations are done on a much simpler logic that operates on 8-bit and 16-bit integers, thus larger data sizes will operate at half throughput. The basic primitive of the stencil operations is the multiply-accumulate which can accumulate 32 bits and multiply 16 bits.
There are two 16-bit ALUs per processing element and they can operate in three distinct ways: independent, joined, and fused. In the most common case, independent, the two ALUs can operate independently on two pairs of different operates (i.e., A1 op B1 and A2 op B2) while in the joined mode, the two ALUs act as a single big ALU producing 32-bit values. In the fused mode, the two ALUs are combined to form a fused 16-bit operation (i.e., A op [B op C]).
Because the MACs are not pipelined, they set the clock cycle. At 800 MHz, the chip is capable of 4,096 FLOPs/cycle (2*16*16*8) or 3.28 TeraFLOPS of raw compute power.
ISA[edit]
The exposed vISA is deployed as pISA to the individual cores. The pISA is a 119-bit VLIW.
Field | Scalar | Math | Memory | Imm | MemImm |
---|---|---|---|---|---|
Bits | 43 | 38 | 12 | 16 | 10 |
Die[edit]
Floorplan[edit]
Die[edit]
- TSMC 28nm 28HPM process
References[edit]
- Some information was obtained directly from Google
- IEEE ISSCC 2018
- Ofer Shacham, "Pixel Visual Core: image processing and machine learning on Pixel 2", Oct 17, 2017.
- Matt Cockrell, "Use of RISC-V on Pixel Visual Core", RISC-V Workshop Barcelona, May 8, 2018
- John L. Hennessy, David A. Patterson, "Computer Architecture: A Quantitative Approach"
designer | Google + |
first announced | October 17, 2017 + |
first launched | October 17, 2017 + |
full page name | google/pixel visual core + |
isa | vISA + |
ldate | October 17, 2017 + |
manufacturer | TSMC + |
market segment | Mobile + and Embedded + |
name | Pixel Visual Core + |
part number | X726C502 + |
process | 28 nm (0.028 μm, 2.8e-5 mm) + |
s-spec | SR3HX + |
technology | CMOS + |