From WikiChip
Editing tesla (car company)/fsd chip

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 24: Line 24:
 
|core count=12
 
|core count=12
 
|thread count=12
 
|thread count=12
|max memory=8 GiB
+
|tdp=72 W
|tdp=36 W
 
 
|package name 1=tesla_car,fcbga_2116
 
|package name 1=tesla_car,fcbga_2116
 
}}
 
}}
'''Full Self-Driving Chip''' ('''FSD Chip''', previously '''Autopilot Hardware 3.0''') is an [[autonomous driving chip]] designed by [[tesla car|Tesla]] and introduced in early [[2019]] for their own cars. Tesla claims the chip is aimed at [[SAE levels|autonomous levels 4 and 5]]. Fabricated on [[Samsung]]'s [[14 nm process technology]], the FSD Chip incorporates 3 quad-core {{armh|Cortex-A72|l=arch}} clusters for a total of 12 CPUs operating at 2.2 GHz, a Mali G71 MP12 GPU operating 1 GHz, 2 [[neural processing units]] operating at 2 GHz, and various other hardware accelerators. The FSD supports up to 128-bit LPDDR4-4266 memory.
+
'''Full Self-Driving Chip''' ('''FSD Chip''', previously '''Autopilot Hardware 3.0''') is an [[autonomous machine chip]] designed by [[tesla car|Tesla]] and introduced in early [[2019]] for their own cars. Fabricated on [[Samsung]]'s [[14 nm process technology]], the FSD Chip incorporates 3 quad-core {{armh|Cortex-A72|l=arch}} clusters for a total of 12 CPUs operating at 2.2 GHz, a GPU operating 1 GHz, 2 [[neural processing units]] operating at 2 GHz, and various other hardware accelerators. The FSD supports up to 128-bit LPDDR4-4266 memory.
  
 
== History ==
 
== History ==
Line 37: Line 36:
 
The full self-driving chip or FSD chip for short is [[tesla car|Tesla's]] home-grown custom designed [[autonomous driving chip]]. The chip has been in development since [[2016]] and has entered mass production in early [[2019]]. Designed as a drop-in upgrade for Tesla's existing cars, the FSD chip inherits most of the power and thermal requirements of prior solutions - including staying with the maximum power consumption of 100 W. Since the chip itself is designed specifically for Tesla's own cars and their own requirements, much of the general purpose-ability of alternative [[neural processors]] has been stripped away from the FSD chip, leaving the design with only the hardware they need.
 
The full self-driving chip or FSD chip for short is [[tesla car|Tesla's]] home-grown custom designed [[autonomous driving chip]]. The chip has been in development since [[2016]] and has entered mass production in early [[2019]]. Designed as a drop-in upgrade for Tesla's existing cars, the FSD chip inherits most of the power and thermal requirements of prior solutions - including staying with the maximum power consumption of 100 W. Since the chip itself is designed specifically for Tesla's own cars and their own requirements, much of the general purpose-ability of alternative [[neural processors]] has been stripped away from the FSD chip, leaving the design with only the hardware they need.
  
At a high level, the chip is a full [[system-on-a-chip]] capable of booting a standard operating system. It is manufactured on Samsung's [[14-nanometer process]] at their Austin, Texas fab, packing roughly six billion transistors on a 260 millimeter squared silicon die. The FSD chip meets AEC-Q100 Grade-2 automotive quality standards. The choice to go with a mature [[14 nm]] node instead of a more [[technology node|leading-edge node]] boiled down to cost and IP readiness. There are twelve {{arch|64}} [[ARM]] cores organized as three clusters of quad-core {{armh|Cortex-A72|l=arch}} cores operating at 2.2 GHz which are used for general purpose processing. There is also relatively light GPU primarily designed for light-weight post-processing. It operates at 1 GHz, capable of up to 600 GFLOPS, supporting both [[single-precision]] and [[double-precision]] floating point operations.
+
At a high level, the chip is a full system-on-a-chip capable of booting a standard operating system. It is manufactured on Samsung's [[14-nanometer process]] at their Austin, Texas fab, packing roughly six billion transistors on a 260 millimeter squared silicon die. The FSD chip meets AEC-Q100 automotive quality standards. The choice to go with a mature [[14 nm]] node instead of a more [[technology node|leading-edge node]] boiled down to cost and IP readiness. There are twelve {{arch|64}} [[ARM]] cores organized as three clusters of quad-core {{armh|Cortex-A72|l=arch}} cores operating at 2.2 GHz which are used for general purpose processing. There is also relatively light GPU primarily designed for light-weight post-processing. It operates at 1 GHz, capable of up to 600 GFLOPS, supporting both [[single-precision]] and [[double-precision]] floating point operations.
  
 
The chip features a relatively low-cost conventional memory subsystem supporting 128-bit of LPDDR4 memory operating at 2133 MHz.
 
The chip features a relatively low-cost conventional memory subsystem supporting 128-bit of LPDDR4 memory operating at 2133 MHz.
Line 70: Line 69:
  
 
=== Neural processing unit ===
 
=== Neural processing unit ===
The FSD chip integrates two custom-designed [[neural processing units]]. Each NPU packs 32 MiB of SRAM designed for storing temporary network results, reducing data movements to main memory. The overall design is pretty straightforward. Each cycle, 256 bytes of activation data and an additional 128 bytes of weight data is read from the SRAM into the MACs array where they are combined. Each NPU has a 96x96 [[multiply-accumulate]] array for a total of 9,216 MACs and 18,432 operations. For the FSD chip, Tesla uses an 8-bit by 8-bit integer multiply and a 32-bit integer addition. The choices for both data types is largely driven by their effort to reduce the power consumption (e.g., a 32-bit FP addition consumes roughly 9 times as much as a 32-bit integer addition) Operating at 2 GHz, each NPU has a peak performance of 36.86 [[trillion operations per second]] (TOPS). With two NPUs on each chip, the FSD chip is capable of up to 73.7 trillion operations per second of combined peak performance. Following the dot product operation, data is shifted to the [[activation function|activation hardware]], the pooling hardware, and finally into the write buffer which aggregates the results. The FSD supports a number of activation functions, including a [[rectified linear unit]] (ReLU),  [[Sigmoid Linear Unit]] (SiLU), and TanH. Each cycle, 128 bytes of result data is written back to the SRAM. All the operations are done simultaneously and continuously, repeating until the full network is done.
+
The FSD chip integrates two custom-designed [[neural processing units]]. Each NPU packs 32 MiB of SRAM designed for storing temporary network results, reducing data movements to main memory. The overall design is pretty straightforward. Each cycle, 256 bytes of activation data and an additional 128 bytes of weight data is read from the SRAM into the MACs array where they are combined. Each NPU has a 96x96 [[multiplay-accumulate]] array for a total of 9,216 MACs and 18,432 operations. For the FSD chip, Tesla uses an 8-bit by 8-bit integer multiply and a 32-bit integer addition. The choices for both data types is largely driven by their effort to reduce the power consumption (e.g., a 32-bit FP addition consumes roughly 9 times as much as a 32-bit integer addition) Operating at 2 GHz, each NPU has a peak performance of 36.86 [[trillion operations per second]] (TOPS). With two NPUs on each chip, the FSD chip is capable of up to 73.7 trillion operations per second of combined peak performance. Following the dot product operation, data is shifted to the [[activation function|activation hardware]], the pooling hardware, and finally into the write buffer which aggregates the results. The FSD supports a number of activation functions, including a [[rectified linear unit]] (ReLU),  [[Sigmoid Linear Unit]] (SiLU), and TanH. Each cycle, 128 bytes of result data is written back to the SRAM. All the operations are done simultaneously and continuously, repeating until the full network is done.
  
 
<div>
 
<div>
Line 86: Line 85:
  
 
* 2 x DMA operations for reading and writing to main memory
 
* 2 x DMA operations for reading and writing to main memory
* 3x dot-product instructions (convolution, deconvolution, and inner-product)
+
* 3x dot-product instructions (convolution, deconvolution, and innner-product)
 
* scale (1-input, 1-output)
 
* scale (1-input, 1-output)
 
* eltwise (2-input, 1-output)
 
* eltwise (2-input, 1-output)
Line 97: Line 96:
 
|type=LPDDR4-4266
 
|type=LPDDR4-4266
 
|ecc=Yes
 
|ecc=Yes
|max mem=8 GiB
 
 
|controllers=1
 
|controllers=1
 
|width=128 bits
 
|width=128 bits
Line 104: Line 102:
  
 
== Full self-driving computer (FSD Computer) ==
 
== Full self-driving computer (FSD Computer) ==
The FSD computer is designed to be retrofitted into existing Tesla models and is therefore largely the same in terms of form factor and I/O. The computer itself fits just behind the glove compartment of the car. The FSD Computer can be installed by a technician in the same slot as the prior Autopilot Hardware 2.5 board. The board itself incorporates two fully independent FSD chips along with their own power subsystem, DRAM, and flash memory for full redundancy. Each chip boots up from its own storage memory and runs its own independent operating system. On the right of the board (shown below) are the eight camera connectors. The power supply and controls are on the left side of the board. The board sits on two independent power supplies - one for one of the FSD chips and one for the other. Additionally, half of the cameras sit on one power supply and the other half sit on the second power supply (note that the video input itself is received by both chips). The redundancy is designed to ensure that in the case of a component such as a camera stream or power supply or some other IC on the board going bad, the full system can continue to operate normally.
+
The FSD computer is designed to be retrofitted into existing Tesla models and is therefore largely the same in terms of form factor and I/O. The computer itself fits just behind the glove compartment of the car. The FSD Computer can be installed by a technician in the same slot as the prior Autopilot Hardware 2.5 board. The board itself incorporates two fully independent FSD chips along with their own power subsystem, DRAM, and flash memory for full redundancy. Each chip boots up from its own storage memory and runs its own independent operating system. On the right of the board (shown below) are the eight camera connectors. The power supply and controls are on the left side of the board. The board sits on two independent power supplies - one for one of the FSD chips and one for the other. Additionally, half of the cameras sit on one power supply and the other half sit on the second power supply (note that the video input itself is received by both chips).  
  
  
Line 110: Line 108:
  
 
=== Operation ===
 
=== Operation ===
When powered on and engaged, sensory input is fed to the board from a variety of sources. Those include current car readings such as inertial measurement unit (IMU), radar, GPS, ultrasonic sensors, wheel ticks, steering angle, and maps data. There are 8 external vision cameras (and 1 internal camera on some vehicles) and 12 ultrasonic sensors. Data is fed to both FSD chips simultaneously for processing. The two chips independently form a future plan for the car - a detailed plan of what the car should do next. The two independently derived plans from both chips are then consequently sent to the safety system which compares them to ensure an agreement was reached. Once the two plans from both chips agree on the calculated plan, the car can proceed and act on that plan (i.e., operate the actuators). The drive commands are then validated and sensory information is used as feedback for ensuring the commands executed the desired operations. The full operation loop operates continuously at a high frame rate.
+
When powered on and engaged, sensory input is fed to the board from a veriaty of sources. Those include current car readings such as acceleration and speed, radar, GPS, IMU, ultrasonic sensors, wheel ticks, steering angle, and maps data. Data is fed to both FSD chips simultaneously for processing. The two chips independently form a future plan for the car - a detailed plan of what the car should do next. The two independently derived plans from both chips are then consequently sent to the safety system which compares them to ensure an agreement was reached. Once the two plans from both chips agree on the calculated plan, the car can proceed and act on that plan (i.e., operate the actuators). The drive commands are then validated and sensory information is used as feedback for ensuring the commands executed the desired operations.
  
 
== Power ==
 
== Power ==
Running the full software stack, the FSD computer dissipates 72 Watts. This is roughly 25% more than the 57 Watts the prior solution, HW2.5, dissipated. Off the 72 W, this includes 15 W which is dissipated by the NPUs. Compared to the HW2.5, running the exact software stack and sensors, Tesla reported a 21x improvement in frames per second.
+
Running the full software stack, the FSD chip dissipates 72 Watts. This roughly 25% more than the 57 Watts the prior solution, HW2.5, dissipated. Off the 72 W, this includes 15 W which is dissipated by the NPUs. Compared to the HW2.5, running the exact software stack and sensors, Tesla reported a 21x improvement in frame per second compared to HW2.5
 
 
{| class="wikitable"
 
|-
 
! FSD Computer !! FSD Chip
 
|-
 
| 72 W || 2x 36 W
 
|}
 
  
 
== Die ==
 
== Die ==
Line 174: Line 165:
  
 
:[[File:tesla fsd die bumps.png|800px]]
 
:[[File:tesla fsd die bumps.png|800px]]
 
== Additional images ==
 
Additional images taken by WikiChip.
 
 
<gallery widths=350px heights=300px>
 
File:fsd comp 1.jpg
 
File:fsd comp 2.jpg
 
File:fsd comp 3.jpg
 
File:fsd comp 4.jpg
 
File:fsd comp 5.jpg
 
File:fsd comp 6.jpg
 
</gallery>
 
  
 
== See also ==
 
== See also ==
Line 192: Line 171:
  
 
== Bibliography ==
 
== Bibliography ==
 +
* ''Tesla Autonomy Day'', April 22, 2019
 
* Tesla, ''personal communication'', April 25, 2019
 
* Tesla, ''personal communication'', April 25, 2019
* ''Tesla Autonomy Day'', April 22, 2019
 
* {{bib|hc|31|Tesla}}
 

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)
Facts about "FSD Chip - Tesla"
base frequency2,200 MHz (2.2 GHz, 2,200,000 kHz) +
core count12 +
core nameCortex-A72 +
core steppingB0 +
designerTesla (car company) +
die area260 mm² (0.403 in², 2.6 cm², 260,000,000 µm²) +
die length20 mm (2 cm, 0.787 in, 20,000 µm) +
die width13 mm (1.3 cm, 0.512 in, 13,000 µm) +
first announcedApril 22, 2019 +
first launchedMarch 10, 2019 +
full page nametesla (car company)/fsd chip +
has ecc memory supporttrue +
instance ofmicroprocessor +
isaARMv8.0-A +
isa familyARM +
ldateMarch 10, 2019 +
main imageFile:tesla fsd chip (front).png +
manufacturerSamsung +
market segmentAutomotive +
max memory8,192 MiB (8,388,608 KiB, 8,589,934,592 B, 8 GiB, 0.00781 TiB) +
max memory bandwidth63.58 GiB/s (65,105.92 MiB/s, 68.269 GB/s, 68,268.505 MB/s, 0.0621 TiB/s, 0.0683 TB/s) +
microarchitectureCortex-A72 +
nameFSD Chip +
packageFCBGA-2116 +
process14 nm (0.014 μm, 1.4e-5 mm) +
supported memory typeLPDDR4-4266 +
tdp36 W (36,000 mW, 0.0483 hp, 0.036 kW) +
technologyCMOS +
thread count12 +
transistor count6,000,000,000 +
word size64 bit (8 octets, 16 nibbles) +