(→Die) |
|||
Line 44: | Line 44: | ||
}} | }} | ||
'''Falkor''' is an [[ARM]] microarchitecture designed by [[Qualcomm]] for the server market. Falkor-based microprocessors are manufactured on a [[10 nm process]] and sold under the {{qualcomm|Centriq}} brand. | '''Falkor''' is an [[ARM]] microarchitecture designed by [[Qualcomm]] for the server market. Falkor-based microprocessors are manufactured on a [[10 nm process]] and sold under the {{qualcomm|Centriq}} brand. | ||
+ | |||
+ | == Codenames == | ||
+ | {| class="wikitable" | ||
+ | ! Core || Platform !! Family | ||
+ | |- | ||
+ | | Falkor || Amberwing || {{qualcomm|Centriq}} | ||
+ | |} | ||
== Process Technology == | == Process Technology == |
Revision as of 21:44, 8 May 2018
Edit Values | |
Falkor µarch | |
General Info | |
Arch Type | CPU |
Designer | Qualcomm |
Manufacturer | Samsung |
Introduction | November 8, 2017 |
Process | 10 nm |
Core Configs | 40, 46, 48 |
Pipeline | |
Type | Superscalar, Superpipeline |
OoOE | Yes |
Speculative | Yes |
Reg Renaming | Yes |
Stages | 10-15 |
Decode | 4-way |
Instructions | |
ISA | ARMv8 |
Extensions | Hypervisor (EL2), TrustZone (EL3), NEON, CRC32, Crypto, FP, RDM |
Cache | |
L1I Cache | 64 KiB/core 8-way set associative |
L1D Cache | 32 KiB/core 8-way set associative |
L2 Cache | 512 KiB/duplex 8-way set associative |
L3 Cache | 5 MiB/block 20-way set associative |
Succession | |
Falkor is an ARM microarchitecture designed by Qualcomm for the server market. Falkor-based microprocessors are manufactured on a 10 nm process and sold under the Centriq brand.
Contents
Codenames
Core | Platform | Family |
---|---|---|
Falkor | Amberwing | Centriq |
Process Technology
- Further information: 10 nm lithography process
Falkor-based chips are manufactured on Samsung's 10nm 10LPE.
Architecture
Falkor is a new architecture designed by Qualcomm from the ground up for the server market. While some of the core architecture ressmbles Qualcomm's mobile cores, the overall system architecture is considerably different to anything Qualcomm has previously designed.
Overview
- Core
- 1 V nominal operating voltage
- 64-bit ARM
- AArch64 only
- Fully ARMv8-compliant
- Supports EL3 (TrustZone)
- Supports EL2 (hypervisor)
- Supports AES, SHA1, SHA2-256 optional cryptography instructions
- AArch64 only
- Out-of-order Pipeline
- 4-way Decode
- 3 instructions + 1 direct branch per cycle
- 8-way Scheduler
- 256-entry ReOrder Buffer
- 76-entry Scheduler
- 4 instructions/cycle retirement
- 16B load + 16B store per cycle
- 4-way Decode
- Core Duplex
- 2 cores in a duplex
- Shared L2 per duplex
- 2 cores in a duplex
Block Diagram
System Overview
This section is empty; you can help add the missing info by editing this page. |
Individual Core
This section is empty; you can help add the missing info by editing this page. |
Memory Hierarchy
- L0I Cache:
- 24 KiB, 3-way set associative
- 64-byte lines
- way-predicted
- parity with auto-correct
- 0 cycle penalty for L0 hit
- Exclusive of L1
- L1I Cache:
- 64 KiB, 8-way set associative
- 64-byte lines
- parity with auto-correct
- 4 cycles penalty for hit (L0 miss)
- Hardware prefetch on L1 misses
- L1D Cache:
- 32 KiB, 8-way set associative
- 64-byte lines
- Write-through, read-allocate, write-no-allocate
- Split virtual and physical tags
- parity with auto-correct
- 3 cycles minimum latency on hits
- L2 Cache:
- Per duplex (shared by both cores)
- Unified 512 KiB, 8-way set associative
- 128-byte lines, interleaved
- Inclusive of L1D$ (both cores)
- ECC, single-error correction / double-error detection (SEC/DED)
- 15 cycles minimum latency on hits
- 32 B per direction per interleave per cycle
- L3 Cache:
- Distributed in 12 blocks along the ring
- 5 MiB/block (60 MiB in total)
- 20-way set associative
- 128-byte lines, 128 B interleaved
- Non-inclusive
- Standard cache or victim mode
- ECC, single-error correction / double-error detection (SEC/DED)
- Integrated L2 Snoop Filter
- QoS
- Way-based partitioning
- Line and Way -based locking support
- System DRAM:
- 768 GiB Max
- x64 DDR4-2666 memory
- ECC, single-error correction / double-error detection (SEC/DED)
- 6 channels, interleaved
- Up to quad-rank 3DS
- 16-128 GiB/channel
- RDIMM/LRDIMM
- TLBs:
- DTLB
- 64-entry
- STLB
- 512-entry "final"
- 64-entry "non-final"
- 64-entry Stage-2
- DTLB
Overview
The chip has been designed by Qualcomm specifically for the data center. Specifically this architecture is an attempt to address high concurrency, high thread density while maintaining isolation and quality of service between the processes. The overall chip consists of 24 core duplexes incorporating 48 cores on a ring interconnect along with 6 channels of DDR4 and L3 cache distributed across the ring. The chip also integrates 32 PCIe 3.0 lanes, 8 SATA gen 3.0 lanes, and a mixture of various other I/O peripherals.
System Architecture
This section is empty; you can help add the missing info by editing this page. |
Core
This section is empty; you can help add the missing info by editing this page. |
Die
- Samsung's 10 nm process (10LPE)
- 18,000,000,000 transistors
- 398 mm² die size
Additional Shots
Additional wafer shots by Qualcomm.
All Falkor-based chips
List of Falkor-based Processors | ||||||||
---|---|---|---|---|---|---|---|---|
Main processor | ||||||||
Model | Price | Launched | Cores | Threads | L3$ | Frequency | Turbo | TDP |
2434 | $ 888.00 € 799.20 £ 719.28 ¥ 91,757.04 | 8 November 2017 | 40 | 40 | 50 MiB 51,200 KiB 52,428,800 B 0.0488 GiB | 2.3 GHz 2,300 MHz 2,300,000 kHz | 2.5 GHz 2,500 MHz 2,500,000 kHz | 110 W 110,000 mW 0.148 hp 0.11 kW |
2452 | $ 1,383.00 € 1,244.70 £ 1,120.23 ¥ 142,905.39 | 8 November 2017 | 46 | 46 | 57.5 MiB 58,880 KiB 60,293,120 B 0.0562 GiB | 2.2 GHz 2,200 MHz 2,200,000 kHz | 2.6 GHz 2,600 MHz 2,600,000 kHz | 120 W 120,000 mW 0.161 hp 0.12 kW |
2460 | $ 1,995.00 € 1,795.50 £ 1,615.95 ¥ 206,143.35 | 8 November 2017 | 48 | 48 | 60 MiB 61,440 KiB 62,914,560 B 0.0586 GiB | 2.2 GHz 2,200 MHz 2,200,000 kHz | 2.6 GHz 2,600 MHz 2,600,000 kHz | 120 W 120,000 mW 0.161 hp 0.12 kW |
Count: 3 |
References
- Thomas Speier & Barry Wolford, "Qualcomm Centriq 2400 Processor." Hot Chips 29 Symposium (HCS), 2017 IEEE. IEEE, 2017.
- Barry Wolford, "Architecting a multi-core server SoC for the cloud", 2017 Linley Processor Conference
Documents
This section is empty; you can help add the missing info by editing this page. |
codename | Falkor + |
core count | 40 +, 46 + and 48 + |
designer | Qualcomm + |
first launched | November 8, 2017 + |
full page name | qualcomm/microarchitectures/falkor + |
instance of | microarchitecture + |
instruction set architecture | ARMv8 + |
manufacturer | Samsung + |
microarchitecture type | CPU + |
name | Falkor + |
pipeline stages (max) | 15 + |
pipeline stages (min) | 10 + |
process | 10 nm (0.01 μm, 1.0e-5 mm) + |