Keraunos System Architecture
PCIe Tile Integration in Keraunos-E100 Chiplet Ecosystem
Version: 2.0
Date: March 26, 2026
Author: System Architecture Team
Executive Summary
This document describes the system-level architecture of the Keraunos-E100 chiplet ecosystem and details how the Keraunos PCIe Tile integrates into the larger Grendel multi-chiplet architecture. The Keraunos PCIe Tile serves as a critical I/O interface, enabling host connectivity and system management while interfacing with the on-chip Network-on-Chip (NOC) infrastructure.
Table of Contents
PCIe Tile Position in the System
3.4 Model Integration: Host–RC–EP–PCIe Tile Connection Diagram
3.5 VDK Integration: PCIe Tile and Synopsys PCIe Controller in the Virtualizer
3.5.6 DesignWare EP/RC Interfaces: Connect vs Stub
1. System Overview
1.1 Grendel Chiplet Ecosystem
The Grendel chiplet ecosystem is a multi-chiplet heterogeneous computing platform designed for high-performance AI/ML workloads. The ecosystem consists of:
Quasar Chiplets: Compute chiplets containing AI/ML processing cores
Mimir Chiplets: Memory chiplets with GDDR interfaces
Athena Chiplets: Specialized compute chiplets
Keraunos-E100 Chiplets: I/O interface chiplets for high-speed connectivity
1.2 Keraunos-E100 Role
Keraunos-E100 is the I/O interface chiplet family in the Grendel ecosystem, providing:
Glueless scale-out connectivity via 400G/800G Ethernet (Quasar-to-Quasar across packages)
Host connectivity via PCIe Gen5 (x16)
Die-to-die (D2D) connectivity within the package using BoW (Bridge-of-Wire) technology
System management capabilities via integrated SMC (System Management Controller)
2. Keraunos-E100 Chiplet Architecture
2.1 High-Level Block Diagram
The Keraunos-E100 chiplet contains the following major subsystems. The diagram is split into two parts for reliable rendering.
Part A — Internal subsystems and NOC:
Part B — External interfaces:
2.2 Key Subsystems
2.2.1 Chiplet Harness
SMC (System Management Controller): 4-core RISC-V processor (Rocket core) running at 800 MHz
SEP (Security Engine Processor): Handles secure boot, attestation, and access filtering
SMU (System Management Unit): Clock generation (CGM PLLs), power management, reset sequencing
2.2.2 PCIe Subsystem
PCIe Tile: Contains TLB translation engines, configuration registers, and internal fabric; it interfaces to the PCIe Controller. The tile does not implement the link layer; it receives/sends TLPs via the controller.
PCIe Controller: On the Keraunos chip this is the Synopsys PCIe Controller IP (DesignWare), configured as an Endpoint (EP). The host system uses a Root Complex (RC). The link is therefore RC (host) ↔ EP (Keraunos).
PCIe SerDes: Physical layer (PHY) for PCIe Gen5/Gen6 connectivity
2.2.3 HSIO (High-Speed I/O) Tiles
CCE (Keraunos Compute Engine): DMA engines, DMRISC cores, data forwarding logic
TT Ethernet Controller: TX/RX queue controllers, packet processing, flow control
MAC/PCS: 800G Ethernet MAC and Physical Coding Sublayer (OmegaCore IP from AlphaWave)
SRAM: 8MB high-speed SRAM for packet buffering and data staging
HSIO Fabric: AXI-based crossbar interconnect
2.2.4 NOC Infrastructure
SMN (System Management Network): Carries control, configuration, and low-bandwidth traffic
QNP Mesh (NOC-N): High-bandwidth data fabric for payload transfer (1.5 GHz @ TT corner)
D2D (Die-to-Die): 5 BoW interfaces @ 2 GHz for chiplet-to-chiplet connectivity
3. PCIe Tile Position in the System
3.1 PCIe Tile Overview
The Keraunos PCIe Tile developed in this project is a SystemC/TLM-2.0 model representing the PCIe subsystem of the Keraunos-E100 chiplet. It provides:
Host Interface: PCIe Gen5 x16 connectivity to the host CPU
Internal Routing: Bidirectional routing between PCIe, NOC-N (QNP), and SMN
Address Translation: TLB-based address mapping between PCIe address space and system address space
Configuration Interface: SMN-accessible configuration registers for TLBs, MSI relay, and error handling
3.2 Architectural Position
3.3 Key Interfaces
Interface |
Protocol |
Width |
Purpose |
|---|---|---|---|
|
TLM-2.0 Target |
64-bit |
Receives PCIe Memory Read/Write from host |
|
TLM-2.0 Initiator |
64-bit |
Forwards inbound PCIe traffic to NOC after TLB translation |
|
TLM-2.0 Initiator |
64-bit |
Forwards bypass/system traffic to SMN |
|
TLM-2.0 Target |
64-bit |
Receives outbound NOC traffic destined for PCIe |
|
TLM-2.0 Target |
64-bit |
Receives outbound SMN traffic destined for PCIe |
|
TLM-2.0 Initiator |
64-bit |
Sends outbound transactions to PCIe controller |
|
TLM-2.0 Target |
64-bit |
SMN access to PCIe Tile configuration registers |
3.4 Model Integration: Host–RC–EP–PCIe Tile Connection Diagram
This section documents how the inband (TLM) and sideband (sc_in/sc_out) ports of the PCIe Tile connect across Host → Root Complex → Endpoint → PCIe Tile for model integration (e.g. connecting the tile to a Synopsys PCIe Controller model or test harness).
End-to-end connection path:
Inband (TLM) connections — EP ↔ PCIe Tile:
Tile port |
Direction |
Width |
Connected to (EP side) |
Description |
|---|---|---|---|---|
|
Target (in) |
64-bit |
EP BusMaster (TLM master) |
Inbound TLPs from host: EP pushes Memory Read/Write to tile |
|
Initiator (out) |
64-bit |
EP AXI_Slave (TLM slave) |
Outbound TLPs to host: tile pushes Memory Read/Write/Completion to EP |
|
Target (in) |
64-bit |
NOC fabric |
Outbound path: NOC traffic destined for PCIe |
|
Initiator (out) |
64-bit |
NOC fabric |
Inbound path: tile forwards translated traffic to NOC |
|
Target (in) |
64-bit |
SMN fabric |
Outbound path: SMN traffic destined for PCIe |
|
Initiator (out) |
64-bit |
SMN fabric |
Inbound path: tile forwards system traffic to SMN |
Sideband signals — EP → PCIe Tile (sc_in to tile):
These are driven by the Synopsys PCIe Controller (EP) or by the system; the tile receives them as sc_in.
Tile port (sc_in) |
Type |
Source |
Description |
|---|---|---|---|
|
bool |
EP |
PCIe core clock from controller |
|
bool |
EP |
Controller reset (active low) |
|
bool |
EP |
CII header valid (SII / config info) |
|
sc_bv<5> |
EP |
CII header type [4:0] |
|
sc_bv<12> |
EP |
CII header address [11:0] |
|
bool |
EP |
Function Level Reset request |
|
bool |
EP |
Hot reset from link |
|
bool |
EP |
RAS error indication |
|
bool |
EP |
DMA completion notification |
|
bool |
EP |
Miscellaneous interrupt from controller |
|
bool |
System (SMC) |
SoC cold reset (active low) |
|
bool |
System (SMC) |
SoC warm reset (active low) |
|
bool |
System |
Isolation request |
|
bool |
System |
AXI clock |
Sideband signals — PCIe Tile → EP (sc_out from tile):
The tile drives these; the EP or system receives them.
Tile port (sc_out) |
Type |
Sink |
Description |
|---|---|---|---|
|
uint8_t |
EP |
PCIe bus number for app |
|
uint8_t |
EP |
PCIe device number for app |
|
bool |
EP |
Device type indicator |
|
bool |
EP |
System interrupt to controller |
|
bool |
EP |
FLR completion / request to EP |
|
bool |
EP |
Hot reset requested |
|
bool |
EP |
Configuration update indicator |
|
bool |
EP |
RAS error to controller |
|
bool |
EP |
DMA completion to controller |
|
bool |
EP |
Controller miscellaneous interrupt |
|
sc_bv<3> |
EP / system |
NOC timeout status |
Summary diagram — sideband and inband to PCIe Tile:
Integration notes:
Inband: Connect the EP’s BusMaster (TLM master) to the tile’s
pcie_controller_target— the EP delivers inbound TLPs from the host to the tile. Connect the tile’spcie_controller_initiatorto the EP’s AXI_Slave (TLM slave) — the tile sends outbound TLPs to the EP, which forwards them over the link to the RC.Sideband: Drive all tile
sc_inports from the EP model or system (clocks, resets, CII, FLR, hot_reset, ras_error, dma_completion, pcie_misc_int; plus cold_reset_n, warm_reset_n, isolate_req, axi_clk from system). Connect all tilesc_outports to the EP or system as required by the EP datasheet and platform design.
3.5 VDK Integration: PCIe Tile and Synopsys PCIe Controller in the Virtualizer
This section describes how the Keraunos PCIe Tile and the Synopsys PCIe Controller (DesignWare, as RC and EP) are connected in the Synopsys Virtualizer VDK so that the virtual platform aligns with the Keraunos system architecture. The final validated VDK uses a direct RC–EP link between two chiplet groups: Host_Chiplet (Root Complex side) and Keraunos_PCIE_Chiplet (Endpoint side).
3.5.1 VDK Topology
The VDK instantiates a Host_Chiplet (with PCIE_RC, RISC-V CPU running Linux, DRAM, UART, PLIC) and a Keraunos_PCIE_Chiplet (with PCIe_EP, PCIE_TILE, Target_Memory, and a second RISC-V CPU running bare-metal firmware).
PCIe model in VDK: Synopsys DESIGNWARE_PCIE / PCIe_2_0 is used for both the Root Complex (PCIE_RC on Host_Chiplet) and the Endpoint (PCIe_EP on Keraunos_PCIE_Chiplet). The PCIe link uses a direct peer-to-peer binding:
RC PCIMem (master) ↔ EP PCIMem_Slave (slave) — TLPs from RC to EP
RC PCIMem_Slave (slave) ↔ EP PCIMem (master) — TLPs from EP to RC
3.5.2 Alignment with Keraunos: Where the PCIe Tile Fits
In the Keraunos-E100 architecture, the host uses a Root Complex and the Keraunos chip uses a Synopsys PCIe Controller as Endpoint. The PCIe Tile sits behind the EP and provides TLB translation and routing to NOC/SMN. In the VDK:
RC: The PCIE_RC on Host_Chiplet models the host side.
EP: On the Keraunos_PCIE_Chiplet, the Synopsys PCIe EP is the PCIe controller; the Keraunos PCIe Tile (PCIE_TILE) is inserted between this EP and the rest of the chip (NOC/SMN/Target_Memory).
The topology is: Host ↔ RC ↔ [direct PCIe link] ↔ EP ↔ PCIe Tile ↔ NOC/SMN. The tile does not replace the EP; it connects to the EP’s application-side (AXI/TLM) and sideband interfaces as in Section 3.4.
3.5.3 Interface-Level Connection Diagram (VDK)
The following diagram shows the VDK topology and where the PCIe Tile and Synopsys RC/EP connect:
Inband (TLM) connections:
PCIE_RC: AXI_Slave, AXI_DBI, BusMaster bound to Host_Chiplet SharedMemoryMap (config and memory space).
PCIe_EP: AXI_DBI bound to Keraunos_PCIE_Chiplet SharedMemoryMap; PCIMem / PCIMem_Slave connected directly to PCIE_RC (TLP traffic). BusMaster connected to PCIE_TILE.pcie_controller_target (inbound TLPs to tile).
PCIe Tile connections:
EP BusMaster to the tile’s pcie_controller_target (inbound TLPs from host). The tile’s pcie_controller_initiator to EP AXI_Slave (outbound TLPs to host). EP PCIMem / PCIMem_Slave are connected directly to the RC for the PCIe link.
The tile’s noc_n_target / noc_n_initiator and smn_n_target / smn_n_initiator connect to the chiplet’s SharedMemoryMap, which decodes to Target_Memory and tile register windows.
3.5.4 Signal- and Interface-Level Mapping (EP ↔ PCIe Tile)
The Synopsys DesignWare PCIe model (PCIe_2_0) exposes the following interface groups. The mapping to the Keraunos PCIe Tile ports enables a drop-in style integration when the tile is added to the VDK.
TLM (inband) — DesignWare EP ↔ PCIe Tile:
DesignWare EP interface (VDK) |
Direction |
PCIe Tile port |
Description |
|---|---|---|---|
AXI_Slave |
Slave (in) |
pcie_controller_initiator |
Outbound TLPs: tile sends Memory Read/Write/Completion to EP; EP receives on AXI_Slave and sends over link to RC. |
AXI_DBI |
Slave (in) |
— |
DBI/config; may remain to SharedMemoryMap or be routed per platform. |
BusMaster |
Master (out) |
pcie_controller_target |
Inbound TLPs: EP delivers host Memory Read/Write to tile (EP BusMaster → tile target). |
PCIMem |
Master (out) |
— |
EP as master toward link (direct to RC). Not connected to tile. |
PCIMem_Slave |
Slave (in) |
— |
Inbound TLPs from link (RC → EP); connects directly to RC, not to tile. |
So: BusMaster (EP) → pcie_controller_target (Tile) for inbound TLPs; pcie_controller_initiator (Tile) → AXI_Slave (EP) for outbound TLPs. PCIMem/PCIMem_Slave stay on the link side (EP ↔ RC direct). AXI_DBI can remain to SharedMemoryMap.
Sideband — DesignWare EP ↔ PCIe Tile (sc_in / sc_out):
DesignWare PCIe_2_0 exposes a number of reset, clock, and sideband pins. Map them to the tile’s sc_in and sc_out as follows so that the VDK integration matches Section 3.4.
Tile sc_in (receive) |
Source (EP or system) |
DesignWare EP / system signal (typical name) |
|---|---|---|
pcie_core_clk |
EP |
cc_core_clk or equivalent core clock |
pcie_controller_reset_n |
EP |
pcie_axi_ares or combined reset_n |
pcie_cii_hv |
EP |
CII header valid |
pcie_cii_hdr_type |
EP |
CII header type [4:0] |
pcie_cii_hdr_addr |
EP |
CII header address [11:0] |
pcie_flr_request |
EP |
FLR request |
pcie_hot_reset |
EP |
Hot reset |
pcie_ras_error |
EP |
RAS error |
pcie_dma_completion |
EP |
DMA completion |
pcie_misc_int |
EP |
Miscellaneous interrupt |
cold_reset_n |
System (e.g. CustomResetController) |
SoC cold reset |
warm_reset_n |
System |
SoC warm reset |
isolate_req |
System |
Isolation request |
axi_clk |
System |
AXI clock |
Tile sc_out (drive) |
Sink (EP or system) |
DesignWare EP / system signal (typical name) |
|---|---|---|
pcie_app_bus_num |
EP |
App bus number |
pcie_app_dev_num |
EP |
App device number |
pcie_device_type |
EP |
Device type |
pcie_sys_int |
EP |
System interrupt to controller |
function_level_reset |
EP |
FLR completion |
hot_reset_requested |
EP |
Hot reset requested |
config_update |
EP |
Config update |
ras_error |
EP |
RAS error to controller |
dma_completion |
EP |
DMA completion to controller |
controller_misc_int |
EP |
Controller misc interrupt |
noc_timeout |
EP / system |
NOC timeout [2:0] |
(Exact DesignWare signal names may vary by IP version; use the EP model’s documentation or RTL interface list to align names.)
3.5.5 Connection Diagram for Easy Integration
A single diagram that ties VDK instances to tile ports and EP ports is below. Use it as a checklist when wiring the PCIe Tile into the VDK behind the Synopsys EP.
Integration checklist:
Inband: Bind EP BusMaster (inbound TLPs to device) to the tile’s pcie_controller_target. Bind the tile’s pcie_controller_initiator (outbound TLPs to host) to EP AXI_Slave. Keep EP PCIMem / PCIMem_Slave connected directly to the RC.
Sideband: Connect all EP and system reset/clock/sideband outputs to the tile’s sc_in; connect all tile sc_out to the EP (and system) inputs as in the table above.
NOC/SMN: Connect tile noc_n_target / noc_n_initiator and smn_n_target / smn_n_initiator to the chiplet’s SharedMemoryMap, which decodes to Target_Memory and tile register windows.
RC–EP link: Use a direct link: bind RC PCIMem to EP PCIMem_Slave and RC PCIMem_Slave to EP PCIMem.
This ensures the VDK integration of the PCIe Tile and Synopsys PCIe Controller (RC and EP) matches the Keraunos system architecture and Section 3.4, and can be integrated with minimal rework.
3.5.6 DesignWare PCIe EP and RC Interfaces: Connect to Tile / System vs Stub
The Ascalon chiplet vdksys Peripherals section instantiates the Synopsys DesignWare PCIe_2_0 model for both PCIe_RC (Primary_Chiplet) and PCIe_EP (Secondary_Chiplet_1/2/3). The model exposes a large set of TLM, RESET, CLOCK, and Default (sideband) interfaces. This subsection first gives direct signal/interface correspondence tables (PCIe Tile ↔ EP, and Host/DRAM ↔ RC), then lists disposition (connect vs stub) for each interface group.
Table 1 — PCIe Tile ↔ DesignWare PCIe EP: signal and interface correspondence
Each row shows which PCIe Tile port connects to which DesignWare PCIe EP port. Connect the Tile column to the DesignWare EP column as indicated.
PCIe Tile (signal / interface) |
DesignWare PCIe EP (signal / interface) |
|---|---|
TLM |
|
|
|
|
|
Clocks (tile sc_in) |
|
|
|
|
|
Resets (tile sc_in) |
|
|
|
|
From system (e.g. CustomResetController), not EP |
|
From system, not EP |
CII — EP to tile (tile sc_in) |
|
|
|
|
|
|
|
FLR / hot reset — EP to tile (tile sc_in) |
|
|
|
|
|
Error / DMA / misc — EP to tile (tile sc_in) |
|
|
|
|
|
|
|
System to tile (tile sc_in) |
|
|
From system (isolation), not EP |
Tile to EP (tile sc_out) |
|
|
|
|
|
|
|
|
|
|
|
|
To EP hot-reset input (e.g. app_init_rst or link side as applicable) |
|
To EP config-update input if present; otherwise stub |
|
To EP RAS/error input (e.g. app_err_* side) |
|
To EP DMA completion input (e.g. dma_*xfer_go_togg or equivalent) |
|
To EP misc interrupt input |
|
To EP or system (NOC timeout status) |
Note: EP AXI_Slave is connected to the tile’s pcie_controller_initiator (outbound path). EP AXI_DBI, PCIMem, ELBIMaster are not connected to the tile: AXI_DBI can go to SharedMemoryMap; PCIMem/PCIMem_Slave connect directly to the RC for the PCIe link. Tile ports noc_n_target, noc_n_initiator, smn_n_target, smn_n_initiator connect to the chiplet SharedMemoryMap, not to the EP.
Table 2 — Host / DRAM ↔ DesignWare PCIe RC: signal and interface connection
Each row shows which Host- or system-side element connects to which DesignWare PCIe RC port. The RC has no PCIe Tile behind it; it connects to host resources and to the PCIe link (directly to EP).
Host / DRAM (or system element) |
DesignWare PCIe RC (signal / interface) |
|---|---|
Host config / MMIO (config space, DBI) |
|
SharedMemoryMap (config space region) |
|
SharedMemoryMap (DBI region) |
|
Host memory (RC as master — downstream TLPs) |
|
SharedMemoryMap (memory region for host-initiated TLPs) |
|
PCIe link (TLPs to/from EP) |
|
EP |
|
EP |
|
Clocks |
|
SYSCLK (e.g. Primary_Chiplet SYSCLK) |
|
SYSCLK |
|
Resets |
|
Peripherals Reset / RST_GEN |
|
Interrupts (host side) |
|
TT_APLIC_TLM2 (e.g. irqS[3] in vdksys) |
|
Not connected (stub) |
|
— |
|
— |
|
— |
All optional sideband (sys_int, device_type, link_up, app_ltssm_en, power/L1/L2, etc.) — stub |
Note: In the vdksys, Host/DRAM is represented by SharedMemoryMap and SYSCLK on Host_Chiplet. Host CPU traffic is modeled via the RC’s AXI_Slave (config), AXI_DBI (DBI), and BusMaster (memory TLPs) bound to SharedMemoryMap. The PCIMem / PCIMem_Slave connect the RC directly to the EP for the PCIe link.
A. PCIe Endpoint (EP) — interfaces and disposition
Category |
DesignWare EP interface (vdksys) |
Connect to PCIe Tile |
Connect to system / other |
Stub |
Notes |
|---|---|---|---|---|---|
TLM |
AXI_Slave |
pcie_controller_initiator |
— |
— |
Outbound TLPs: tile → EP (tile initiates to EP AXI_Slave). |
TLM |
AXI_DBI |
— |
SharedMemoryMap (DBI region) |
— |
DBI/config. |
TLM |
BusMaster |
pcie_controller_target |
— |
— |
Inbound TLPs: EP delivers to tile (EP BusMaster → tile target). |
TLM |
ELBIMaster |
— |
— |
Yes (vdksys: auto stub) |
Optional ELBI; not used for tile. |
TLM |
PCIMem |
— |
Link (direct to RC) |
— |
EP as master toward link. |
TLM |
PCIMem_Slave |
— |
Link (direct to RC) |
— |
EP receives from link; not connected to tile. |
RESET |
pcie_axi_ares, cc_dbi_ares, cc_core_ares, cc_pwr_ares, cc_phy_ares |
pcie_controller_reset_n (or combine) |
Peripherals Reset / RST_GEN |
— |
Drive tile reset from same source as EP. |
CLOCK |
cc_core_clk |
pcie_core_clk (tile sc_in) |
— |
— |
EP core clock to tile. |
CLOCK |
cc_dbi_aclk |
— |
SYSCLK (bound in vdksys) |
— |
DBI clock; also usable as axi_clk for tile. |
CLOCK |
cc_pipe_clk, cc_aclkSlv, cc_aclkMstr, cc_aux_clk, refclk |
— |
— |
Yes (vdksys: cc_pipe_clk stubbed) |
Internal/PHY clocks; stub if not driving tile. |
Sideband (CII) |
lbc_cii_hv, lbc_cii_dv, lbc_cii_hdr_type, lbc_cii_hdr_addr, lbc_cii_hdr_* |
pcie_cii_hv, pcie_cii_hdr_type, pcie_cii_hdr_addr (tile sc_in) |
— |
Rest of CII if tile does not use |
CII = Configuration Interface Info; map key signals to tile. |
Sideband (FLR) |
cfg_flr_pf_active_x, app_flr_pf_done_x |
pcie_flr_request (in), function_level_reset (out) |
— |
— |
FLR handshake between EP and tile. |
Sideband (hot reset, etc.) |
link_req_rst_not, training_rst_n, smlh_req_rst_not |
pcie_hot_reset (tile sc_in), hot_reset_requested (tile sc_out) |
— |
— |
As needed for tile behavior. |
Sideband (bus/dev) |
app_bus_num, app_dev_num |
pcie_app_bus_num, pcie_app_dev_num (tile sc_out) |
— |
— |
Tile drives EP with assigned BDF. |
Sideband (device type) |
device_type (slave on EP) |
pcie_device_type (tile sc_out) |
— |
Yes if not using |
vdksys stubs; connect from tile when integrated. |
Sideband (interrupt) |
sys_int (slave on EP) |
pcie_sys_int (tile sc_out) |
— |
Yes in vdksys |
Connect tile pcie_sys_int to EP sys_int when integrated. |
Sideband (DMA) |
dma_wdxfer_done_togg[], dma_rdxfer_done_togg[], edma_int_rd_chan[], edma_int_wr_chan[], edma_int |
pcie_dma_completion, controller_misc_int (tile sc_in/sc_out) |
— |
Optional |
Map DMA completion / misc int to tile as needed. |
Sideband (RAS/error) |
pcie_parc_int, app_err_, cfg_aer_ |
pcie_ras_error (tile sc_in), ras_error (tile sc_out) |
— |
Optional |
Connect if tile implements RAS/error reporting. |
MSI |
msi_ctrl_int, msi_ctrl_int_vec_[], msi_gen, ven_msi_, msix_addr, msix_data, cfg_msix_* |
— |
APLIC / interrupt controller (e.g. TT_APLIC_TLM2) |
Stub msi_gen in vdksys |
RC binds msi_ctrl_int to APLIC; EP same for device MSI. |
Power / L1/L2 |
apps_pm_xmt_turnoff, app_req_entr_l1, app_req_exit_l1, pme_en, pme_stat, clk_req, clk_req_in, pm_linkst_, pm_dstate, radm_pm_ |
— |
— |
Yes (vdksys stubs many) |
Power management; stub for minimal tile integration. |
Other sideband |
ready_entr_l23, app_ltssm_en, link_up, sys_pre_det_state, app_unlock_msg, app_ltr_, app_init_rst, bridge_flush_not, hp_int, hp_msi, RADM_inta/b/c/d, cfg_pme_, radm_pm_to_ack, slv_misc_info, mstr_misc_info, app_hdr_log, app_tlp_prfx_log, app_err_, ven_msi_tc, ven_msi_vector, cfg_msi_, CxlRegAccess, ptm_*, etc. |
— |
— |
Yes |
Optional or debug; stub to simplify integration. |
B. PCIe Root Complex (RC) — interfaces and disposition
The RC has the same DesignWare PCIe_2_0 interface set. There is no PCIe Tile behind the RC (the tile is behind the EP on the Keraunos chiplet). So RC interfaces either connect to the link (directly to EP), to the system (SharedMemoryMap, SYSCLK, APLIC), or are stubbed.
Category |
DesignWare RC interface (vdksys) |
Connect to link (EP / Switch) |
Connect to system |
Stub |
Notes |
|---|---|---|---|---|---|
TLM |
AXI_Slave, AXI_DBI |
— |
SharedMemoryMap (config, DBI) |
— |
Host config space; bound in vdksys. |
TLM |
BusMaster |
— |
SharedMemoryMap (memory) |
— |
Host-initiated TLPs; bound in vdksys. |
TLM |
ELBIMaster |
— |
— |
Yes (vdksys: auto stub) |
Optional. |
TLM |
PCIMem |
EP PCIMem_Slave (direct) |
— |
— |
Downstream TLPs. |
TLM |
PCIMem_Slave |
EP PCIMem (direct) |
— |
— |
Upstream TLPs from EP. |
RESET |
pcie_axi_ares, cc_*_ares |
— |
Peripherals Reset / RST_GEN |
— |
Same as EP. |
CLOCK |
cc_core_clk, cc_dbi_aclk |
— |
SYSCLK |
— |
Bound in vdksys. |
CLOCK |
cc_pipe_clk, cc_aux_clk, refclk, cc_aclkSlv, cc_aclkMstr |
— |
— |
Yes |
Stub if not used. |
MSI |
msi_ctrl_int |
— |
TT_APLIC_TLM2 (irqS[3] in vdksys) |
— |
Host RC MSI to APLIC. |
Sideband |
sys_int, device_type, ready_entr_l23, app_ltssm_en, link_up, sys_pre_det_state, apps_pm_xmt_turnoff, clk_req_in, app_req_entr_l1, app_req_exit_l1, app_flr_pf_done_x, app_ltr_*, msi_gen, and all other Default/sideband |
— |
— |
Yes (vdksys stubs many) |
No tile; stub optional RC sideband. |
C. Summary
EP: Connect to PCIe Tile: EP BusMaster → tile pcie_controller_target (inbound); tile pcie_controller_initiator → EP AXI_Slave (outbound). Resets and cc_core_clk (and optionally cc_dbi_aclk) to tile sc_in; CII, FLR, hot reset, app_bus_num, app_dev_num, device_type, sys_int, dma_completion, ras_error to/from tile sc_in/sc_out as in Section 3.4. Connect to system: AXI_DBI to SharedMemoryMap; cc_dbi_aclk to SYSCLK; msi_ctrl_int to APLIC. Stub: ELBIMaster; optional/PHY clocks (cc_pipe_clk, etc.); power/L1/L2 and other optional sideband.
RC: Connect to link: PCIMem, PCIMem_Slave directly to EP. Connect to system: AXI_Slave, AXI_DBI, BusMaster to SharedMemoryMap; clocks to SYSCLK; msi_ctrl_int to APLIC. Stub: All optional sideband and PHY clocks as in vdksys.
4. Connectivity Architecture
4.1 Inbound Data Path (Host → Chip)
Use Case: Host CPU writes data to Quasar compute cores or Mimir memory. The host communicates with the PCIe Tile via a PCIe Controller. On the host side the controller is a Root Complex (RC); on the Keraunos chip side the controller is the Synopsys PCIe Controller IP (DesignWare), configured as an Endpoint (EP). So the link is Host (RC) ↔ Keraunos (EP); the PCIe Tile sits behind the endpoint and receives TLPs from it over the internal interface.
Key Steps:
Host initiates PCIe Memory Write targeting Keraunos BAR (Base Address Register); the request is sent via the PCIe Controller over the PCIe link.
PCIe Controller delivers the Memory Write TLP to the PCIe Tile; the tile receives the transaction via its
pcie_inboundsocket.Inbound TLB translates host address to system address space
PCIe-SMN-IO Switch routes based on address:
0x0000_0000_0000 - 0x0000_FFFF_FFFF: NOC-bound (via
noc_n_initiator)0x1000_0000_0000 - 0x1FFF_FFFF_FFFF: SMN-bound (via
smn_n_initiator)
Transaction forwarded to NOC-N or SMN
NOC-N routes via D2D links to destination Quasar/Mimir chiplet
Response traverses back through the same path; PCIe Tile sends Completion TLP to the PCIe Controller, which delivers it to the Host.
Sideband signal flow (inbound use case):
During inbound (Host → Chip), the EP and system drive sideband inputs to the PCIe Tile so the tile can accept and process TLPs; the tile drives sideband outputs back to the EP. The flow is:
Direction |
Signals |
Role in inbound use case |
|---|---|---|
EP → Tile (sc_in) |
|
Clock and reset so tile is ready; CII for config info; EP may assert FLR/hot_reset/errors during or after inbound TLPs. |
System → Tile (sc_in) |
|
SoC reset and isolation; AXI clock for config/MMIO. |
Tile → EP (sc_out) |
|
FLR/hot_reset handshake; config and error reporting; NOC timeout; SII bus/dev and interrupt to EP. |
4.2 Outbound Data Path (Chip → Host)
Use Case: Quasar compute cores send results back to host DRAM or trigger MSI interrupts.
Key Steps:
Quasar initiates write targeting PCIe address range (typically host DRAM)
NOC-N routes to Keraunos PCIe Tile via
noc_n_outboundsocketOutbound TLB translates system address back to host physical address
PCIe Controller (
pcie_controller_initiator) generates PCIe Memory Write TLPTransaction sent over PCIe link to host
Host DRAM responds with completion
Response propagates back through PCIe Tile → NOC → Quasar
Sideband signal flow (outbound use case):
During outbound (Chip → Host), the same sideband links carry status and handshake: the EP may drive reset/FLR; the tile uses sideband outputs to signal completion and errors to the EP so the EP can complete TLPs toward the host.
Direction |
Signals |
Role in outbound use case |
|---|---|---|
EP → Tile (sc_in) |
Same as inbound |
Clock and reset; CII; EP can assert FLR/hot_reset or error sideband during outbound. |
Tile → EP (sc_out) |
|
Tell EP when tile has completed work (e.g. outbound DMA) or hit errors; FLR/hot_reset handshake; NOC timeout so EP can report or retry. |
4.3 Configuration Path (SMN → PCIe Tile Registers)
Use Case: SMC programs PCIe Tile TLBs, enables MSI relay, or reads error status.
Addressable Registers (via SMN):
0x1804_0000 - 0x1804_07FF: Inbound TLB configurations (8 entries)
0x1804_0800 - 0x1804_0FFF: Outbound TLB configurations (8 entries)
0x1800_0000 - 0x1800_0FFF: MSI Relay registers
0x1802_0000 - 0x1802_0FFF: PCIe error status and control
4.4 MSI Interrupt Path (Chip → Host)
Use Case: Ethernet controller or Quasar triggers interrupt to host driver.
5. Data Flow Paths
5.1 End-to-End Data Flow Example: Host DMA to Quasar
Scenario: Host writes 4KB of neural network weights to Quasar L1 memory.
Address Translation:
Host Address:
0x8000_0000(PCIe BAR + offset)Inbound TLB Lookup: Maps application region 0 → NOC address
System Address:
0x0000_0000_4000_0000(Quasar chiplet, NOC coordinates, L1 offset)Physical Routing: QNP mesh routes to D2D tile 2 → Quasar chiplet ID 1 → Tensix core (4,5)
5.2 Multi-Hop Data Flow: Quasar → PCIe → Host → PCIe → Quasar
Scenario: Quasar chiplet 0 sends data to Quasar chiplet 1 in a different Grendel package via host DRAM (zero-copy).
6. Address Space Integration
6.1 System Address Map
The Keraunos-E100 local address map is a subset of the broader Grendel system address map:
Address Range |
Target |
Description |
|---|---|---|
|
NOC-N |
Quasar/Mimir chiplets via D2D |
|
SMN (SEP) |
Security Engine Processor |
|
SMN (SMC) |
System Management Controller |
|
SMN (MSI) |
MSI Relay in PCIe Tile |
|
SMN (PCIe Err) |
PCIe Tile error registers |
|
SMN (TLB) |
PCIe Tile TLB configurations |
|
HSIO |
HSIO tile 0 (CCE, Ethernet, SRAM) |
|
HSIO |
HSIO tile 1 (CCE, Ethernet, SRAM) |
6.2 PCIe BAR (Base Address Register) Mapping
The PCIe Tile exposes multiple BARs to the host:
BAR |
Size |
Type |
Purpose |
|---|---|---|---|
BAR0 |
256MB |
Memory, 64-bit |
Main data path (DMA to/from Quasar) |
BAR2 |
16MB |
Memory, 64-bit |
Configuration space (SMC mailboxes, TLB programming) |
BAR4 |
64KB |
Memory, 64-bit |
MSI-X table |
BAR0 Inbound TLB Mapping Example:
Host writes to
BAR0 + 0x1000_0000(256MB offset)Inbound TLB Entry 1 (Application region 1):
Input Range:
0x1000_0000 - 0x1FFF_FFFF(256MB)Output Base:
0x0000_0000_4000_0000(NOC address for Quasar chiplet 1)
Translated Address:
0x0000_0000_4000_0000(sent to NOC-N)
6.3 Address Translation Stages
7. System Use Cases
7.1 Use Case 1: Model Initialization
Objective: Load a 10GB large language model from host to distributed Quasar memory.
Flow:
Host driver programs PCIe Tile Inbound TLBs (8 entries for 8 memory regions)
Host DMA engine streams model weights via PCIe Memory Writes
PCIe Tile translates addresses and routes to NOC-N
NOC-N distributes data across multiple Quasar chiplets via D2D links
Quasar chiplets store weights in local L1/DRAM
Performance:
PCIe Gen5 x16: ~64 GB/s theoretical, ~50 GB/s effective
Load time: 10GB / 50 GB/s = 200ms
7.2 Use Case 2: Inference Execution
Objective: Run inference on Quasar chiplets, stream results back to host.
Flow:
Host sends inference request descriptor via PCIe write (small payload: 256 bytes)
Quasar chiplets execute inference using cached model weights
Quasar writes results to host DRAM via outbound TLB (PCIe Memory Write)
Quasar triggers MSI-X interrupt via SMN → MSI Relay → PCIe
Host driver processes results
Latency:
Request descriptor: ~1μs (PCIe TLP overhead)
Inference execution: Variable (model-dependent)
Result transfer (1MB): 1MB / 50 GB/s = 20μs
MSI interrupt latency: ~2μs
7.3 Use Case 3: Package-to-Package Communication
Objective: Enable Quasar chiplets in Package 0 to communicate with Package 1 over Ethernet.
Flow (Keraunos Ethernet-based):
Quasar in Package 0 writes data to HSIO SRAM via NOC-N
CCE in HSIO tile prepares Ethernet packet
TT Ethernet Controller sends packet via 800G Ethernet to Package 1
Package 1 Ethernet Controller receives packet, writes to local HSIO SRAM
Local NOC-N forwards data to destination Quasar
Alternative Flow (PCIe-based, for same-host deployments):
Quasar in Package 0 writes to host DRAM via PCIe Tile (outbound)
Package 1 PCIe Tile reads from host DRAM (inbound)
Forwarded to Package 1 Quasar via NOC-N
7.4 Use Case 4: System Management
Objective: SMC monitors PCIe link status and reconfigures TLBs dynamically.
Flow:
SMC reads PCIe link status registers via SMN (0x1802_0xxx)
Detects link degradation (Gen5 x16 → Gen5 x8)
SMC reprograms TLB entries to reduce traffic load
SMC triggers software notification via MSI-X
Host driver adjusts DMA batch sizes
8. Final VDK Platform: Linux-Booting PCIe Tile Integration
8.1 Overview
The final validated VDK platform demonstrates a complete end-to-end PCIe data path with Linux running on the host. The platform:
Boots RISC-V Linux on the Host_Chiplet via OpenSBI (
fw_payload.elf)Enumerates the PCIe Endpoint using the Linux
snps,dw-pciedriverTransfers data from the host through the PCIe complex to memory attached to the PCIe Tile’s
noc_n_initiatorportRuns a userspace application (
pcie_xfer) for interactive read/write operations through the PCIe BAR
This section documents the final validated architecture as implemented in the reference workspace.
8.2 Dual-Chiplet VDK Topology
The platform consists of two chiplet groups connected via a direct PCIe link:
Key architectural decisions in the final platform:
Direct RC–EP link — RC’s
PCIMembinds to EP’sPCIMem_Slaveand vice versaTwo independent RISC-V CPUs — Host runs Linux; Device runs bare-metal firmware
Target_Memory on noc_n_initiator path — 16 MB memory at address 0x0 on the chiplet bus, reachable from the host through
EP → PCIE_TILE → noc_n_initiator → SharedMemoryMap → Target_MemoryMSI interrupt — RC’s
msi_ctrl_intconnected to Host SMC’sirqS[11]for PCIe MSI-to-host notification
8.3 Host Memory Map
The Host_Chiplet CPU sees the following address space:
Address |
Size |
Component |
Purpose |
|---|---|---|---|
|
64 KB |
CLINT |
Timer and software interrupts |
|
4 MB |
PCIE_RC DBI |
PCIe RC configuration (DBI registers) |
|
128 KB |
PCIE_RC ATU |
iATU outbound/inbound windows (via DBI CS2) |
|
256 MB |
PCIE_RC AXI_Slave |
PCIe config + memory window |
|
16 MB |
— Config sub-window |
Type 0/1 config TLPs via iATU |
|
240 MB |
— MEM sub-window |
Memory TLPs to EP BARs |
|
256 MB |
DRAM |
Host main memory (Linux runs here) |
|
256 B |
UART |
DW APB UART (115.2 MHz clock) |
|
2 MB |
PLIC |
Platform-Level Interrupt Controller |
8.4 Device (Keraunos_PCIE_Chiplet) Memory Map
The Keraunos_PCIE_Chiplet SharedMemoryMap provides the following decode for all initiators (EP BusMaster, PCIE_TILE noc_n/smn_n, SMC_Configure CPU):
Address |
Size |
Component |
Purpose |
|---|---|---|---|
|
16 MB |
Target_Memory |
Main data memory (host-accessible via PCIe BAR) |
|
8 MB |
PCIE_TILE smn_n_target |
SMN-side target window into the tile |
|
4 MB |
PCIe_EP AXI_DBI |
EP DBI configuration registers |
|
16 MB |
PCIE_TILE noc_n_target |
NoC-side target window into the tile |
8.5 End-to-End Data Path
The critical data path for host-to-device memory transfers traverses:
Step-by-step flow:
Host application (
pcie_xfer) performs MMIO write to BAR0 (mapped via sysfsresource0or/dev/mem)CPU issues a store to the PCIe MEM window (e.g.
0x71000000 + offset)RC AXI_Slave receives the transaction at
0x70000000iATU translates the address to a Memory Write TLP targeting the EP
RC PCIMem sends the TLP over the virtual PCIe link
EP PCIMem_Slave receives the TLP
EP BusMaster forwards the decoded transaction to
PCIE_TILE.pcie_controller_targetPCIE_TILE routes the transaction through internal fabric to
noc_n_initiatornoc_n_initiator accesses the chiplet SharedMemoryMap, which decodes to
Target_Memoryat address0x0Target_Memory completes the write; the response propagates back through the same path
8.6 Linux Boot Flow
Boot details:
OpenSBI (
fw_payload.elf) is loaded with ELF segment addresses (entry at0x80000000). The VP must not override the load address to0x0.No U-Boot — OpenSBI directly boots the embedded Linux kernel.
Device tree (
keraunos_host.dts) specifies thesnps,dw-pciecompatible node with DBI at0x44000000and config window at0x70000000.Boot arguments:
console=hvc0 earlycon=sbi rdinit=/init pci=realloc pci=assign-busses pci=noaer pcie_aspm=offThe
pci=reallocandpci=assign-bussesflags are critical for the VP where BIOS/firmware has not pre-assigned PCI resources.Host CPU ISA:
rv64imac(no FPU) — kernel built withCONFIG_FPU=n, userspace uses muslrv64imactoolchain.
8.7 PCIe Enumeration
The Linux dw-pcie driver enumerates the endpoint:
Property |
Value |
|---|---|
Bus topology |
Bus 0: RC, Bus 1: EP (direct link) |
Config access |
iATU-programmed window at |
MEM window |
|
EP BAR0 |
16 MB memory BAR ( |
MSI |
RC |
Lanes |
4 (VP config; physical Keraunos uses x16) |
The device tree PCIe node:
pcie@44000000 {
compatible = "snps,dw-pcie";
reg = <0x0 0x44000000 0x0 0x400000>, /* DBI */
<0x0 0x70000000 0x0 0x01000000>; /* config */
bus-range = <0x0 0x1>;
ranges = <0x02000000 0x0 0x71000000
0x0 0x71000000 0x0 0x0F000000>; /* 240 MB MEM */
interrupts = <32>, <33>; /* MSI, INTx */
};
8.8 The pcie_xfer Application
pcie_xfer is a Linux userspace utility that demonstrates the complete host-to-tile data path. It maps EP BAR0 via sysfs and performs MMIO read/write operations.
Capabilities:
Command |
Description |
|---|---|
|
32-bit MMIO write at BAR0 + offset |
|
32-bit MMIO read at BAR0 + offset |
|
Fill |
|
Hex dump |
|
Write incrementing pattern and verify readback |
|
Timed burst write for throughput measurement |
|
Verify |
|
Write binary file contents to BAR0 |
Data path confirmed by pcie_xfer:
Host CPU → RC AXI_Slave → iATU → PCIe TLP
→ EP PCIMem_Slave → EP BusMaster
→ PCIE_TILE.pcie_controller_target
→ noc_n_initiator → [Target_Memory]
Usage example:
# Auto-detect EP and enter interactive mode
pcie_xfer
# Write 0xDEADBEEF at BAR0 offset 0x100
pcie_xfer -c "write 0x100 0xDEADBEEF"
# Read back and verify
pcie_xfer -c "read 0x100"
# Write incrementing pattern (256 dwords) and verify
pcie_xfer -c "pattern 0x0 0x100"
8.9 Device-Side Firmware (pcie_bringup)
The Keraunos_PCIE_Chiplet runs pcie_bringup.elf on its TT_Rocket_LT CPU (SMC_Configure). This bare-metal firmware:
Initializes the PCIe Endpoint controller via DBI registers at
0x44000000Programs Inbound TLB entries for BAR address translation
Sets BAR sizes (BAR0_MASK = 0xFFFFFF for 16 MB)
Asserts
system_readyto signal the EP is ready for host enumerationWaits for PCIe link-up before the host attempts configuration reads
The device CPU and host CPU boot independently; the VP configuration ensures both start with active_at_start = true on their respective RST_GEN.
8.10 VP Configuration
Two VP configurations are available:
Config |
Host Image |
Device Image |
Quantum |
Use Case |
|---|---|---|---|---|
|
|
|
6000 ps |
Full Linux with standard kernel |
|
|
|
1000 ps |
Fast-boot mini Linux for rapid iteration |
Key VPCFG overrides (both configs):
PCIE_RC / PCIe_EP clocks:
cc_pipe_clkat 250 MHz, all AXI/DBI/aux clocks at 100 MHzSHARED_DBI_ENABLED:
false(separate DBI window)UART_CLK: 115.2 MHz
Chiplet SharedMemoryMap decode:
0x44000000:0x00400000:s;0x18000000:0x00800000;0x44400000:0x01000000;0x0:0x1000000
8.11 Sideband Connections in the Final Platform
The final VDK connects the following sideband signals between the PCIe models and the PCIE_TILE:
EP → PCIE_TILE:
Signal |
Source |
Destination |
|---|---|---|
|
EP DMA interrupts |
|
|
EP parity/RAS error |
|
|
EP CII signals |
|
|
EP FLR request |
|
System → PCIE_TILE:
Signal |
Source |
Destination |
|---|---|---|
Clock |
Chiplet CLK_GEN |
PCIE_TILE clock input |
Reset |
Chiplet RST_GEN |
PCIE_TILE reset input |
Host MSI path: PCIE_RC.msi_ctrl_int → Host_Chiplet.SMC.irqS[11]
9. Appendices
9.1 Acronyms and Abbreviations
Term |
Definition |
|---|---|
AXI |
Advanced eXtensible Interface (ARM AMBA standard) |
BAR |
Base Address Register (PCIe configuration space) |
BoW |
Bridge-of-Wire (die-to-die interconnect technology) |
CCE |
Keraunos Compute Engine (DMA and packet processing) |
D2D |
Die-to-Die (chiplet interconnect interface) |
DMA |
Direct Memory Access |
HSIO |
High-Speed Input/Output (Ethernet subsystem in Keraunos) |
ISR |
Interrupt Service Routine |
MAC |
Media Access Control (Ethernet layer) |
MSI |
Message Signaled Interrupt (PCIe interrupt mechanism) |
NOC |
Network-on-Chip |
PCS |
Physical Coding Sublayer (Ethernet layer) |
QNP |
Quasar NOC Protocol (internal NOC protocol) |
RISC-V |
Reduced Instruction Set Computer - Version 5 (open ISA) |
SCML2 |
SystemC Modeling Library 2 (Synopsys verification library) |
SEP |
Security Engine Processor |
SMC |
System Management Controller |
SMN |
System Management Network (control plane NOC) |
SMU |
System Management Unit (clock/power/reset control) |
SRAM |
Static Random-Access Memory |
TLB |
Translation Lookaside Buffer (address translation cache) |
TLP |
Transaction Layer Packet (PCIe protocol) |
TLM |
Transaction-Level Modeling (SystemC abstraction) |
9.2 Reference Documents
Keraunos-E100 Architecture Specification (keraunos-e100-for-review.pdf, v0.9.14)
Keraunos PCIe Tile High-Level Design (Keraunos_PCIe_Tile_HLD.md, v2.0)
Keraunos PCIe Tile SystemC Design Document (Keraunos_PCIE_Tile_SystemC_Design_Document.md, v2.1)
Keraunos PCIe Tile Test Plan (Keraunos_PCIE_Tile_Testplan.md, v2.1)
PCIe Base Specification 5.0 (PCI-SIG, 2019)
AMBA AXI and ACE Protocol Specification (ARM IHI 0022E)
SystemC TLM-2.0 Language Reference Manual (IEEE 1666-2011)
9.3 Revision History
Version |
Date |
Author |
Description |
|---|---|---|---|
1.0 |
2026-02-10 |
System Architecture Team |
Initial release |
2.0 |
2026-03-26 |
System Architecture Team |
Added Section 8: Final VDK Platform with Linux boot, PCIe enumeration, pcie_xfer application, dual-chiplet topology, and end-to-end data path through noc_n_initiator to Target_Memory |
Document Control:
Classification: Internal Use Only
Distribution: Keraunos Project Team, Grendel Architecture Team
Review Cycle: Quarterly or upon major architecture changes
End of Document