Keraunos System Architecture

PCIe Tile Integration in Keraunos-E100 Chiplet Ecosystem

Version: 2.0
Date: March 26, 2026
Author: System Architecture Team


Executive Summary

This document describes the system-level architecture of the Keraunos-E100 chiplet ecosystem and details how the Keraunos PCIe Tile integrates into the larger Grendel multi-chiplet architecture. The Keraunos PCIe Tile serves as a critical I/O interface, enabling host connectivity and system management while interfacing with the on-chip Network-on-Chip (NOC) infrastructure.


Table of Contents

  1. System Overview

  2. Keraunos-E100 Chiplet Architecture

  3. PCIe Tile Position in the System

  4. Connectivity Architecture

  5. Data Flow Paths

  6. Address Space Integration

  7. System Use Cases

  8. Final VDK Platform: Linux-Booting PCIe Tile Integration

  9. Appendices


1. System Overview

1.1 Grendel Chiplet Ecosystem

The Grendel chiplet ecosystem is a multi-chiplet heterogeneous computing platform designed for high-performance AI/ML workloads. The ecosystem consists of:

  • Quasar Chiplets: Compute chiplets containing AI/ML processing cores

  • Mimir Chiplets: Memory chiplets with GDDR interfaces

  • Athena Chiplets: Specialized compute chiplets

  • Keraunos-E100 Chiplets: I/O interface chiplets for high-speed connectivity

1.2 Keraunos-E100 Role

Keraunos-E100 is the I/O interface chiplet family in the Grendel ecosystem, providing:

  • Glueless scale-out connectivity via 400G/800G Ethernet (Quasar-to-Quasar across packages)

  • Host connectivity via PCIe Gen5 (x16)

  • Die-to-die (D2D) connectivity within the package using BoW (Bridge-of-Wire) technology

  • System management capabilities via integrated SMC (System Management Controller)

graph TB subgraph pkg["Grendel Package"] QUASAR[Quasar] MIMIR[Mimir] KERAUNOS[Keraunos-E100] QUASAR --- KERAUNOS MIMIR --- QUASAR end HOST[Host] <-->|PCIe x16| KERAUNOS REMOTE[Remote Package] <-->|Ethernet| KERAUNOS style KERAUNOS fill:#e1f5ff style QUASAR fill:#ffe1e1 style MIMIR fill:#e1ffe1 style HOST fill:#fff4e1 style REMOTE fill:#f0e1ff

2. Keraunos-E100 Chiplet Architecture

2.1 High-Level Block Diagram

The Keraunos-E100 chiplet contains the following major subsystems. The diagram is split into two parts for reliable rendering.

Part A — Internal subsystems and NOC:

graph TB subgraph harness["Chiplet Harness"] SMC[SMC] SEP[SEP] SMU[SMU] end subgraph pcie["PCIe Subsystem"] PCIE_TILE[PCIe Tile] PCIE_SERDES[PCIe SerDes] end subgraph hsio["HSIO Tiles (2)"] CCE0[CCE 0] CCE1[CCE 1] ETH0[TT Eth Ctrl 0] ETH1[TT Eth Ctrl 1] MAC0[MAC PCS 0] MAC1[MAC PCS 1] SRAM[HSIO SRAM] FABRIC[HSIO Fabric] end subgraph noc["NOC Infrastructure"] SMN[SMN] QNP[QNP Mesh] D2D[D2D Tiles] end PCIE_TILE --- SMN PCIE_TILE --- QNP PCIE_TILE --- PCIE_SERDES CCE0 --- FABRIC CCE1 --- FABRIC ETH0 --- FABRIC ETH1 --- FABRIC FABRIC --- SRAM FABRIC --- SMN FABRIC --- QNP ETH0 --- MAC0 ETH1 --- MAC1 SMN --- SMC SMN --- SEP SMN --- D2D QNP --- D2D style PCIE_TILE fill:#ffcccc,stroke:#c00 style FABRIC fill:#d9f7d9 style SMN fill:#e6ccff style QNP fill:#e6ccff

Part B — External interfaces:

graph LR subgraph chip["Keraunos-E100 Chiplet"] PCIE[PCIe Tile] MAC_A[MAC PCS 0] MAC_B[MAC PCS 1] D2D_T[D2D Tiles] end HOST[PCIe Host] ETH_A[Ethernet 0] ETH_B[Ethernet 1] QUASAR[Quasar Mimir] HOST <-->|PCIe x16| PCIE MAC_A -->|800G| ETH_A MAC_B -->|800G| ETH_B D2D_T -->|BoW| QUASAR style PCIE fill:#ffcccc,stroke:#c00

2.2 Key Subsystems

2.2.1 Chiplet Harness

  • SMC (System Management Controller): 4-core RISC-V processor (Rocket core) running at 800 MHz

  • SEP (Security Engine Processor): Handles secure boot, attestation, and access filtering

  • SMU (System Management Unit): Clock generation (CGM PLLs), power management, reset sequencing

2.2.2 PCIe Subsystem

  • PCIe Tile: Contains TLB translation engines, configuration registers, and internal fabric; it interfaces to the PCIe Controller. The tile does not implement the link layer; it receives/sends TLPs via the controller.

  • PCIe Controller: On the Keraunos chip this is the Synopsys PCIe Controller IP (DesignWare), configured as an Endpoint (EP). The host system uses a Root Complex (RC). The link is therefore RC (host) ↔ EP (Keraunos).

  • PCIe SerDes: Physical layer (PHY) for PCIe Gen5/Gen6 connectivity

2.2.3 HSIO (High-Speed I/O) Tiles

  • CCE (Keraunos Compute Engine): DMA engines, DMRISC cores, data forwarding logic

  • TT Ethernet Controller: TX/RX queue controllers, packet processing, flow control

  • MAC/PCS: 800G Ethernet MAC and Physical Coding Sublayer (OmegaCore IP from AlphaWave)

  • SRAM: 8MB high-speed SRAM for packet buffering and data staging

  • HSIO Fabric: AXI-based crossbar interconnect

2.2.4 NOC Infrastructure

  • SMN (System Management Network): Carries control, configuration, and low-bandwidth traffic

  • QNP Mesh (NOC-N): High-bandwidth data fabric for payload transfer (1.5 GHz @ TT corner)

  • D2D (Die-to-Die): 5 BoW interfaces @ 2 GHz for chiplet-to-chiplet connectivity


3. PCIe Tile Position in the System

3.1 PCIe Tile Overview

The Keraunos PCIe Tile developed in this project is a SystemC/TLM-2.0 model representing the PCIe subsystem of the Keraunos-E100 chiplet. It provides:

  1. Host Interface: PCIe Gen5 x16 connectivity to the host CPU

  2. Internal Routing: Bidirectional routing between PCIe, NOC-N (QNP), and SMN

  3. Address Translation: TLB-based address mapping between PCIe address space and system address space

  4. Configuration Interface: SMN-accessible configuration registers for TLBs, MSI relay, and error handling

3.2 Architectural Position

graph TB subgraph host["Host System"] CPU[Host CPU] DRAM[Host DRAM] end subgraph tile["PCIe Tile"] PCIE_CTRL[PCIe Controller] TLB_IN[Inbound TLB] TLB_OUT[Outbound TLB] SWITCH[PCIe-SMN-IO Switch] MSI_RELAY[MSI Relay] end subgraph chip["Keraunos-E100"] SMN_NET[SMN] QNP_NET[QNP NOC-N] HSIO_TILES[HSIO Tiles] D2D_LINKS[D2D Links] end CPU --- PCIE_CTRL PCIE_CTRL --- TLB_IN PCIE_CTRL --- TLB_OUT TLB_IN --- SWITCH SWITCH --- QNP_NET SWITCH --- SMN_NET TLB_OUT --- QNP_NET TLB_OUT --- SMN_NET SMN_NET --- PCIE_CTRL SMN_NET --- HSIO_TILES QNP_NET --- HSIO_TILES QNP_NET --- D2D_LINKS MSI_RELAY --- PCIE_CTRL SMN_NET --- MSI_RELAY style PCIE_CTRL fill:#ffcccc,stroke:#c00 style TLB_IN fill:#ffe6cc style TLB_OUT fill:#ffe6cc style SWITCH fill:#d9f7d9 style SMN_NET fill:#e6ccff style QNP_NET fill:#cce5ff

3.3 Key Interfaces

Interface

Protocol

Width

Purpose

pcie_inbound

TLM-2.0 Target

64-bit

Receives PCIe Memory Read/Write from host

noc_n_initiator

TLM-2.0 Initiator

64-bit

Forwards inbound PCIe traffic to NOC after TLB translation

smn_n_initiator

TLM-2.0 Initiator

64-bit

Forwards bypass/system traffic to SMN

noc_n_outbound

TLM-2.0 Target

64-bit

Receives outbound NOC traffic destined for PCIe

smn_outbound

TLM-2.0 Target

64-bit

Receives outbound SMN traffic destined for PCIe

pcie_controller_initiator

TLM-2.0 Initiator

64-bit

Sends outbound transactions to PCIe controller

smn_config

TLM-2.0 Target

64-bit

SMN access to PCIe Tile configuration registers

3.4 Model Integration: Host–RC–EP–PCIe Tile Connection Diagram

This section documents how the inband (TLM) and sideband (sc_in/sc_out) ports of the PCIe Tile connect across Host → Root Complex → Endpoint → PCIe Tile for model integration (e.g. connecting the tile to a Synopsys PCIe Controller model or test harness).

End-to-end connection path:

graph LR subgraph host["Host System"] CPU[Host CPU] DRAM[Host DRAM] end subgraph rc["Root Complex"] RC_CORE[RC Core] RC_SIDEBAND[Reset / Power / Sideband] end subgraph link["PCIe Link"] TLP[TLPs] end subgraph ep["Endpoint - Synopsys PCIe Controller"] EP_CORE[EP Core] EP_AXI[AXI / Data IF] EP_SIDEBAND[Sideband Signals] end subgraph tile["PCIe Tile DUT"] TLM_IN[TLM Target] TLM_OUT[TLM Initiator] SC_IN[sc_in] SC_OUT[sc_out] end CPU --- RC_CORE RC_CORE --- TLP TLP --- EP_CORE EP_CORE --- EP_AXI EP_CORE --- EP_SIDEBAND EP_AXI --- TLM_IN EP_AXI --- TLM_OUT EP_SIDEBAND --- SC_IN EP_SIDEBAND --- SC_OUT RC_SIDEBAND -.->|optional| EP_SIDEBAND style TLP fill:#e3f2fd style EP_AXI fill:#fff3e0 style EP_SIDEBAND fill:#f3e5f5 style TLM_IN fill:#e8f5e9 style TLM_OUT fill:#e8f5e9 style SC_IN fill:#fce4ec style SC_OUT fill:#fce4ec

Inband (TLM) connections — EP ↔ PCIe Tile:

Tile port

Direction

Width

Connected to (EP side)

Description

pcie_controller_target

Target (in)

64-bit

EP BusMaster (TLM master)

Inbound TLPs from host: EP pushes Memory Read/Write to tile

pcie_controller_initiator

Initiator (out)

64-bit

EP AXI_Slave (TLM slave)

Outbound TLPs to host: tile pushes Memory Read/Write/Completion to EP

noc_n_target

Target (in)

64-bit

NOC fabric

Outbound path: NOC traffic destined for PCIe

noc_n_initiator

Initiator (out)

64-bit

NOC fabric

Inbound path: tile forwards translated traffic to NOC

smn_n_target

Target (in)

64-bit

SMN fabric

Outbound path: SMN traffic destined for PCIe

smn_n_initiator

Initiator (out)

64-bit

SMN fabric

Inbound path: tile forwards system traffic to SMN

Sideband signals — EP → PCIe Tile (sc_in to tile):

These are driven by the Synopsys PCIe Controller (EP) or by the system; the tile receives them as sc_in.

Tile port (sc_in)

Type

Source

Description

pcie_core_clk

bool

EP

PCIe core clock from controller

pcie_controller_reset_n

bool

EP

Controller reset (active low)

pcie_cii_hv

bool

EP

CII header valid (SII / config info)

pcie_cii_hdr_type

sc_bv<5>

EP

CII header type [4:0]

pcie_cii_hdr_addr

sc_bv<12>

EP

CII header address [11:0]

pcie_flr_request

bool

EP

Function Level Reset request

pcie_hot_reset

bool

EP

Hot reset from link

pcie_ras_error

bool

EP

RAS error indication

pcie_dma_completion

bool

EP

DMA completion notification

pcie_misc_int

bool

EP

Miscellaneous interrupt from controller

cold_reset_n

bool

System (SMC)

SoC cold reset (active low)

warm_reset_n

bool

System (SMC)

SoC warm reset (active low)

isolate_req

bool

System

Isolation request

axi_clk

bool

System

AXI clock

Sideband signals — PCIe Tile → EP (sc_out from tile):

The tile drives these; the EP or system receives them.

Tile port (sc_out)

Type

Sink

Description

pcie_app_bus_num

uint8_t

EP

PCIe bus number for app

pcie_app_dev_num

uint8_t

EP

PCIe device number for app

pcie_device_type

bool

EP

Device type indicator

pcie_sys_int

bool

EP

System interrupt to controller

function_level_reset

bool

EP

FLR completion / request to EP

hot_reset_requested

bool

EP

Hot reset requested

config_update

bool

EP

Configuration update indicator

ras_error

bool

EP

RAS error to controller

dma_completion

bool

EP

DMA completion to controller

controller_misc_int

bool

EP

Controller miscellaneous interrupt

noc_timeout

sc_bv<3>

EP / system

NOC timeout status

Summary diagram — sideband and inband to PCIe Tile:

graph TB subgraph host_rc["Host / Root Complex"] H[Host] RC[RC] H --- RC end subgraph pcie_link["PCIe Link - Inband TLPs"] L[TLP] end subgraph ep_block["Synopsys EP - PCIe Controller"] EP[EP] EP_CLK[clk, reset_n] EP_FLR[flr_request, hot_reset] EP_ERR[ras_error, dma_completion, misc_int] EP_CII[cii_hv, cii_hdr_type, cii_hdr_addr] EP --- EP_CLK EP --- EP_FLR EP --- EP_ERR EP --- EP_CII end subgraph tile["PCIe Tile"] TLM_T[pcie_controller_target] TLM_I[pcie_controller_initiator] IN[sc_in ports] OUT[sc_out ports] end RC --- L L --- EP EP ---|AXI/TLM| TLM_T EP ---|AXI/TLM| TLM_I EP_CLK --- IN EP_FLR --- IN EP_ERR --- IN EP_CII --- IN OUT --- EP IN -.->|cold_reset_n, warm_reset_n, isolate_req, axi_clk| SYS[System SMC] style L fill:#e3f2fd style TLM_T fill:#c8e6c9 style TLM_I fill:#c8e6c9 style IN fill:#f8bbd0 style OUT fill:#f8bbd0

Integration notes:

  • Inband: Connect the EP’s BusMaster (TLM master) to the tile’s pcie_controller_target — the EP delivers inbound TLPs from the host to the tile. Connect the tile’s pcie_controller_initiator to the EP’s AXI_Slave (TLM slave) — the tile sends outbound TLPs to the EP, which forwards them over the link to the RC.

  • Sideband: Drive all tile sc_in ports from the EP model or system (clocks, resets, CII, FLR, hot_reset, ras_error, dma_completion, pcie_misc_int; plus cold_reset_n, warm_reset_n, isolate_req, axi_clk from system). Connect all tile sc_out ports to the EP or system as required by the EP datasheet and platform design.

3.5 VDK Integration: PCIe Tile and Synopsys PCIe Controller in the Virtualizer

This section describes how the Keraunos PCIe Tile and the Synopsys PCIe Controller (DesignWare, as RC and EP) are connected in the Synopsys Virtualizer VDK so that the virtual platform aligns with the Keraunos system architecture. The final validated VDK uses a direct RC–EP link between two chiplet groups: Host_Chiplet (Root Complex side) and Keraunos_PCIE_Chiplet (Endpoint side).

3.5.1 VDK Topology

The VDK instantiates a Host_Chiplet (with PCIE_RC, RISC-V CPU running Linux, DRAM, UART, PLIC) and a Keraunos_PCIE_Chiplet (with PCIe_EP, PCIE_TILE, Target_Memory, and a second RISC-V CPU running bare-metal firmware).

PCIe model in VDK: Synopsys DESIGNWARE_PCIE / PCIe_2_0 is used for both the Root Complex (PCIE_RC on Host_Chiplet) and the Endpoint (PCIe_EP on Keraunos_PCIE_Chiplet). The PCIe link uses a direct peer-to-peer binding:

  • RC PCIMem (master) ↔ EP PCIMem_Slave (slave) — TLPs from RC to EP

  • RC PCIMem_Slave (slave) ↔ EP PCIMem (master) — TLPs from EP to RC

3.5.2 Alignment with Keraunos: Where the PCIe Tile Fits

In the Keraunos-E100 architecture, the host uses a Root Complex and the Keraunos chip uses a Synopsys PCIe Controller as Endpoint. The PCIe Tile sits behind the EP and provides TLB translation and routing to NOC/SMN. In the VDK:

  • RC: The PCIE_RC on Host_Chiplet models the host side.

  • EP: On the Keraunos_PCIE_Chiplet, the Synopsys PCIe EP is the PCIe controller; the Keraunos PCIe Tile (PCIE_TILE) is inserted between this EP and the rest of the chip (NOC/SMN/Target_Memory).

The topology is: Host ↔ RC ↔ [direct PCIe link] ↔ EP ↔ PCIe Tile ↔ NOC/SMN. The tile does not replace the EP; it connects to the EP’s application-side (AXI/TLM) and sideband interfaces as in Section 3.4.

3.5.3 Interface-Level Connection Diagram (VDK)

The following diagram shows the VDK topology and where the PCIe Tile and Synopsys RC/EP connect:

graph TB subgraph vdk_root["VDK: Keraunos_PCIE_Tile"] subgraph host["Host_Chiplet"] RST_H[RST_GEN] SMM_P[SharedMemoryMap] SMC_H[SMC] RC[PCIE_RC - Synopsys RC] DRAM_H[DRAM] SMC_H --- RC end subgraph device["Keraunos_PCIE_Chiplet"] RST_D[RST_GEN] SMM_S[SharedMemoryMap] SMC_D[SMC_Configure] EP[PCIe_EP - Synopsys EP] TILE[PCIE_TILE - Keraunos PCIe Tile] MEM[Target_Memory] EP ---|BusMaster / AXI_Slave| TILE TILE ---|noc_n / smn_n| SMM_S SMM_S --- MEM end end RC ---|PCIMem / PCIMem_Slave direct| EP RC ---|AXI_Slave, AXI_DBI, BusMaster| SMM_P EP -.->|sideband| TILE style RC fill:#e3f2fd style EP fill:#fff3e0 style TILE fill:#c8e6c9 style MEM fill:#ffe6cc

Inband (TLM) connections:

  • PCIE_RC: AXI_Slave, AXI_DBI, BusMaster bound to Host_Chiplet SharedMemoryMap (config and memory space).

  • PCIe_EP: AXI_DBI bound to Keraunos_PCIE_Chiplet SharedMemoryMap; PCIMem / PCIMem_Slave connected directly to PCIE_RC (TLP traffic). BusMaster connected to PCIE_TILE.pcie_controller_target (inbound TLPs to tile).

PCIe Tile connections:

  • EP BusMaster to the tile’s pcie_controller_target (inbound TLPs from host). The tile’s pcie_controller_initiator to EP AXI_Slave (outbound TLPs to host). EP PCIMem / PCIMem_Slave are connected directly to the RC for the PCIe link.

  • The tile’s noc_n_target / noc_n_initiator and smn_n_target / smn_n_initiator connect to the chiplet’s SharedMemoryMap, which decodes to Target_Memory and tile register windows.

3.5.4 Signal- and Interface-Level Mapping (EP ↔ PCIe Tile)

The Synopsys DesignWare PCIe model (PCIe_2_0) exposes the following interface groups. The mapping to the Keraunos PCIe Tile ports enables a drop-in style integration when the tile is added to the VDK.

TLM (inband) — DesignWare EP ↔ PCIe Tile:

DesignWare EP interface (VDK)

Direction

PCIe Tile port

Description

AXI_Slave

Slave (in)

pcie_controller_initiator

Outbound TLPs: tile sends Memory Read/Write/Completion to EP; EP receives on AXI_Slave and sends over link to RC.

AXI_DBI

Slave (in)

DBI/config; may remain to SharedMemoryMap or be routed per platform.

BusMaster

Master (out)

pcie_controller_target

Inbound TLPs: EP delivers host Memory Read/Write to tile (EP BusMaster → tile target).

PCIMem

Master (out)

EP as master toward link (direct to RC). Not connected to tile.

PCIMem_Slave

Slave (in)

Inbound TLPs from link (RC → EP); connects directly to RC, not to tile.

So: BusMaster (EP)pcie_controller_target (Tile) for inbound TLPs; pcie_controller_initiator (Tile)AXI_Slave (EP) for outbound TLPs. PCIMem/PCIMem_Slave stay on the link side (EP ↔ RC direct). AXI_DBI can remain to SharedMemoryMap.

Sideband — DesignWare EP ↔ PCIe Tile (sc_in / sc_out):

DesignWare PCIe_2_0 exposes a number of reset, clock, and sideband pins. Map them to the tile’s sc_in and sc_out as follows so that the VDK integration matches Section 3.4.

Tile sc_in (receive)

Source (EP or system)

DesignWare EP / system signal (typical name)

pcie_core_clk

EP

cc_core_clk or equivalent core clock

pcie_controller_reset_n

EP

pcie_axi_ares or combined reset_n

pcie_cii_hv

EP

CII header valid

pcie_cii_hdr_type

EP

CII header type [4:0]

pcie_cii_hdr_addr

EP

CII header address [11:0]

pcie_flr_request

EP

FLR request

pcie_hot_reset

EP

Hot reset

pcie_ras_error

EP

RAS error

pcie_dma_completion

EP

DMA completion

pcie_misc_int

EP

Miscellaneous interrupt

cold_reset_n

System (e.g. CustomResetController)

SoC cold reset

warm_reset_n

System

SoC warm reset

isolate_req

System

Isolation request

axi_clk

System

AXI clock

Tile sc_out (drive)

Sink (EP or system)

DesignWare EP / system signal (typical name)

pcie_app_bus_num

EP

App bus number

pcie_app_dev_num

EP

App device number

pcie_device_type

EP

Device type

pcie_sys_int

EP

System interrupt to controller

function_level_reset

EP

FLR completion

hot_reset_requested

EP

Hot reset requested

config_update

EP

Config update

ras_error

EP

RAS error to controller

dma_completion

EP

DMA completion to controller

controller_misc_int

EP

Controller misc interrupt

noc_timeout

EP / system

NOC timeout [2:0]

(Exact DesignWare signal names may vary by IP version; use the EP model’s documentation or RTL interface list to align names.)

3.5.5 Connection Diagram for Easy Integration

A single diagram that ties VDK instances to tile ports and EP ports is below. Use it as a checklist when wiring the PCIe Tile into the VDK behind the Synopsys EP.

graph LR subgraph host_side["Host / RC (VDK Primary_Chiplet)"] RC[PCIe_RC] end subgraph keraunos_chiplet["Keraunos Chiplet (e.g. Secondary_Chiplet_1)"] subgraph ep_block["Synopsys EP"] EP[PCIe_2_0] EP_PCIMem[PCIMem] EP_PCIMemSlave[PCIMem_Slave] EP_BusM[BusMaster] EP_AXI_S[AXI_Slave] EP_AXI_DBI[AXI_DBI] EP_RST[resets] EP_CLK[clocks] EP_SB[sideband] end subgraph tile_block["Keraunos PCIe Tile"] T_tgt[pcie_controller_target] T_init[pcie_controller_initiator] T_noc_t[noc_n_target] T_noc_i[noc_n_initiator] T_smn_t[smn_n_target] T_smn_i[smn_n_initiator] T_sc_in[sc_in] T_sc_out[sc_out] end end RC -->|PCIMem direct| EP_PCIMemSlave EP_PCIMem -->|PCIMem direct| RC EP_BusM -->|TLM inbound| T_tgt T_init -->|TLM outbound| EP_AXI_S EP_RST --> T_sc_in EP_CLK --> T_sc_in EP_SB --> T_sc_in T_sc_out --> EP_SB T_noc_i --> NOC[NOC] T_smn_i --> SMN[SMN] NOC --> T_noc_t SMN --> T_smn_t style T_tgt fill:#c8e6c9 style T_init fill:#c8e6c9 style T_sc_in fill:#f8bbd0 style T_sc_out fill:#f8bbd0

Integration checklist:

  1. Inband: Bind EP BusMaster (inbound TLPs to device) to the tile’s pcie_controller_target. Bind the tile’s pcie_controller_initiator (outbound TLPs to host) to EP AXI_Slave. Keep EP PCIMem / PCIMem_Slave connected directly to the RC.

  2. Sideband: Connect all EP and system reset/clock/sideband outputs to the tile’s sc_in; connect all tile sc_out to the EP (and system) inputs as in the table above.

  3. NOC/SMN: Connect tile noc_n_target / noc_n_initiator and smn_n_target / smn_n_initiator to the chiplet’s SharedMemoryMap, which decodes to Target_Memory and tile register windows.

  4. RC–EP link: Use a direct link: bind RC PCIMem to EP PCIMem_Slave and RC PCIMem_Slave to EP PCIMem.

This ensures the VDK integration of the PCIe Tile and Synopsys PCIe Controller (RC and EP) matches the Keraunos system architecture and Section 3.4, and can be integrated with minimal rework.

3.5.6 DesignWare PCIe EP and RC Interfaces: Connect to Tile / System vs Stub

The Ascalon chiplet vdksys Peripherals section instantiates the Synopsys DesignWare PCIe_2_0 model for both PCIe_RC (Primary_Chiplet) and PCIe_EP (Secondary_Chiplet_1/2/3). The model exposes a large set of TLM, RESET, CLOCK, and Default (sideband) interfaces. This subsection first gives direct signal/interface correspondence tables (PCIe Tile ↔ EP, and Host/DRAM ↔ RC), then lists disposition (connect vs stub) for each interface group.

Table 1 — PCIe Tile ↔ DesignWare PCIe EP: signal and interface correspondence

Each row shows which PCIe Tile port connects to which DesignWare PCIe EP port. Connect the Tile column to the DesignWare EP column as indicated.

PCIe Tile (signal / interface)

DesignWare PCIe EP (signal / interface)

TLM

pcie_controller_target (TLM target)

BusMaster (EP TLM master) — EP delivers inbound TLPs from host to tile

pcie_controller_initiator (TLM initiator)

AXI_Slave (EP TLM slave) — tile sends outbound TLPs to EP

Clocks (tile sc_in)

pcie_core_clk

cc_core_clk

axi_clk

cc_dbi_aclk (or system SYSCLK)

Resets (tile sc_in)

pcie_controller_reset_n

pcie_axi_ares (invert for active-low), or combined from cc_dbi_ares, cc_core_ares, cc_pwr_ares, cc_phy_ares

cold_reset_n

From system (e.g. CustomResetController), not EP

warm_reset_n

From system, not EP

CII — EP to tile (tile sc_in)

pcie_cii_hv

lbc_cii_hv

pcie_cii_hdr_type

lbc_cii_hdr_type

pcie_cii_hdr_addr

lbc_cii_hdr_addr

FLR / hot reset — EP to tile (tile sc_in)

pcie_flr_request

cfg_flr_pf_active_x (EP drives FLR request to tile)

pcie_hot_reset

link_req_rst_not or training_rst_n / smlh_req_rst_not (link/controller hot reset)

Error / DMA / misc — EP to tile (tile sc_in)

pcie_ras_error

pcie_parc_int or app_err_* / cfg_aer_* (RAS/error from EP)

pcie_dma_completion

dma_wdxfer_done_togg[] / dma_rdxfer_done_togg[] or edma_int (DMA completion)

pcie_misc_int

edma_int_rd_chan[] / edma_int_wr_chan[] or other controller misc interrupt

System to tile (tile sc_in)

isolate_req

From system (isolation), not EP

Tile to EP (tile sc_out)

pcie_app_bus_num

app_bus_num

pcie_app_dev_num

app_dev_num

pcie_device_type

device_type

pcie_sys_int

sys_int

function_level_reset

app_flr_pf_done_x (tile signals FLR done to EP)

hot_reset_requested

To EP hot-reset input (e.g. app_init_rst or link side as applicable)

config_update

To EP config-update input if present; otherwise stub

ras_error

To EP RAS/error input (e.g. app_err_* side)

dma_completion

To EP DMA completion input (e.g. dma_*xfer_go_togg or equivalent)

controller_misc_int

To EP misc interrupt input

noc_timeout

To EP or system (NOC timeout status)

Note: EP AXI_Slave is connected to the tile’s pcie_controller_initiator (outbound path). EP AXI_DBI, PCIMem, ELBIMaster are not connected to the tile: AXI_DBI can go to SharedMemoryMap; PCIMem/PCIMem_Slave connect directly to the RC for the PCIe link. Tile ports noc_n_target, noc_n_initiator, smn_n_target, smn_n_initiator connect to the chiplet SharedMemoryMap, not to the EP.


Table 2 — Host / DRAM ↔ DesignWare PCIe RC: signal and interface connection

Each row shows which Host- or system-side element connects to which DesignWare PCIe RC port. The RC has no PCIe Tile behind it; it connects to host resources and to the PCIe link (directly to EP).

Host / DRAM (or system element)

DesignWare PCIe RC (signal / interface)

Host config / MMIO (config space, DBI)

SharedMemoryMap (config space region)

AXI_Slave

SharedMemoryMap (DBI region)

AXI_DBI

Host memory (RC as master — downstream TLPs)

SharedMemoryMap (memory region for host-initiated TLPs)

BusMaster

PCIe link (TLPs to/from EP)

EP PCIMem_Slave (direct link)

PCIMem — RC sends downstream TLPs

EP PCIMem (direct link)

PCIMem_Slave — RC receives upstream TLPs from EP

Clocks

SYSCLK (e.g. Primary_Chiplet SYSCLK)

cc_core_clk

SYSCLK

cc_dbi_aclk

Resets

Peripherals Reset / RST_GEN

pcie_axi_ares, cc_dbi_ares, cc_core_ares, cc_pwr_ares, cc_phy_ares

Interrupts (host side)

TT_APLIC_TLM2 (e.g. irqS[3] in vdksys)

msi_ctrl_int — RC MSI to host interrupt controller

Not connected (stub)

ELBIMaster — stub

cc_pipe_clk, cc_aux_clk, refclk, cc_aclkSlv, cc_aclkMstr — stub

All optional sideband (sys_int, device_type, link_up, app_ltssm_en, power/L1/L2, etc.) — stub

Note: In the vdksys, Host/DRAM is represented by SharedMemoryMap and SYSCLK on Host_Chiplet. Host CPU traffic is modeled via the RC’s AXI_Slave (config), AXI_DBI (DBI), and BusMaster (memory TLPs) bound to SharedMemoryMap. The PCIMem / PCIMem_Slave connect the RC directly to the EP for the PCIe link.


A. PCIe Endpoint (EP) — interfaces and disposition

Category

DesignWare EP interface (vdksys)

Connect to PCIe Tile

Connect to system / other

Stub

Notes

TLM

AXI_Slave

pcie_controller_initiator

Outbound TLPs: tile → EP (tile initiates to EP AXI_Slave).

TLM

AXI_DBI

SharedMemoryMap (DBI region)

DBI/config.

TLM

BusMaster

pcie_controller_target

Inbound TLPs: EP delivers to tile (EP BusMaster → tile target).

TLM

ELBIMaster

Yes (vdksys: auto stub)

Optional ELBI; not used for tile.

TLM

PCIMem

Link (direct to RC)

EP as master toward link.

TLM

PCIMem_Slave

Link (direct to RC)

EP receives from link; not connected to tile.

RESET

pcie_axi_ares, cc_dbi_ares, cc_core_ares, cc_pwr_ares, cc_phy_ares

pcie_controller_reset_n (or combine)

Peripherals Reset / RST_GEN

Drive tile reset from same source as EP.

CLOCK

cc_core_clk

pcie_core_clk (tile sc_in)

EP core clock to tile.

CLOCK

cc_dbi_aclk

SYSCLK (bound in vdksys)

DBI clock; also usable as axi_clk for tile.

CLOCK

cc_pipe_clk, cc_aclkSlv, cc_aclkMstr, cc_aux_clk, refclk

Yes (vdksys: cc_pipe_clk stubbed)

Internal/PHY clocks; stub if not driving tile.

Sideband (CII)

lbc_cii_hv, lbc_cii_dv, lbc_cii_hdr_type, lbc_cii_hdr_addr, lbc_cii_hdr_*

pcie_cii_hv, pcie_cii_hdr_type, pcie_cii_hdr_addr (tile sc_in)

Rest of CII if tile does not use

CII = Configuration Interface Info; map key signals to tile.

Sideband (FLR)

cfg_flr_pf_active_x, app_flr_pf_done_x

pcie_flr_request (in), function_level_reset (out)

FLR handshake between EP and tile.

Sideband (hot reset, etc.)

link_req_rst_not, training_rst_n, smlh_req_rst_not

pcie_hot_reset (tile sc_in), hot_reset_requested (tile sc_out)

As needed for tile behavior.

Sideband (bus/dev)

app_bus_num, app_dev_num

pcie_app_bus_num, pcie_app_dev_num (tile sc_out)

Tile drives EP with assigned BDF.

Sideband (device type)

device_type (slave on EP)

pcie_device_type (tile sc_out)

Yes if not using

vdksys stubs; connect from tile when integrated.

Sideband (interrupt)

sys_int (slave on EP)

pcie_sys_int (tile sc_out)

Yes in vdksys

Connect tile pcie_sys_int to EP sys_int when integrated.

Sideband (DMA)

dma_wdxfer_done_togg[], dma_rdxfer_done_togg[], edma_int_rd_chan[], edma_int_wr_chan[], edma_int

pcie_dma_completion, controller_misc_int (tile sc_in/sc_out)

Optional

Map DMA completion / misc int to tile as needed.

Sideband (RAS/error)

pcie_parc_int, app_err_, cfg_aer_

pcie_ras_error (tile sc_in), ras_error (tile sc_out)

Optional

Connect if tile implements RAS/error reporting.

MSI

msi_ctrl_int, msi_ctrl_int_vec_[], msi_gen, ven_msi_, msix_addr, msix_data, cfg_msix_*

APLIC / interrupt controller (e.g. TT_APLIC_TLM2)

Stub msi_gen in vdksys

RC binds msi_ctrl_int to APLIC; EP same for device MSI.

Power / L1/L2

apps_pm_xmt_turnoff, app_req_entr_l1, app_req_exit_l1, pme_en, pme_stat, clk_req, clk_req_in, pm_linkst_, pm_dstate, radm_pm_

Yes (vdksys stubs many)

Power management; stub for minimal tile integration.

Other sideband

ready_entr_l23, app_ltssm_en, link_up, sys_pre_det_state, app_unlock_msg, app_ltr_, app_init_rst, bridge_flush_not, hp_int, hp_msi, RADM_inta/b/c/d, cfg_pme_, radm_pm_to_ack, slv_misc_info, mstr_misc_info, app_hdr_log, app_tlp_prfx_log, app_err_, ven_msi_tc, ven_msi_vector, cfg_msi_, CxlRegAccess, ptm_*, etc.

Yes

Optional or debug; stub to simplify integration.

B. PCIe Root Complex (RC) — interfaces and disposition

The RC has the same DesignWare PCIe_2_0 interface set. There is no PCIe Tile behind the RC (the tile is behind the EP on the Keraunos chiplet). So RC interfaces either connect to the link (directly to EP), to the system (SharedMemoryMap, SYSCLK, APLIC), or are stubbed.

Category

DesignWare RC interface (vdksys)

Connect to link (EP / Switch)

Connect to system

Stub

Notes

TLM

AXI_Slave, AXI_DBI

SharedMemoryMap (config, DBI)

Host config space; bound in vdksys.

TLM

BusMaster

SharedMemoryMap (memory)

Host-initiated TLPs; bound in vdksys.

TLM

ELBIMaster

Yes (vdksys: auto stub)

Optional.

TLM

PCIMem

EP PCIMem_Slave (direct)

Downstream TLPs.

TLM

PCIMem_Slave

EP PCIMem (direct)

Upstream TLPs from EP.

RESET

pcie_axi_ares, cc_*_ares

Peripherals Reset / RST_GEN

Same as EP.

CLOCK

cc_core_clk, cc_dbi_aclk

SYSCLK

Bound in vdksys.

CLOCK

cc_pipe_clk, cc_aux_clk, refclk, cc_aclkSlv, cc_aclkMstr

Yes

Stub if not used.

MSI

msi_ctrl_int

TT_APLIC_TLM2 (irqS[3] in vdksys)

Host RC MSI to APLIC.

Sideband

sys_int, device_type, ready_entr_l23, app_ltssm_en, link_up, sys_pre_det_state, apps_pm_xmt_turnoff, clk_req_in, app_req_entr_l1, app_req_exit_l1, app_flr_pf_done_x, app_ltr_*, msi_gen, and all other Default/sideband

Yes (vdksys stubs many)

No tile; stub optional RC sideband.

C. Summary

  • EP: Connect to PCIe Tile: EP BusMaster → tile pcie_controller_target (inbound); tile pcie_controller_initiator → EP AXI_Slave (outbound). Resets and cc_core_clk (and optionally cc_dbi_aclk) to tile sc_in; CII, FLR, hot reset, app_bus_num, app_dev_num, device_type, sys_int, dma_completion, ras_error to/from tile sc_in/sc_out as in Section 3.4. Connect to system: AXI_DBI to SharedMemoryMap; cc_dbi_aclk to SYSCLK; msi_ctrl_int to APLIC. Stub: ELBIMaster; optional/PHY clocks (cc_pipe_clk, etc.); power/L1/L2 and other optional sideband.

  • RC: Connect to link: PCIMem, PCIMem_Slave directly to EP. Connect to system: AXI_Slave, AXI_DBI, BusMaster to SharedMemoryMap; clocks to SYSCLK; msi_ctrl_int to APLIC. Stub: All optional sideband and PHY clocks as in vdksys.


4. Connectivity Architecture

4.1 Inbound Data Path (Host → Chip)

Use Case: Host CPU writes data to Quasar compute cores or Mimir memory. The host communicates with the PCIe Tile via a PCIe Controller. On the host side the controller is a Root Complex (RC); on the Keraunos chip side the controller is the Synopsys PCIe Controller IP (DesignWare), configured as an Endpoint (EP). So the link is Host (RC) ↔ Keraunos (EP); the PCIe Tile sits behind the endpoint and receives TLPs from it over the internal interface.

sequenceDiagram participant Host as Host CPU participant Ctrl as PCIe Controller participant Tile as PCIe Tile participant TLB as Inbound TLB participant Switch as PCIe-SMN-IO Switch participant NOC as NOC-N participant SMN as SMN participant Quasar as Quasar Host->>Ctrl: Memory Write Ctrl->>Tile: Memory Write TLP Tile->>TLB: Lookup Translation TLB-->>Tile: System Address Tile->>Switch: Forward Switch->>Switch: Route Decision alt NOC-bound Switch->>NOC: Forward NOC->>Quasar: D2D to Quasar Quasar-->>NOC: Response NOC-->>Switch: Response else SMN-bound Switch->>SMN: Forward SMN-->>Switch: Response end Switch-->>Tile: Completion Tile-->>Ctrl: Completion TLP Ctrl-->>Host: Completion

Key Steps:

  1. Host initiates PCIe Memory Write targeting Keraunos BAR (Base Address Register); the request is sent via the PCIe Controller over the PCIe link.

  2. PCIe Controller delivers the Memory Write TLP to the PCIe Tile; the tile receives the transaction via its pcie_inbound socket.

  3. Inbound TLB translates host address to system address space

  4. PCIe-SMN-IO Switch routes based on address:

    • 0x0000_0000_0000 - 0x0000_FFFF_FFFF: NOC-bound (via noc_n_initiator)

    • 0x1000_0000_0000 - 0x1FFF_FFFF_FFFF: SMN-bound (via smn_n_initiator)

  5. Transaction forwarded to NOC-N or SMN

  6. NOC-N routes via D2D links to destination Quasar/Mimir chiplet

  7. Response traverses back through the same path; PCIe Tile sends Completion TLP to the PCIe Controller, which delivers it to the Host.

Sideband signal flow (inbound use case):
During inbound (Host → Chip), the EP and system drive sideband inputs to the PCIe Tile so the tile can accept and process TLPs; the tile drives sideband outputs back to the EP. The flow is:

graph LR subgraph host_rc["Host / RC"] H[Host] end subgraph ep["EP - Synopsys Controller"] EP_IN[Sideband Out] EP_OUT[Sideband In] end subgraph tile["PCIe Tile"] SC_IN[sc_in] SC_OUT[sc_out] end subgraph sys["System SMC"] SYS[Reset / Isolate] end H -->|TLP| EP_IN EP_IN -->|clk, reset_n, CII, FLR, hot_reset, ras, dma_cpl, misc_int| SC_IN SYS -->|cold_reset_n, warm_reset_n, isolate_req, axi_clk| SC_IN SC_OUT -->|FLR out, hot_reset_req, config_update, ras, dma_cpl, controller_misc_int, noc_timeout, bus/dev, device_type, sys_int| EP_OUT EP_OUT -->|Optional to RC| H style SC_IN fill:#fce4ec style SC_OUT fill:#e8f5e9

Direction

Signals

Role in inbound use case

EP → Tile (sc_in)

pcie_core_clk, pcie_controller_reset_n, pcie_cii_*, pcie_flr_request, pcie_hot_reset, pcie_ras_error, pcie_dma_completion, pcie_misc_int

Clock and reset so tile is ready; CII for config info; EP may assert FLR/hot_reset/errors during or after inbound TLPs.

System → Tile (sc_in)

cold_reset_n, warm_reset_n, isolate_req, axi_clk

SoC reset and isolation; AXI clock for config/MMIO.

Tile → EP (sc_out)

function_level_reset, hot_reset_requested, config_update, ras_error, dma_completion, controller_misc_int, noc_timeout, pcie_app_bus_num, pcie_app_dev_num, pcie_device_type, pcie_sys_int

FLR/hot_reset handshake; config and error reporting; NOC timeout; SII bus/dev and interrupt to EP.

4.2 Outbound Data Path (Chip → Host)

Use Case: Quasar compute cores send results back to host DRAM or trigger MSI interrupts.

sequenceDiagram participant Quasar as Quasar participant NOC as NOC-N participant PCIe as PCIe Tile participant TLB as Outbound TLB participant Ctrl as PCIe Controller participant Host as Host Quasar->>NOC: Write NOC->>PCIe: Forward PCIe->>TLB: TLB Lookup TLB-->>PCIe: Host Address PCIe->>Ctrl: Forward Ctrl->>Host: Memory Write TLP Host-->>Ctrl: Completion Ctrl-->>PCIe: Response PCIe-->>NOC: Response NOC-->>Quasar: Completion

Key Steps:

  1. Quasar initiates write targeting PCIe address range (typically host DRAM)

  2. NOC-N routes to Keraunos PCIe Tile via noc_n_outbound socket

  3. Outbound TLB translates system address back to host physical address

  4. PCIe Controller (pcie_controller_initiator) generates PCIe Memory Write TLP

  5. Transaction sent over PCIe link to host

  6. Host DRAM responds with completion

  7. Response propagates back through PCIe Tile → NOC → Quasar

Sideband signal flow (outbound use case):
During outbound (Chip → Host), the same sideband links carry status and handshake: the EP may drive reset/FLR; the tile uses sideband outputs to signal completion and errors to the EP so the EP can complete TLPs toward the host.

graph LR subgraph quasar_noc["Quasar / NOC"] Q[Quasar] end subgraph tile["PCIe Tile"] SC_IN[sc_in] SC_OUT[sc_out] end subgraph ep["EP - Synopsys Controller"] EP_IN[Sideband In] EP_OUT[Sideband Out] end subgraph host["Host"] H[Host DRAM] end Q -->|TLP| tile EP_OUT -->|clk, reset_n, CII, FLR, hot_reset, ras, dma_cpl, misc_int| SC_IN SC_OUT -->|dma_completion, controller_misc_int, config_update, ras_error, noc_timeout, FLR out, hot_reset_req| EP_IN EP_IN -->|TLP to host| H H -->|Completion| EP_IN EP_IN -.->|Completion sideband if any| EP_OUT style SC_IN fill:#fce4ec style SC_OUT fill:#e8f5e9

Direction

Signals

Role in outbound use case

EP → Tile (sc_in)

Same as inbound

Clock and reset; CII; EP can assert FLR/hot_reset or error sideband during outbound.

Tile → EP (sc_out)

dma_completion, controller_misc_int, config_update, ras_error, noc_timeout, function_level_reset, hot_reset_requested

Tell EP when tile has completed work (e.g. outbound DMA) or hit errors; FLR/hot_reset handshake; NOC timeout so EP can report or retry.

4.3 Configuration Path (SMN → PCIe Tile Registers)

Use Case: SMC programs PCIe Tile TLBs, enables MSI relay, or reads error status.

sequenceDiagram participant SMC as SMC participant SMN as SMN participant PCIe as PCIe Config SMC->>SMN: SMN Write SMN->>PCIe: Forward PCIe->>PCIe: Update TLB or MSI PCIe-->>SMN: Response SMN-->>SMC: Complete

Addressable Registers (via SMN):

  • 0x1804_0000 - 0x1804_07FF: Inbound TLB configurations (8 entries)

  • 0x1804_0800 - 0x1804_0FFF: Outbound TLB configurations (8 entries)

  • 0x1800_0000 - 0x1800_0FFF: MSI Relay registers

  • 0x1802_0000 - 0x1802_0FFF: PCIe error status and control

4.4 MSI Interrupt Path (Chip → Host)

Use Case: Ethernet controller or Quasar triggers interrupt to host driver.

sequenceDiagram participant HSIO as HSIO Tile participant SMN as SMN participant MSI as MSI Relay participant PCIe as PCIe Controller participant Host as Host CPU HSIO->>SMN: Trigger Interrupt SMN->>MSI: Forward MSI->>MSI: Translate to MSI-X MSI->>PCIe: MSI-X TLP PCIe->>Host: MSI-X Write Host->>Host: ISR

5. Data Flow Paths

5.1 End-to-End Data Flow Example: Host DMA to Quasar

Scenario: Host writes 4KB of neural network weights to Quasar L1 memory.

graph LR subgraph host["Host System"] A[Host DMA] end subgraph pcie_tile["PCIe Tile"] B[PCIe Inbound] C[Inbound TLB] D[Switch] E[NOC Initiator] end subgraph noc["NOC"] F[QNP Mesh] G[D2D Tile] end subgraph quasar["Quasar"] H[NOC Router] I[L1 Memory] end A -->|PCIe Write| B B --> C C -->|Translate| D D --> E E --> F F -->|QNP| G G -->|BoW| H H --> I style B fill:#ffcccc style C fill:#ffe6cc style D fill:#d9f7d9 style E fill:#cce5ff style F fill:#cce5ff style G fill:#e6ccff style I fill:#ffe6cc

Address Translation:

  • Host Address: 0x8000_0000 (PCIe BAR + offset)

  • Inbound TLB Lookup: Maps application region 0 → NOC address

  • System Address: 0x0000_0000_4000_0000 (Quasar chiplet, NOC coordinates, L1 offset)

  • Physical Routing: QNP mesh routes to D2D tile 2 → Quasar chiplet ID 1 → Tensix core (4,5)

5.2 Multi-Hop Data Flow: Quasar → PCIe → Host → PCIe → Quasar

Scenario: Quasar chiplet 0 sends data to Quasar chiplet 1 in a different Grendel package via host DRAM (zero-copy).

graph TB subgraph pkg0["Package 0"] Q0[Quasar 0] K0[Keraunos PCIe 0] Q0 -->|1. NOC Write| K0 end H[Host DRAM] K0 -->|2. PCIe Write| H subgraph pkg1["Package 1"] K1[Keraunos PCIe 1] Q1[Quasar 1] K1 -->|4. NOC Write| Q1 end H -->|3. PCIe Read| K1 style Q0 fill:#ffe6cc style K0 fill:#ffcccc style H fill:#fff4e1 style K1 fill:#ffcccc style Q1 fill:#ffe6cc

6. Address Space Integration

6.1 System Address Map

The Keraunos-E100 local address map is a subset of the broader Grendel system address map:

Address Range

Target

Description

0x0000_0000_0000 - 0x0000_FFFF_FFFF

NOC-N

Quasar/Mimir chiplets via D2D

0x1000_0000_0000 - 0x1000_0FFF_FFFF

SMN (SEP)

Security Engine Processor

0x1001_0000_0000 - 0x1001_0FFF_FFFF

SMN (SMC)

System Management Controller

0x1800_0000_0000 - 0x1800_0FFF_FFFF

SMN (MSI)

MSI Relay in PCIe Tile

0x1802_0000_0000 - 0x1802_0FFF_FFFF

SMN (PCIe Err)

PCIe Tile error registers

0x1804_0000_0000 - 0x1804_0FFF_FFFF

SMN (TLB)

PCIe Tile TLB configurations

0x2000_0000_0000 - 0x2000_00FF_FFFF

HSIO

HSIO tile 0 (CCE, Ethernet, SRAM)

0x2001_0000_0000 - 0x2001_00FF_FFFF

HSIO

HSIO tile 1 (CCE, Ethernet, SRAM)

6.2 PCIe BAR (Base Address Register) Mapping

The PCIe Tile exposes multiple BARs to the host:

BAR

Size

Type

Purpose

BAR0

256MB

Memory, 64-bit

Main data path (DMA to/from Quasar)

BAR2

16MB

Memory, 64-bit

Configuration space (SMC mailboxes, TLB programming)

BAR4

64KB

Memory, 64-bit

MSI-X table

BAR0 Inbound TLB Mapping Example:

  • Host writes to BAR0 + 0x1000_0000 (256MB offset)

  • Inbound TLB Entry 1 (Application region 1):

    • Input Range: 0x1000_0000 - 0x1FFF_FFFF (256MB)

    • Output Base: 0x0000_0000_4000_0000 (NOC address for Quasar chiplet 1)

  • Translated Address: 0x0000_0000_4000_0000 (sent to NOC-N)

6.3 Address Translation Stages

graph LR A[Host Addr] B[PCIe TLP] C[Inbound TLB] D[System Addr] E[NOC Routing] F[D2D Translate] G[Quasar Addr] A --> B B --> C C --> D D --> E E --> F F --> G style C fill:#ffe6cc style D fill:#cce5ff style E fill:#e6ccff

7. System Use Cases

7.1 Use Case 1: Model Initialization

Objective: Load a 10GB large language model from host to distributed Quasar memory.

Flow:

  1. Host driver programs PCIe Tile Inbound TLBs (8 entries for 8 memory regions)

  2. Host DMA engine streams model weights via PCIe Memory Writes

  3. PCIe Tile translates addresses and routes to NOC-N

  4. NOC-N distributes data across multiple Quasar chiplets via D2D links

  5. Quasar chiplets store weights in local L1/DRAM

Performance:

  • PCIe Gen5 x16: ~64 GB/s theoretical, ~50 GB/s effective

  • Load time: 10GB / 50 GB/s = 200ms

7.2 Use Case 2: Inference Execution

Objective: Run inference on Quasar chiplets, stream results back to host.

Flow:

  1. Host sends inference request descriptor via PCIe write (small payload: 256 bytes)

  2. Quasar chiplets execute inference using cached model weights

  3. Quasar writes results to host DRAM via outbound TLB (PCIe Memory Write)

  4. Quasar triggers MSI-X interrupt via SMN → MSI Relay → PCIe

  5. Host driver processes results

Latency:

  • Request descriptor: ~1μs (PCIe TLP overhead)

  • Inference execution: Variable (model-dependent)

  • Result transfer (1MB): 1MB / 50 GB/s = 20μs

  • MSI interrupt latency: ~2μs

7.3 Use Case 3: Package-to-Package Communication

Objective: Enable Quasar chiplets in Package 0 to communicate with Package 1 over Ethernet.

Flow (Keraunos Ethernet-based):

  1. Quasar in Package 0 writes data to HSIO SRAM via NOC-N

  2. CCE in HSIO tile prepares Ethernet packet

  3. TT Ethernet Controller sends packet via 800G Ethernet to Package 1

  4. Package 1 Ethernet Controller receives packet, writes to local HSIO SRAM

  5. Local NOC-N forwards data to destination Quasar

Alternative Flow (PCIe-based, for same-host deployments):

  1. Quasar in Package 0 writes to host DRAM via PCIe Tile (outbound)

  2. Package 1 PCIe Tile reads from host DRAM (inbound)

  3. Forwarded to Package 1 Quasar via NOC-N

7.4 Use Case 4: System Management

Objective: SMC monitors PCIe link status and reconfigures TLBs dynamically.

Flow:

  1. SMC reads PCIe link status registers via SMN (0x1802_0xxx)

  2. Detects link degradation (Gen5 x16 → Gen5 x8)

  3. SMC reprograms TLB entries to reduce traffic load

  4. SMC triggers software notification via MSI-X

  5. Host driver adjusts DMA batch sizes


8. Final VDK Platform: Linux-Booting PCIe Tile Integration

8.1 Overview

The final validated VDK platform demonstrates a complete end-to-end PCIe data path with Linux running on the host. The platform:

  • Boots RISC-V Linux on the Host_Chiplet via OpenSBI (fw_payload.elf)

  • Enumerates the PCIe Endpoint using the Linux snps,dw-pcie driver

  • Transfers data from the host through the PCIe complex to memory attached to the PCIe Tile’s noc_n_initiator port

  • Runs a userspace application (pcie_xfer) for interactive read/write operations through the PCIe BAR

This section documents the final validated architecture as implemented in the reference workspace.

8.2 Dual-Chiplet VDK Topology

The platform consists of two chiplet groups connected via a direct PCIe link:

graph TB subgraph host["Host_Chiplet (Root Complex Side)"] HOST_CPU[TT_Rocket_LT RISC-V CPU<br/>Runs Linux via OpenSBI] HOST_DRAM[DRAM @ 0x80000000<br/>256 MB] HOST_UART[UART @ 0xC000A000] HOST_PLIC[PLIC @ 0xC4000000] HOST_CLINT[CLINT @ 0x02000000] HOST_RC[PCIE_RC<br/>Synopsys DWC PCIe 2.0<br/>Root Complex] HOST_SMM[SharedMemoryMap] HOST_RST[RST_GEN / CLK_GEN] HOST_CPU --> HOST_SMM HOST_SMM --> HOST_DRAM HOST_SMM --> HOST_UART HOST_SMM --> HOST_PLIC HOST_SMM --> HOST_CLINT HOST_SMM -->|DBI: 0x44000000| HOST_RC HOST_SMM -->|AXI: 0x70000000| HOST_RC HOST_RC --- HOST_RST end subgraph device["Keraunos_PCIE_Chiplet (Endpoint Side)"] DEV_CPU[TT_Rocket_LT RISC-V CPU<br/>Runs pcie_bringup firmware] DEV_EP[PCIe_EP<br/>Synopsys DWC PCIe 2.0<br/>Endpoint] DEV_TILE[PCIE_TILE<br/>Keraunos PCIe Tile] DEV_MEM[Target_Memory<br/>16 MB @ 0x0] DEV_SMM[SharedMemoryMap] DEV_RST[RST_GEN / CLK_GEN] DEV_CPU --> DEV_SMM DEV_EP -->|BusMaster| DEV_TILE DEV_TILE -->|noc_n_initiator| DEV_SMM DEV_TILE -->|smn_n_initiator| DEV_SMM DEV_SMM --> DEV_MEM DEV_EP --- DEV_RST DEV_TILE --- DEV_RST end HOST_RC <-->|PCIMem / PCIMem_Slave<br/>Direct PCIe Link| DEV_EP style HOST_CPU fill:#e3f2fd style HOST_RC fill:#ffcccc,stroke:#c00 style DEV_EP fill:#fff3e0 style DEV_TILE fill:#c8e6c9 style DEV_MEM fill:#ffe6cc style HOST_DRAM fill:#e1ffe1

Key architectural decisions in the final platform:

  1. Direct RC–EP link — RC’s PCIMem binds to EP’s PCIMem_Slave and vice versa

  2. Two independent RISC-V CPUs — Host runs Linux; Device runs bare-metal firmware

  3. Target_Memory on noc_n_initiator path — 16 MB memory at address 0x0 on the chiplet bus, reachable from the host through EP PCIE_TILE noc_n_initiator SharedMemoryMap Target_Memory

  4. MSI interrupt — RC’s msi_ctrl_int connected to Host SMC’s irqS[11] for PCIe MSI-to-host notification

8.3 Host Memory Map

The Host_Chiplet CPU sees the following address space:

Address

Size

Component

Purpose

0x02000000

64 KB

CLINT

Timer and software interrupts

0x44000000

4 MB

PCIE_RC DBI

PCIe RC configuration (DBI registers)

0x44300000

128 KB

PCIE_RC ATU

iATU outbound/inbound windows (via DBI CS2)

0x70000000

256 MB

PCIE_RC AXI_Slave

PCIe config + memory window

0x70000000

16 MB

— Config sub-window

Type 0/1 config TLPs via iATU

0x71000000

240 MB

— MEM sub-window

Memory TLPs to EP BARs

0x80000000

256 MB

DRAM

Host main memory (Linux runs here)

0xC000A000

256 B

UART

DW APB UART (115.2 MHz clock)

0xC4000000

2 MB

PLIC

Platform-Level Interrupt Controller

8.4 Device (Keraunos_PCIE_Chiplet) Memory Map

The Keraunos_PCIE_Chiplet SharedMemoryMap provides the following decode for all initiators (EP BusMaster, PCIE_TILE noc_n/smn_n, SMC_Configure CPU):

Address

Size

Component

Purpose

0x00000000

16 MB

Target_Memory

Main data memory (host-accessible via PCIe BAR)

0x18000000

8 MB

PCIE_TILE smn_n_target

SMN-side target window into the tile

0x44000000

4 MB

PCIe_EP AXI_DBI

EP DBI configuration registers

0x44400000

16 MB

PCIE_TILE noc_n_target

NoC-side target window into the tile

8.5 End-to-End Data Path

The critical data path for host-to-device memory transfers traverses:

graph LR subgraph host["Host (Linux)"] APP[pcie_xfer app] CPU[RISC-V CPU] APP --> CPU end subgraph rc["Root Complex"] AXI_S[AXI_Slave<br/>0x70000000] iATU[iATU<br/>Address Translation] PCIMEM[PCIMem] AXI_S --> iATU iATU --> PCIMEM end subgraph link["PCIe Link"] TLP[Memory TLP] end subgraph ep["Endpoint"] PCIMEM_S[PCIMem_Slave] BUSM[BusMaster] PCIMEM_S --> BUSM end subgraph tile["PCIE_TILE"] PCT[pcie_controller_target] NOC_I[noc_n_initiator] PCT --> NOC_I end subgraph mem["Target Memory"] MEM[16 MB @ 0x0] end CPU -->|MMIO Write| AXI_S PCIMEM -->|TLP| TLP TLP --> PCIMEM_S BUSM --> PCT NOC_I --> MEM style APP fill:#e3f2fd style iATU fill:#ffe6cc style TLP fill:#e3f2fd style PCT fill:#c8e6c9 style NOC_I fill:#c8e6c9 style MEM fill:#ffe6cc

Step-by-step flow:

  1. Host application (pcie_xfer) performs MMIO write to BAR0 (mapped via sysfs resource0 or /dev/mem)

  2. CPU issues a store to the PCIe MEM window (e.g. 0x71000000 + offset)

  3. RC AXI_Slave receives the transaction at 0x70000000

  4. iATU translates the address to a Memory Write TLP targeting the EP

  5. RC PCIMem sends the TLP over the virtual PCIe link

  6. EP PCIMem_Slave receives the TLP

  7. EP BusMaster forwards the decoded transaction to PCIE_TILE.pcie_controller_target

  8. PCIE_TILE routes the transaction through internal fabric to noc_n_initiator

  9. noc_n_initiator accesses the chiplet SharedMemoryMap, which decodes to Target_Memory at address 0x0

  10. Target_Memory completes the write; the response propagates back through the same path

8.6 Linux Boot Flow

sequenceDiagram participant VP as Virtualizer participant HOST as Host_Chiplet CPU participant SBI as OpenSBI (M-mode) participant LNX as Linux Kernel participant DRV as dw-pcie Driver participant EP as PCIe Endpoint VP->>HOST: Load fw_payload.elf (entry 0x80000000) VP->>EP: Load pcie_bringup.elf (device CPU) HOST->>SBI: CPU starts at 0x80000000 SBI->>SBI: Initialize M-mode, set up S-mode SBI->>LNX: Jump to kernel (S-mode) LNX->>LNX: Parse device tree (keraunos_host.dtb) LNX->>LNX: Initialize CLINT, PLIC, UART LNX->>DRV: Probe snps,dw-pcie (DBI @ 0x44000000) DRV->>DRV: Program iATU outbound windows DRV->>EP: Type 0 Config Read (bus 1, dev 0) EP-->>DRV: Return Vendor/Device ID DRV->>DRV: Enumerate EP, assign BARs DRV->>LNX: PCIe subsystem ready LNX->>LNX: Boot initramfs (rdinit=/init) LNX->>LNX: Shell prompt available

Boot details:

  • OpenSBI (fw_payload.elf) is loaded with ELF segment addresses (entry at 0x80000000). The VP must not override the load address to 0x0.

  • No U-Boot — OpenSBI directly boots the embedded Linux kernel.

  • Device tree (keraunos_host.dts) specifies the snps,dw-pcie compatible node with DBI at 0x44000000 and config window at 0x70000000.

  • Boot arguments: console=hvc0 earlycon=sbi rdinit=/init pci=realloc pci=assign-busses pci=noaer pcie_aspm=off

  • The pci=realloc and pci=assign-busses flags are critical for the VP where BIOS/firmware has not pre-assigned PCI resources.

  • Host CPU ISA: rv64imac (no FPU) — kernel built with CONFIG_FPU=n, userspace uses musl rv64imac toolchain.

8.7 PCIe Enumeration

The Linux dw-pcie driver enumerates the endpoint:

Property

Value

Bus topology

Bus 0: RC, Bus 1: EP (direct link)

Config access

iATU-programmed window at 0x70000000 (16 MB), no ECAM

MEM window

0x71000000–0x7FFFFFFF (240 MB prefetchable)

EP BAR0

16 MB memory BAR (BAR0_MASK = 0xFFFFFF)

MSI

RC msi_ctrl_int → Host PLIC IRQ 32; INTx → PLIC IRQ 33

Lanes

4 (VP config; physical Keraunos uses x16)

The device tree PCIe node:

pcie@44000000 {
    compatible = "snps,dw-pcie";
    reg = <0x0 0x44000000 0x0 0x400000>,    /* DBI */
          <0x0 0x70000000 0x0 0x01000000>;   /* config */
    bus-range = <0x0 0x1>;
    ranges = <0x02000000 0x0 0x71000000
              0x0 0x71000000 0x0 0x0F000000>; /* 240 MB MEM */
    interrupts = <32>, <33>;                  /* MSI, INTx */
};

8.8 The pcie_xfer Application

pcie_xfer is a Linux userspace utility that demonstrates the complete host-to-tile data path. It maps EP BAR0 via sysfs and performs MMIO read/write operations.

Capabilities:

Command

Description

write <offset> <value>

32-bit MMIO write at BAR0 + offset

read <offset>

32-bit MMIO read at BAR0 + offset

fill <offset> <count> <val>

Fill count dwords with a value

dump <offset> <count>

Hex dump count dwords

pattern <offset> <count>

Write incrementing pattern and verify readback

burst <offset> <count>

Timed burst write for throughput measurement

verify <offset> <count> <val>

Verify count dwords match expected value

send <offset> <file>

Write binary file contents to BAR0

Data path confirmed by pcie_xfer:

Host CPU → RC AXI_Slave → iATU → PCIe TLP
         → EP PCIMem_Slave → EP BusMaster
         → PCIE_TILE.pcie_controller_target
         → noc_n_initiator → [Target_Memory]

Usage example:

# Auto-detect EP and enter interactive mode
pcie_xfer

# Write 0xDEADBEEF at BAR0 offset 0x100
pcie_xfer -c "write 0x100 0xDEADBEEF"

# Read back and verify
pcie_xfer -c "read 0x100"

# Write incrementing pattern (256 dwords) and verify
pcie_xfer -c "pattern 0x0 0x100"

8.9 Device-Side Firmware (pcie_bringup)

The Keraunos_PCIE_Chiplet runs pcie_bringup.elf on its TT_Rocket_LT CPU (SMC_Configure). This bare-metal firmware:

  1. Initializes the PCIe Endpoint controller via DBI registers at 0x44000000

  2. Programs Inbound TLB entries for BAR address translation

  3. Sets BAR sizes (BAR0_MASK = 0xFFFFFF for 16 MB)

  4. Asserts system_ready to signal the EP is ready for host enumeration

  5. Waits for PCIe link-up before the host attempts configuration reads

The device CPU and host CPU boot independently; the VP configuration ensures both start with active_at_start = true on their respective RST_GEN.

8.10 VP Configuration

Two VP configurations are available:

Config

Host Image

Device Image

Quantum

Use Case

default/default

riscv64-linux/output/fw_payload.elf

pcie_bringup/pcie_bringup.elf

6000 ps

Full Linux with standard kernel

mini_riscv64_linux/mini_riscv64_linux

mini-riscv64-linux/output/fw_payload.elf

pcie_bringup/pcie_bringup.elf

1000 ps

Fast-boot mini Linux for rapid iteration

Key VPCFG overrides (both configs):

  • PCIE_RC / PCIe_EP clocks: cc_pipe_clk at 250 MHz, all AXI/DBI/aux clocks at 100 MHz

  • SHARED_DBI_ENABLED: false (separate DBI window)

  • UART_CLK: 115.2 MHz

  • Chiplet SharedMemoryMap decode: 0x44000000:0x00400000:s;0x18000000:0x00800000;0x44400000:0x01000000;0x0:0x1000000

8.11 Sideband Connections in the Final Platform

The final VDK connects the following sideband signals between the PCIe models and the PCIE_TILE:

EP → PCIE_TILE:

Signal

Source

Destination

edma_int / edma_int_*

EP DMA interrupts

pcie_misc_int on PCIE_TILE

pcie_parc_int

EP parity/RAS error

pcie_ras_error on PCIE_TILE

lbc_cii_hv, lbc_cii_hdr_type, lbc_cii_hdr_addr

EP CII signals

pcie_cii_* on PCIE_TILE

cfg_flr_pf_active_x[0]

EP FLR request

pcie_flr_request on PCIE_TILE

System → PCIE_TILE:

Signal

Source

Destination

Clock

Chiplet CLK_GEN

PCIE_TILE clock input

Reset

Chiplet RST_GEN

PCIE_TILE reset input

Host MSI path: PCIE_RC.msi_ctrl_intHost_Chiplet.SMC.irqS[11]


9. Appendices

9.1 Acronyms and Abbreviations

Term

Definition

AXI

Advanced eXtensible Interface (ARM AMBA standard)

BAR

Base Address Register (PCIe configuration space)

BoW

Bridge-of-Wire (die-to-die interconnect technology)

CCE

Keraunos Compute Engine (DMA and packet processing)

D2D

Die-to-Die (chiplet interconnect interface)

DMA

Direct Memory Access

HSIO

High-Speed Input/Output (Ethernet subsystem in Keraunos)

ISR

Interrupt Service Routine

MAC

Media Access Control (Ethernet layer)

MSI

Message Signaled Interrupt (PCIe interrupt mechanism)

NOC

Network-on-Chip

PCS

Physical Coding Sublayer (Ethernet layer)

QNP

Quasar NOC Protocol (internal NOC protocol)

RISC-V

Reduced Instruction Set Computer - Version 5 (open ISA)

SCML2

SystemC Modeling Library 2 (Synopsys verification library)

SEP

Security Engine Processor

SMC

System Management Controller

SMN

System Management Network (control plane NOC)

SMU

System Management Unit (clock/power/reset control)

SRAM

Static Random-Access Memory

TLB

Translation Lookaside Buffer (address translation cache)

TLP

Transaction Layer Packet (PCIe protocol)

TLM

Transaction-Level Modeling (SystemC abstraction)

9.2 Reference Documents

  1. Keraunos-E100 Architecture Specification (keraunos-e100-for-review.pdf, v0.9.14)

  2. Keraunos PCIe Tile High-Level Design (Keraunos_PCIe_Tile_HLD.md, v2.0)

  3. Keraunos PCIe Tile SystemC Design Document (Keraunos_PCIE_Tile_SystemC_Design_Document.md, v2.1)

  4. Keraunos PCIe Tile Test Plan (Keraunos_PCIE_Tile_Testplan.md, v2.1)

  5. PCIe Base Specification 5.0 (PCI-SIG, 2019)

  6. AMBA AXI and ACE Protocol Specification (ARM IHI 0022E)

  7. SystemC TLM-2.0 Language Reference Manual (IEEE 1666-2011)

9.3 Revision History

Version

Date

Author

Description

1.0

2026-02-10

System Architecture Team

Initial release

2.0

2026-03-26

System Architecture Team

Added Section 8: Final VDK Platform with Linux boot, PCIe enumeration, pcie_xfer application, dual-chiplet topology, and end-to-end data path through noc_n_initiator to Target_Memory


Document Control:

  • Classification: Internal Use Only

  • Distribution: Keraunos Project Team, Grendel Architecture Team

  • Review Cycle: Quarterly or upon major architecture changes


End of Document