jem fdr: design and implementation
DESCRIPTION
JEM FDR: Design and Implementation. JEP system requirements Architecture Modularity Data Formats Data Flow Challenges : Latency Connectivity, high-speed data paths JEM revisions JEM 1.1 - implementation details Daughter modules Energy sum algorithms FPGA resource use Performance - PowerPoint PPT PresentationTRANSCRIPT
5th April, 2005 JEM FDR 1
JEM FDR: Design and ImplementationJEP system requirementsArchitecture
ModularityData FormatsData Flow
Challenges : Latency Connectivity, high-speed data paths
JEM revisionsJEM 1.1 - implementation detailsDaughter modulesEnergy sum algorithmsFPGA resource usePerformanceProduction tests
5th April, 2005 JEM FDR 2
JEP system requirementsProcess –4.9 < η < 4.9 region
~32×32×2 = 2k trigger towers of Δη×Δφ=.2×.29 bit input data (0-511 GeV)32x32 10-bit “jet elements” after em/had pre-sum
2 multiplications per jet element: ET (EX,EY)3 Adder trees spanning the JEP (JEMs, CMMs)Sliding window jet algorithm, variable window size within 3×3 environmentOutput data to CTP
Thresholded ET , ET
Jet hit countOutput data to RODs
Intermediate results, mainly captured from module boundaries RoI data for RoIB
5th April, 2005 JEM FDR 3
JEP system design considerationsModerate data processing power
Tough latency requirementsLarge amount of signals to be processed partition into parallel operating modules Algorithm requiring environment to each jet element high bandwidth inter-module lanesData concentrator functionality, many few Severely pin bound design, dominated by input connectivity
ModulesProcessors (FPGAs)
Benefit from similarities to cluster processorCommon infrastructure (Backplane)Common serial link technology
5th April, 2005 JEM FDR 4
System modularity
Two crates, each processing two quadrants in φ 32 × 8 bins (jet elements) per quad
η range split over 8 JEMs 4 × 8 jet elements per JEM
Four input processors per JEMSingle jet processor per JEMSingle sum processor per JEM
- +
8 JEMs per quadrant
quad 0 | quad 2
quad 1 | quad 3
1 | 0
2 | 3
- + - +
5th April, 2005 JEM FDR 5
Replication of environment elements - system and crate level -
JEM has 32 core algorithm cells
4 × 8 jet elements Directly mapped : 4 PPMs (e,h) 1 JEM
JEM operates on a total of 77 jet elements including ‘environment’ : 7 × 11
Replication in φ via multiple copies of PPM output data
Replication in η via back-plane fan-out
PPM
em
Backplane fan-out
JEM
η
φ
5th April, 2005 JEM FDR 6
JEM data formats – real-time dataJEM Inputs from PPM:
Physical layer : LVDS, 10 bits, 12-bit encoded w. start/stop bitD0 odd parity bitD(9:1) 9 bit data, D1 = LSB= 1 GeV
Jet elements to jet processor:No parity bitD(9:0) 10 bit data, D0 = LSB= 1 GeV10 data bits muxed to 5 lines, least significant first
Energy sums to sum processor:No parity bitET(11:0) 12 bit data, D0 = LSB= 1 GeVEX(13:0) 14 bit data, D0 = LSB= .25 GeVEY(13:0) 14 bit data, D0 = LSB= .25 GeV
JEM output to CMM:J(23:0) 8 x 3 bit saturating jet hits sent on bottom portJ24 odd parity bitS(23:0) 3 x 8 bit quad-linear encoded energy sums on top port
6 bit energy2 bit rangeResolution 1GEV, 4 GeV, 16 GeV, 64 GeV
S24 odd parity bit
5th April, 2005 JEM FDR 7
JEM data formats - readoutPhysical layer : 16bits, 20-bit encoded (CIMT, alternating flag bit, fill-frames 1A/1B, HDMP 1022 format)Event separator : Minimum of 1 fill-frame sent after each event worth of dataAll data streams odd parity protected (serial parity)
DAQ readout : 67-long stream per L1A / slice being read outInput data on D(14:0) :11 bit per channel, nine bit data, 1 bit parity error, 1 bit link error12 bit Bcnum & 25 bit sum & 25 bit jet hits on D15
RoI readout : 45-long stream per L1AD(1:0) : total of 8ROIs
2 bits location & saturation flag & 8 bits threshold passedD2 : 12 bits BcnumD(4:3) : used on FCAL JEMs only (forward jets)D(15:5) : always zero
5th April, 2005 JEM FDR 8
JEM data flow
LVDS deserialiser
Input processor
Jet processor+ readout controller
To CMM
400 Mbit/s serial data(480 Mbit/s with protocol)
40 MHz parallel
80 Mb/s
40 Mb/s parallel
Sum processor+ readout controller
Link PHY To CMM Link PHY
640 Mbit/s serial data(800 Mbit/s with protocol)
Not synchronous to bunch clock
Multiple protocols and data speeds and signaling levels used throughout boardMultiplexing up and down takes considerable fraction of latency budgetRe-synchronisation of data generally required on each chip and board boundary
FiFo buffersPhase adjustment w. firmware-based detectionDelay scans
40Mb/s
40 Mb/s
5th April, 2005 JEM FDR 9
Challenges : latency & connectivity
Latency budget for energy sum processor:18.5 ticks (TDR)Input cables : ~2 ticksCMM : ~ 5 ticksTransmission to CTP <2 ticks
~ 9.5 ticks available on JEM from cable connector to backplane outputs to CMM
Module dimensions imposed by use of common backplaneLarge module : 9U*40cmFull height of backplane used for data transmission due to high signal count long high-speed tracks unavoidable need to use terminated lines throughout need to properly adjust timingHigh input count : 88 differential cables
5th April, 2005 JEM FDR 10
Connectivity : high-density input cabling24 4-pair cable assemblies arranged in 6 blocks of 4 (2 φ bins × em, had)Same coordinate system now on cables and crate: φ upwards, η left to right (as seen from front)V cable rotated Different cablingfor FCAL JEMs re-map FCAL channels in jet FPGA firmware
em
had
AMP cable pinout (as seen from front of crate)
+ – + – + – + – a b c d e
1 2
1ab 1de 2ab 2de
1ab 2ab
V
W
Z
A
H
B
1
4
E
H
PPM 9 8 7
4
24
12
8
16
20
Cable connector
FCAL
5th April, 2005 JEM FDR 11
Connectivity : details of differential data paths
Differential 100Ω termination at sink
400 (480) Mbit/s input dataUse de-serialisers compatible to DS92LV1021 (LVDS signal level, not DC-balanced)88 signals per JEM arriving on shielded parallel pairsRun via long cables (<15m) and short tracks (few cm)Require pre-compensation on transmitting end
640 (800) Mbit/s readout dataPECL level electro-optical translatorHDMP1022 protocol, 16-bit modeUse compatible low-power PHY
5th April, 2005 JEM FDR 12
Connectivity : details of single ended data pathsCMOS signalspoint-to-point60Ω DCI source termination throughout on all FPGAs40Mb/s (25ns)
at 1.5V, no phase controlEnergy sum path into sum processor : 40 lines per input processorGeneral control paths
At 2.5V : CMM merger signals via backplane (phase adjustment on receiving end)
80Mb/s (12.5ns) at 1.5V : jet elements7x11x5bit =385 lines into jet processor2x3x11x5bit=330 lines on backplane from/to adjacent modulesGlobal phase adjustment via TTCrxAll signals latched into jet processor on same clock edge
5th April, 2005 JEM FDR 13
JEM history
JEM0.0 built from Dec. 2000LVDS de-serialiser DS92LV122411 input processors covering one phi bin each, Spartan2Main processor performing jet and energy algorithms, Virtex-EControl FPGA, ROC, HDMP1022 PHY, coaxial outputComplete failure due to assembly company
JEM 0.x built from Dec. 2003Minor design correction wrt to JEM0.0New manufacturer (PCB / assembly )Fully functional prototype except CAN slow control and FPGA flash configurationTTC interface not to specs due to lack of final TTCrx chipSuccessfully tested all available functionality
5th April, 2005 JEM FDR 14
JEM 011 input processors
Main
88 x DS92LV1224
ROC
VME-Interface
2 x HDMP1022
Backplane Conn.
TTCrxCAN
5th April, 2005 JEM FDR 15
JEM history (2)
JEM1.0 built in 2003All processors Virtex-2Input processors on daughter modules (R,S,T,U)LVDS de-serialiser SCAN921260 (6-channel)4 input processors covering three phi bins each1 Jet processor on main board1 Sum processor on main board1 Board control CPLD (CC)Readout links (PHY & opto) on daughter module (RM)Flash configurator : system ACESlow control / CAN : Fujitsu microcontrollerSuccessfully tested algorithms and all interfaces
Some tuning required on SystemACE clockCAN not to new specs (L1Calo common design)
5th April, 2005 JEM FDR 16
History: JEM 1.0
power
Jet
Sum
R
S
T
U
VMECCRM
ACE
CAN
Flash
TTC
JEM1.0 successfully tested AlgorithmsAll interfaces
LVDS inFIO inter-module linksMerger outOptical readoutVMECAN slow control
Mainz, RAL slice test, CERN test beam
5th April, 2005 JEM FDR 17
JEM 1.1JEM1.1 in production nowIdentical to JEM 1.0Additional daughter module: Control Module (CM)
CANVME controlFan-out of configuration lines
Expected back from assembly soooon
88 pair
V M E
each 165 pins FIO 60 bit @ 80Mb/s
TTCDec
System ACE
3 x 40 bit @ 40 Mb/s
DES
DES
DES
DES Input 2 B 1 A 0 V
60
60
40
Input 5 E 4 D 3 C
Input 8 H 7 G 6 F
Input --
10 X 9 W
DAQ/VME
To JMM
TX
Jet
R
S
T
U
Sum
DAQ To SMM
ROI
Opto
clock mirror
TX Opto
CAN
CM
RM
5th April, 2005 JEM FDR 18
JEM details –main board9U*40cm*2mm, bracing bars, ESD strips, shielded b’plane connector4 signal layers incl. top, bottom, 2*Vcc, 4*GND total 10 layersMicro vias on top, bottom, buried viasAll tracks controlled impedance : controlled / measured by manufacturer
Single ended 60Ω Differential 100Ω Point-to-point links onlyAll hand-routed
60Ω DCI source termination on processors (CMOS levels)Power distribution
All circuitry supplied by local step-down regulators, fused 10A (estimated maximum consumption < 5A on any supply, 50W tot.)10A capacity, separate 1.5V regulator for daughter modulesDefined ramp-up time (Virtex2 requirement)staged bypass capacitors, low ESRVME buffers scannable 3.3V (DTACK: open drain 3*24mA), short stubs on signal lines, 20-75 mmVccaux for FPGAs : dedicated quiet 3.3VMerger signals (directly driven by processors) on 2.5V banksFPGA core and inter-processor and inter-module links 1.5V
5th April, 2005 JEM FDR 19
JEM details –main board (2)Timing
TTC signals terminated and buffered (LVPECL, DC) near backplaneTTCdec module with PLL and crystal clock automatic backup DESKEW1 bunch clock used as a general purpose clockLow skew buffers (within TTCdec PLL loop) with series terminatorsDESKEW2 clock used for phase-controlled sampling 80Mb/s jet element data (local & FIO) on jet processor only
VMESynchronised to bunch clockSum processor acts as VME controllerBasic pre-configure VME access through CM
Readout located on RM (ROCs on sum and jet processor)DCS/CAN located on CM (except PHY - near backplane)Configuration via SystemACE / CF
P2P links to keep ringing at bayMultiple configurations, slot dependent choice
5th April, 2005 JEM FDR 20
JEM details –main board (3)
JTAG available on most active components. Separate chains
FPGAs (through SystemACE)Non-programmable devices on input daughtersTTCdec and Readout ModuleBuffersControl Module
JTAG used for Connectivity tests at manufacturer & MZCPLD configurationFPGA configuration (ACE)
5th April, 2005 JEM FDR 21
Input modules24 LVDS data channels per module12 layer PCB with micro viasImpedance controlled tracks
60 Ω single ended100 Ω differential
LVDS signals entering via 100Ω differential connector on short tracks (<1cm)Differential termination close to de-serialiser4 × SCAN921260 6-channel de-serialiser
PLL and analogue supply voltage only (3.3V) supplied from backplaneDigital supply from step-down regulator on main board Reference clock supplied via FPGA
XC2V1500 input processor1.5V CMOS 60Ω DCI signals to sum and jet processorSMBus device for Vcc and temperature monitoring (new)
5th April, 2005 JEM FDR 22
Readout Module RM2 channels, 640 Mb/s 16bit 20 bit CIMT coded, fill-frame FF1,
alternating flag bit, as defined in HDMP1022 specs
2xPHY, 2xSFP opto transceiver, so far 2-layer boardsHigh-speed tracks <1cmPHYs tested:
HDMP1022 serialiser 2.4W/chip (reference, tested in 16-bit and 20-bit mode) HDMP1032A serialiser 660mW/chip, €27.86 @ 80pc (16-bit)TLK1201A serdes 250mW/chip, < €5.00 @ 80pc, uncoded, requires data formatter firmware in ROC (16-bit, 20-bit)
Successfully run off bunch clockConverted to Xtal clock due to unknown jitter situation on ATLAS TTC clockProblems with Xtal clock distribution to ROI PHY (RAL, MZ)RM seems to work with clock linked from DAQ PHY to ROI PHY
Want a local crystal oscillator on RM Need new iteration of RM (HDMP1032A, TLK1201A)
5th April, 2005 JEM FDR 23
Control Module CM
Combines CAN/DCS, VME pre-configure access and JTAG fanoutCAN
Controller to L1Calo specs now (common design for all processors, see CMM/CPMLink to main board via SMBus only (Vcc, temperatures)
VME CPLD (pinout error corrected) generating DTACK for all accesses within module sub-address range to avoid bus timeout
Providing basic access for FPGA configuration via VMEconfiguration resetACE configuration selection / slot dependentACE configuration selection via VME
Buffers for SystemACE-generated JTAG signals to FPGAsTTCdec parallel initialisation (ID from geographical address)