jem fdr: design and implementation

37
5th April, 2005 JEM FDR 1 JEM FDR: Design and Implementation JEP system requirements Architecture Modularity Data Formats Data Flow Challenges : Latency Connectivity, high-speed data paths JEM revisions JEM 1.1 - implementation details Daughter modules Energy sum algorithms FPGA resource use Performance Production tests

Upload: keena

Post on 28-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

JEM FDR: Design and Implementation. JEP system requirements Architecture Modularity Data Formats Data Flow Challenges : Latency Connectivity, high-speed data paths JEM revisions JEM 1.1 - implementation details Daughter modules Energy sum algorithms FPGA resource use Performance - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 1

JEM FDR: Design and ImplementationJEP system requirementsArchitecture

ModularityData FormatsData Flow

Challenges : Latency Connectivity, high-speed data paths

JEM revisionsJEM 1.1 - implementation detailsDaughter modulesEnergy sum algorithmsFPGA resource usePerformanceProduction tests

Page 2: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 2

JEP system requirements

Process –4.9 < η < 4.9 region~32×32×2 = 2k trigger towers of Δη×Δφ=.2×.29 bit input data (0-511 GeV)32x32 10-bit “jet elements” after em/had pre-sum

2 multiplications per jet element: ET (EX,EY)

3 Adder trees spanning the JEP (JEMs, CMMs)Sliding window jet algorithm, variable window size within 3×3 environmentOutput data to CTP

Thresholded ET , ET

Jet hit countOutput data to RODs

Intermediate results, mainly captured from module boundaries RoI data for RoIB

Page 3: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 3

JEP system design considerations

Moderate data processing power

Tough latency requirementsLarge amount of signals to be processed partition into parallel operating modules Algorithm requiring environment to each jet element high bandwidth inter-module lanesData concentrator functionality, many few Severely pin bound design, dominated by input connectivity

ModulesProcessors (FPGAs)

Benefit from similarities to cluster processorCommon infrastructure (Backplane)Common serial link technology

Page 4: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 4

System modularity

Two crates, each processing two quadrants in φ 32 × 8 bins (jet elements) per quad

η range split over 8 JEMs 4 × 8 jet elements per JEM

Four input processors per JEMSingle jet processor per JEMSingle sum processor per JEM

- +

8 JEMs per quadrant

quad 0 | quad 2

quad 1 | quad 3

1 | 0

2 | 3

- + - +

Page 5: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 5

Replication of environment elements - system and crate level -

JEM has 32 core algorithm cells

4 × 8 jet elements Directly mapped : 4 PPMs (e,h) 1 JEM

JEM operates on a total of 77 jet elements including ‘environment’ : 7 × 11

Replication in φ via multiple copies of PPM output data

Replication in η via back-plane fan-out

PPM

em

Backplane fan-out

JEM

η

φ

Page 6: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 6

JEM data formats – real-time dataJEM Inputs from PPM:

Physical layer : LVDS, 10 bits, 12-bit encoded w. start/stop bitD0 odd parity bitD(9:1) 9 bit data, D1 = LSB= 1 GeV

Jet elements to jet processor:No parity bitD(9:0) 10 bit data, D0 = LSB= 1 GeV10 data bits muxed to 5 lines, least significant first

Energy sums to sum processor:No parity bitET(11:0) 12 bit data, D0 = LSB= 1 GeVEX(13:0) 14 bit data, D0 = LSB= .25 GeVEY(13:0) 14 bit data, D0 = LSB= .25 GeV

JEM output to CMM:J(23:0) 8 x 3 bit saturating jet hits sent on bottom portJ24 odd parity bitS(23:0) 3 x 8 bit quad-linear encoded energy sums on top port

6 bit energy2 bit rangeResolution 1GEV, 4 GeV, 16 GeV, 64 GeV

S24 odd parity bit

Page 7: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 7

JEM data formats - readout

Physical layer : 16bits, 20-bit encoded (CIMT, alternating flag bit, fill-frames 1A/1B, HDMP 1022 format)Event separator : Minimum of 1 fill-frame sent after each event worth of dataAll data streams odd parity protected (serial parity)

DAQ readout : 67-long stream per L1A / slice being read outInput data on D(14:0) :11 bit per channel, nine bit data, 1 bit parity error, 1 bit link error12 bit Bcnum & 25 bit sum & 25 bit jet hits on D15

RoI readout : 45-long stream per L1AD(1:0) : total of 8ROIs

2 bits location & saturation flag & 8 bits threshold passedD2 : 12 bits BcnumD(4:3) : used on FCAL JEMs only (forward jets)D(15:5) : always zero

Page 8: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 8

JEM data flow

LVDS deserialiser

Input processor

Jet processor+ readout controller

To CMM

400 Mbit/s serial data(480 Mbit/s with protocol)

40 MHz parallel

80 Mb/s

40 Mb/s parallel

Sum processor+ readout controller

Link PHY To CMM Link PHY

640 Mbit/s serial data(800 Mbit/s with protocol)

Not synchronous to bunch clock

Multiple protocols and data speeds and signaling levels used throughout boardMultiplexing up and down takes considerable fraction of latency budgetRe-synchronisation of data generally required on each chip and board boundary

FiFo buffersPhase adjustment w. firmware-based detectionDelay scans

40Mb/s

40 Mb/s

Page 9: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 9

Challenges : latency & connectivity

Latency budget for energy sum processor:18.5 ticks (TDR)Input cables : ~2 ticksCMM : ~ 5 ticksTransmission to CTP <2 ticks

~ 9.5 ticks available on JEM from cable connector to backplane outputs to CMM

Module dimensions imposed by use of common backplaneLarge module : 9U*40cmFull height of backplane used for data transmission due to high signal count long high-speed tracks unavoidable need to use terminated lines throughout need to properly adjust timingHigh input count : 88 differential cables

Page 10: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 10

Connectivity : high-density input cabling

24 4-pair cable assemblies arranged in 6 blocks of 4 (2 φ bins × em, had)Same coordinate system now on cables and crate: φ upwards, η left to right (as seen from front)V cable rotated Different cablingfor FCAL JEMs re-map FCAL channels in jet FPGA firmware

em

had

AMP cable pinout (as seen from front of crate)

+ – + – + – + –

a b c d e

1 2

1ab 1de 2ab 2de

1ab 2ab

V

W

Z

A

H

B

1

4

E

H

PPM 9 8 7

4

24

12

8

16

20

Cable connector

FCAL

Page 11: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 11

Connectivity : details of differential data paths

Differential 100Ω termination at sink

400 (480) Mbit/s input dataUse de-serialisers compatible to DS92LV1021 (LVDS signal level, not DC-balanced)88 signals per JEM arriving on shielded parallel pairsRun via long cables (<15m) and short tracks (few cm)Require pre-compensation on transmitting end

640 (800) Mbit/s readout dataPECL level electro-optical translatorHDMP1022 protocol, 16-bit modeUse compatible low-power PHY

Page 12: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 12

Connectivity : details of single ended data paths

CMOS signalspoint-to-point60Ω DCI source termination throughout on all FPGAs40Mb/s (25ns)

at 1.5V, no phase controlEnergy sum path into sum processor : 40 lines per input processorGeneral control paths

At 2.5V : CMM merger signals via backplane (phase adjustment on receiving end)

80Mb/s (12.5ns) at 1.5V : jet elements7x11x5bit =385 lines into jet processor2x3x11x5bit=330 lines on backplane from/to adjacent modulesGlobal phase adjustment via TTCrxAll signals latched into jet processor on same clock edge

Page 13: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 13

JEM history

JEM0.0 built from Dec. 2000LVDS de-serialiser DS92LV122411 input processors covering one phi bin each, Spartan2Main processor performing jet and energy algorithms, Virtex-EControl FPGA, ROC, HDMP1022 PHY, coaxial outputComplete failure due to assembly company

JEM 0.x built from Dec. 2003Minor design correction wrt to JEM0.0New manufacturer (PCB / assembly )Fully functional prototype except CAN slow control and FPGA flash configurationTTC interface not to specs due to lack of final TTCrx chipSuccessfully tested all available functionality

Page 14: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 14

JEM 011 input processors

Main

88 x DS92LV1224

ROC

VME-Interface

2 x HDMP1022

Backp

lan

e C

on

n.

TTCrxCAN

Page 15: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 15

JEM history (2)

JEM1.0 built in 2003All processors Virtex-2Input processors on daughter modules (R,S,T,U)LVDS de-serialiser SCAN921260 (6-channel)4 input processors covering three phi bins each1 Jet processor on main board1 Sum processor on main board1 Board control CPLD (CC)Readout links (PHY & opto) on daughter module (RM)Flash configurator : system ACESlow control / CAN : Fujitsu microcontrollerSuccessfully tested algorithms and all interfaces

Some tuning required on SystemACE clockCAN not to new specs (L1Calo common design)

Page 16: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 16

History: JEM 1.0

power

Jet

Sum

R

S

T

U

VME

CCRM

ACE

CAN

Flash

TTC

JEM1.0 successfully tested AlgorithmsAll interfaces

LVDS inFIO inter-module linksMerger outOptical readoutVMECAN slow control

Mainz, RAL slice test, CERN test beam

Page 17: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 17

JEM 1.1

JEM1.1 in production nowIdentical to JEM 1.0Additional daughter module: Control Module (CM)

CANVME controlFan-out of configuration lines

Expected back from assembly soooon

88 pair

V M E

each 165 pins FIO 60 bit @ 80Mb/s

TTCDec

System ACE

3 x 40 bit @ 40 Mb/s

DES

DES

DES

DES Input 2 B 1 A 0 V

60

60

40

Input 5 E 4 D 3 C

Input 8 H 7 G 6 F

Input --

10 X 9 W

DAQ/VME

To JMM

TX

Jet

R

S

T

U

Sum

DAQ

To SMM

ROI

Opto

clock mirror

TX Opto

CAN

CM

RM

Page 18: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 18

JEM details –main board9U*40cm*2mm, bracing bars, ESD strips, shielded b’plane connector4 signal layers incl. top, bottom, 2*Vcc, 4*GND total 10 layersMicro vias on top, bottom, buried viasAll tracks controlled impedance : controlled / measured by manufacturer

Single ended 60Ω Differential 100Ω Point-to-point links onlyAll hand-routed

60Ω DCI source termination on processors (CMOS levels)Power distribution

All circuitry supplied by local step-down regulators, fused 10A (estimated maximum consumption < 5A on any supply, 50W tot.)10A capacity, separate 1.5V regulator for daughter modulesDefined ramp-up time (Virtex2 requirement)staged bypass capacitors, low ESRVME buffers scannable 3.3V (DTACK: open drain 3*24mA), short stubs on signal lines, 20-75 mmVccaux for FPGAs : dedicated quiet 3.3VMerger signals (directly driven by processors) on 2.5V banksFPGA core and inter-processor and inter-module links 1.5V

Page 19: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 19

JEM details –main board (2)

TimingTTC signals terminated and buffered (LVPECL, DC) near backplaneTTCdec module with PLL and crystal clock automatic backup DESKEW1 bunch clock used as a general purpose clockLow skew buffers (within TTCdec PLL loop) with series terminatorsDESKEW2 clock used for phase-controlled sampling 80Mb/s jet element data (local & FIO) on jet processor only

VMESynchronised to bunch clockSum processor acts as VME controllerBasic pre-configure VME access through CM

Readout located on RM (ROCs on sum and jet processor)DCS/CAN located on CM (except PHY - near backplane)Configuration via SystemACE / CF

P2P links to keep ringing at bayMultiple configurations, slot dependent choice

Page 20: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 20

JEM details –main board (3)

JTAG available on most active components. Separate chains

FPGAs (through SystemACE)Non-programmable devices on input daughtersTTCdec and Readout ModuleBuffersControl Module

JTAG used for Connectivity tests at manufacturer & MZCPLD configurationFPGA configuration (ACE)

Page 21: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 21

Input modules

24 LVDS data channels per module12 layer PCB with micro viasImpedance controlled tracks

60 Ω single ended100 Ω differential

LVDS signals entering via 100Ω differential connector on short tracks (<1cm)Differential termination close to de-serialiser4 × SCAN921260 6-channel de-serialiser

PLL and analogue supply voltage only (3.3V) supplied from backplaneDigital supply from step-down regulator on main board Reference clock supplied via FPGA

XC2V1500 input processor1.5V CMOS 60Ω DCI signals to sum and jet processorSMBus device for Vcc and temperature monitoring (new)

Page 22: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 22

Readout Module RM

2 channels, 640 Mb/s 16bit 20 bit CIMT coded, fill-frame FF1, alternating flag bit, as defined in HDMP1022 specs

2xPHY, 2xSFP opto transceiver, so far 2-layer boardsHigh-speed tracks <1cmPHYs tested:

HDMP1022 serialiser 2.4W/chip (reference, tested in 16-bit and 20-bit mode) HDMP1032A serialiser 660mW/chip, €27.86 @ 80pc (16-bit)TLK1201A serdes 250mW/chip, < €5.00 @ 80pc, uncoded, requires data formatter firmware in ROC (16-bit, 20-bit)

Successfully run off bunch clockConverted to Xtal clock due to unknown jitter situation on ATLAS TTC clockProblems with Xtal clock distribution to ROI PHY (RAL, MZ)RM seems to work with clock linked from DAQ PHY to ROI PHY

Want a local crystal oscillator on RM Need new iteration of RM (HDMP1032A, TLK1201A)

Page 23: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 23

Control Module CM

Combines CAN/DCS, VME pre-configure access and JTAG fanoutCAN

Controller to L1Calo specs now (common design for all processors, see CMM/CPMLink to main board via SMBus only (Vcc, temperatures)

VME CPLD (pinout error corrected) generating DTACK for all accesses within module sub-address range to avoid bus timeout

Providing basic access for FPGA configuration via VMEconfiguration resetACE configuration selection / slot dependentACE configuration selection via VME

Buffers for SystemACE-generated JTAG signals to FPGAsTTCdec parallel initialisation (ID from geographical address)

Page 24: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 24

JEM Cost @ 40pcs

Main board, 10 layerPCB                 21,393 €Assembly            7,657 €Components     49,725 €Sub-total 78,775 €

Input module, 12 layerPCB          11,625 €Assembly           6,500 €  Components      43,000 €   Sub-total 61,152 €

 Total 139,927€ +control + readout + SFP

Page 25: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 25

Energy Sum Algorithm

In all stages saturate outputs if input is saturated or arithmetic overflow occursOperate on 40Mb/s data from LVDS de-serialisers :88 channels per JEM, 9-bit ET data, parity, link errorLatch incoming data on bunch clock, 2 samples per tickSelect the stable sample under VME control

Automatic phase detection in f/w (remove that feature ?)Delay scan (VME)

Correct for upstream latency differences, up to 3 ticks (shift register, VME controlled)Send data to readout and spy circuitry Zero data on parity errorApply channel maskSum up electromagnetic and corresponding hadronic channel to 10-bit jet elementMultiplex jet elements to 80Mb/s and send to jet processor and backplane

Page 26: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 26

Energy Sum Algorithm (2)

Threshold jet elements and sum to ET (12 bits, 1GeV resolution)Threshold jet elements and multiply (cosφ,sinφ), .25GeV resolutionSum to 2*14 bit (EX,EY) missing energy vector

Transmit (EX,EY,ET) to sum processor

Calculate board-level total vector sumQuad-linear encoding to 8 bit each

6-bit value and 2-bit range indicatorResolution 1,4,16,64 GeV, full scale 4 TeV

Send 25 bits of data incl. odd parity bit D(24) to backplane

Page 27: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 27

FPGA resources used

Fully synchronous designs, I/O Flip-flops used on all data linesInput FPGAs XC2V1500-4FF896C

Slice Flip Flops: 27% LUTs: 59% totalIOBs 90%Block RAMs: 68% Multipliers 50% GCLKs: 12% DCMs: 12% 40.6MHz

SUM FPGA XC2V2000-4BF957CSlice Flip Flops: 7% LUTs: 11% totalIOBs 83%Block RAMs: 12% GCLKs: 25% DCMs: 12% 42.8MHz

Page 28: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 28

Performance

All interfaces and the algorithms have been tested on JEM1.0 in Mainz, at the RAL slice test and in the CERN test beam.

Problems revealed:SystemACE configuration fails if incoming clock or TCK signal are of insufficient quality : signal distortions confirmed

re-layout of crystal clock and TCK distribution on JEM1.1At CERN 2 out of 4 PPR channels could not be received error-free : signal distortions confirmed

modifications required on the PPR LCD moduleErrors observed on ROI readout only recently : problems with on-JEM crystal clock distribution confirmed

re-layout of readout module, use local clockApart from the above problems all interfaces and the

algorithms have shown to work error free in all tests

Page 29: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 29

Test setup

Up to 3 JEMs in a 9U crate allowing for FIO tests either direction, along with VMM, TCM, CMM (and CPMs!)

Control: Concurrent CPU on VMM or via flat cableExternal data sources for

TTC : TTCvx, TTCvi, TTCex (CERN/RAL) via TCMLVDS :

1 DSS 16-channel (MZ)Several DSS (RAL)LSM (RAL)PPR (CERN) (4 channels)

External data sinks forMerger signals : 2 CMMs (RAL)Readout path:

Complete ROS (RAL)G-link tester with f/w pattern comparison (MZ)

Page 30: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 30

Test strategiesTest the full system including all interfaces and algorithms at moderate statistics. Generally use physics-like test vectors

Requires operation of a ROS and data comparison on a computer. Therefore even in relatively long test runs very low bit error rates would go undetectedTest interfaces with firmware-based test adapters and on-JEM diagnostic firmware allowing for real-time detection of pattern errors

These tests will reveal even low-level errors quickly

Choice of test patterns - have a look at possible failure mechanisms:FIO data and merger data on backplane source-terminated lines at

moderate speed: no signal dispersion expected nor observed 800 Mb/s readout data : due to optical transmission no dispersion

expected nor observedLVDS links : the pre-compensation circuitry is required to compensate at

a single time constant only, well below a single bit period. At the receiving end a slight overshoot should be observed

no inter-symbol interference expected on neither of the transmission lines. Main source of errors: system noise. Any non-constant data pattern should do.Use binary counter pattern. Useful on serial links:

Has long stretches of many ones / many zeroesHas transitions all-one to all-zeroEasy to detect errors

Page 31: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 31

System test at RAL (slice test)

Setup with 2-stage merging in a single crate:DSS JEM crate CMM system CMM ROD ROS

ROD ROSComparing readout data against simulation.ROD type : 6U modulesData format : old format (6U module specific)Results (June 2004):Data taken up to 5 slices of JEM DAQ data. Trigger rate up

to 60kHz, 4*106 events analysed, no errors observed on JEM readout.

Page 32: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 32

Interface tests

At RAL:Playback from JEM (ramps) into CMM (parity detection).

Merger signals crossing 2/3 of backplane length: no error in 1013 bits

In Mainz:FIO tests 3 JEMs (ramps, pattern comparison on central JEM)

: no error in 1013 bitsLVDS input tests Source : DSS, 16 inputs exercised at a time, pattern

comparison (ramp) in input module : no error in 1015 bitsReadout link tests : G-link tester with pattern

comparison(ramp) no error in 1014 bits(problems with crystal clock from jet processor)

Page 33: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 33

FIO tests : delay scan

All data latched into jet processor on a common clock edgeSweep TTCrx delay setting, 104ps stepsMeasure data errors on each channel : 10 bits, 5 signal lines

Single channel8ns error free

All channels6.5ns error free

Page 34: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 34

latency

LatencyEnergy path 183nsJet path 234ns < 9.5BC

Page 35: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 35

CERN test beam

Within a wider test setup the following modules were available to generate / analyse JEM ‘test vectors’ based on true calorimeter signals

PPR JEM CMM CTP ROD ROS

ROD type : 6U modulesData received from PPR error free on 2 channelsReadout from PPR not possible could not verify input signal integrity except parity error checkEnergy sum signal processing verified internally

Page 36: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 36

Test beam results

Sum algorithm error-free (see effects of quad-linear encoding)

Input data Eem+Ehad

Energ

y s

um

to C

MM

Page 37: JEM FDR: Design and Implementation

5th April, 2005 JEM FDR 37

Production testsBoundary scan at manufacturer: high coverage due to large fraction of scannable components verify connectivity (static test)Standalone tester for input module LVDS inputs, pattern comparison in firmware (high statistics)Standalone tester for readout module, pattern comparison in firmware (high statistics)DCI operation verified w. oscilloscope (drive unterminated 50 Ω cable into scope, record pulse shape) dynamic testSystem-level tests in Mainz : 1 crate, 1 JEM supplied with LVDS data at a time, playback and spy facilities used to generate / capture data on board boundaries.FIO delay scanHigh statistics FIO BER tests, pattern detection in firmware, test full crate at a time with maximum activity on LVDS, VME, readoutSystem-level tests at CERN