an ultra-low power address-event sensor interface for ... · power address-event sensor interface...

6
An Ultra-Low Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction Alo Di Mauro Integrated Systems Laboratory ETH Zurich, Switzerland [email protected] Francesco Conti * Integrated Systems Laboratory ETH Zurich, Switzerland [email protected] Luca Benini Integrated Systems Laboratory ETH Zurich, Switzerland [email protected] ABSTRACT Internet-of-ings devices need sensors with low power footprint and capable of producing semantically rich data. Promising can- didates are spiking sensors that use asynchronous Address-Event Representation (AER) carrying information within inter-spike times. To minimize the overhead of coupling AER sensors with o-the- shelf microcontrollers, we propose an FPGA-based methodology that i) tags the AER spikes with timestamps to make them car- riable by standard interfaces (e.g. I2S, SPI); ii) uses a recursively divided clock generated on-chip by a pausable ring-oscillator, to reduce power while keeping accuracy above 97% on timestamps. We prototyped our methodology on a IGLOOnano AGLN250 FPGA, consuming less than 4.5mW under a 550kevt/s spike rate (i.e. a noisy environment), and down to 50uW in absence of spikes. ACM Reference format: Alo Di Mauro, Francesco Conti, and Luca Benini. 2017. An Ultra-Low Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design Automation Conference, Austin, TX, USA, June 18-22, 2017 (DAC’17), 6 pages. DOI: 10.1145/3061639.3062201 1 INTRODUCTION e deployment on low power devices of complex “smart” appli- cations based on multi-sensor data streams is at the core of the so called “Internet-of-ings” (IoT) revolution. In this context, small and unobtrusive edge computing devices such as low power micro- controllers must be able to extract high-level information out of noisy, high bandwidth and essentially “informationally sparse” data streams, such as those produced by o-the-shelf microphones and cameras. To extract high-level information out of these sensors, it is necessary to use data analytics algorithms such as principal com- ponent analysis [1] for dimensionality reduction, k-means [2] for clustering, support-vector machines [3] or neural networks [4] for * Also with the EEES Laboratory - University of Bologna, Italy (contact at [email protected]). Also with the EEES Laboratory - University of Bologna, Italy (contact at [email protected]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. DAC’17, Austin, TX, USA © 2017 ACM. 978-1-4503-4927-7/17/06. . . $$15.00 DOI: 10.1145/3061639.3062201 classication. ese are typically too complex and computationally intensive for most microcontrollers, leaving only the alternative to add a hardware accelerator to the edge computing node (which is ex- pensive in terms of cost and power), or to send raw data streams to a higher-level computing infrastructure in the cloud (which has an enormous energy overhead and requires relatively high-bandwidth communication over radio). A promising alternative approach is to move part of the semantic information extraction burden to the sensor itself, giving up some generality in exchange for an output data stream that already high- lights data of interest to the specic application. is modication could signicantly reduce the computational eort of a downstream computing node by reducing the volume of data that such node has to process to obtain the same semantic information. Within these “smart” sensors, a particularly interesting class is that of event- based spiking devices [5][6][7], which, similarly to neurons in the retina and cochlea, mainly sense changes - that is, they produce events when the content of the underlying sensed analog signal has a high energy content within a set of narrow frequency bands. e output of these sensors is therefore essentially a predistilled time-frequency representation of the original sensor signal. Very oen, event-based spiking sensors use an asynchronous interface to communicate with the external world. ere are several motivations for this. First, the information content of the spike stream is contained not only in the spike “address” (i.e. position and/or frequency), but also in the relative inter-spike time delta, similarly to what happens in the human retina [8]. An asynchro- nous representation is therefore naturally suited to encapsulate this information in an implicit way. Second, many of these sensors were originally designed to couple with custom designed brain-like interfaces that are internally asynchronous [9][10]. Finally, the asynchronous interface oers an opportunity for signicant power savings due to its clock-less nature. Unfortunately, the implicit nature of this essential component of the information embedded in a spike stream makes it dicult to transfer the stream to synchronous devices like micro-controllers. First of all, it is necessary to sample the spike stream with a sampling period sucient to adequately represent small inter-spike times, since time in synchronous systems is quantized by denition. e smallest possible inter-spike time forces the choice of a high sam- pling frequency even if the average spike rate is in practice much smaller. Moreover, apart from fully streaming ASICs, most syn- chronous systems (such as o-the-shelf microcontrollers) require temporary data storage in a working memory to process and/or transfer any kind of data. To this end, data must be transformed in a latency-insensitive form, i.e. all time-related information has to be

Upload: others

Post on 15-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction

Al�o Di Mauro

Integrated Systems Laboratory

ETH Zurich, Switzerland

[email protected]

Francesco Conti∗

Integrated Systems Laboratory

ETH Zurich, Switzerland

[email protected]

Luca Benini†

Integrated Systems Laboratory

ETH Zurich, Switzerland

[email protected]

ABSTRACTInternet-of-�ings devices need sensors with low power footprint

and capable of producing semantically rich data. Promising can-

didates are spiking sensors that use asynchronous Address-Event

Representation (AER) carrying information within inter-spike times.

To minimize the overhead of coupling AER sensors with o�-the-

shelf microcontrollers, we propose an FPGA-based methodology

that i) tags the AER spikes with timestamps to make them car-

riable by standard interfaces (e.g. I2S, SPI); ii) uses a recursively

divided clock generated on-chip by a pausable ring-oscillator, to

reduce power while keeping accuracy above 97% on timestamps.

We prototyped our methodology on a IGLOOnano AGLN250 FPGA,

consuming less than 4.5mW under a 550kevt/s spike rate (i.e. a

noisy environment), and down to 50uW in absence of spikes.

ACM Reference format:Al�o Di Mauro, Francesco Conti, and Luca Benini. 2017. An Ultra-Low

Power Address-Event Sensor Interface

for Energy-Proportional Time-to-Information Extraction. In Proceedings ofDesign Automation Conference, Austin, TX, USA, June 18-22, 2017 (DAC’17),6 pages.

DOI: 10.1145/3061639.3062201

1 INTRODUCTION�e deployment on low power devices of complex “smart” appli-

cations based on multi-sensor data streams is at the core of the so

called “Internet-of-�ings” (IoT) revolution. In this context, small

and unobtrusive edge computing devices such as low power micro-

controllers must be able to extract high-level information out of

noisy, high bandwidth and essentially “informationally sparse” data

streams, such as those produced by o�-the-shelf microphones and

cameras. To extract high-level information out of these sensors, it

is necessary to use data analytics algorithms such as principal com-

ponent analysis [1] for dimensionality reduction, k-means [2] for

clustering, support-vector machines [3] or neural networks [4] for

∗Also with the EEES Laboratory - University of Bologna, Italy (contact at

[email protected]).

†Also with the EEES Laboratory - University of Bologna, Italy (contact at

[email protected]).

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for pro�t or commercial advantage and that copies bear this notice and the full citation

on the �rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior speci�c permission and/or a

fee. Request permissions from [email protected].

DAC’17, Austin, TX, USA© 2017 ACM. 978-1-4503-4927-7/17/06. . .$$15.00

DOI: 10.1145/3061639.3062201

classi�cation. �ese are typically too complex and computationally

intensive for most microcontrollers, leaving only the alternative to

add a hardware accelerator to the edge computing node (which is ex-

pensive in terms of cost and power), or to send raw data streams to

a higher-level computing infrastructure in the cloud (which has an

enormous energy overhead and requires relatively high-bandwidth

communication over radio).

A promising alternative approach is to move part of the semantic

information extraction burden to the sensor itself, giving up some

generality in exchange for an output data stream that already high-

lights data of interest to the speci�c application. �is modi�cation

could signi�cantly reduce the computational e�ort of a downstream

computing node by reducing the volume of data that such node has

to process to obtain the same semantic information. Within these

“smart” sensors, a particularly interesting class is that of event-

based spiking devices [5][6][7], which, similarly to neurons in the

retina and cochlea, mainly sense changes - that is, they produce

events when the content of the underlying sensed analog signal

has a high energy content within a set of narrow frequency bands.

�e output of these sensors is therefore essentially a predistilled

time-frequency representation of the original sensor signal.

Very o�en, event-based spiking sensors use an asynchronous

interface to communicate with the external world. �ere are several

motivations for this. First, the information content of the spike

stream is contained not only in the spike “address” (i.e. position

and/or frequency), but also in the relative inter-spike time delta,

similarly to what happens in the human retina [8]. An asynchro-

nous representation is therefore naturally suited to encapsulate

this information in an implicit way. Second, many of these sensors

were originally designed to couple with custom designed brain-like

interfaces that are internally asynchronous [9][10]. Finally, the

asynchronous interface o�ers an opportunity for signi�cant power

savings due to its clock-less nature.

Unfortunately, the implicit nature of this essential component

of the information embedded in a spike stream makes it di�cult to

transfer the stream to synchronous devices like micro-controllers.

First of all, it is necessary to sample the spike stream with a sampling

period su�cient to adequately represent small inter-spike times,

since time in synchronous systems is quantized by de�nition. �e

smallest possible inter-spike time forces the choice of a high sam-

pling frequency even if the average spike rate is in practice much

smaller. Moreover, apart from fully streaming ASICs, most syn-

chronous systems (such as o�-the-shelf microcontrollers) require

temporary data storage in a working memory to process and/or

transfer any kind of data. To this end, data must be transformed in

a latency-insensitive form, i.e. all time-related information has to be

Page 2: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.

made explicit so that it can be conserved for an inde�nite amount of

time. For these reasons, building a link between an asynchronous

event-based sensor and a commercial o�-the-shelf microcomtroller

or a similar synchronous device can be essentially considered a

time measurement problem, with the main constraint that power

must be kept within a very low power envelope.

Our contribution to solve the problem of coupling an event-

based sensor with a synchronous device is as follows. We propose a

low-power architecture to measure the inter-spike time and apply

a timestamp to each event, explicitizing the time component of

information embedded within the spike train. Moreover, to increase

the e�ciency in sensing input signals, we progressively reduce

the frequency of the sampling interface between two consecutive

events, so that if the rate is low power consumption is signi�cantly

reduced. If two events are too far away in time, we consider them

incorrelated and fully switch down the clock to enter an even lower

power mode. We present an embodiment of this architecture on

a low-power MicroSemi IGLOOnano FPGA, using a pausable ring

oscillator to generate the variable frequency clock, and test it by

transforming the spike-stream from the low power spiking cochlea

designed by Liu et al. [11] into an I2S stream that can be consumed

by most microcontrollers. We show that it is possible to signi�cantly

reduce the power of the interface when the spiking rate is decresed,

ranging from 50 µW in absence of spikes to 4.5 mW at a 550kevt/s

spike rate, equivalent to a noisy environment.

2 RELATEDWORKSSigni�cant research interest in the topic of brain inspired sensors

has been shown, both as a way to explore and understand the way

human sensory organs work through their imitation, and as a means

to provide ”smarter” input to vision and audio processing. Event-

based pixel sensors show data rates in the order of a few Mevt/s

or less: for example, DVS128 [12], developed by INIlab, features

a maximum event rate of 1 Mevt/s within a power envelope of

∼23 mW. �e sensor proposed by Go�ardi et al. [7] bu�ers all

events up to building a 128 × 64 “frame”; with a 25% pixel activity

(equivalent to ∼100 kevt/s) it consumes 100 µW. Audio sensors such

as silicon cochleas [13][14][11] typically work at tens/hundreds of

kevt/s for typical speech scenarios, within a power envelope of less

than 15 mW, and down to mere tens of µW for the latest sensor

proposed by Yang et al. [15]. �ese sensors can be employed in

many applications that can bene�t from cognitive computing and

semantically high level input data, such as autonomous UAVs [16],

robotics (both audio [17] and vision based [18]), and industrial [19]

or tra�c safety [20] applications.

To integrate these sensors in real systems, several examples

of AER interfaces have been developed, exploiting a variety of

hardware and so�ware architectures. �e purpose of most of these

platforms is essentially to interface a neuronal chip with a PC

for test, debug and data acquisition. For this reason, most of these

interfaces are designed to cope with the worst case in terms of input

data rate, and have no hard constraint on power consumption. For

example, interfaces from AER to PCI or USB have been developed

both on FPGA [21][22][23][24], achieving sustainable event rates

up to 10 Mevt/s in a power envelope in the order of hundreds of

function AETRsampling ( Tmin , θdiv , Ndiv )

Tsample ← Tmin ; cntsample ← 0 ; cntdiv ← 0

while True doif reqest() then

sample() ; acknowledge()cntsample ← 0 ; cntdiv ← 0

Tsample ← Tmin

else if cntsample = θdiv thenif cntdiv = Ndiv then

cntsample ← 0 ; cntdiv ← 0

shutdown clk()wait for reqest()continue

elseTsample ← 2 · Tsample

cntsample ← 0 ; cntdiv ← cntdiv + 1

end ifelse

cntsample ← cntsample + 1

end ifwait one cycle()

end whileend function

Figure 1: Time-to-information extraction methodology.Tmin is the starting sampling period (i.e. the fastest); θ

divis

the number of cycles between two successive divisions of thesampling clock; N

divis the number of times the clock is di-

vided before it is switched o�.

mW or more, and ASIC [25][26], with sustainable event rates up to

20Mevt/s.

Rusci et al. [27] propose a smart wakeup interface to an event-

based vision sensor integrated within a ultra-low-power multicore

system-on-chip which is similar to the one we propose here. With

respect to our work, this proposal is less �exible, as it does not

allow interfacing with a generic microcontroller, and it does not

feature a locally generated variable frequency clock, which is a key

component of our work. It enables ultra high e�ciency in vehicle

detection, with real time performance achieved in less than 25 µW.

As previously mentioned, our proposal in this work relies on

the availability of a variable frequency pausable clock generated

directly on chip. �e reference clock used to synchronize incoming

data is tuned accordingly to the activity at the asynchronous bound-

ary; this approach is similar to that used in a very di�erent context

in Globally Asynchronous, Locally Synchronous (GALS) systems

[28]. Some of these systems are able to pause and reactivate the

clock reference used for synchronization in a data-driven fashion,

i.e. depending on the presence/absence of an asynchronous hand-

shake. To this end, logic circuits similar to the one we propose

in Section 3 have been already exploited to implement so-called

pausible clocks [29].

3 TIME-TO-INFORMATIONEXTRACTION

Spike streams coming out from asynchronous brain-inspired sen-

sors contain two di�erent kinds of information: the transmi�ed

data value itself (i.e. the address of the “neuron” that produced the

spike), and the time delta between two successive events. Address-

Event Representation (AER) [30] is the protocol used by many of

these sensors; it employs a 4-phase asynchronous handshake and

Page 3: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction DAC’17, June 18-22, 2017, Austin, TX, USA

Figure 2: AER sampling clock with Ndiv= 3, θ

div= 8.

an address channel. �is protocol does not provide any explicit

information about the time that separates two consecutive ele-

ments, hence this information is implicit in the inter-events time.

A completely asynchronous interface, by de�nition, is not able to

explicitly extract information related with timing: it has to work

as a continuous consumer of the event spike stream. Either the

downstream computing device is explicitly working as such (e.g.

a brain-inspired architecture like TrueNorth [10]), or the time do-

main information must be extracted explicitly. �e former behavior

can only be implemented in a typical microcontroller by forcing

it to remain always-on and active to process collected events in

real time; conversely, making the time domain information explicit

could enable storing and accumulating events so that they can be

processed in batch, allowing more e�cient usage of the downstream

computing device. Increasing the e�ciency of the events acquisi-

tion/timestamping unit becomes crucial in this architecture, since

only this block would be active during the spikes accumulation

phase; all the unused part of the system could be clock-gated. In

such architecture, the actual achievable energy saving depends on

two main factors: i) the ratio between the input and output bitrate;

ii) the bu�er size.

To extract the implicit time domain information (the inter-spiketime) from the spike stream, we propose the mechanism shown in

Figure 1, which is based on variable frequency sampling of the AER

input. Each event arrived at the interface is tagged with a timestamp

measured as the time delta from the previous spike event. We

call the timestamp-enriched format of AER an Address-Event-TimeRepresentation or AETR. In the AETR format, spike events are made

latency-insensitive because their arrival time is explicitly encoded,

and can be stored for an inde�nite amount of time before being

processed or carried over any other digital data transfer protocol

without making additional assumptions of any kind. As we are

interested in relative precision for inter-spike deltas, the sampling

frequency can be progressively relaxed, reducing the frequency by

one half every θdiv

cycles as shown in Figure 2. Eventually, if no

spike is present on the input, a�er Ndiv

clock divisions the clock is

completely stopped to save even more power, and reactivated only

when a AER request for handshake is asserted at the input.

4 HARDWARE ARCHITECTUREWe deployed the time-to-information extraction methodology that

was detailed in Section 3 on a AER-to-I2S interface implemented

on a low-power MicroSemi IGLOOnano FPGA. We targeted in

particular the iniLabs DAS1 cochlea sensor1, which mounts the

Cochlea AMSC1c chip [11]. We selected an I2S stream as the carrier

of the timestamp-augmented spike stream accordingly to the audio

1h�p://inilabs.com/products/dynamic-audio-sensor

IGLOOnano FPGA

AMSC1c

Cochlea

REQ

ACK

ADDR

microphone SPI

unit

clock

generator

AER to AETR

sampling unit

AETR bu!er

(9.2 kB)

I2S

interface

SCK

CSN

MOSI

MISO

SCK

WS

SD

Micro

Controller

Unit

STM32-L476

con"guration bus

data crossbar

10bit

INT

Figure 3: AER-to-I2S interface between theCochlea AMSC1Cand a microcontroller unit.

nature of the cochlea signal; through the proposed interface, the

cochlea can be connected to any I2S-equipped microcontroller unit

(MCU), such as an STM32-L476 [31]. Figure 3 shows a high-level

architectural diagram of the full system.

�e hardware architecture of the AER-to-I2S interface is formed

by four main macro-blocks: i) an AER front-end, which acts as

spike stream synchronization block and produces the timestamp-

augmented AETR stream, ii) a bu�er module, which can be con�g-

ured to hold the AETR data to create a batch to be transferred in

block, iii) the Clock Generator, which provides the recursively di-

vided clock, based on a pausable ring oscillator, iv) the I2S interface.

�e blocks that send or receive AETR data are interconnected by

a combinational crossbar, while a con�guration bus, accessible by

the outside through SPI, is used to modify the interface con�gura-

tion registers at runtime. Except for the request monitor inside the

AER front-end, all blocks are clock-gated by default and activated

only when in active use; moreover, all modules use the same global

variable frequency clock generated on-chip by the clock generator.

An input spike is signaled by the assertion of the AER request sig-

nal (REQ). As shown in Figure 4, the input monitor used to receive

the request is consitututed by a simple cascade of two �ip-�ops to

synchronize the request and reduce the occurence of metastability.

As in AER the address (ADDR) signal is required to be already stable

when REQ is asserted, the address is simply sampled by a single

10bit register. A counter generates the timestamp used to tag the

incoming events; it has a con�gurable increment step to produce

timestamps coherent with the varying sampling period. �e tagged

AETR data stream is sent to an SRAM-based FIFO bu�er, where

the collected events are stored until reaching a certain threshold,

at which point the bu�ered data is converted into an I2S stream

towards the downstream microcontroller.

D Q

D Q D Q

ADDR

REQ

ADDR (sampled)

REQ (synch)

CLK (always on)

CLK (gateable)

10

Figure 4: AER interface ADDR and REQ input monitor.

Page 4: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.

SLEEP

EN

QD

VDD

REQ

SLEEP

PULSE CLK

QQ D

Figure 5: Schematic of the ring oscillator with start/stop cir-cuit.

4.1 Clock Generator�e Clock Generator is responsible of generating the variable fre-

quency clock described in Section 3. It is composed of a pausable

ring oscillator, which provides the reference clock frequency, and a

con�gurable clock divider.

�e ring oscillator, shown in Figure 5, is implemented as a cas-

cade of an odd number of inverting gates placed in a closed loop

con�guration. Minimum delay inverters have been used for higher

granularity in the generated frequency selection, which can be

performed by removing/inserting a pair of inverters. �e input

inverter is substituted by a NOR2 gate to interrupt the inverting

chain and stop the oscillator. Since the clock is used as reference

for the whole system, all registers are frozen when it is deactivated

- including the one generating the SLEEP bit. To avoid this being

a deadlock condition, this is converted into a pulse by a chain of

inverters; the length of this chain is de�ned by the constraint that

the pulse must be longer then a clock semiperiod and arrive during

the low clock phase. As shown in Figure 5, the clock is stopped by

the assertion of the SLEEP PULSE bit, which is AND’ed with the

clock to avoid glitches.

�e ring oscillator generates a 120 MHz clock that is fed to a cas-

cade of frequency dividers to bring the frequency down to 30 MHz

(reference clock). A �nite state machine implements the algorithm

detailed in Section 3, generating the global clock with a submultiple

frequency with respect to the reference one. �e θdiv and Ndivcon�guration parameters can be loaded from the outside via the

SPI con�guration interface to change the interface con�guration at

run-time.

5 EXPERIMENTAL RESULTS�e system has been implemented on an IGLOOnano AGLN250V2FPGA, using Synopsys Synplify Pro J2015.03M for logical synthesis

and Microsemi Libero SoC 11.7 for placement & routing. �e inter-

face utilizes 31% of the resources available (∼ 600 equivalent logic

gates). We constrained the design to work with a 30 MHz clock

reference frequency generated by the ring oscillator, i.e. 15 MHz

as the highest frequency available for sampling. �is means that

inter-spike time of 130 ns or more can be sensed by the interface;

more than enough to respect the most commonly used standard for

the AER protocol, CAVIAR [32], which requires each event to be

completed within 700 ns.

0.001

0.01

0.1

1

100 1000 10000 100000 1x106

Avera

ge e

rror

Event rate (evt/s)

Timestamp error

θdiv = 16θdiv = 32θdiv = 64

Figure 6: Average relative error introduced by the AER-to-AETR conversion.

5.1 Time-to-Information extraction accuracyTo evaluate the time accuracy and the error introduced by the

time quantization with the our variable frequency approach, we

implemented a Matlab model of the clock generation unit, which

can be fed with a con�gurable event rate Poisson distributed spike

stream. In this model we assume a perfect clock with constant

frequency and 50% duty cycle. �e system has been simulated

for di�erent values of θdiv , and in a range of event rates between

100evt/s and 2Mevt/s. Figure 6 shows that in the event rate range of

interest (e.g. for θdiv = 64, from 1 kevt/s to 550 kevt/s), the average

error caused by frequency division can be kept signi�cantly below

the analytic 3% bound.

In the graph shown in Figure 6, we distinguish three di�erent

regions (e.g. for θdiv = 64): inactive region, from 100 evts/s to 100

kevt/s, corresponding to a very low activity of the sensor; activeregion, from 100 kevts/s to approximately 550 kevt/s, where the

divided clock methodology is applied; high-activity region, above

∼550 kevt/s, where the reference frequency is always the maximum

one.

In the inactive region, the error is high as the event rate is so

low that the interface is essentially always o�, therefore most spike

events are tagged with the saturated timestamp: this corresponds

to a region in which we are uninterested in the correlation be-

tween events. In the high-activity region, the behavior is di�erent:

when the event rate is very high, nearing to the non-divided sam-

pling frequency, the error increases because a increasing fraction of

the spikes are separated by inter-spike times which are below the

Nyquist period, and therefore are tagged incorrectly (this is a limit

related to the choice of the non-divided sampling frequency, and

not to our frequency division scheme). In the active region, which

is our main region of interest, the error oscillates between two

boundaries; the upper bound is given by a time measurement of the

inter-spike time, done just a�er an iterative frequency division. �e

lower bound is given when the the inter-spike time is measured

just before a new iterative frequency division. In other words, the

Page 5: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction DAC’17, June 18-22, 2017, Austin, TX, USA

0

50

100

150

200

250

0 100 200 300 400 500 600 700 800 0

50000

100000

150000

200000

250000

300000

350000

400000

Spik

e a

ddre

ss

Event ra

te (

evt/s)

Time (ms)

EventEvent rate

(a) Address-Event Representation and event rate.

0.05

0.1

0.15

0.2

0.25

0.3

0 3 6 9 12

Pro

ba

bili

ty

Timestamp error %

θdiv = 16

0.05

0.1

0.15

0.2

0.25

0.3

0 3 6 9 12

Pro

ba

bili

ty

Timestamp error %

θdiv = 32

0.05

0.1

0.15

0.2

0.25

0.3

0 3 6 9 12

Pro

ba

bili

ty

Timestamp error %

θdiv = 64

(b) Distribution of timestamp errors at di�erent θdiv.

Figure 7: Example of single output channel of the cochleasensor for a word extracted from a real sentence, with eventrate and error distributon.

peaks and valleys in the average error in this region are related to

the Ndiv

successive divisions of the clock.

Figure 7 shows an example of the output of the cochlea when

sensing a word in a real conversation (Figure 7a), along with the

error distribution at di�erent values of θdiv

. Figure 7b clearly shows

how increasing θdiv

improves overall accuracy, although this im-

provement comes with some power cost as clari�ed by the following

section.

5.2 Power consumptionTo measure the e�ciency introduced by the clock division method-

ology, we compared the power consumption with our approach

with a “naıve” constant frequency sampling approach utilizing the

same ring oscillator; in both cases we clock-gated the unused parts

of the circuit to highlight the improvements introduced by the sole

frequency division. We added to the design a variable rate pseudo-

random spike generator based on a linear-feedback shi� register to

feed the system with a �xed rate spike stream and measure power

0

1

2

3

4

5

0.01 0.1 1 10 100

Pow

er

consum

ption (

mW

)

Event rate (kevt/s)

θdiv = 64 θdiv = 32 θdiv = 16

No divisionIdeal

Figure 8: Power consumption

directly on the FPGA board in the range between 10 evt/s to 800

kevt/s for three di�erent values of θdiv

.

As can be observed in Figure 8, the proposed solution is vastly

more e�cient than the naıve clocking at all except for extremely

high rates, when they are on par. Let us consider the ideal power

consumption of the interface as a linear function of rate r , i.e.

Pideal(r ) = E

spike· r + Pstatic, (1)

where Pstatic is the static power consumed by the FPGA (50 µW) and

Espike

is the ideal dynamic energy per spike, which we estimated

as the one in the high-activity region. We can see from Figure 8

that the power consumption gets farther from ideality as the event

rate is decreased, but the clock division technique we propose in

Section 3 drastically improves the situation with respect to the

baseline technique with no clock division. Furthermore, when

the event rate drops below ∼1 kevt/s the clock is o�en shut down

completely, boosting e�ciency up to near ideal power consumption,

particularly at event rates lower than 10 to 100kevt/s. When the

activity of the sensor is very low, the ring oscillator switches o�

o�en, determining a steeper decrease of power consumption when

successive spikes are uncorrelated. Notice that the switching o� of

the ring oscillator can be performed without a signi�cant worsening

in the acquisition time of the next incoming event, since the time

to recover from the o�-state is in the order of 100 ns; which is

comparable with a single clock period at the max freq. �erefore,

with this clock methodology we measured a reduction in power

consumption up to 55% in the active region (in the order of a few

kevt/s), down to only 50 µW in the inactive region.

�e maximum time interval the interface is able to measure

depends directly from the value of θdiv and Ndiv . �ese two pa-

rameters can be used as two di�erent knobs to match both the

desired accuracy and the desired maximum time interval that the

interface is able to cover. �is time can be computed from Figure

8 as the inverse of the event rate in the �ex point of the power

consumption trends.

Page 6: An Ultra-Low Power Address-Event Sensor Interface for ... · Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction. In Proceedings of Design

DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.

6 CONCLUSIONIn this work, we have shown �exible architecture to be deployed on

a small low-power FPGA to link asynchronous event-based sensors

with commercial o�-the-shelf microcontrollers. As an essential

part of the spiking information is embodied in the inter-spike time,

measuring these e�ciently is the key task of our interface. �e

approach we use, based on iterative clock divisions to save power

with minor accuracy loss and on switching o� the clock altogether

when events are extremely sparse, achieves much be�er energy

proportionality than simple sampling at a constant frequency. �e

system has been fully tested on a IGLOOnano FPGA in connection

with aCochlea AMSC1C sensor via AER and a STM32 microcontoller

via I2S and is fully functional. �e power consumption for time-

to-information extraction scales from 4.5 mW at a 550kevt/s rate

down to slightly more than 50 µW at rates lower than 10evt/s (a

90× factor) while a naıve constant clock methodology is stuck

to the same 4.5 mW power regardless of the event rate. At the

same time, with our technique the accuracy reduction can be kept

bounded below 3%, and on average it is even smaller. We believe this

architecture can enable new applications of event-based sensors

in all kinds of low-power devices, both those targeting explicitly

the brain-like nature of these sensors and other ones which simply

exploit their semantically rich output data.

7 ACKNOWLEDGEMENTSWe thank Shih-Chii Liu and Tobi Delbruck from Institute of Neu-

roinformatics of Zurich (INI), University of Zurich, for kindly lend-

ing us a Cochlea AMSC1C prototype. �is work was supported by

EU project ExaNoDe (H2020-671578).

REFERENCES[1] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary

Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.

[2] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y.

Wu, “An e�cient k-means clustering algorithm: Analysis and implementation,”

IEEE transactions on pa�ern analysis and machine intelligence, vol. 24, no. 7, pp.

881–892, 2002.

[3] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,”

ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1–

27:27, May 2011.

[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classi�cation with Deep

Convolutional Neural Networks,” in Advances in Neural Information ProcessingSystems 25, F. Pereira, C. J. C. Burges, L. Bo�ou, and K. Q. Weinberger, Eds.

Curran Associates, Inc., 2012, pp. 1097–1105.

[5] C. Posch, T. Serrano-Gotarredona, B. Linares-Barranco, and T. Delbruck,

“Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking

Output,” Proceedings of the IEEE, vol. 102, no. 10, pp. 1470–1484, Oct. 2014.

[6] M. Yang, C. H. Chien, T. Delbruck, and S. C. Liu, “A 0.5V 55 µW 64x2-channel

binaural silicon cochlea for event-driven stereo-audio sensing,” in 2016 IEEEInternational Solid-State Circuits Conference (ISSCC), Jan. 2016, pp. 388–389.

[7] M. Go�ardi, N. Massari, and S. A. Jawed, “A 100 µW 128x64 pixels contrast-

based asynchronous binary vision sensor for sensor networks applications,” IEEEJournal of Solid-State Circuits, vol. 44, no. 5, pp. 1582–1592, May 2009.

[8] D. A. Bu�s, C. Weng, J. Jin, C.-I. Yeh, N. A. Lesica, J.-M. Alonso, and G. B. Stanley,

“Temporal precision in the neural code and the timescales of natural vision,” 2007.

[9] S. Moradi and G. Indiveri, “An event-based neural network architecture with an

asynchronous programmable synaptic memory,” IEEE Transactions on BiomedicalCircuits and Systems, vol. 8, no. 1, pp. 98–107, Feb. 2014.

[10] P. a. Merolla, J. V. Arthur, R. Alvarez-Icaza, a. S. Cassidy, J. Sawada, F. Akopyan,

B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Ap-

puswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S.

Modha, “A million spiking-neuron integrated circuit with a scalable commu-

nication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, Aug.

2014.

[11] S. C. Liu, A. van Schaik, B. A. Minch, and T. Delbruck, “Asynchronous Binaural

Spatial Audition Sensor With 2 64 4 Channel Output,” IEEE Transactions on

Biomedical Circuits and Systems, vol. 8, no. 4, pp. 453–464, Aug. 2014.

[12] [Online]. Available: h�p://inilabs.com/products/dynamic-vision-sensors/

speci�cations/

[13] B. Wen and K. Boahen, “A silicon cochlea with active coupling,” IEEE Transactionson Biomedical Circuits and Systems, vol. 3, no. 6, pp. 444–455, Dec. 2009.

[14] S. C. Liu, A. van Schaik, B. A. Mincti, and T. Delbruck, “Event-based 64-channel

binaural silicon cochlea with q enhancement mechanisms,” in Proc. IEEE Int.Symp. Circuits and Systems, May 2010, pp. 2027–2030.

[15] M. Yang, C.-H. Chien, T. Delbruck, and S.-C. Liu, “A 0.5V 55µW 64×2 Channel

Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing,” in 2016 IEEEInternational Solid-State Circuits Conference (ISSCC). IEEE, 2016, pp. pp. 388–389.

[16] M. Rusci, D. Rossi, M. Lecca, M. Go�ardi, E. Farella, and L. Benini, “An Event-

Driven Ultra-Low-Power Smart Visual Sensor,” IEEE Sensors Journal, vol. 16,

no. 13, pp. 5344–5353, Jul. 2016.

[17] F. Gomez-Rodriguez, A. Linares-Barranco, L. Miro, S. C. Liu, A. van Schaik,

R. Etienne-Cummings, and M. A. Lewis, “AER Auditory Filtering and CPG for

Robot Control,” in Proceedings of IEEE International Symposyum on Circuits andSystems, May 2007, pp. 1201–1204.

[18] A. Jimenez-Fernandez, J. L. F. del Bosh, R. Paz-Vicente, A. Linares-Barranco, and

G. Jimenez, “Neuro-inspired system for real-time vision sensor tilt correction,”

in Proc. IEEE Int. Symp. Circuits and Systems, May 2010, pp. 1394–1397.

[19] J. Conradt, R. Berner, M. Cook, and T. Delbruck, “An embedded AER dynamic

vision sensor for low-latency pole balancing,” in Proceedings of IEEE 12th Inter-national Computer Vision Conference Workshops (ICCV Workshops), Sep. 2009, pp.

780–785.

[20] C. Conde, E. Orbe, I. M. d. Diego, and E. Cabello, “Bio-inspired Event Based

Motion Detection for Tra�c Safety in a Close-Real Automotive Environment,”

in Proc. IEEE Electronics, Robotics and Automotive Mechanics Conf. (CERMA), Nov.

2011, pp. 120–125.

[21] A. Linares-Barranco, R. Paz, A. Jimenez-Fernandez, C. D. Lujan, M. Rivas, J. L.

Sevillano, G. Jimenez, and A. Civit, “Neuro-inspired real-time USB & PCI to AER

interfaces for vision processing,” in Proc. Int. Symp. Performance Evaluation ofComputer and Telecommunication Systems SPECTS 2008, Jun. 2008, pp. 330–337.

[22] R. Berner, T. Delbruck, A. Civit-Balcells, and A. Linares-Barranco, “A 5 Meps

USB2.0 Address-Event Monitor-Sequencer Interface,” 2006.

[23] R. Paz-Vicente, A. Linares-Barranco, D. Cascado, M. A. Rodriguez, G. Jimenez,

A. Civit, and J. L. Sevillano, “PCI-AER interface for neuro-inspired spiking sys-

tems,” in 2006 IEEE International Symposium on Circuits and Systems, May 2006,

pp. 4 pp.–.

[24] S. O. Cisneros, J. J. R. Panduro, D. T. A. Bretn, and J. R. R. Barn, “Space-time aer

protocol receiver asynchronously controlled on fpga,” Computing Science andAutomatic Control (CCE), 2014.

[25] M. Hofsta�er, P. Schon, and C. Posch, “A SPARC-compatible general purpose

address-event processor with 20-bit l0ns-resolution asynchronous sensor data

interface in 0.18um CMOS,” IEEE International Symposium on Circuits and Systems,2010.

[26] C. Brandli, R. Berner, M. Yang, S. C. Liu, and T. Delbruck, “A 240× 180 130 dB 3 µs

Latency Global Shu�er Spatiotemporal Vision Sensor,” IEEE Journal of Solid-StateCircuits, vol. 49, no. 10, pp. 2333–2341, Oct. 2014.

[27] M. Rusci, D. Rossi, M. Lecca, M. Go�ardi, L. Benini, and E. Farella, “Energy-

e�cient design of an always-on smart visual trigger,” in Proc. IEEE Int. SmartCities Conf. (ISC2), Sep. 2016, pp. 1–6.

[28] M. Krstic, E. Grass, F. K. Gurkaynak, and P. Vivet, “Globally Asynchronous,

Locally Synchronous Circuits: Overview and Outlook,” IEEE Design Test of Com-puters, vol. 24, no. 5, pp. 430–441, Sep. 2007.

[29] K. Y. Yun and R. P. Donohue, “Pausible clocking: a �rst step toward heteroge-

neous systems,” in Proc. Conf. IEEE Int Computer Design: VLSI in Computers andProcessors ICCD ’96, Oct. 1996, pp. 118–123.

[30] �e Address-Event Representation Communcation Protocol.[31] “STMicroelectronics STM32L476xx Datasheet.”

[32] CAVIAR Hardware Interface Standards, Version 2.01.