an ultra-low power address-event sensor interface for ... · power address-event sensor interface...
TRANSCRIPT
An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction
Al�o Di Mauro
Integrated Systems Laboratory
ETH Zurich, Switzerland
Francesco Conti∗
Integrated Systems Laboratory
ETH Zurich, Switzerland
Luca Benini†
Integrated Systems Laboratory
ETH Zurich, Switzerland
ABSTRACTInternet-of-�ings devices need sensors with low power footprint
and capable of producing semantically rich data. Promising can-
didates are spiking sensors that use asynchronous Address-Event
Representation (AER) carrying information within inter-spike times.
To minimize the overhead of coupling AER sensors with o�-the-
shelf microcontrollers, we propose an FPGA-based methodology
that i) tags the AER spikes with timestamps to make them car-
riable by standard interfaces (e.g. I2S, SPI); ii) uses a recursively
divided clock generated on-chip by a pausable ring-oscillator, to
reduce power while keeping accuracy above 97% on timestamps.
We prototyped our methodology on a IGLOOnano AGLN250 FPGA,
consuming less than 4.5mW under a 550kevt/s spike rate (i.e. a
noisy environment), and down to 50uW in absence of spikes.
ACM Reference format:Al�o Di Mauro, Francesco Conti, and Luca Benini. 2017. An Ultra-Low
Power Address-Event Sensor Interface
for Energy-Proportional Time-to-Information Extraction. In Proceedings ofDesign Automation Conference, Austin, TX, USA, June 18-22, 2017 (DAC’17),6 pages.
DOI: 10.1145/3061639.3062201
1 INTRODUCTION�e deployment on low power devices of complex “smart” appli-
cations based on multi-sensor data streams is at the core of the so
called “Internet-of-�ings” (IoT) revolution. In this context, small
and unobtrusive edge computing devices such as low power micro-
controllers must be able to extract high-level information out of
noisy, high bandwidth and essentially “informationally sparse” data
streams, such as those produced by o�-the-shelf microphones and
cameras. To extract high-level information out of these sensors, it
is necessary to use data analytics algorithms such as principal com-
ponent analysis [1] for dimensionality reduction, k-means [2] for
clustering, support-vector machines [3] or neural networks [4] for
∗Also with the EEES Laboratory - University of Bologna, Italy (contact at
†Also with the EEES Laboratory - University of Bologna, Italy (contact at
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for pro�t or commercial advantage and that copies bear this notice and the full citation
on the �rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior speci�c permission and/or a
fee. Request permissions from [email protected].
DAC’17, Austin, TX, USA© 2017 ACM. 978-1-4503-4927-7/17/06. . .$$15.00
DOI: 10.1145/3061639.3062201
classi�cation. �ese are typically too complex and computationally
intensive for most microcontrollers, leaving only the alternative to
add a hardware accelerator to the edge computing node (which is ex-
pensive in terms of cost and power), or to send raw data streams to
a higher-level computing infrastructure in the cloud (which has an
enormous energy overhead and requires relatively high-bandwidth
communication over radio).
A promising alternative approach is to move part of the semantic
information extraction burden to the sensor itself, giving up some
generality in exchange for an output data stream that already high-
lights data of interest to the speci�c application. �is modi�cation
could signi�cantly reduce the computational e�ort of a downstream
computing node by reducing the volume of data that such node has
to process to obtain the same semantic information. Within these
“smart” sensors, a particularly interesting class is that of event-
based spiking devices [5][6][7], which, similarly to neurons in the
retina and cochlea, mainly sense changes - that is, they produce
events when the content of the underlying sensed analog signal
has a high energy content within a set of narrow frequency bands.
�e output of these sensors is therefore essentially a predistilled
time-frequency representation of the original sensor signal.
Very o�en, event-based spiking sensors use an asynchronous
interface to communicate with the external world. �ere are several
motivations for this. First, the information content of the spike
stream is contained not only in the spike “address” (i.e. position
and/or frequency), but also in the relative inter-spike time delta,
similarly to what happens in the human retina [8]. An asynchro-
nous representation is therefore naturally suited to encapsulate
this information in an implicit way. Second, many of these sensors
were originally designed to couple with custom designed brain-like
interfaces that are internally asynchronous [9][10]. Finally, the
asynchronous interface o�ers an opportunity for signi�cant power
savings due to its clock-less nature.
Unfortunately, the implicit nature of this essential component
of the information embedded in a spike stream makes it di�cult to
transfer the stream to synchronous devices like micro-controllers.
First of all, it is necessary to sample the spike stream with a sampling
period su�cient to adequately represent small inter-spike times,
since time in synchronous systems is quantized by de�nition. �e
smallest possible inter-spike time forces the choice of a high sam-
pling frequency even if the average spike rate is in practice much
smaller. Moreover, apart from fully streaming ASICs, most syn-
chronous systems (such as o�-the-shelf microcontrollers) require
temporary data storage in a working memory to process and/or
transfer any kind of data. To this end, data must be transformed in
a latency-insensitive form, i.e. all time-related information has to be
DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.
made explicit so that it can be conserved for an inde�nite amount of
time. For these reasons, building a link between an asynchronous
event-based sensor and a commercial o�-the-shelf microcomtroller
or a similar synchronous device can be essentially considered a
time measurement problem, with the main constraint that power
must be kept within a very low power envelope.
Our contribution to solve the problem of coupling an event-
based sensor with a synchronous device is as follows. We propose a
low-power architecture to measure the inter-spike time and apply
a timestamp to each event, explicitizing the time component of
information embedded within the spike train. Moreover, to increase
the e�ciency in sensing input signals, we progressively reduce
the frequency of the sampling interface between two consecutive
events, so that if the rate is low power consumption is signi�cantly
reduced. If two events are too far away in time, we consider them
incorrelated and fully switch down the clock to enter an even lower
power mode. We present an embodiment of this architecture on
a low-power MicroSemi IGLOOnano FPGA, using a pausable ring
oscillator to generate the variable frequency clock, and test it by
transforming the spike-stream from the low power spiking cochlea
designed by Liu et al. [11] into an I2S stream that can be consumed
by most microcontrollers. We show that it is possible to signi�cantly
reduce the power of the interface when the spiking rate is decresed,
ranging from 50 µW in absence of spikes to 4.5 mW at a 550kevt/s
spike rate, equivalent to a noisy environment.
2 RELATEDWORKSSigni�cant research interest in the topic of brain inspired sensors
has been shown, both as a way to explore and understand the way
human sensory organs work through their imitation, and as a means
to provide ”smarter” input to vision and audio processing. Event-
based pixel sensors show data rates in the order of a few Mevt/s
or less: for example, DVS128 [12], developed by INIlab, features
a maximum event rate of 1 Mevt/s within a power envelope of
∼23 mW. �e sensor proposed by Go�ardi et al. [7] bu�ers all
events up to building a 128 × 64 “frame”; with a 25% pixel activity
(equivalent to ∼100 kevt/s) it consumes 100 µW. Audio sensors such
as silicon cochleas [13][14][11] typically work at tens/hundreds of
kevt/s for typical speech scenarios, within a power envelope of less
than 15 mW, and down to mere tens of µW for the latest sensor
proposed by Yang et al. [15]. �ese sensors can be employed in
many applications that can bene�t from cognitive computing and
semantically high level input data, such as autonomous UAVs [16],
robotics (both audio [17] and vision based [18]), and industrial [19]
or tra�c safety [20] applications.
To integrate these sensors in real systems, several examples
of AER interfaces have been developed, exploiting a variety of
hardware and so�ware architectures. �e purpose of most of these
platforms is essentially to interface a neuronal chip with a PC
for test, debug and data acquisition. For this reason, most of these
interfaces are designed to cope with the worst case in terms of input
data rate, and have no hard constraint on power consumption. For
example, interfaces from AER to PCI or USB have been developed
both on FPGA [21][22][23][24], achieving sustainable event rates
up to 10 Mevt/s in a power envelope in the order of hundreds of
function AETRsampling ( Tmin , θdiv , Ndiv )
Tsample ← Tmin ; cntsample ← 0 ; cntdiv ← 0
while True doif reqest() then
sample() ; acknowledge()cntsample ← 0 ; cntdiv ← 0
Tsample ← Tmin
else if cntsample = θdiv thenif cntdiv = Ndiv then
cntsample ← 0 ; cntdiv ← 0
shutdown clk()wait for reqest()continue
elseTsample ← 2 · Tsample
cntsample ← 0 ; cntdiv ← cntdiv + 1
end ifelse
cntsample ← cntsample + 1
end ifwait one cycle()
end whileend function
Figure 1: Time-to-information extraction methodology.Tmin is the starting sampling period (i.e. the fastest); θ
divis
the number of cycles between two successive divisions of thesampling clock; N
divis the number of times the clock is di-
vided before it is switched o�.
mW or more, and ASIC [25][26], with sustainable event rates up to
20Mevt/s.
Rusci et al. [27] propose a smart wakeup interface to an event-
based vision sensor integrated within a ultra-low-power multicore
system-on-chip which is similar to the one we propose here. With
respect to our work, this proposal is less �exible, as it does not
allow interfacing with a generic microcontroller, and it does not
feature a locally generated variable frequency clock, which is a key
component of our work. It enables ultra high e�ciency in vehicle
detection, with real time performance achieved in less than 25 µW.
As previously mentioned, our proposal in this work relies on
the availability of a variable frequency pausable clock generated
directly on chip. �e reference clock used to synchronize incoming
data is tuned accordingly to the activity at the asynchronous bound-
ary; this approach is similar to that used in a very di�erent context
in Globally Asynchronous, Locally Synchronous (GALS) systems
[28]. Some of these systems are able to pause and reactivate the
clock reference used for synchronization in a data-driven fashion,
i.e. depending on the presence/absence of an asynchronous hand-
shake. To this end, logic circuits similar to the one we propose
in Section 3 have been already exploited to implement so-called
pausible clocks [29].
3 TIME-TO-INFORMATIONEXTRACTION
Spike streams coming out from asynchronous brain-inspired sen-
sors contain two di�erent kinds of information: the transmi�ed
data value itself (i.e. the address of the “neuron” that produced the
spike), and the time delta between two successive events. Address-
Event Representation (AER) [30] is the protocol used by many of
these sensors; it employs a 4-phase asynchronous handshake and
An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction DAC’17, June 18-22, 2017, Austin, TX, USA
Figure 2: AER sampling clock with Ndiv= 3, θ
div= 8.
an address channel. �is protocol does not provide any explicit
information about the time that separates two consecutive ele-
ments, hence this information is implicit in the inter-events time.
A completely asynchronous interface, by de�nition, is not able to
explicitly extract information related with timing: it has to work
as a continuous consumer of the event spike stream. Either the
downstream computing device is explicitly working as such (e.g.
a brain-inspired architecture like TrueNorth [10]), or the time do-
main information must be extracted explicitly. �e former behavior
can only be implemented in a typical microcontroller by forcing
it to remain always-on and active to process collected events in
real time; conversely, making the time domain information explicit
could enable storing and accumulating events so that they can be
processed in batch, allowing more e�cient usage of the downstream
computing device. Increasing the e�ciency of the events acquisi-
tion/timestamping unit becomes crucial in this architecture, since
only this block would be active during the spikes accumulation
phase; all the unused part of the system could be clock-gated. In
such architecture, the actual achievable energy saving depends on
two main factors: i) the ratio between the input and output bitrate;
ii) the bu�er size.
To extract the implicit time domain information (the inter-spiketime) from the spike stream, we propose the mechanism shown in
Figure 1, which is based on variable frequency sampling of the AER
input. Each event arrived at the interface is tagged with a timestamp
measured as the time delta from the previous spike event. We
call the timestamp-enriched format of AER an Address-Event-TimeRepresentation or AETR. In the AETR format, spike events are made
latency-insensitive because their arrival time is explicitly encoded,
and can be stored for an inde�nite amount of time before being
processed or carried over any other digital data transfer protocol
without making additional assumptions of any kind. As we are
interested in relative precision for inter-spike deltas, the sampling
frequency can be progressively relaxed, reducing the frequency by
one half every θdiv
cycles as shown in Figure 2. Eventually, if no
spike is present on the input, a�er Ndiv
clock divisions the clock is
completely stopped to save even more power, and reactivated only
when a AER request for handshake is asserted at the input.
4 HARDWARE ARCHITECTUREWe deployed the time-to-information extraction methodology that
was detailed in Section 3 on a AER-to-I2S interface implemented
on a low-power MicroSemi IGLOOnano FPGA. We targeted in
particular the iniLabs DAS1 cochlea sensor1, which mounts the
Cochlea AMSC1c chip [11]. We selected an I2S stream as the carrier
of the timestamp-augmented spike stream accordingly to the audio
1h�p://inilabs.com/products/dynamic-audio-sensor
IGLOOnano FPGA
AMSC1c
Cochlea
REQ
ACK
ADDR
microphone SPI
unit
clock
generator
AER to AETR
sampling unit
AETR bu!er
(9.2 kB)
I2S
interface
SCK
CSN
MOSI
MISO
SCK
WS
SD
Micro
Controller
Unit
STM32-L476
con"guration bus
data crossbar
10bit
INT
Figure 3: AER-to-I2S interface between theCochlea AMSC1Cand a microcontroller unit.
nature of the cochlea signal; through the proposed interface, the
cochlea can be connected to any I2S-equipped microcontroller unit
(MCU), such as an STM32-L476 [31]. Figure 3 shows a high-level
architectural diagram of the full system.
�e hardware architecture of the AER-to-I2S interface is formed
by four main macro-blocks: i) an AER front-end, which acts as
spike stream synchronization block and produces the timestamp-
augmented AETR stream, ii) a bu�er module, which can be con�g-
ured to hold the AETR data to create a batch to be transferred in
block, iii) the Clock Generator, which provides the recursively di-
vided clock, based on a pausable ring oscillator, iv) the I2S interface.
�e blocks that send or receive AETR data are interconnected by
a combinational crossbar, while a con�guration bus, accessible by
the outside through SPI, is used to modify the interface con�gura-
tion registers at runtime. Except for the request monitor inside the
AER front-end, all blocks are clock-gated by default and activated
only when in active use; moreover, all modules use the same global
variable frequency clock generated on-chip by the clock generator.
An input spike is signaled by the assertion of the AER request sig-
nal (REQ). As shown in Figure 4, the input monitor used to receive
the request is consitututed by a simple cascade of two �ip-�ops to
synchronize the request and reduce the occurence of metastability.
As in AER the address (ADDR) signal is required to be already stable
when REQ is asserted, the address is simply sampled by a single
10bit register. A counter generates the timestamp used to tag the
incoming events; it has a con�gurable increment step to produce
timestamps coherent with the varying sampling period. �e tagged
AETR data stream is sent to an SRAM-based FIFO bu�er, where
the collected events are stored until reaching a certain threshold,
at which point the bu�ered data is converted into an I2S stream
towards the downstream microcontroller.
D Q
D Q D Q
ADDR
REQ
ADDR (sampled)
REQ (synch)
CLK (always on)
CLK (gateable)
10
Figure 4: AER interface ADDR and REQ input monitor.
DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.
SLEEP
EN
QD
VDD
REQ
SLEEP
PULSE CLK
QQ D
Figure 5: Schematic of the ring oscillator with start/stop cir-cuit.
4.1 Clock Generator�e Clock Generator is responsible of generating the variable fre-
quency clock described in Section 3. It is composed of a pausable
ring oscillator, which provides the reference clock frequency, and a
con�gurable clock divider.
�e ring oscillator, shown in Figure 5, is implemented as a cas-
cade of an odd number of inverting gates placed in a closed loop
con�guration. Minimum delay inverters have been used for higher
granularity in the generated frequency selection, which can be
performed by removing/inserting a pair of inverters. �e input
inverter is substituted by a NOR2 gate to interrupt the inverting
chain and stop the oscillator. Since the clock is used as reference
for the whole system, all registers are frozen when it is deactivated
- including the one generating the SLEEP bit. To avoid this being
a deadlock condition, this is converted into a pulse by a chain of
inverters; the length of this chain is de�ned by the constraint that
the pulse must be longer then a clock semiperiod and arrive during
the low clock phase. As shown in Figure 5, the clock is stopped by
the assertion of the SLEEP PULSE bit, which is AND’ed with the
clock to avoid glitches.
�e ring oscillator generates a 120 MHz clock that is fed to a cas-
cade of frequency dividers to bring the frequency down to 30 MHz
(reference clock). A �nite state machine implements the algorithm
detailed in Section 3, generating the global clock with a submultiple
frequency with respect to the reference one. �e θdiv and Ndivcon�guration parameters can be loaded from the outside via the
SPI con�guration interface to change the interface con�guration at
run-time.
5 EXPERIMENTAL RESULTS�e system has been implemented on an IGLOOnano AGLN250V2FPGA, using Synopsys Synplify Pro J2015.03M for logical synthesis
and Microsemi Libero SoC 11.7 for placement & routing. �e inter-
face utilizes 31% of the resources available (∼ 600 equivalent logic
gates). We constrained the design to work with a 30 MHz clock
reference frequency generated by the ring oscillator, i.e. 15 MHz
as the highest frequency available for sampling. �is means that
inter-spike time of 130 ns or more can be sensed by the interface;
more than enough to respect the most commonly used standard for
the AER protocol, CAVIAR [32], which requires each event to be
completed within 700 ns.
0.001
0.01
0.1
1
100 1000 10000 100000 1x106
Avera
ge e
rror
Event rate (evt/s)
Timestamp error
θdiv = 16θdiv = 32θdiv = 64
Figure 6: Average relative error introduced by the AER-to-AETR conversion.
5.1 Time-to-Information extraction accuracyTo evaluate the time accuracy and the error introduced by the
time quantization with the our variable frequency approach, we
implemented a Matlab model of the clock generation unit, which
can be fed with a con�gurable event rate Poisson distributed spike
stream. In this model we assume a perfect clock with constant
frequency and 50% duty cycle. �e system has been simulated
for di�erent values of θdiv , and in a range of event rates between
100evt/s and 2Mevt/s. Figure 6 shows that in the event rate range of
interest (e.g. for θdiv = 64, from 1 kevt/s to 550 kevt/s), the average
error caused by frequency division can be kept signi�cantly below
the analytic 3% bound.
In the graph shown in Figure 6, we distinguish three di�erent
regions (e.g. for θdiv = 64): inactive region, from 100 evts/s to 100
kevt/s, corresponding to a very low activity of the sensor; activeregion, from 100 kevts/s to approximately 550 kevt/s, where the
divided clock methodology is applied; high-activity region, above
∼550 kevt/s, where the reference frequency is always the maximum
one.
In the inactive region, the error is high as the event rate is so
low that the interface is essentially always o�, therefore most spike
events are tagged with the saturated timestamp: this corresponds
to a region in which we are uninterested in the correlation be-
tween events. In the high-activity region, the behavior is di�erent:
when the event rate is very high, nearing to the non-divided sam-
pling frequency, the error increases because a increasing fraction of
the spikes are separated by inter-spike times which are below the
Nyquist period, and therefore are tagged incorrectly (this is a limit
related to the choice of the non-divided sampling frequency, and
not to our frequency division scheme). In the active region, which
is our main region of interest, the error oscillates between two
boundaries; the upper bound is given by a time measurement of the
inter-spike time, done just a�er an iterative frequency division. �e
lower bound is given when the the inter-spike time is measured
just before a new iterative frequency division. In other words, the
An Ultra-Low Power Address-Event Sensor Interfacefor Energy-Proportional Time-to-Information Extraction DAC’17, June 18-22, 2017, Austin, TX, USA
0
50
100
150
200
250
0 100 200 300 400 500 600 700 800 0
50000
100000
150000
200000
250000
300000
350000
400000
Spik
e a
ddre
ss
Event ra
te (
evt/s)
Time (ms)
EventEvent rate
(a) Address-Event Representation and event rate.
0.05
0.1
0.15
0.2
0.25
0.3
0 3 6 9 12
Pro
ba
bili
ty
Timestamp error %
θdiv = 16
0.05
0.1
0.15
0.2
0.25
0.3
0 3 6 9 12
Pro
ba
bili
ty
Timestamp error %
θdiv = 32
0.05
0.1
0.15
0.2
0.25
0.3
0 3 6 9 12
Pro
ba
bili
ty
Timestamp error %
θdiv = 64
(b) Distribution of timestamp errors at di�erent θdiv.
Figure 7: Example of single output channel of the cochleasensor for a word extracted from a real sentence, with eventrate and error distributon.
peaks and valleys in the average error in this region are related to
the Ndiv
successive divisions of the clock.
Figure 7 shows an example of the output of the cochlea when
sensing a word in a real conversation (Figure 7a), along with the
error distribution at di�erent values of θdiv
. Figure 7b clearly shows
how increasing θdiv
improves overall accuracy, although this im-
provement comes with some power cost as clari�ed by the following
section.
5.2 Power consumptionTo measure the e�ciency introduced by the clock division method-
ology, we compared the power consumption with our approach
with a “naıve” constant frequency sampling approach utilizing the
same ring oscillator; in both cases we clock-gated the unused parts
of the circuit to highlight the improvements introduced by the sole
frequency division. We added to the design a variable rate pseudo-
random spike generator based on a linear-feedback shi� register to
feed the system with a �xed rate spike stream and measure power
0
1
2
3
4
5
0.01 0.1 1 10 100
Pow
er
consum
ption (
mW
)
Event rate (kevt/s)
θdiv = 64 θdiv = 32 θdiv = 16
No divisionIdeal
Figure 8: Power consumption
directly on the FPGA board in the range between 10 evt/s to 800
kevt/s for three di�erent values of θdiv
.
As can be observed in Figure 8, the proposed solution is vastly
more e�cient than the naıve clocking at all except for extremely
high rates, when they are on par. Let us consider the ideal power
consumption of the interface as a linear function of rate r , i.e.
Pideal(r ) = E
spike· r + Pstatic, (1)
where Pstatic is the static power consumed by the FPGA (50 µW) and
Espike
is the ideal dynamic energy per spike, which we estimated
as the one in the high-activity region. We can see from Figure 8
that the power consumption gets farther from ideality as the event
rate is decreased, but the clock division technique we propose in
Section 3 drastically improves the situation with respect to the
baseline technique with no clock division. Furthermore, when
the event rate drops below ∼1 kevt/s the clock is o�en shut down
completely, boosting e�ciency up to near ideal power consumption,
particularly at event rates lower than 10 to 100kevt/s. When the
activity of the sensor is very low, the ring oscillator switches o�
o�en, determining a steeper decrease of power consumption when
successive spikes are uncorrelated. Notice that the switching o� of
the ring oscillator can be performed without a signi�cant worsening
in the acquisition time of the next incoming event, since the time
to recover from the o�-state is in the order of 100 ns; which is
comparable with a single clock period at the max freq. �erefore,
with this clock methodology we measured a reduction in power
consumption up to 55% in the active region (in the order of a few
kevt/s), down to only 50 µW in the inactive region.
�e maximum time interval the interface is able to measure
depends directly from the value of θdiv and Ndiv . �ese two pa-
rameters can be used as two di�erent knobs to match both the
desired accuracy and the desired maximum time interval that the
interface is able to cover. �is time can be computed from Figure
8 as the inverse of the event rate in the �ex point of the power
consumption trends.
DAC’17, June 18-22, 2017, Austin, TX, USA A. Di Mauro et al.
6 CONCLUSIONIn this work, we have shown �exible architecture to be deployed on
a small low-power FPGA to link asynchronous event-based sensors
with commercial o�-the-shelf microcontrollers. As an essential
part of the spiking information is embodied in the inter-spike time,
measuring these e�ciently is the key task of our interface. �e
approach we use, based on iterative clock divisions to save power
with minor accuracy loss and on switching o� the clock altogether
when events are extremely sparse, achieves much be�er energy
proportionality than simple sampling at a constant frequency. �e
system has been fully tested on a IGLOOnano FPGA in connection
with aCochlea AMSC1C sensor via AER and a STM32 microcontoller
via I2S and is fully functional. �e power consumption for time-
to-information extraction scales from 4.5 mW at a 550kevt/s rate
down to slightly more than 50 µW at rates lower than 10evt/s (a
90× factor) while a naıve constant clock methodology is stuck
to the same 4.5 mW power regardless of the event rate. At the
same time, with our technique the accuracy reduction can be kept
bounded below 3%, and on average it is even smaller. We believe this
architecture can enable new applications of event-based sensors
in all kinds of low-power devices, both those targeting explicitly
the brain-like nature of these sensors and other ones which simply
exploit their semantically rich output data.
7 ACKNOWLEDGEMENTSWe thank Shih-Chii Liu and Tobi Delbruck from Institute of Neu-
roinformatics of Zurich (INI), University of Zurich, for kindly lend-
ing us a Cochlea AMSC1C prototype. �is work was supported by
EU project ExaNoDe (H2020-671578).
REFERENCES[1] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary
Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.
[2] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y.
Wu, “An e�cient k-means clustering algorithm: Analysis and implementation,”
IEEE transactions on pa�ern analysis and machine intelligence, vol. 24, no. 7, pp.
881–892, 2002.
[3] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,”
ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1–
27:27, May 2011.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classi�cation with Deep
Convolutional Neural Networks,” in Advances in Neural Information ProcessingSystems 25, F. Pereira, C. J. C. Burges, L. Bo�ou, and K. Q. Weinberger, Eds.
Curran Associates, Inc., 2012, pp. 1097–1105.
[5] C. Posch, T. Serrano-Gotarredona, B. Linares-Barranco, and T. Delbruck,
“Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking
Output,” Proceedings of the IEEE, vol. 102, no. 10, pp. 1470–1484, Oct. 2014.
[6] M. Yang, C. H. Chien, T. Delbruck, and S. C. Liu, “A 0.5V 55 µW 64x2-channel
binaural silicon cochlea for event-driven stereo-audio sensing,” in 2016 IEEEInternational Solid-State Circuits Conference (ISSCC), Jan. 2016, pp. 388–389.
[7] M. Go�ardi, N. Massari, and S. A. Jawed, “A 100 µW 128x64 pixels contrast-
based asynchronous binary vision sensor for sensor networks applications,” IEEEJournal of Solid-State Circuits, vol. 44, no. 5, pp. 1582–1592, May 2009.
[8] D. A. Bu�s, C. Weng, J. Jin, C.-I. Yeh, N. A. Lesica, J.-M. Alonso, and G. B. Stanley,
“Temporal precision in the neural code and the timescales of natural vision,” 2007.
[9] S. Moradi and G. Indiveri, “An event-based neural network architecture with an
asynchronous programmable synaptic memory,” IEEE Transactions on BiomedicalCircuits and Systems, vol. 8, no. 1, pp. 98–107, Feb. 2014.
[10] P. a. Merolla, J. V. Arthur, R. Alvarez-Icaza, a. S. Cassidy, J. Sawada, F. Akopyan,
B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Ap-
puswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S.
Modha, “A million spiking-neuron integrated circuit with a scalable commu-
nication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, Aug.
2014.
[11] S. C. Liu, A. van Schaik, B. A. Minch, and T. Delbruck, “Asynchronous Binaural
Spatial Audition Sensor With 2 64 4 Channel Output,” IEEE Transactions on
Biomedical Circuits and Systems, vol. 8, no. 4, pp. 453–464, Aug. 2014.
[12] [Online]. Available: h�p://inilabs.com/products/dynamic-vision-sensors/
speci�cations/
[13] B. Wen and K. Boahen, “A silicon cochlea with active coupling,” IEEE Transactionson Biomedical Circuits and Systems, vol. 3, no. 6, pp. 444–455, Dec. 2009.
[14] S. C. Liu, A. van Schaik, B. A. Mincti, and T. Delbruck, “Event-based 64-channel
binaural silicon cochlea with q enhancement mechanisms,” in Proc. IEEE Int.Symp. Circuits and Systems, May 2010, pp. 2027–2030.
[15] M. Yang, C.-H. Chien, T. Delbruck, and S.-C. Liu, “A 0.5V 55µW 64×2 Channel
Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing,” in 2016 IEEEInternational Solid-State Circuits Conference (ISSCC). IEEE, 2016, pp. pp. 388–389.
[16] M. Rusci, D. Rossi, M. Lecca, M. Go�ardi, E. Farella, and L. Benini, “An Event-
Driven Ultra-Low-Power Smart Visual Sensor,” IEEE Sensors Journal, vol. 16,
no. 13, pp. 5344–5353, Jul. 2016.
[17] F. Gomez-Rodriguez, A. Linares-Barranco, L. Miro, S. C. Liu, A. van Schaik,
R. Etienne-Cummings, and M. A. Lewis, “AER Auditory Filtering and CPG for
Robot Control,” in Proceedings of IEEE International Symposyum on Circuits andSystems, May 2007, pp. 1201–1204.
[18] A. Jimenez-Fernandez, J. L. F. del Bosh, R. Paz-Vicente, A. Linares-Barranco, and
G. Jimenez, “Neuro-inspired system for real-time vision sensor tilt correction,”
in Proc. IEEE Int. Symp. Circuits and Systems, May 2010, pp. 1394–1397.
[19] J. Conradt, R. Berner, M. Cook, and T. Delbruck, “An embedded AER dynamic
vision sensor for low-latency pole balancing,” in Proceedings of IEEE 12th Inter-national Computer Vision Conference Workshops (ICCV Workshops), Sep. 2009, pp.
780–785.
[20] C. Conde, E. Orbe, I. M. d. Diego, and E. Cabello, “Bio-inspired Event Based
Motion Detection for Tra�c Safety in a Close-Real Automotive Environment,”
in Proc. IEEE Electronics, Robotics and Automotive Mechanics Conf. (CERMA), Nov.
2011, pp. 120–125.
[21] A. Linares-Barranco, R. Paz, A. Jimenez-Fernandez, C. D. Lujan, M. Rivas, J. L.
Sevillano, G. Jimenez, and A. Civit, “Neuro-inspired real-time USB & PCI to AER
interfaces for vision processing,” in Proc. Int. Symp. Performance Evaluation ofComputer and Telecommunication Systems SPECTS 2008, Jun. 2008, pp. 330–337.
[22] R. Berner, T. Delbruck, A. Civit-Balcells, and A. Linares-Barranco, “A 5 Meps
USB2.0 Address-Event Monitor-Sequencer Interface,” 2006.
[23] R. Paz-Vicente, A. Linares-Barranco, D. Cascado, M. A. Rodriguez, G. Jimenez,
A. Civit, and J. L. Sevillano, “PCI-AER interface for neuro-inspired spiking sys-
tems,” in 2006 IEEE International Symposium on Circuits and Systems, May 2006,
pp. 4 pp.–.
[24] S. O. Cisneros, J. J. R. Panduro, D. T. A. Bretn, and J. R. R. Barn, “Space-time aer
protocol receiver asynchronously controlled on fpga,” Computing Science andAutomatic Control (CCE), 2014.
[25] M. Hofsta�er, P. Schon, and C. Posch, “A SPARC-compatible general purpose
address-event processor with 20-bit l0ns-resolution asynchronous sensor data
interface in 0.18um CMOS,” IEEE International Symposium on Circuits and Systems,2010.
[26] C. Brandli, R. Berner, M. Yang, S. C. Liu, and T. Delbruck, “A 240× 180 130 dB 3 µs
Latency Global Shu�er Spatiotemporal Vision Sensor,” IEEE Journal of Solid-StateCircuits, vol. 49, no. 10, pp. 2333–2341, Oct. 2014.
[27] M. Rusci, D. Rossi, M. Lecca, M. Go�ardi, L. Benini, and E. Farella, “Energy-
e�cient design of an always-on smart visual trigger,” in Proc. IEEE Int. SmartCities Conf. (ISC2), Sep. 2016, pp. 1–6.
[28] M. Krstic, E. Grass, F. K. Gurkaynak, and P. Vivet, “Globally Asynchronous,
Locally Synchronous Circuits: Overview and Outlook,” IEEE Design Test of Com-puters, vol. 24, no. 5, pp. 430–441, Sep. 2007.
[29] K. Y. Yun and R. P. Donohue, “Pausible clocking: a �rst step toward heteroge-
neous systems,” in Proc. Conf. IEEE Int Computer Design: VLSI in Computers andProcessors ICCD ’96, Oct. 1996, pp. 118–123.
[30] �e Address-Event Representation Communcation Protocol.[31] “STMicroelectronics STM32L476xx Datasheet.”
[32] CAVIAR Hardware Interface Standards, Version 2.01.