Embedded
Data Management
Designing embedded data management techniques requires tackling a
conjunction of
constraints like scarce RAM resources, electronic stable storage, and
energy consumption. Studying these constraints with co-design
considerations
in mind is highly desirable to make future hardware platforms better
adapted to
data driven applications. The most significant contributions achieved
in this
direction are the following:
Secure chip DBMSs. Secure
chips
exhibit severe hardware constraints making traditional database
techniques irrelevant. Taking advantage of
our experience in designing and developing database techniques for
secure
chips, a first achievement in this field was the definition of a new
benchmark,
called DiSC, dedicated to secure chip DBMSs. The DiSC benchmark was
conceived
to: (1) compare the relative performance of candidate storage and
indexing data structures, (2) predict the limits of on-chip
applications, and (3) provide co-design hints to help calibrate the
resources of a future secure chip to meet the requirements of on-chip
data intensive applications. In addition, we studied
to which extent secure chips hardware evolution should impact the
design of
embedded data management techniques and drew research perspectives
linked to a
broader usage of secure chip data management techniques.
Embedded data management in
Flash
memory. Thanks to its
excellent properties in terms of read performance, energy consumption
and shock resistance, NAND Flash has become the most popular persistent
data
storage medium for mobile and embedded devices. Embedded data
management
techniques with flash memory are very challenging to design due to a
combination of
NAND Flash constraints (e.g., the block-erase-before-page-rewrite
constraint and
limited number of erase cycles) and embedded system constraints (e.g.,
tiny RAM
and resource consumption predictability). In this action, we focused on
(1) the definition of indexing models dedicated to NAND Flash/embedded
system constraints and (2) the design of a complete
DBMS engine coping with these same constraints. State of the art works
have proposed adaptations of traditional indexing methods (e.g.,
based on B-Tree or hashing) to cope with Flash constraints by deferring
index
updates using a log and batching them to decrease the number of rewrite
operations in Flash. However, these methods were not designed with
embedded system
constraints in mind and do not address them properly (notably, their
RAM
consumption exceeds by far the capacity of the targeted embedded
platforms). We proposed a
new alternative for indexing Flash-resident data that specifically
addresses the embedded context. This approach, called PBFilter,
organizes the index
structure in a purely sequential way. Key lookups are sped up thanks to
two
principles called Summarization and Partitioning. We instantiated these
principles
with data structures and algorithms based on Bloom Filters and showed
the effectiveness of this approach. PBFilter was patented by INRIA and
Gemalto in 2007. More recently, we
addressed the problem of designing a complete DBMS engine dedicated to
embedded
platforms with NAND Flash. We proposed a new paradigm called
database Serialization & Stratification consisting of organizing
the
complete
database (data, indexes, logs and buffers) sequentially and
reorganizing it also
in a sequential way. Hence random writes and their negative side
effects on
Flash are simply precluded. While the principle is obvious to express,
it
introduces new challenges for supporting efficiently database
updates/deletes,
transaction atomicity, buffer management and primary and secondary
indexes. The Serialization & Stratification paradigm allows
managing
a complete
database without generating any random writes, a property which may
have a wide applicability, in every context where random writes are
detrimental in
terms of I/O cost, energy consumption, space occupancy or lifetime.
Benchmarking flash devices.Thanks
to its excellent properties in terms of read performance, energy
consumption and shock resistance,
NAND Flash has become a credible competitor even for traditional disks
on high-end
servers. The new question is how should database systems adapt to this
new form
of secondary storage. Before we can answer this question, we need to
fully
understand the performance characteristics of flash devices.
Unfortunately, while
flash chips are very precisely specified, flash devices, e.g., Solid
State Disks
(SSDs), do not behave as flash chips. They are complex devices
including controller hardware and proprietary software (the so-called
Flash Translation
Layer or FTL). FTL are both complex and undocumented. As a result,
flash devices
are black boxes from a system's point of view. In order to understand
their performance characteristics, we have designed a benchmark, called
uFLIP, that casts light on all relevant usage patterns of current, as
well as
future, flash devices. uFLIP includes a benchmarking methodology which
takes into
account the particular characteristics of flash devices. In 2010, we
have also
devised a mechanism for measuring the energy consumption of flash
devices. While
energy consumption cannot be traced to individual IOs, we can associate
energy consumption figures to IO patterns, which helps understanding
further
the behavior of the devices. This work was done in cooperation with the
IT University of Copenhagen and the Reykjavík University. It was the
recipient of the best paper award at CIDR'09 (see
www.uflip.org).
We also proposed in 2010 an extension to
capture energy consumption.
Bimodal flash devices. While
disks have offered a stable
behavior for decades, thus guaranteeing the
timelessness of many database design decisions, flash devices keep on
mutating.
Many researchers have proposed to adapt database algorithms to existing
flash
devices. However, today, there is no reference DBMS design based on
solid
assumptions of flash devices behavior, precisely because flash device
behavior
varies across models, across firmware updates and possibly over time
for the
same model: database researchers are running after flash memory
technology. In this
study, we took the reverse approach and defined how flash devices
should
support database management. We advocated that flash devices should
provide
guarantees to a DBMS so that it can devise stable and efficient IO
management
mechanisms. Based on the characteristics of flash chips, we defined a
bimodal
FTL that distinguishes between a minimal mode where sequential writes,
sequential
reads and random reads are optimal while updates and random writes are
forbidden, and a mode where updates and random writes are supported at
the cost
of sub-optimal IO performance.
Note that the scope of uFLIP and bi-modal flash devices is much broader
than the embedded data management context. It also applies to high-end
flash
devices and Flash-based DBMS on high-end servers.