Embedded Data Management

Designing embedded data management techniques requires tackling a conjunction of constraints like scarce RAM resources, electronic stable storage, and energy consumption. Studying these constraints with co-design considerations in mind is highly desirable to make future hardware platforms better adapted to data driven applications. The most significant contributions achieved in this direction are the following:

Secure chip DBMSs. Secure chips exhibit severe hardware constraints making traditional database techniques irrelevant. Taking advantage of our experience in designing and developing database techniques for secure chips, a first achievement in this field was the definition of a new benchmark, called DiSC, dedicated to secure chip DBMSs. The DiSC benchmark was conceived to: (1) compare the relative performance of candidate storage and indexing data structures, (2) predict the limits of on-chip applications, and (3) provide co-design hints to help calibrate the resources of a future secure chip to meet the requirements of on-chip data intensive applications. In addition, we studied to which extent secure chips hardware evolution should impact the design of embedded data management techniques and drew research perspectives linked to a broader usage of secure chip data management techniques.

Embedded data management in Flash memory. Thanks to its excellent properties in terms of read performance, energy consumption and shock resistance, NAND Flash has become the most popular persistent data storage medium for mobile and embedded devices. Embedded data management techniques with flash memory are very challenging to design due to a combination of NAND Flash constraints (e.g., the block-erase-before-page-rewrite constraint and limited number of erase cycles) and embedded system constraints (e.g., tiny RAM and resource consumption predictability). In this action, we focused on (1) the definition of indexing models dedicated to NAND Flash/embedded system constraints and (2) the design of a complete DBMS engine coping with these same constraints. State of the art works have proposed adaptations of traditional indexing methods (e.g., based on B-Tree or hashing) to cope with Flash constraints by deferring index updates using a log and batching them to decrease the number of rewrite operations in Flash. However, these methods were not designed with embedded system constraints in mind and do not address them properly (notably, their RAM consumption exceeds by far the capacity of the targeted embedded platforms). We proposed a new alternative for indexing Flash-resident data that specifically addresses the embedded context. This approach, called PBFilter, organizes the index structure in a purely sequential way. Key lookups are sped up thanks to two principles called Summarization and Partitioning. We instantiated these principles with data structures and algorithms based on Bloom Filters and showed the effectiveness of this approach. PBFilter was patented by INRIA and Gemalto in 2007. More recently, we addressed the problem of designing a complete DBMS engine dedicated to embedded platforms with NAND Flash.  We proposed a new paradigm called database Serialization & Stratification consisting of organizing the complete database (data, indexes, logs and buffers) sequentially and reorganizing it also in a sequential way. Hence random writes and their negative side effects on Flash are simply precluded. While the principle is obvious to express, it introduces new challenges for supporting efficiently database updates/deletes, transaction atomicity, buffer management and primary and secondary indexes. The Serialization & Stratification paradigm allows managing a complete database without generating any random writes, a property which may have a wide applicability, in every context where random writes are detrimental in terms of I/O cost, energy consumption, space occupancy or lifetime.

Benchmarking flash devices.Thanks to its excellent properties in terms of read performance, energy consumption and shock resistance, NAND Flash has become a credible competitor even for traditional disks on high-end servers. The new question is how should database systems adapt to this new form of secondary storage. Before we can answer this question, we need to fully understand the performance characteristics of flash devices. Unfortunately, while flash chips are very precisely specified, flash devices, e.g., Solid State Disks (SSDs), do not behave as flash chips. They are complex devices including controller hardware and proprietary software (the so-called Flash Translation Layer or FTL). FTL are both complex and undocumented. As a result, flash devices are black boxes from a system's point of view. In order to understand their performance characteristics, we have designed a benchmark, called uFLIP, that casts light on all relevant usage patterns of current, as well as future, flash devices. uFLIP includes a benchmarking methodology which takes into account the particular characteristics of flash devices. In 2010, we have also devised a mechanism for measuring the energy consumption of flash devices. While energy consumption cannot be traced to individual IOs, we can associate energy consumption figures to IO patterns, which helps understanding further the behavior of the devices. This work was done in cooperation with the IT University of Copenhagen and the Reykjavík University. It was the recipient of the best paper award at CIDR'09 (see www.uflip.org). We also proposed in 2010 an extension to capture energy consumption.

Bimodal flash devices. While disks have offered a stable behavior for decades, thus guaranteeing the timelessness of many database design decisions, flash devices keep on mutating. Many researchers have proposed to adapt database algorithms to existing flash devices. However, today, there is no reference DBMS design based on solid assumptions of flash devices behavior, precisely because flash device behavior varies across models, across firmware updates and possibly over time for the same model: database researchers are running after flash memory technology. In this study, we took the reverse approach and defined how flash devices should support database management. We advocated that flash devices should provide guarantees to a DBMS so that it can devise stable and efficient IO management mechanisms. Based on the characteristics of flash chips, we defined a bimodal FTL that distinguishes between a minimal mode where sequential writes, sequential reads and random reads are optimal while updates and random writes are forbidden, and a mode where updates and random writes are supported at the cost of sub-optimal IO performance.

Note that the scope of uFLIP and bi-modal flash devices is much broader than the embedded data management context. It also applies to high-end flash devices and Flash-based DBMS on high-end servers.