Home » Resources » Responsive Reliable Products Using Linux and Flash

Responsive Reliable Products Using Linux and Flash

Who Should Read This Paper?

This paper is intended to provide a high-level perspective on the challenges and opportunities for using flash memory in Linux-based designs.

The focus of this paper is mainly on software, and topics covered will be most useful for software developers and system engineers who interact with software.

Though most of the hardware details are left to other sources, hardware engineers may find the topics useful for a more comprehensive understanding of the software requirements of flash memory.

Introduction

There are three key trends driving new embedded designs. Individually, each one is making a positive contribution to the embedded market, but in combination these trends may be leading to a data reliability nightmare.

This paper describes the convergence of these trends: growth in flash memory usage, and in the importance of data, and how they represent a triple-threat to embedded devices. Included is a list of 7 best practices for software designers concerned about the success of their products.

Because the embedded market is so complex, no single paper can provide a comprehensive list of best practices for the software engineer. However, following the practices outlined here will minimize the most common issues in flash management and data reliability that we have encountered in our 25 years as a software provider.

Linux is now the de facto standard

The adoption of Linux is finally realizing the growth we’ve all expected for years.  Embedded Linux design starts are growing at double digit rates, a fact evidenced by near daily announcements of Linux-based products. Well-known companies like MontaVista, ACCESS, Trolltech, TimeSys, LynuxWorks, and Wind River are offering Linux-based platforms and solutions to embedded developers. With this momentum, it is clear that the growth of Linux in embedded designs will remain strong.

The Linux developer and user community, a dedicated and often outspoken group, now has the commercial acceptance that reinforces what they’ve been saying all along: Linux is a viable solution for embedded designs. Even semiconductor companies, sometimes dismissive of software requirements, are actively participating in Linux kernel and application development by contributing code back into the community source trees.

Here are a few examples of the commitment of major semiconductor companies to Linux:

ARM : (Nasdaq: ARMHY) : Revenue - $549.1M

The ARM core is the fastest growing design choice for new embedded designs. For several years, ARM has embraced Linux and has now begun releasing code back into the community. ARM has created several online forums for nurturing development using both ARM technology and Linux.

Intel Corporation : (Nasdaq: INTC) : Revenue - $37.32B

Intel has proven to be a strong contributor as well, for enterprise, desktop and embedded applications. They have been releasing drivers and other source code, helping the kernel community and commercial distribution suppliers sustain Linux-based business on Intel Architecture and most recently, on ARM-licensed XScale (now a part of Marvell through acquisition in 2006).

Texas Instruments : (Nasdaq: TXN) : Revenue - $13.74B

TI has devoted substantial resources to creating and funding ports for Linux and other open source software on their high-end mobile phone platforms, including OMAP and DaVinci. They provide drivers for the video and audio hardware capabilities on their reference boards for those and other devices.

Atmel : (Nasdaq: ATML) : Revenue - $1.62B

Atmel, an ARM licensee and tier II supplier of embedded hardware platforms, is investing in creating open source solutions for their silicon. In particular, they are pushing the envelope for memory-constrained ARM based devices.

The strengths of embedded Linux are numerous. Foremost, Linux reliability is demonstrated by its growing use among developers that must design to a “six-9’s” system availability standard. Although the adoption of these requirements began in the design of enterprise servers, the 99.9999% reliability philosophy is being employed for many embedded systems with a critical application – such as telecom routers, automotive contols, and industrial automation applications.  In addition, the availability of source code attracts many developers, who fear difficulties in bringing up, debugging and maintaining platform software in the absence of source code. Even with the deployment of advanced debugging tools, often the fastest way to resolve an issue is by stepping through the source. FInally, the expansive community of developers provides an additional incentive for choosing Linux. Individuals and teams can look to the open source developer community-at-large for feedback, suggestions and informal support, often as a complement to, or even replacement of, the support from commercial vendors. 

Bill Weinberg, Principal Analyst at LinuxPundit.com and General Manager of the Linux Phone Standards Forum, explains:

 “Today, Linux is both a de facto standard for embedded development, and a more formal one.  Its de facto market share advantage (>30%) comes from free [open] access to source code, ubiquitous enterprise deployment, use of Linux as a development host, and as a prototyping system for semiconductor suppliers to bring up their wares. 

“Linux offers developers more than ‘just another embedded OS.’  Rather, the open source operating system presents a real standards-based platform, fostering interoperability and re-use of both embedded software components and thousands of COTS [commercial off-the-shelf] software packages from the desktop and enterprise.”

Research substantiates this trend of increasing Linux adoption. Analysts find Linux a strong competitor across the overall marketplace. Here are some findings from embedded analysts:

“Linux is putting increasing pressure on the traditional embedded operating system vendors,” — VDC Corporation

“40% of the embedded developers surveyed were using Linux,”  — Evan Data Corporation

“Linux will power over 31 % of all smartphones sold in 2011, with a CAGR of 75% through 2012,” —  ABI Research

Embedded Linux has become the de-facto standard platform for embedded applications. This scenario is quite a long way from the humble beginnings of the Minix mailing list posting by Linus Torvalds back in 1993. There is little doubt that the momentum enjoyed by Linux will continue.

Flash memory fuels embedded designs

There is an undeniable trend in today’s semiconductor market – a strong and growing appetite for flash memory to house the ever-increasing amount of data being stored on embedded devices.

A full hardware description of flash memory is beyond the scope of this paper, but there are several aspects of flash memory that make it a good fit for common design requirements of embedded devices.

1.   Retention of data when power is off

2.   Access times several orders of magnitude faster than other storage memory

3.   Form factors decrease, while cell densities increase

4.   Lower power requirements over traditional HDD (hard-disk drives)

5.   No moving parts, enabling operation under rugged conditions

A few years ago, a typical design included only a few kilobytes of flash memory for the storage of code. The system included NOR memory rather than ROM as a way to maintain system internals once power was shut off. Over time, NAND technology was introduced and its lower cost per megabyte made it a common choice as demands for storing data increased.

Web-Feet Research, a market research firm covering the memory market, predicts increased demand for both NAND and NOR memory across several industries, with CAGRs of over 50% through 2011 for the majority of the industries. Other research reports a 150% growth in demand from the embedded market over the past several years. In response to this clamor for flash, the variety of flash parts and number of flash vendors serving the embedded market has risen. As the demand increases, vendors - incumbents and newcomers - strategize to introduce new technologies through partnerships with competitors and multi-billion dollar capital expenditures: Intel entered into joint ventures with both Micron (IM Flash) and STMicroelectronics (Numonyx); both Samsung and Spansion invested heavily in production facilities during 2007.

Flash memory usage in devices has been quite a success story — fueled in part by Apple’s release of the iPod and then the iPhone. In addition to Apple’s runaway success, the embedded market as a whole is taking advantage of the relatively low price per megabyte that flash memory brings to designs. Cell phones, digital cameras, point-of-sale scanners, personal media players, PDAs, and MP3 players are all increasing the demand for flash memory.

Importance of data trumps value of device

Storing data has become part of our lifestyle. Today, our digital lives – both personal and business – revolve around the data we download. The typical teenager stores nearly 1,000 music and video tracks on their iPod and sends nearly 1,000 text messages a week. Employees do their work electronically and share digital data. As an engineering professional, you probably have your schedules, contacts, and other information on your gadgets.

The amount of data that we store is overwhelming. IDC, a leading market research firm, states the amount of digital content created and copied in 2006 was 161 exabytes. IDC predicts that by 2010 the amount of data we will create and copy will grow to 988 exabytes. In more tangible terms, that equates to an information library from the past 10-15 years of your life that may include a music collection of over 500 digitized CDs, personal photos from your first child’s first day at kindergarten to first day at college, financial records that document your investments over the past 10 years, and all business data and contact information for everyone you have ever known.  That is a lot of information to be at risk!

While the amount of data continues to rise, so does the importance of data. The value and meaning of our digital data has been shifting from a leisure activity to an important necessity in our daily lives. For the average user, the data stored on their personal device – PDA, cell phone, MP3 player – has grown to be more valuable than the device itself. It is now widely recognized that it is the data, not the device, which we consider important.

When a sudden crash occurs with a device, it is quite possible to lose all the data, and when this happens it’s not the device we’re worried about, but how to recover our files. Unfortunately, most device failures are software related. In fact, past studies from Intel, Seagate and Hewlett-Packard have shown that 80-90% of all product returns are the result of data corruption problems caused by software, not by hardware.

In the world of an embedded device, the impact of data failure is multiplied, since embedded devices are found along the entire chain of data flow. A point of sale device, router, switch, cell phone, and back-up system all contain embedded software. At each step along its journey, embedded data is at risk.

Below are just a few scenarios that illustrate the consequences of embedded data loss.

POS system – After receiving new user data, the machine suddenly shuts down and loses a week’s worth of sales  and credit card information.

Consumer MP3 player – A dropped device renders gigabytes of music, purchased at as much as $1 per song, corrupt and unrecoverable

Portable video devices – An unexpected shut-down causes an entire video collection to become inaccessible.

Telecom switches and routers – A key router loses all DNS tables entries, switching information, and DHCP settings

Automotive devices – The control data and engine status data is lost when your mechanic accidentally causes a short

From the perspective of an embedded developer, intrinsic data reliability should be a requirement of software for storage. As the value of data has outpaced the value of the device, many developers still cling to the belief that data corruption can be mitigated by building safeguards on top of legacy software rather than insisting that reliability be architected into the system. Unlike the desktop or server environment, embedded devices cannot be dependent on a cumbersome restore/recovery process to protect against the consequences of corrupted data.

The Perfect Storm for Data Corruption Consequences

We have identified three distinct embedded trends: widespread adoption of Linux; exponential growth in use of flash memory; and growing value of data on the device. These trends are already intersecting and as they continue forward, we will see an increase in potential for and magnitude of consequences of data corruption on the device.

The widespread adoption of Linux has created the inevitable side effect of kernel and code fragmentation. In response to opportunities created by the advance of Linux in the embedded marketplace, many vendors have appeared, providing platforms and tools. To keep current with changes to the core Linux kernel, many vendors have created mechanisms to deal with changes to the mainline kernel tree. However, for storage on flash systems, updates to the MTD (memory technology driver) subsystem, which is independent of the kernel changes, could be a major burden. The latest changes to the MTD subsystem must be tested and maintained against the changes to the kernel tree. While this may not be a huge issue by itself, the combinations of kernel versions multiplied by the MTD changes can quickly become unmanageable.

The growth of flash memory is bringing higher densities and lower cost-per-megabit. These benefits are usually realized through the introduction of new parts. The new parts in turn bring new demands for software support and testing. When a new part is introduced, the FTL (Flash Translation Layer) , software layer responsible for the core read and write access to the chip, must be modified for (or at least tested against) these parts. The increase in densities can also require changes to the programming algorithm, performance considerations, and device optimization requirements.

Preventing Common Flash Problems

Discussion of flash memory often focuses on the advantages and largely ignores the ‘special’ characteristics that make it challenging to use. Key differences in writing and erasing data, management of bad blocks, and management of write/erase cycles are some of the more obvious ones.

In contrast to a hard-disk drive, flash memory handles writing and erasing operations using two different levels of granularity. Data is written at the bit level but is erased in blocks (sectors). These blocks are either 16KB in size, for parts up to 128MB capacity, or 256KB in larger capacity chips and the entire block must be erased in a single operation. Further complicating the algorithms, unexpected bit changes can occur during these operations. To compensate, special ECC (error-correcting code) algorithms must be implemented either in hardware or software to handle random bit changes.

NAND chips ship from the factory with bad blocks and bad blocks can also develop over time. If bad blocks are not identified and managed, data corruption will occur when the system attempts to read from or write to the affected block. Blocks also have an inherent limitation to the number of erase cycles that can be performed. If you write to the same spot again and again, data corruption will occur. Well-written flash management software has special algorithms for spreading out the operations over the entire device (wear-leveling). Inefficiencies in the write/read/erase cycle can cause even the best flash memory to fail prematurely. Wear-leveling algorithms extend the life of the device.

To further complicate matters, the flash industry is struggling to establish standards in how the flash chips interface to the software. Today, when you develop a software solution for one part, chances are slim that it will work for any other.

The application should be shielded from the complexity of dealing with these issues. Ideally, flash management software will handle all of these details without adding any additional programming burden. The operating system, and subsequently the application, should have no need to know which part from which manufacturer is being used or even that there is flash memory being used at all. The flash management software should manage the device seamlessly. Application developers should be able to use standard I/O calls to read from and write to flash memory. By using quality flash management software with the embedded Linux kernel, developers can take advantage of the positive aspects of flash memory while minimizing the learning curve and more effectively using the memory.

Flash-chip-independent flash management software, like FlashFXTM Pro from Datalight, is one solution that handles the peculiarities of flash.

In choosing your flash management software, look for these key features:

Bad Block Management: Detects, corrects, and manages bad areas found in NAND flash technology. The format capability automatically detects bad blocks, manages existing bad blocks, and reserves a replacement block region on the flash array. This feature for FlashFX Pro is further documented in US Patent # 6260156.

Special algorithms to handle the write capabilities of the flash array: Each flash part has an erase zone size, which may vary from a few KB to 512KB or more and be changed within the software. The wear leveling algorithms ensure that the erase zones are used evenly across the entire flash array.

Support for the embedded Linux kernel: The advanced features managed by FlashFX Pro, such as erase operations, fault tolerance, wear leveling and bad block management, are completely transparent to applications using FlashFX Pro for Linux.

Data reliability – Lots of legacy to overcome, but an achievable goal

By far, the biggest threat concerning data reliability lies in using legacy file systems. Legacy file storage solutions were designed for file I/O access, not data reliability. Instead of replacing them, many device designers make the mistake of adding layer after layer of code in an attempt to mitigate corruption risk, often suffering resultant performance issues.

Unfortunately, retrofitting solutions in an embedded system can lead to unexpected problems later. New advances in batteries, new protocols, and application upgrades all introduce unexpected behavior in a software product. The underlying storage architecture should be inherently reliable, no matter what new software is introduced.

As an example, consider JFFS2. Built upon the older JFFS, it has become the default file system for Linux platforms using flash memory for storage. What many designers are unaware of is that due to its fundamental design, the performance of JFFS2 degrades significantly as the embedded system reaches its storage capacity. As the storage fills, JFFS2 thrashes in the garbage collection, consuming system resources and slowing down application performance.

FileSystem Check (FSCK), Cure or Kludge?

The fsck function minimizes the impact of corruption by adding an additional software check on startup to catch errors and inconsistencies before they corrupt the entire file system.

This method of post-recovery is necessary due to a lack of designed-in data reliability. In legacy desktop and server environments, it was common to have inconsistent states, especially if the file system or operating system was unexpectedly interrupted. By adding a pre-start check, errors could be detected and fixed before they created more problems. Good idea, right?

There are two issues that make this problematic in an embedded system. First, an embedded device is much less likely to be able to just be rebooted and a proactive method of preventing data loss should exist. Second, the start-up time of the device must be minimized.

The solution to error-free system start-ups is a software design that was built to be reliable, not dependent upon additional tools that attempt to remedy an inherent problem. A file system should have an architecture, operation, and internal structure that are all geared toward providing assurance with regard to data. RelianceTM, the high-integrity file system from Datalight, is a leading example of an embedded file system that provides this assurance.

Reliance - architected for data reliability in embedded systems

Datalight Reliance was designed to guarantee data reliability on embedded systems. Since it is impractical for reliability to come at the cost of performance,  Reliance was designed to be optimized for diverse use cases without compromise to fundamental system data reliability. This section will discuss three of the core concepts used by Reliance: atomic operations and transaction points, committed vs. working states, and Dynamic Transaction PointTM technology. These concepts combine to make Reliance distinct from other file systems.

The Reliance transaction point describes the moment in time when changes to the file system are committed to the storage media. The developer can configure the file system to perform transactions according to the requirements of the application to be triggered by a system event – such as completion of an operation – a defined time interval or a combination of both. When a transaction point is executed, it commits all file data and metadata disk changes since the last transaction point to the disk in an atomic manner. This atomicity of Reliance provides a guarantee that all data will be flushed to disk at the transaction point, and the data integrity of the system is maintained. This all-or-nothing operation means there is no overwriting of live data and the logical structure of the data always maintains its integrity, and is the primary benefit of Reliance over a legacy file system like FAT.

The Reliance file system has two distinct states -- the committed state and the working state. The committed state reflects the state of the file system on disk at the last completed transaction point. The working state is the last on-disk state in addition to the modifications since the last completed transaction point. Some of the data that is included in the working state will be found in memory while other data, such as saved files, exists on the disk. At the completion of a transaction point, the working state becomes the new committed state. At initialization time, Reliance must read only three blocks of data to mount the file system, resulting in fast, consistent mount times regardless of disk volume or utilization.

Dynamic Transaction Point technology allows the developer to precisely control the triggering of transaction points throughout the entire operation of the device, ensuring optimum performance. Not only can transaction point triggers be set at compile time, but they can also be changed programmatically during device operation. For example, when updating software, there are many dependent files that need to “match”. Using a legacy file system, if power was interrupted after some but not all files had been updated on the device, it is possible that the device would not boot upon power restore. However, when using Reliance, the system can be set to temporarily suspend transaction points during a software upgrade and trigger one when all file replacements have been completed, thereby ensuring that the device will continue to function.

All of these capabilities when taken together and combined with FlashFX Pro provide a flash storage solution that guarantees reliability, has “wicked fast” mount times and can be optimized to out-perform I/O of legacy file systems, particularly for multi-function devices such as SmartPhones.

Summary

Embedded Design Trends

#1:  Embedded Linux is now a de facto standard in the embedded market as demonstrated by a dramatically increasing number of design starts, a strong and growing developer community, and active participation by leading hardware and commercial software vendors.

#2: Flash memory is the largest driver of storage in embedded systems outpacing the growth of all other non-volatile storage options. Improving price per megabyte inspires new categories of devices and dramatic data storage capacity increases.

#3: Data is more valuable than the device. Information critical to business operations and thousands of dollars of purchased content are stored on devices costing less than $500.

These three trends are converging to create the perfect storm of data vulnerability.

Best Practices

#1: EVALUATE YOUR EMBEDDED LINUX PARTNERS ON A BROAD LIST OF CRITERIA

Your choice of a solution partner will impact the success of your current project, as well as the entire product line and product development roadmap. Your vendor should have considerable experience in the embedded market, a list of deployed applications, an understanding of your application and hardware, and a knowledgeable, accessible support team.

#2: CONSIDER THE ROADMAP AND LIFECYLE OF YOUR FLASH PARTS CAREFULLY

The high demand for flash can cause long-term availability and price issues. Pace of change in lithography and power requirements can cause unusually short manufacturing lifecycles. Make sure your design (including tools, OS, and processor) can accommodate alternate choices, both from hardware and software perspectives.

#3: ANTICIPATE USERS' DATA USAGE HABITS AND INCORPORATE THEM INTO YOUR SPECIFICATIONS

Understanding the media formats, file sizes, and number of read/write operations done by the user will let you make better tradeoff decisions of cost vs performance.

#4: CALCULATE THE CONSEQUENCES OF DATA CORRUPTION

In addition to understanding users’ data usage habits, consider the consequences to your company’s reputation should data corruption occur. Calculate how much you can afford to invest to make sure that it doesn’t happen because of your product. Use this infomation in deciding on flash file system software solutions.

#5: LEAVE THE LOW LEVEL COMPLEXITY TO EXPERTS

Ensure that your flash management software supports your flash part, allows customization according to your application and other hardware, and has been ported to and tested with your operating system software.

#6: PREVENT DATA CORRUPTION IN THE FIRST PLACE

Don’t rely on corrrective measures that attempt to mitigate shortcomings in pieces of your software stack responsible for user and system data. Doing so may incur unacceptable performance penalties and leave you vulnerable to “gaps in coverage” as technology advances.

#7: CONSIDER DATALIGHT RELIANCE FILE SYSTEM FOR YOUR NEXT DESIGN PROJECT

Architected for reliability, Reliance Dynamic Transaction Point technology gives you complete control over performance optimization for your specific use cases.

About Datalight

Datalight is the market leader in software technologies that manage data reliably in embedded devices. For more than 25 years, our focus on portable, flexible solutions has enabled customers to save money, reduce development time and get to market faster. Our customers have discovered that Datalight solutions result in unparalleled interoperability and increased customer satisfaction. These accomplishments have earned Datalight a reputation as a provider of reliable and cost effective software solutions that are backed by a commitment to customer service and satisfaction.

For more information, visit www.Datalight.com or call 425.951.8086.

 

NEWS