Some controllers are minimalistic with little RAM and basic functionality. These most often are found in the least expensive consumer USB sticks. Controllers used on more sophisticated SSDs frequently have multi-core CPUs and a lot of RAM or other temporary storage to provide higher performance.  The first consumer flash disks (CF cards) were used for emerging digital cameras. The use case was simple: take a picture, store it, take another one. As applications developed more complex read/write use cases, they demanded more performance, which required more sophisticated strategies to achieve that performance, which brought higher demands for processing and temporary storage resources.

Blocks versus pages

Each flash chip is made up of pages which are organized into blocks. The size of the page for a given flash chip might be 512b (on older parts), 2kb, 4kb or larger and typically there are 64 pages in each block.  A table is required to identify which blocks represent usable areas of the disk and which blocks are bad or spare.The FTL is the set of firmware algorithms that manage how and where data is stored in the flash memory, and what happens when power loss occurs, or when a block of the flash memory becomes bad. There are two broad types of FTLs, namely block-based, which manages a block at a time, and page-based, which can manage individual pages.  

Block-based FTLs have been around since the dawn of flash. They require minimal resources to manage the logical to physical mapping, bad block management and wear-leveling. The old CF card flash disks from 1994 were block-based and used a simple 8-bit 8051 microcontroller and minimal RAM. When flash densities were small, block-based FTLs didn’t suffer as many limitations as they do on today’s multi-gigabyte disks. Since block-based FTLs can only write in block-sized chunks, using such an FTL with today’s high density, large block NAND requires computing resources in excess of what exists in the resources in low-end controllers.

Read performance for a block-based FTL is typically very fast. However, while the write performance of a block-based FTL is fine when data is written sequentially to the disk for the first time, the performance is much worse when data is written randomly or requires overwriting blocks that have been previously used. This is because block -based FTLs are often used in low resource flash controllers where the new data cannot be cached and must be synchronously written to the flash media. The ideal use case for block-based flash disks is a read-only application in which updates happen only when the entire dataset can be image copied for fast sequential writes.

Write performance on block-based FTLs also suffers because to update a page the entire block must be rewritten. The steps to update a page of flash within a block are to:

  1. Identify the spare block that will be used for writing
  2. Copy the pages that come before the page(s) that will be updated from the old block to the new block (for example, if the update will be to data in page 3, copy pages 0, 1 and 2 to the new block)
  3. Write the updated data (a new page 3) to the new block
  4. Copy the pages following the page that was updated (pages 4, …) from the old block to the new block
  5. Mark the old block as discarded so that it can be erased for reuse.
  6. Mark the new block as a live data block

Multiple blocks are reserved for spares to replace blocks that fail during normal use. When writing data, one of the reserved blocks is selected, and data is copied to it during the writing of a specific area of the disk. The previous block is erased and put back in the pool for when it is needed. When a block fails, then that erased block is removed from the overprovisioning pool, thereby reducing the life of the flash disk.

A block-based FTL does not support a TRIM or traditional Garbage Collection. Being that the block is the level of flash being managed, TRIMing a page out of a block cannot be done. Only a full block can be TRIMed. Garbage Collection is not meaningful on a block-based FTL, as the data with a block cannot be TRIMed or reordered, both of which are required for a Garbage Collection to be effective.

In contrast, the page-based FTL enables much faster random write performance, but slower read performance. A page-based FTL does not write a full block but only the necessary page(s) of data being written along with the FTL metadata that must be written to track of the new position of the data. Given that a page is small (2-4 KB) and a block is large (256 KB to 8 MB), writing a single page is faster, however managing where the page data exists as it is moved around due to wear-leveling is much more complex

For an 8 GB SD card, there would be 2096 4 MB blocks or 2,097,152 4KB pages. Managing 2,000,000 units of data, as opposed to 2,000 requires more RAM, a more powerful CPU, and a much more complex FTL. with a major concern for a page-based FTL is Garbage Collection, which is defined as consolidation of the data from multiple blocks with many empty pages to create more free blocks.

TRIM is the FTL API used by a file system to tell the FTL which data (pages) are no longer needed by the file system. For example, if a file was deleted then the file system should TRIM the pages of flash, letting the FTL know that those pages are no longer needed. If these pages are NOT TRIMed during Garbage Collection, then the FTL will not benefit from re-gaining the freed up flash memory, and will pay a performance and endurance penalty due to the additional overhead and flash wear that results from continuing to wear-level pages with no value to the file system or the user.

Since flash pages cannot be overwritten (because they must be erased first), there is an implicit TRIM, that occurs when an LBA is re-written. The original version of the LBA is no longer needed because it’s been replaced with the new LBA. There is not a TRIM call, but the original data has been replaced and the original data is no longer live and is effectively TRIMed.