9 Jun 2020

NAND Flash Basics

Introduction

NAND flash is a type of non-volatile data storage technology, and is often used on embedded devices much the same way a hard-drive would be used on a desktop machine.

NAND flash is built on cells. The original flash devices stored one bit of information per cell, and were called single-level cell (SLC) technology. Later two bits were stored per cell, so this became (unfortunately named) multi-level cell (MLC) technology. When the technology came along to store three bits per cell, it was named triple-level cell (TLC) technology. As you can see: "single" means one, "multiple" means two, and "triple" means three. According to wikipedia there now exist (or are in development) quad- devices with 4 bits per cell (QLC) and penta- (PLC) devices with 5.

Unlike traditional hard-drives (i.e. built on spinning platters), the individual cells used to store bits in NAND flash can not be twiddled indefinitely. New or newly-erased flash shows up as all bits set to 1. The act of writing data to a freshly erased NAND device is simply the process of changing the necessary bits from 1s to 0s. The moment the data you want to write needs to flip a 0 back to a 1, an erase cycle is needed. If, coincidentally, every time you needed to write data to a NAND all that was required was to flip bits from 1s to 0s, you would never need to perform an erase cycle. Flipping bits from 1s to 0s comes "for free", but erasing bits from 0s to 1s comes at a cost. Each time a bit is switched from a 0 to a 1 (i.e. erased) the oxide layer of the cell is degraded. Over time it will stop being possible to erase a cell.
 
Every device comes with a claim of being able to support a given minimum number of program/erase (P/E) cycles, but the maximum number is never known. In general SLC NAND supports a higher number of P/E cycles than MLC, and MLC supports more P/E cycles than TLC. In general the drop-off in P/E endurance between the different technologies is quite significant; on the order of magnitudes. You'll have to consult the datasheets for specific devices for the actual performance of any specific device, but in general SLC devices are good for ~50,000 to 100,000 P/E cycles, MLC ~1,000 to 10,000, and TLC under ~1,000 P/E cycles.

Internally, cells are combined to form bytes, bytes are combined to form pages, pages are gathered into blocks, and blocks are combined to form planes.

From outside a NAND device one can only access the data inside a NAND device one page at a time. If you want to read one byte of data, the entire page on which that byte is stored must be retrieved, then the specific byte of interest can be accessed. The same is true for writing; you can only write data to a NAND device one page at a time. More recent devices will often break a single page up into fixed, equal-sized sub-pages which can help when fetching or writing data. Erasing, however, needs to be performed a whole block at a time. Fancy devices can allow pages in separate planes to be read or written simultaneously. In general pages are for reading/writing, blocks for erasing, and operations can occur simultaneously to separate planes. You'll need to consult the datasheet of any specific device for each device's specific behaviour.

The sizes of pages, blocks, and planes on a NAND device varies by device. A typical device will have page sizes in the 2048-bytes/page range, will split a page into 4 sub-pages, will combine around 64 pages to form a block, and use roughly 1024 blocks for each plane.

Although storing more bits per cell improves storage density, there are trade-offs such as: slower access times, and reduced life expectancy of the device. So while TLC is newer than MLC and has some advantages over it, TLC doesn't displace MLC; MLC is still available. The same holds true for SLC; even though it is "older" technology, it is faster and has an order of magnitude or better P/E cycles over "newer" technology. Therefore SLC is still used quite extensively when the circumstances call for improved reliability or speed over size or cost. As you would expect SLC flash is the most expensive, with the price dropping with each increase in bits per cell.

ECC and OOB


In order to combat the strange situation of a device that will (by design) fail over time, in practice ECC checks are often calculated for the data that is stored on the device. In fact if you read the fine print in the datasheet regarding a device's P/E endurance you'll often find that these endurance claims are based on the expectation that ECC is being used. But if you're going to calculate an ECC for a chunk of data, where will you store this value? There's not much point to generating an ECC for a chunk of data if it isn't stored with the data so it can be checked later on. As such, each NAND flash device comes with extra storage space added to the device in order to store these ECC calculations (or any other book-keeping data you'd like to track). So if you buy (for example) a 512MB-sized NAND flash device, you might be given a device that has 512MB+8MB of storage. This extra area is referred to as the out of bounds or OOB area.

Typically ECCs are calculated per page of NAND data. When you ask a device for a page of data (the smallest size of data you can request from a NAND device) in addition to receiving that page worth of data you will also be given the data from that page's OOB area. The same is true for writing: when you want to write data to a NAND device, for each page you give it, you also have to provide the data for that page's OOB area.

By the way the answer is "yes it is" (if you're wondering whether or not the OOB area is susceptible to the same degradation as the rest of the data on a NAND device). One could add OOB areas to OOB areas ad infinitum, but one level is considered sufficient. A failure in either of the main data area or an OOB area is a failure for that page+OOB combination.

Raw vs FTL

Getting the most out of your NAND device requires not just ECC checks, but the ability to mark pages as bad, support for caching, tolerance for sudden power loss, wear-levelling, and other techniques that help it perform optimally and correctly. The addition of these algorithms is so necessary that many devices that use NAND flash internally will put a microcontroller and code between the user and the memory. These are called FTL or managed devices.

FTL stands for flash translation layer and it forms a level of indirection between what the user requests, and what actually happens to the memory. For example, the user might continuously read/modify/update one specific page over and over. But knowing that hammering on only one page might cause that page to wear out much faster than the rest of the device, internally the FTL actually maps the requests for modifications to a specific page that the user is commanding, to modifying a different physical page with every request. This lets the user think they are reading and writing the same page over and over while in fact those requests are being spread over the entire device so as to not wear out any one part of the chip prematurely. This called wear-levelling.

Managed devices will often present a different interface to the outside world than what one would expect from raw NAND. Therefore when working with a device that uses NAND technology, it's important to know whether you're dealing with a raw NAND chip, or one that is managed.

Most managed devices present themselves as hard-drives, so using them under Linux is simply a matter of having Linux treat the device the same as it would any other hard-drive. Once Linux is interfaced to it, you simply use it as any other hard-drive: you partition it, format with your favourite filesystem, mount it, then use it like normal.

If you're using a raw NAND device, then your best bet is to make use of raw NAND-handling software that is already available in order to deal with NAND's quirks. Under Linux, the MTD subsystem provides a uniform, though raw, interface to NAND. On top of the MTD subsystem you could use JFFS2, but it has been mostly superseded by UBI and UBIFS.

Unlike JFFS2, UBIFS can't sit directly on top of the MTD subsystem. It needs to sit inside a UBI container. It is UBI that has all the fancy logic and algorithms for managing the physical memory, providing fast boot times, handling power interruptions, wear-levelling, and so forth. UBIFS is a NAND-aware filesystem that sits on top of UBI with which it works well. A UBI can contain one or more UBIFS filesystems.

External Interface and ONFI

Interacting with NAND has been simplified and standardized thanks to the efforts of ONFI, the Open NAND Flash Interface. All NAND devices have either an 8- or 16-bit parallel I/O bus, plus a number of standardized control lines. Telling the NAND device what you want to do is simply a matter of some combination of giving it a command, providing it with an "address", then reading or writing the data. The I/O bus is multiplexed between commands, "addresses", and data. The NAND device distinguishes between these pieces of information based on the cycle and on the values of the logic levels on the control lines.

I use "address" in quotes because the address of any piece of data in a NAND device is not referenced the same way a piece of data would be referenced in, for example, RAM. As I've mentioned a couple times, you can only interface with a NAND device one page at a time. Telling the NAND device which page you want is a matter of specifying its column, its page address within a block, its block number, and specifying which plane it's on. The bits specifying this information are jumbled up together and sent to the device as either 3, 4, or 5 (depends on the device) 8-bit words (regardless whether the device has an 8- or 16-bit I/O interface) which are sent in subsequent I/O cycles during the "address" phase.

ONFI also specifies the bit patterns of the various commands that can be issued to a NAND device. In this way Linux's MTD software (for example) doesn't need to be chip-specific with regards to the command definitions.

Timing Charts

Like most pieces of silicon, NAND devices don't operate infinitely quickly; they certainly don't operate at the speed of the bus connecting your SoC to the NAND device. As such, one of the most important pieces of information contained in your device's datasheet is the table specifying minimum or maximum timings of various operations.

This table is usually found in a section called "AC Characteristics" and includes the timing information for around 3 dozen or so parameters. For example, the Address Latch Enable setup time is given as tALS. Sometimes the timing is specified as a minimum amount of time that one needs to wait for an event, other times it specifies a maximum time. Each parameter has an associated unit, usually nano-seconds, but sometimes micro-seconds.

Some parts of a NAND's datasheet aren't as important as others from a software point of view. But when working with a NAND chip at a low level, the timing information is certainly one of the more important sections.

SoCs and NAND Controllers

It would be pretty rare to see a micro-controller connected to a NAND device directly using nothing but GPIO lines. Part of the difficulty in controlling a NAND device directly would be to get the timing right and efficient. As such most SoCs include a dedicated NAND Controller.

The job of the NAND Controller is to handle the interaction between the SoC and the NAND so that the SoC is freed from the lowest-level details of handling the NAND; like a sort of buffer. The SoC creates a request by loading the controller's registers with the correct values, and it's the controller's job to twiddle the various control and I/O lines in the correct sequence, at the right times.

An SoC's NAND Controller will often incorporate logic for handling ECC calculations and manipulating portions of the OOB areas as appropriate. For example, I mentioned earlier that when providing data to the NAND, one must also supply the OOB area. In some cases the software only needs to provide the data, and the controller will calculate the ECC and supply the OOB data to the NAND device itself. The reverse also applies when reading data: the controller can be instructed to check the ECC and it will either correct the data itself (if it can, if an error is detected) or set flags to let the user know an issue was found (or both). In which case the user simply receives a page of corrected data.

The NAND Controller can only do its job properly if it is configured properly. For each SoC that has a NAND Controller, a portion of its registers need to be used to give the user a place to specify the configuration and timing parameters of the specific NAND device being used.

Configuration usually involves telling the NAND Controller the bus width (8 or 16), the page size, whether or not sub-pages are used, how many bytes to use when specifying the "address" (3, 4, or 5), and various other things.

For timing, the datasheet for the NAND device will always specify timing in absolute, "wall clock" values (e.g. 25[ns]), whereas the NAND Controller only knows how to count clock ticks. Therefore not only do you need to know the clock rate of the bus to which the NAND device is connected (which is almost guaranteed to not be the same as the clock rate of the CPU itself), but these values will need to be adjusted anytime the clock rate changes (e.g. in low power or power-saving modes). Telling the controller how many clock ticks to wait will always be specified as an integer number. Knowing your clock rate, you'll need to figure out how many ticks are required to get at least that much delay, then round up. For example a given timing parameter might specify a minimum delay of 25[ns], at a clock rate of 130[MHz] this would translate to 3.25 clocks. But since the controller can't count a quarter of a clock, this value needs to be rounded up to 4. At this clock rate 4 clocks actually gives a delay of 30.7[ns], but we can't specify 3 otherwise the controller won't wait long enough for the NAND device, and errors will result.

Unfortunately it's rare to find a controller that has a 1:1 mapping between the timing parameters provided in the NAND device's datasheet and the timing parameters required by the NAND Controller. For reasons that can only be described as masochistic, the NAND Controller will almost always want timing values that are calculations that, if you're lucky, will be based on values found in the NAND device's datasheet. For example, a typical device's datasheet will (thankfully) provide a timing parameter called tRHZ. But instead of asking for this value, the NAND Controller might say: I need NAND_TA and you calculate NAND_TA as:

((RD_HIGH - RD_LOW)/HCLK) + (NAND_TA/HCLK) ≥ tRHZ

RD_HIGH and RD_LOW are other timing parameters the controller wants, which you've already calculated in a manner similar to the above, but you must re-arrange the inequality to isolate NAND_TA. Thankfully tRHZ is found in the datasheet; sometimes the controller will request a parameter that isn't in the datasheet and you're left trying to figure out how to use the parameters the datasheet gives you to determine the value the controller wants.

Also, the above calculations depend on your ability to figure out the clock rate which requires an understanding of the clocking and PLL mechanisms of your SoC, which isn't trivial either.

Conclusion

NAND flash is an interesting technology, with its own advantages and quirks. To get NAND working on a specific device requires an understanding of the details of the specific NAND device you're using, as well as understanding the capabilities and limitations of your SoC.

No comments: