The Mars Opportunity Rover was in the news again this week, as NASA mission engineers try to overcome what they refer to as an increasingly troubling bout of rover "amnesia".
In September of 2014, the team reformatted the flash memory. This algorithm tested each of the cells within the 8 banks for flash, marking any additional bad cells and recalculating the new drive size. After the procedure, the new drive was reduced in capacity by 1.7 megabytes and overall flash problems with greatly reduced for a time. The original design used 128 MB of DRAM, 256 MB of flash memory (in radiation-hardened card form), and another 3 MB of EEPROM.
The root cause is still under investigation. Possibilities include an intermittently bad spot in memory that isn't read frequently, a structure which is not subject to wear leveling (explaining the improvement after the first reformat), or even just a timing issue - the memory takes longer to complete writes as it ages. One outcome of this investigation was a shift to a working mode that avoids the use of the flash data-storage system.
Embedded designers using NAND flash today also have to contend with the situation the media has too many bad blocks. With good wear leveling algorithms, the blocks that aren't bad are likely close to the suggested erase count limit, so a reformat wouldn't buy much time. Using DRAM instead of NAND flash is a bit extreme, but similar techniques are used today to reduce write amplification - such as caching data in memory until an entire NAND erase block can be written at once.
After another Opportunity reformat in early December, performance of the flash media remained intermittent. Repeated failures to write the flash caused watchdog software to reboot the rover, leading to a loss of communication over Christmas. Investigation is now centered on bank 7 of the flash memory, which seems to have the most failures.
I think it's awesome that a device with an original planned mission time of 90 days on Mars is approaching sol 3900. Here's hoping that your 2015 embedded designs keep going well beyond their expected lifetime!