In recent years, the adoption of NAND flash in enterprise storage systems has been increasing rapidly. Recent all-flash storage arrays exhibit excellent I/O throughput, latency, storage density, and energy efficiency. At the same time, flash manufacturers strive to reduce cost and increase storage density with novel technologies such as continuous technology node scaling, increasing the number of bits stored per cell, and stacking cells vertically (3D-NAND). However, this comes at the cost of reduced endurance of each block, increased variation across blocks, and lower performance. These issues cannot be solely compensated by stronger ECC or read-retry schemes and prohibit a direct use of new flash technologies in enterprise storage systems.
We present several novel flash-management technologies that reduce write amplification, achieve better wear leveling, and significantly enhance endurance without sacrificing performance; and hence enable us to bring next-generation flash to the levels required in enterprise storage. In particular, we introduce techniques such as block calibration which determines optimal read threshold voltages, new garbage collection schemes with heat segregation, and health binning to overcomes variations in flash block quality.
These complementary flash-management algorithms were initially designed, then refined and enhanced in a simulator and later implemented in an enterprise-level all-flash storage system. Our evaluations show that by introducing heat segregation and health binning, overall endurance becomes dictated by the average endurance of all blocks in a device (instead of being dictated by the worst block), thereby enhancing endurance by 33%. And our heat-aware garbage collection schemes further improve endurance up to 2.5x compared to the baseline.