How to Sustain Performance for NVMe Drives Under Thermal Stress Conditions

2022-04-07 ATP ELECTRONICS
NVMe SSDs,H/W Heatsink,Thermal Management,NVMe M.2 2280

NVMe drives are major disruptors in flash storage technology, offering unprecedented speeds and performance either in the ultra-slim M.2 or U.2 form factor. Breaking Serial ATA (SATA) transfer rates capped at 6Gb/s, NVMe drives leverage the PCI Express (PCIe) interface, which directly connects to the CPU, resulting in 4-6X the speed of SATA in random workloads.


The big leap in speed and performance reduces latency, enables faster access, and delivers higher input/output per second (IOPS) compared with other interfaces designed for mechanical storage devices.


With the increase in speed came overheating issues, which are exacerbated by NVMe drives typically installed in compact embedded systems that are often fanless or with minimal airflow for heat dissipation. Overheating has adverse effects on the NVMe's data integrity, endurance and retention capabilities. The drive will degrade quickly as the tunnel oxide weakens, causing electrons to leak out. This, in turn, results in higher bit errors and more uncorrectable errors.

 

This article explores the thermal management challenges for NVMe drives and presents ATP ELECTRONICS's Customizable Thermal Management Solution based on different application needs, system mechanical design, and other important considerations.

 

Applications Requiring Thermal Management

Due to its speedy transfer rates, NVMe storage is gaining adoption in applications where microseconds count, such as those involving real-time customer interactions, time-critical data analytics, and more. In many of these scenarios, the device is typically installed in enclosures with little or no airflow. and are constantly subjected to intense workloads under harsh conditions. Multiple die stacking per integrated circuit (IC) and intensive components in the limited printed circuit board (PCB) space, especially for double-sided designs, also contribute to the overheating issue.

 

Thermal management is therefore critical to sustain performance stability during operation at high temperatures.

Applications requiring thermal management

The following table shows possible scenarios with thermal and airflow conditions that need to be addressed.


  *LFM: Linear Feet per Minute


ATP ELECTRONICS's Customizable Thermal Management Solution

ATP ELECTRONICS recognizes that thermal challenges are unique for different use cases and scenarios; hence, a "one-size fits all" approach may not be the most suitable. To meet a customer's specific thermal requirements, ATP offers a holistic and customizable solution that combine firmware and hardware technologies.


The process is hinged on extensive collaboration with customers and is summarized in these four steps:



1. ASSESSMENT

Joint Validation for Thermal Management

ATP ELECTRONICS works with system developers to overcome the challenges unique to the specific case. By understanding the performance criteria, user application and system specifications (including, but not limited to temperature, workload, airflow, and mechanical design), ATP ELECTRONICS can customize an NVMe solution for the customer.  



An important part of assessing heat dissipation is taking a close look at the mechanical design within the system. How much space is available for heatsink solutions? How can we make sure that no mechanical interference happens among all the components of the system printed circuit board?



The system's mechanical design may not have considered a heatsink solution in the beginning. This is why it is important to examine the available space around the NVMe SSD as well as possible mechanical interferences that may happen.


2. SIMULATION

Influence of Air Inlet/Outlet and SSD Location

Since air flow may vary depending on the fan and drive location, simulation tests are also performed using a proprietary ATP-built mini chamber to recreate as closely as possible the thermal environments based on customers''profile. Air flow capability and SSD location, as well as performance requirements for the SSD considering its location from the air inlet, are among the factors considered. Necessary adjustments are then made to ensure the most optimal solution to meet the requirements.


The proprietary ATP-built mini chamber (Generation 2) is used to simulate and adjust thermal environments based on customer's profile.


A pure hardware simulation test based on full-speed operation, which is the worst-case scenario, is conducted using the Cadence Simulation system. This gives hardware engineers insights into the heat distribution in each PCB layer, as well as the potential risk of heat accumulating in particular areas. Adjustments can then be made to layout circuits, wire thickness, quantity/position of through-holes, and others.


An example of heat distribution simulation result of a PCB's top layer


3. CUSTOMIZATION

Thermal Management Consideration: Which Heatsink Fits the Mechanical Design?

ATP ELECTRONICS's customized thermal management solution consists of both firmware and hardware components:

 

  • Adaptive Thermal Control through the ATP ELECTRONICS Dynamic Thermal Throttling Mechanism

This provides a delicate balance between performance and temperature instead of dramatic performance reduction. Temperature sensors continuously detect the device temperature. After sophisticated FW transactions, the performance gradually declines, and the temperature is adjusted.

 

For NVMe M.2 2280 modules, a variety of HW heatsink options (materials, dimensions, types) are available to match the mechanical constraints of each system design. For high-density NVMe U.2 SSDs, a thermal pad covering the controller and NAND flash area dissipates heat through the U.2 aluminum housing.


HW thermal management options for NVMe M.2 2280 modules and U.2 SSD


  • Garbage Collection F/W Tuning

A periodic background refresh offsets the significant performance drop caused by the long garbage collection process.

 

4. OPTIMIZATION

Thermal Management Consideration: Which Heatsink Fits the Mechanical Design?

An optimized solution combines both HW and FW to meet customer's needs. As the graph below shows, performance can drop sharply when standard thermal throttling is used. ATP NVMe SSDs with the customized thermal management solution, on the other hand, deliver higher sustained write performance.


Comparison graph shows that NVMe SSDs with ATP ELECTRONICS Thermal Management Solutions combining both hardware and firmware deliver better sustained write performance and do not have drastic performance drops compared with SSDs using standard heatsinks and thermal throttling mechanism.


Conclusion

Customization through ATP ELECTRONICS's Joint Validation Service offers effective hardware and firmware thermal management solutions to overcome NVMe heating challenges and to deliver better sustained performance. By working closely together, ATP and its customers can arrive at the most optimized solution to meet thermal criteria and performance requirements.


ATP ELECTRONICS's customizable Thermal Management Solutions use both hardware (heatsinks) and advanced firmware (Dynamic Thermal Throttling mechanism) to make sure that NVMe SSDs remain cool even when installed in spaces with insufficient airflow and under varied thermal conditions.


With their blistering-fast performance, NVMe SSDs race, not only against time but also against speed.

  • +1 Like
  • Add to Favorites

Recommend

This document is provided by Sekorm Platform for VIP exclusive service. The copyright is owned by Sekorm. Without authorization, any medias, websites or individual are not allowed to reprint. When authorizing the reprint, the link of www.sekorm.com must be indicated.

Contact Us

Email: