NVMe SSD Thermal Management: What We Have Learned from Marathons

2024-05-24 ATP Official Website
NVMe SSD,SSD,NVMe solid state drives,heatsink

Part 1: Environmental Assessment

NVMe solid state drives (NVMe SSDs) are known to run at blistering speeds. They are 4X faster than Serial ATA (SATA) drives! As such, they are prone to overheating, especially when installed in systems with limited airflow. This series of articles explores the considerations and thermal solutions offered by ATP, so NVMe SSDs can beat the heat and thus deliver reliable sustained performance over extended periods of time.


Thermal Management is Like a Marathon

Thermal management can be likened to a marathon, which is a long-distance footrace that requires endurance and strategy. Here are five things that they have in common. These variables can affect the performance of both marathon runners and NVMe SSDs.

Fig.1 Marathons and the thermal management of NVMe SSDs have these five things in common, each of which can impact performance.


Environmental Assessment

We start with understanding the system environment. It could be a box PC, data logger, or IIoT server operating under various temperatures, airflow conditions, and customer criteria.

Fig.2 In assessing the system environment, we check the airflow, as systems are normally already equipped with fans. To ensure sufficient ventilation, we also check if powerful fans or cooling plates are needed for heat dissipation at high temperatures.


The table below shows examples of various test environments as well as customers’ criteria.

Table.1

By knowing the environment and requirements to set up the thermal simulation and optimize the performance, we can meet customer’s expectations.

 

Part 2: Physical Conditions

Thermoregulation, which is the body’s ability to maintain its core or optimal temperature, is crucial and helps avoid potentially dangerous conditions. The same is true with an SSD — if it keeps heating up, the drive with trigger a thermal shutdown. Through thermal simulation, we would like to find the balance between temperature and performance.


Component-Level Simulation in Design Phase

Cadence Simulation software is used to perform component-level simulation of IR drop analysis (signal integrity) and thermal simulation during the product design phase.


Hardware engineers input important component and package information into the system, such as case dissipation, power loss, printed circuit board (PCB) dissipation in watts, as well as junction temperature, case and board temperature, and other relevant information.


The Cadence Simulation software generates the results showing the distribution of heat in each PCB layer, which indicates which area(s) have the potential risk of heat accumulation.


Hardware engineers then consider adjusting the layout circuits, wire thickness, the quantity and/or position of through-holes, and other variables.

Fig.3 

Sample temperature distribution result of the PCB top layer shows that heat accumulates on the Controller.

NOTE: Pure hardware simulation at a worst-case scenario (full speed, not considering firmware-based thermal throttling mechanism)


Checking the System’s Mechanical Design

Apart from the SSD itself, we check the system’s mechanical design. We can add an additional heatsink to improve heat dissipation. Ideally, an 8 mm heatsink is better than a copper foil but not every system has enough space.


What factors do we consider when assessing heat dissipation for mechanical design within the system?

Space. In compact systems, space is very cramped, and heatsink solutions are not typically considered during design. How then can we make sure that the heatsink fits? Is there ample space for a heatsink? We need to consider the area surrounding the NVMe SSD, from the top (height) to the bottom, and the width (thickness/length).


Mechanical Interference. We also make a careful evaluation to make sure that all components on the system PCB do not physically interfere or overlap with the heatsink.

Fig.4 It is important to verify that the system has enough room for installing a heatsink.

ATP Heatsink Solutions: Design Considerations

Upon consideration of the surrounding area, it is time to come up with the optimal heatsink solution. As an example, ATP’s specially designed 8 mm fin-type heatsink offers the following advantages for systems with limited space:

More surface area for heat dissipation

Lightweight aluminum material offers good thermal conductivity

Good adhesiveness of thermal pad for conductivity

Clips designed for assembly efficiency on SSDs

Fig.5

The figure above shows the top and bottom parts of an ATP 8 mm fin-type heatsink and how it looks when assembled onto an M.2 2280 NVMe SSD.

Fig.6

The clips are made of thin 0.3 mm stainless steel for more reliable top and bottom heatsink fixing. The space-saving design is suitable for systems with limited space, as it does not interfere with other components on the system PCB.

 

Part 3: Ambient Simulation (Training)

“Ambient simulation” can be likened to a runner’s training program in preparation for a marathon. This means subjecting SSDs to different ambient conditions that mimic real scenarios to make sure that the SSDs are fit to perform well under such conditions.


System-Level Simulation Using Cadence Thermal Simulation System

The Cadence Thermal Simulation System runs system/module-level simulation. The customer inputs thermal elements such as ambient temperature, airflow, and ATP SSD parameters.


Given different ambient conditions and airflow, the Cadence system can estimate the SSD temperature with or without the heatsink. The following figures demonstrate the effectivity of ATP’s 8 mm heatsink in pure hardware, full-operation mode (worst-case scenarios).

Fig.7

In this scenario, the ambient temperature is higher but there is sufficient airflow, which helped lower the controller area temperature to 145℃. Sufficient airflow complemented by the 8 mm heatsink resulted in the reduction of the overall temperature.

NOTE: Pure hardware simulation test, full-operation mode (worst-case scenario)

 

ATP-Built Mini Chamber for SSD Testing

Aside from the Cadence software simulation tool, ATP also performs actual SSD tests using our own-built mini chamber. Sized just like a notebook/laptop, it is more compact and flexible as well as easier to use compared with typical giant test chambers.

Fig.8 The ATP-built mini chamber

The mini chamber allows us to perform SSD tests in a controlled environment. We can change the airflow, temperature setting, and test scripts via an external system and save the log files. The chamber is equipped with an alarm/alert, which enables an emergency stop and overheat protection in case the temperature goes over the threshold limits.


Part 4: Gear/Equipment

Choosing the right gear or equipment to dissipate heat is very important. For athletes participating in a marathon, choosing suitable clothing can provide protection and enhance breathability, and the right fabric allows easy evaporation of sweat for better cooling and comfort.


Material Selection

There are several factors to consider when choosing the right material for NVMe SSD heatsinks. In this article, we will discuss a few of the most important ones.

Fig.9

Reliability Test for Copper Foil

To evaluate the reliability of our copper foil solution, we perform resistance testing at high/low temperatures on the adhesive layer to make sure that there is no deformation.

Fig.10

The adhesive strength of the copper foil heatsink is tested to ensure reliability and excellent retention of the heatsink to the SSD.


Shore Hardness Scale of Thermal Pad

The Shore Durometer is a way of measuring the hardness of materials such as plastics and rubber. Flexible, soft thermal pads should attach closely between the heatsink and SSD components to transfer heat away from the SSD and keep the operating temperature cool. If the pads are too soft, it means that the percentage of the silicone is high, and the heat dissipation substance is low. This could lead to poor thermal conductivity.

Fig.11 

This illustration shows that if the thermal pad is too soft, it compromises the heatsink’s attachment to the SSD components, resulting in lower heat dissipation.

 

Cadence Simulation Tool

The Cadence Simulation software can be used to run component-level thermal simulations. With this software, you can compare which material and/or heatsink solution is the best option. Component-level simulation consists of several factors, including ambient temperature, airflow, and thermal resistance/power consumption of main components. Cadence Simulation is pure hardware simulation based on full-speed operation (worst-case scenario). The thermography below compares thermal data for bare SSD and SSD using ATP’s 8 mm heatsink. In the following example, the ambient temperature is higher at 70℃ but with 1200 ft/m airflow. At full speed, the controller temperature for the bare SSD rises to 145℃. With the 8 mm heatsink plus sufficient airflow, the temperature goes down to 133℃, giving a 12℃ reduction.

Fig.12

NOTE: Pure hardware simulation test at a worst-case scenario (full speed without firmware-based thermal throttling mechanism)

 

What You Wear Keeps You Cooler

Special quick-dry outfits worn by athletes provide protection and keep moisture or sweat away from the skin to keep it cool. The same is true for NVMe SSDs. ATP’s special heatsink solutions keep heat away from the SSD and reduce the temperature of the controller, where heat typically accumulates.


The graph below shows the controller temperature is reduced from a bare SSD’s 68.5℃ to 46.9℃ with ATP’s 4 mm heatsink, and further down to 30℃ with the 8 mm heatsink. These images were taken at room temperature with minimal airflow of 450 linear feet per minute (LFM), and 100% Sequential write after the 30-minute test.

Fig.13

ATP Heat Dissipation Solutions

Not every system has room or space for a powerful heatsink. Considering space constraints, ATP offers different heat dissipation solutions described in the table below.

Table.2

Here is another example showing the importance of choosing the right gear. The figures on below show that the bare SSD repeatedly slows down to cool whenever the composite temperature keeps increasing. The 8 mm heatsink helps to keep the SSD cool by dissipating heat complemented by airflow support.


As the heatsink continues to reduce the composite temperature of the NVMe SSD, steady performance is achieved with ATP’s unique firmware (FW) algorithm, resulting in better-sustained performance.

Fig.14

Part 5 : Pacing Strategy

We will discuss the importance of having a pacing strategy. For marathon runners, this means managing their energy throughout the race. At ATP, we use the Dynamic Thermal Throttling mechanism to manage the heat of our NVMe SSDs to ensure sustained performance at optimal levels.


Steady Wins the Race!

When running a heavy workload at high temperatures, the drives will trigger “thermal throttling” to slow down the drive and prevent overheating. The downside is that the SSD is unable to sustain the optimum operation required to perform and finish the tasks at hand.


ATP Dynamic Thermal Throttling Mechanism

Contrary to standard throttling mechanisms, the ATP Dynamic Thermal Throttling mechanism does not push the SSD to its temperature limits and then sharply drop speed to cool it down. Instead, it provides a delicate balance between performance and temperature by continuously detecting device temperature and adjusting the pace.


ATP Thermal Management Solutions combines both hardware (heatsink) and firmware (Dynamic Throttling mechanism) to make sure that the SSD delivers optimum sustained performance throughout its operation.

By continuously detecting the device temperature and adjusting the pace, lower power consumption is achieved, unlike SSDs that always run at full speed and waste a lot of energy. With fewer fan operations, less energy is required, and less noise is generated.

Fig.15

ATP Thermal Management Solutions combine HW and FW to provide a pacing strategy that results in steady speed and efficient power management.


By continuously detecting the device temperature and adjusting the pace, lower power consumption is achieved, unlike SSDs that always run under full speed and waste a lot of energy. With fewer fan operations, less energy is required, and less noise is generated.


Simulation and Customization: One Scenario Does Not Fit All

At ATP, we believe that every application scenario presents unique thermal requirements. This is why we carefully consider different factors to come up with a solution that fits the specific needs. One scenario does not fit all, so we offer customization options.


The following table summarizes some of the scenarios presented by customers — different airflow environments, ambient temperatures, workloads, heatsink types according to available space in the system design, and respective test results.

Table.3 Simulation and Customization Table

With customers’ varied application requirements, ATP welcomes inquiries for customization. For more information on ATP’s customizable thermal management solutions for M.2 and U.2 NVMe drives, contact an ATP Representative sekorm.

  • +1 Like
  • Add to Favorites

Recommend

This document is provided by Sekorm Platform for VIP exclusive service. The copyright is owned by Sekorm. Without authorization, any medias, websites or individual are not allowed to reprint. When authorizing the reprint, the link of www.sekorm.com must be indicated.

Contact Us

Email: