AI’s Big Shift: Serving User Queries Requires Inference Data Centers
Most of the narrative you’ve seen around Artificial Intelligence (AI) data centers has focused on the massive data centers developing the large language models (LLMs) that create the backbone of AI. For economic reasons these models are typically developed in a giant data center that can be anywhere, typically in a low-cost rural area.
But it turns out that the LLM data centers aren’t the most efficient in using the AI models they create. As a result, the majority of projected data center growth is now happening in another kind of facility with different geographic requirements. The fastest growth in the sector is now for inference data centers.
Low Latency Response: When an AI query for usable information from an LLM occurs, speed of response is key. If you are incorporating AI throughout your business, a consistent latency of just a few hundred milliseconds can make your applications unusable. So low latency is critical. To deliver that information in a time-effective, low friction manner data centers need to be near people. This is known as the inference level.
The use case for inference data centers is similar to that provided for the expected boom in edge data centers around a decade or so ago, where data centers would need to be placed in high-population areas to deliver data in a low latency fashion to end users. The edge seemed sensible, but when COVID hit and Work From Home (WFH) became almost universal for a few months, our infrastructure proved to be sufficient for most use cases in its largely centralized form. And applications requiring the lowest latency, like autonomous vehicles and virtual reality, did not develop as quickly as projected. As a result, demand for the edge cooled.
Inference Data Centers: The New Edge: To draw quickly and effectively from LLMs, inference data centers must be near people for low latency response to queries. Right now, inference is happening in the cloud on these big models, which is hardly economical but is the best answer while greater infrastructure capacity is under development. This isn’t ideal for the long term, as it is like renting a Lamborghini as your daily commuter vehicle until the Toyota factory produces enough Celicas for economical and efficient use.
The need for the inference level is not being debated, it is relatively certain. Research firm Gartner projects that by 2029 nearly twice as much will be spent on inferencing as on training, $72 billion vs. $37 billion according to their analysts’ calculations.
Inference Tech Developing Rapidly: Understanding the need for inference data centers, developing the software and hardware to support their operations is a focus of the industry. Today, many enterprises are running inference servers on standard CPUs, without the expensive and hot-running Nvidia chips required for LLM AI model creation. Eventually inference servers will become even easier to run as there will be smaller models developed. The tech ecosystem, with experience, will get better at routing tasks to the smallest viable model to reduce cost and increase efficiency.
A Memorable Development: There are models coming out now that work well on CPUs, without a water-cooled AI gigawatt data center required. All components of the compute process are evolving rapidly. In late March Google introduced Turboquant, designed to reduce memory by six times while delivering 8x inference speed. After Google’s announcement markets were not kind to the stocks of current memory providers, reflecting expectations that a new generation of solutions is on the way.
Prepared for Inference: With significant experience delivering high density power and cooling solutions for High Performance Computing, Direct LTx is prepared for the shift to inference. Is your organization an early adopter in running your own AI solutions and in need of a data center home that can manage your power density and cooling requirements? If so, we would welcome your consideration. Email strategy@DirectLTx.com to schedule a conversation.