The OpenAI/Microsoft partnership is once again making headlines as reports suggest that they are looking to build a new data center and/or supercomputer or a data center that acts like a supercomputer to take their game to the next level.
Next Platform sums it up best:
“The first thing to note about the rumored ‘Stargate’ system that Microsoft is planning to build to support the computational needs of its large language model partner, OpenAI, is that the people doing the talking – reportedly OpenAI chief executive officer Sam Altman – are talking about a datacenter, not a supercomputer.
And that is because the datacenter – and perhaps multiple datacenters within a region with perhaps as many as 1 million XPU computational devices – will be the supercomputer.
The report on the Stargate effort, which was reported by The Information as the Easter holiday was starting last Friday, is consistent with the scalability goal of 1 million interconnected endpoints that the Ultra Ethernet Consortium – of which Microsoft is a founding member – has set for future Ethernet networks.
The Stargate system – and we will continue to call it a system even if it is at datacenter or region scale – is, at this point something on the back of an envelope and almost certainly something that Altman leaked to get tongues wagging. Altman seems unable to decide if OpenAI should be wholly dependent on Microsoft, and who can blame him? This is why there are also rumors about OpenAI designing its own chips for AI training and inference, and outrageous comments about Alman trying to spearhead a $7 trillion investment in chip manufacturing that he subsequently backed away from.
You can’t blame Altman for throwing around the big numbers he is staring down. Training AI models is enormously expensive, and running inference – what will be mostly generation of tokens and is properly labeled that at this point, as Nvidia co-founder and chief executive officer Jensen Huang recently pointed out in his keynote at the GTC 2024 conference – is unsustainably expensive. And that is why Microsoft, Amazon Web Services, Google, and Meta Platforms have created or are creating their own CPUs and XPUs. And as the parameter counts go up and data shifts from text to other formats, LLMs are only going to get larger and larger – 100X to 1,000X over the next few years if present trends persist and the iron can scale.
And so, we are hearing chatter about Stargate, which demonstrates that the upper echelon of AI training is without question a rich person’s game.
Depending on what you read in the interpretations of the reports following the initial Stargate rumor, Stargate is the fifth phase of a project that will cost somewhere between $100 billion and $115 billion, with Stargate being delivered in 2028 and operating through 2030 and beyond. Microsoft is apparently in the third phase of the buildout right now. Presumably, those funding numbers cover all five phase of the machine, and it is not clear if this number covers the datacenter, the machinery inside of it, and the cost of power.”
But what is clear is that powering the future of AI is going to take a lot of money and require a lot of energy and that OpenAI and Microsoft are committing to building out the infrastructure required to enable AI to continue to grow.
Is Stargate the Greatest Idea Ever?
Leave a comment