What is an AI factory?
MICHAEL BIRD
Hello and welcome back to Technology Now, a weekly show from Hewlett Packard Enterprise where we take what's happening in the world and explore how it's changing the way organizations are using technology.
We’re your hosts Michael Bird…
AUBREY LOVELL
and Aubrey Lovell, and this week we’re going back to basics and looking at building AIs.
- We’ll be exploring what an AI factory is,
- We’ll be looking into who uses these factories,
- And we will be asking the important question: why use an AI factory at all?
MICHAEL BIRD
Yeah, that is right Aubrey. We'll be learning all about AI factories so if you’re the kind of person who needs to know *why* what’s going on in the world, matters to your organisation, this podcast is for you.
(oh) And if you haven’t yet, subscribe to your podcast app of choice so you don’t miss out.
Right, let’s get into it!
AUBREY LOVELL
Let's do it.
AUBREY LOVELL
So unsurprisingly, Michael, we obviously talk a lot about AI on the show. We know that it's an ongoing story that's constantly evolving and changing, and it's intimately intertwined with the world that we live in. However, we don't often talk about how AIs are made, or for that matter, where AIs are made. So have you ever thought about where AIs come from?
MICHAEL BIRD
Well, you know, we work for HPE, so I think we sort of think about this, but it sometimes feels a bit like how people used to think about the cloud. Do remember when cloud first came out and everyone was like, cloud? Where is the cloud? Doesn't matter where the cloud is, it's the cloud. It's everywhere. I guess, yeah, I guess the thing with an AI is just because it's so fantastically complicated. I think a lot about the people involved in creating it, hundreds of thousands of people potentially involved in creating an AI that does something.
clever and intelligent, particularly with all these new large language models that have hit the scene in last few years.
AUBREY LOVELL
Right, and we kind of do have a golden rule, right? I mean that golden rule really is with AI, the input equals the output. So whatever you're putting into it, it has to be a good caliber in order to receive a good output or a positive output, right? And I think, you you can program a very, very basic neural network on your own laptop, right? People are learning to do that with coding, et cetera. But this is kind of would be almost a gimmick rather than anything useful on a grand scale like what we're talking about with some of the things that we do within HPE and with the overall enterprise space.
MICHAEL BIRD
Right, because the amount of energy and processing required to programme, and train sophisticated AI models can be enormous. According to a 2024 article from The Economist, one major large language model took over 50 gigawatt-hours of electricity just to train and we’ve linked to that stat in the show notes.
But it’s not just about the power requirement to build an AI, nowadays, AI production can be a matter of national security.
AUBREY LOVELL
That's true, Michael. And one way to make things more secure is to do them all in one place, which you can control. Enter the AI factory. Now to clarify, this isn't a factory which is run by AI. It is quite literally a factory in which AIs are produced, but I don't want to give too much away because we have a brilliant guest joining us to delve deeper into this topic.
MICHAEL BIRD
Yes Aubrey we do. Iveta Lohovska is the global CTO for AI factories and sovereign AI at HPE. She is here to shed a little bit more light on what an AI factory is and how they work. Iveta, thank you so much for joining us. What exactly is an AI factory?
IVETA LOHOVSKA
So this is a very good question because most of the time when you say an AI factory, people think of the facility where you produce different things and you're using AI to enhance the production or the operations of the AI factory. But this is not the case. An AI factory is a specialized environment designed specifically to create, train, deploy, deploy inference AI models. It takes a specialized set up in this case, specific hardware, software networking to produce a high quality AI solutions at scale. So the factory itself is producing AI at scale.
MICHAEL BIRD
So what exactly are the processes which need to happen inside an AI factory?
IVETA LOHOVSKA
Well, there are small and big AI factories. Even if they're small or big, depending on the size and the requirements, either of a customer or the government or specific public sector institution, ultimately, you need to be able to do several things in sequence or in parallel. This is the data ingestion and the data preparation. So the AI factory needs to have the capacity to collect, hold, clean, label, and process all of this data. And in the case of an AI factory, we don't mean megabytes or petabytes. We mean terabytes and further
Then you need to have all of the tools to be able to develop a model, to develop a model for scientific purpose or some simulation like traditional HPC workloads, high-performance computing workloads, or to build a huge large language model at scale with billions of parameters.
If you want to train large language models, you need to have the compute, the specialized AI compute to be able to train that. And then validate, evaluate, deploy, and serve to your end users, serve to the science department, serve these models to the general public, to average users like us, or serve these to public sector users or departments of different ministry or the healthcare ministry, the defense ministry. So those are kind of specialized users. And of course, horizontally and vertically, you need to be able to monitor and maintain this system. Also, most of the time with the use of AI, we call this AI ops. So the use of AI in the operational aspects of the factory.
MICHAEL BIRD
Hmm. And so these AI factories, it sort of sounds like they're almost industrializing the process of building an AI.
IVETA LOHOVSKA
Yeah. there is definitely a focus on basically making a production of AI at scale. And this is what AI factories are trying to do.
MICHAEL BIRD
So I think what I find most surprising is that actually there is so much AI at scale happening. the perception I think maybe in the wider world is that AI models cost so much and are so tricky to create that actually an organization would maybe only have one. They would just interpret all of the data of an organization and maybe make some sort of large language model to interpret that. But from the sounds of it, actually there are organizations or countries or governments that are creating different models and workloads for specific purposes.
IVETA LOHOVSKA
That is correct. Yes. And you have both approaches. So at least I have seen both approaches where based on the maturity of the end users or the organization building these, you'd have, or also the budget and the muscle they have to, or where they are in this journey that will start with no more generic system, but still capable enough to enable some kind of innovation, some kind of scientific research or product development at the top, but they're also very advanced users and customers in organizations that have very specific niche. In healthcare, they're working with, for example, very specialized models with very specialized datasets, and they know what exactly they need and they don't need.
MICHAEL BIRD
and AI factories are a big part of the sovereign AI movement.
IVETA LOHOVSKA
They are. Yes.
MICHAEL BIRD
And so can you sort of go into detail how an AI factory, I guess a sovereign AI factory, can be important and can work for national security, particularly when we talk about governments?
IVETA LOHOVSKA
Yes, so in the context of sovereign AI, with an AI factory, you demonstrate the control of the AI system. You control the data, the gravity of the data, the model development, then the IP of the models, which are key factors when it comes to national security, but also how innovative a country is. Then you reduce the dependency on foreign technology, enhancing your own resilience within the country, like every critical infrastructure, like water and electricity and post services and transport services and train services. This is treated the same way. AI infrastructure, AI factories and sovereign AI factories.
MICHAEL BIRD
So what is the alternative to an AI factory? Because clearly not every organization is doing this, has the capability to do this, or is aware of it. So if you weren't implementing an AI factory, what would your options be? And what would be the positives and, I guess, the negatives of it?
IVETA LOHOVSKA
So when we speak of an AI factory, we don't mean one or five or 10 GPUs. We mean hundreds, 2,000 GPUs of type of accelerators used in an AI factory. Of course, in the context of small enterprise or small business, you can start with a small scalable system with the same first principles and physical laws that will help you scale. If your model, business model, operational model or nations model and the demands of these scaling, but you can definitely start small. You don't have to start with those large scale systems. You can start with a small computing environment. So that's one path forward of how you can stand up on an environment which is specifically designed for AI, but has the principles of scaling out and up so that once your needs require, you can build something closer to an AI factory or an actual AI factory
AUBREY LOVELL
Thanks so much Iveta.
The control you get by creating an AI in a single location is one of the most fascinating parts of this because yes, of course there is the efficiency benefit of building it all on dedicated systems, but the additional security implications of having complete control at every step of the way is incredibly important.
Looking forward to learning more about AI Factories later in the show.
AUBREY LOVELL
Alright then. Now it’s time for “Today I learned”, the part of the show where we take a look at something happening in the world that we think you should know about. Aubrey, what have you got for us this week…
MICHAEL BIRD
Alright, well, I'm going to stay on topic this week as we are about to talk a little bit more about large language models, LLMs for short. And the reason that we're discussing them today in an already AI-filled episode is that when scientists put a large number of LLMs into a room together and left them to interact, they started to display emergent behavior and develop social norms.
AUBREY LOVELL
Hmm, interesting.
MICHAEL BIRD
Yes, and in an article published in Science Advances a group of London-based researchers presented an experiment in the form of a game. In this game, the LLMs were paired up and asked to each select an answer from a pool of 10. If both models in the pair chose the same option, they would win and get a reward. If not, they would be shown the other model's choice, and both would be penalised for not selecting the same option.
Now as the experiment went on, models would swap partners and start the process again with only the memory of their own previous interactions to guide their choices. At the start of the game, the answer chosen was random, but very quickly, models began showing bias towards specific answers which made them more likely to win faster. When the experiment was repeated using 26 answers instead of 10, and up to 200 different AIs, the same thing happened. These social conventions kept reappearing.
Even more bizarre, the bias could not be traced back to any one individual. The bias arose as a result of the interactions between different LLMs alone and were replicated using multiple types of LLM. The researchers hope that this discovery will help pave the way to heading off and avoiding harmful biases developing in LLMs as we continue to integrate them more and more into our lives. I think that's quite fascinating. It's sort of the sociology of large language models, isn't it?
AUBREY LOVELL
It really is. mean, I think you're seeing all these use cases, especially the last couple of years of how AI really can help us. But I think it's also important to underscore the essence of really knowing to have, you're trying to have responsible AI, right? Whether it's, you know, in some of these larger models or the larger language models, making sure that there is human governance control and responsibility baked in so that these things kind of don't happen or don't go off base, right?
AUBREY LOVELL
Right then. Now it’s time to return to Iveta and Michael’s discussion about AI factories.
MICHAEL BIRD
All right, Iveta, we talked a little bit about why people would want an AI factory. Can you talk me through the process of setting one up?
IVETA LOHOVSKA
So let's start with the initial building blocks of what an AI factory is. You need to have the location, the facility, access to electricity or source of energy, multiple physical constraints and things that need to be taken into consideration. And then from the architectural point of view, you have the HPC clusters, the high performing computing clusters that you need to design and prepare and set up in the right architecture based on what your requirements and scale is. you have the operational systems which will run this hardware. And then you have the control plane. The control plane is the layer that manages in a single view all of the users or the tenants of this system.
I mentioned multi-tenancy and this is where on this control plane layer is where ideally it gets handled where you have to separate entities. If you have different departments, different companies that they don't want to share data, they don't want to share what they're doing when running from the same system, this is where this gets addressed.
Then further up in the stack, you have the AI workloads. So the different type of applications and use cases that we will run and flourish on this system. Then you have high bandwidth, low latency network infrastructure. And of course, storage, storage for handling all of the data, the processing, are kind of the fundamental pieces before moving to the more of a like software layers of an AI factory.
MICHAEL BIRD
And so how much HPC, so how much high performance computing is needed for an AI factory? Is it the primary use these days for high performance computing?
IVETA LOHOVSKA
Um, it is, yes. Um, traditionally, um, um, so HPC systems were used for, you know, 80, 90 % HPC jobs and maybe some kind of, uh, algorithmic work and, and AI work, um, with what is left. And now it's exactly the opposite. At least from my work, I would say 70 to 80 % of the workloads are AI related workloads. And the leftover is are traditional HPC workloads. And this is a dramatic change and change that happened in such a short time that for the people in the industry, for the vendors, for the builders, it's very difficult to keep up with. And that's what we're trying to address.
MICHAEL BIRD
And how long have AI Factory been around for?
IVETA LOHOVSKA
So I would say a factory is a relatively new term in the way we using to describe those large scale HPC clusters. And it's just a new fancy word that we came up with. But I would say it's relatively recent. The type of the work is not recent. We are doing it for quite some time.
MICHAEL BIRD
Yeah, so AI factories have been around for a little while, but the term is quite new.
IVETA LOHOVSKA
Exactly, yes.
MICHAEL BIRD
So what does the future look like for AI factories?
IVETA LOHOVSKA
I think it's bright and it's very exciting because it's changing so fast and in a way it's complex because it requires a multidisciplinary approach. have hardware folks, HPC folks, networking folks, and software folks up to the use cases to the scientists working together to design the best system to run an operationalized AI. But then we can think of the high impact use cases that you are able to address with AI factories and only with AI factories at scale when it comes to the defense programs, to the health care and life science research programs that you can only enable with these specialized systems. And I hope more countries, more nations, more teams will have access to those and are able to enable their work, get trained, get excited by the compute, the speed, the performance that they're getting out of those systems so that they can push their respective field, either biology or material science or healthcare.
MICHAEL BIRD
So final question then, why do think our listeners should care about AI factories?
IVETA LOHOVSKA
AI factories drive innovation. best example I can give is Lumi in the Nordics in Finland. Just the fact that you have this very powerful machine hosted in the country and so many startups and enterprises have access to build new models, new services, new products, and then just build a whole ecosystem around that itself is kind of the most important factor that you're building educational services around those AI factories for people to adopt new skills, to know how HPC works, how AI on HPC works, how AI at scale works.
I...know for sure that it will impact the future of our work, what we do, how we do this, how fast we do this. I think we need to pay attention on this and AI factories ultimately will try to help us solve the future of working education in the area of AI.
AUBREY LOVELL
Thank you so much again for joining us today, Iveta.
It's really interesting how we've talked about sovereign AI a lot on this show. So with the crossover between AI factories and sovereignty, it would make sense that these AI factories have unofficially existed for a while, but just gone by other names. And the way in which they're taking off will be really exciting to kind of watch over the next few years. So it would be really interesting to kind of circle back on this topic in the future once we have seen, you know, quite how much of an impact these factories have on our lives.
MICHAEL BIRD
Right then, we are getting towards the end of the show which means it’s time for This Week In History. What was your clue for us last week Aubrey?
AUBREY LOVELL
Ok, so it's 1919 and this solar eclipse is about to prove this theory relatively correct.
MICHAEL BIRD
Mmm, yes, I think I thought this might be something to do with the theory of relativity, mainly because our producer had put the word relatively in italics.
AUBREY LOVELL
Well, your intuition is correct. Let me set the scene for you. Okay. Because I'm going to tell you a little story. So story time on May 29th, very specific 1919, 106 years to the day before the release of this episode, there was a total solar eclipse, but we need to go back a little further because in 1905, Albert Einstein published his special theory of relativity.
Within 10 years, by 1915, he would go on to publish and finalize his theory of general relativity. But a theory is just an idea until you have hard proof, right? So in 1919, astronomers Frank Dyson and Arthur Eddington set out to test the theory and to do that, they used a solar eclipse and what would go on to be called the Eddington experiment, which is an experiment designed to measure how much the sun's gravity would bend and deflect starlight passing by it.
During the solar eclipse, the moon acts as a natural barrier, blocking the sun and leaving only its corona, which is the outer atmosphere, visible to us on Earth. When the sun is blocked out, dim stars, which would normally be hidden by the sun's dazzling brightness, became visible. Throughout the eclipse, Eddington took photos of the sun and, later that year, would go on to present the results of his experiment: The starlight was deflected in a way that was predicted by Einstein's theory - An announcement which would also signal the replacement of Newton's theory of gravity with the theory of relativity.
One of the most interesting parts of this story though is that general relativity doesn't explain everything. It's still incomplete. It's also worth noting that there are questions as to the accuracy of Eddington's equipment and whether he simply discounted any findings which didn't match up with what he was looking for. Of the 16 photographic plates he made during the eclipse, he noted that quote, one plate I measured gave a result agreeing with Einstein.
MICHAEL BIRD
My goodness. Well, there we go. That took me back to GCSE, A level physics, high school science for you.
AUBREY LOVELL
Definitely. Ok then, what have you got for us next week Michael?
MICHAEL BIRD
Good, well thank you very much. So... Well Aubrey, I'm glad you asked. It's 1920 and this scientist is about to present a theory about the existence of another part of an atom which would take over a decade to be proved right.
Part of an atom. I'm gonna say... I'm gonna put my hat in ring for quarks. That's what I'm gonna go with. I'm gonna say quarks.
AUBREY LOVELL
Interesting
MICHAEL BIRD
Maybe I'm too early for quarks. Electrons, protons? 1920 feels like it would be... That's what would have been discovered. I'm say quarks.
AUBREY LOVELL
I feel like that's a good guess. I don't think I'm going to weigh in on this one. We'll have to find out next time
AUBREY LOVELL
Okay that brings us to the end of Technology Now for this week.
Thank you to our guest, Iveta,
And of course, to our listeners.
Thank you so much for joining us.
MICHAEL BIRD
If you’ve enjoyed this episode, please do let us know – rate and review us wherever you listen to episodes and if you want to get in contact with us, send us an email to technology now at HPE dot com.
Technology Now is hosted by Aubrey Lovell and myself, Michael Bird
This episode was produced by Harry Lampert and Izzie Clarke with production support from Alysha Kempson-Taylor, Beckie Bird, Alissa Mitry and Renee Edwards.
AUBREY LOVELL
Our social editorial team is Rebecca Wissinger, Judy-Anne Goldman and Jacqueline Green and our social media designers are Alejandra Garcia, and Ambar Maldonado.
MICHAEL BIRD
Technology Now is a Fresh Air Production for Hewlett Packard Enterprise.
(and) we’ll see you next week. Cheers!