Summary:
Hugo Huang discusses the challenges CEOs face when adopting generative AI, including unpredictable costs, infrastructure bottlenecks, workforce shifts, and ethical concerns. He emphasizes lifecycle planning, cost dashboards, and strategic implementation to balance innovation and risk.
This transcript has been edited for clarity and length.
Mike Sacopulos: My guest today is Hugo Huang, an expert in cloud computing and business models. He works for Canonical which is a vendor for Google. Hugo holds an MBA from the MIT Sloan School of Business, and he is going to guide us through generative AI considerations. Hugo Huang, welcome to SoundPractice.
Hugo Huang: Thank you, Mike.
Sacopulos: For our audience today, we're going to be talking about an article of yours that appeared in the Harvard Business Review. The title of that article is "What CEOs Need to Know About the Costs of Adopting Gen AI." So, first, you have to walk me through: what is generative AI?
Huang: In most cases, people like to compare the current development of AI to the previous generation of AI excitement, which we now refer to as predictive AI. That means we use an algorithm — also an AI model — to predict what the machine can identify. In many cases, like in the visual case, when a machine or camera captures a picture of a cat or dog, the algorithm can understand, "Wow, this is a dog, this is not a cat." It can output this understanding without any human guidance and tell you whether this is the object you want to search for. That's the previous generation we call predictive AI.
Currently, we're more excited about generative AI, or Gen AI. The difference is that generative AI can generate something. What can it generate? In most cases, when we talk about large language models, it's based on language. You give them an input — basically like talking with a person, giving a couple of words or a few sentences — and then the model can generate something that may respond to you or complete your sentence. Basically, you feel a sense that you're talking with a real person. The persona of this person may be different depending on the settings you want. Maybe this could be your coach, or it could be a teacher. There are many ways to configure the model to make sure it works the way you like it to.
Sacopulos: In your article, you open by noting that some companies are pulling away from cloud migration because of unexpected cost pressures. Why do so many CEOs underestimate the true price tag of generative AI? Is it a misunderstanding of the technology, or are the billing and usage models simply just that complex?
Huang: I think — I'll share a case later — but before that, I think we could say 70% of the cases are missing the full lifecycle consideration. As an AI application, it's not like you buy a physical machine. This isn't a tool where you have the fixed cost and the maintenance cost for that machine. It's not really like that. AI is more of a process — when we talk about AI, it's not actually a program. It's more of a process of dealing with data. So, you never have a clear estimation of how much data you need to process.
I think two or three weeks ago, there was a very popular case: an acquisition between Meta and a startup called Metus. Meta spent more than $2 billion on this small startup. What did this company do that made Meta so excited about it? Actually, what they did is making many of their users worried about the cost. From their financial data, it looks very promising — in just nine months, this company went from launching the product to reaching $100 million in recurring revenue for the year. That's amazing for any startup, just nine months. People are excited to realize many of their enterprise customers are paying thousands of dollars per person for the subscription of their service.
So, what do they do? Basically, if we're talking to a ChatGPT or Google Gemini, you just input a sentence and the language model will, based on your sentence, output the appropriate content for you. That's what we call a one-off situation — just one conversation. You ask, and it responds. But Metus is quite different. They're running an agent business.
Suppose I, as a user, ask, "Hey, Metus, could you help me research all the startups that Y Combinator just had in their last batch?" Metus will do that. They'll assign multiple agents to different jobs. Some of the agents will scrape all the websites of Y Combinator. Others will, based on this list, research any public information about those startups. So, it's like a back-and-forth information exchange. You don't really know how many tokens you consume, how much information you feed to those multiple agents. Then when you get the bill, you're so surprised. "Wow, how much did I use? I just asked one sentence. How could that one sentence, one request, cost me maybe $100?" You probably never thought about that.
So that makes this business unpredictable. As a CEO, you might say, "Oh, we have to run a project about AI. In the future, all our employees must use this AI tool or use these AI agents and make sure they consume the latest and greatest model" — which would be the most expensive model. And you're very surprised by what you find by the end of the month or the year. That's why it's unpredictable. You have to have the full lifecycle sense of how much you're consuming.
Sacopulos: In your article, you describe a client who invested heavily in CPUs but ran into a bandwidth bottleneck that made the system inefficient. Can you walk us through that example? Why isn't it enough for a company just to buy fast chips? And what hidden infrastructure costs do leaders commonly miss?
Huang: Yeah, I can walk through the real case and maybe use another metaphor about what exactly will happen to a normal person. The case is that the company is truly a real customer with Google, and we work together to serve that customer. The customer purchased a lot of Nvidia GPUs and had them installed at the Google data center. But unfortunately, when they clustered all the computers together — like multiple GPUs — the GPU, when it finishes processing the data, has to feed it back through the CPU. That CPU talks to another CPU, and that CPU channels the data to the GPUs of that machine. So, one machine can only have one CPU, but it can have multiple GPUs. That means there's a bottleneck between CPUs.
Unfortunately, when they cluster multiple machines, multiple CPUs together, the GPU-to-GPU connection always relies on the CPU. So, you have to solve that problem. Otherwise, no matter how many GPUs you add to that CPU, it can be fast for that single machine, but it cannot be fast for the whole cluster. You'll still have this problem.
It's just like you have a very good car with a very powerful engine, but your brakes don't work — they're very weak. What can you do? You cannot run that fast, right? You should consider how much you can force the pedal to stop the car. That's the real issue. Not only CEOs, but also IT managers and others who handle the situation need to consider this.
Sacopulos: Very interesting. You also write that generative AI shifts organizations from a traditional talent pyramid to more of a diamond structure. What does that mean for CEOs' hiring strategies?
Huang: Well, that's really a very exciting topic these days. You've probably heard the term "one-man company." There are many startups that aren't designed for hiring thousands or hundreds of thousands of people. They'll continue to be one founder who uses multiple agents to work for this person.
When we talk about the diamond-shape organization, we're thinking about the workforce of the future. With AI, AI can replace some of the junior work — we're probably not replacing the employees, the person entirely. We can think of it as actually replacing those types of jobs, those types of tasks. You probably have someone who can do the work by himself or herself, but who can also monitor multiple agents to do the same job.
In the past, you had one IC — individual contributor — who could do the work for himself or herself. But in the future, this person can still be an individual contributor in your organization chart. At the same time, this person can manage maybe 100 agents to do the job that previously required his or her manager and a team to do. That's what we call diamond.
So, you'll see in the future, we still need some individual contributors. Some jobs cannot be replaced by agents. We'll see the middle layer get bigger and bigger because people get promoted not because they're hiring new people, just because their skills can manage more tasks, more complicated tasks. They have more bandwidth to think strategically, to think about more important tasks they previously required their manager to handle. So, we have a much stronger middle layer. And the top layer stays the same, I assume, but maybe sometimes it will change if your CEO can directly use so many agents by himself or herself — who knows?
To the bottom layer, we'll see we don't need so many junior graduates to perform the very basic jobs.
Sacopulos: Do you think that organizations will gravitate towards that diamond shape over time, or do you think it will be an implemented strategy? I just wonder if there are enough employees now with sufficient skills to enter into a diamond-shaped model. Is this going to take time to evolve, or do you think it's a competitive advantage for someone to try to put that into place sooner rather than later?
Huang: Well, as we saw from the early '90s, we quoted "the earth is flat." At that time, we considered you could outsource any kind of work to somewhere cheaper. But we didn't see that happen dramatically. We still see that as a trend, and these days we still see some big companies outsource their workforce to somewhere else. I think that's an option, if you can get a benefit.
When comparing an agent with a real person, there will definitely be some advantages. They're always loyal, they're always friendly. If you ask ChatGPT, "Can you accuse me of something?" I guess it will refuse. It will continue flattering you. But there are always problems.
People are very good at strategic thinking. We can think outside the box, but agents may have some constraints. I recently read an article from the MIT Technology Review. It said it was very surprising that current large language models are so powerful they can solve some unsolved math problems. So, people are starting to think about whether we need so many mathematicians. Are they actually useful?
But when people looked at those problems we assumed were unsolved, those questions were actually solved by someone, but not well documented. They weren't published in a way that everyone knew. So, the model is just very good at finding something. They're not good at solving or driving innovation based on current information. They're very good at finding where we've already solved the problem. So it has its advantages, and people always have our advantages.
We shouldn't — I think that's a kind of dilemma, and people argue so much about whether AI will replace us. I think as a person myself, I think as long as I can harness, can learn some of the AI tools, I will be at least, for the next couple of years, able to make use of that, not be afraid. In the long term, what should I do? That's my philosophy.
Sacopulos: Well, I think that's a good philosophy. In your article, you argue that security and ethics are not just governance issues — they're major cost centers if neglected. Between shadow IT, data leakage, and model bias, where do you see companies losing the most money when they cut corners on responsible AI?
Huang: We recently did another research to see what — basically we surveyed the executives and decision makers in large enterprises about what they think about shadow IT. In most cases, shadow IT could be the most concerning. But it actually depends, though. I mean, the survey is only about people's opinions. It's not about facts.
When you look at shadow IT, it basically gives you — especially gives the managers, the leadership — uncertainty. What are they using? Are they using ChatGPT or are they using Gemini? If my company does not have a contract with Gemini, with Google, are they feeding my confidential data to Gemini? So, Gemini knows my business secrets. That's part of the risk of shadow IT.
When we talk about the cost of bias, the bias doesn't make you feel you're losing something at the beginning. But when you actually sense that, when you actually find you have a biased model, you have a biased decision chain for a long time, it's too late to remediate. You've already made a lot of mistakes.
So it's very hard to say which one you should avoid, which should have priority. I think security is always a list — it's a compliance list. You have to check all the time: Am I compliant enough? Is there any risk that some bad actors could come onto my floor to do something I didn't notice?
It's more about the wisdom of running a business, relying on someone who really cares about the business, paying attention to the work, to the current situations, and current usage of different tools. Some people — not necessarily full-time, but at least they have their KPIs, they have their responsibilities to manage this kind of stuff and pay for that, obviously.
Sacopulos: I was very interested in your discussion of a generative AI cost dashboard for CEOs. I thought this was a great idea. If a CEO sat down with you next week and opened up this dashboard that you're discussing, what are the top three metrics that should be on that dashboard?
Huang: Well, like we talked about with the Metus case, generative AI costs are very hard to predict. Everyone hopes they're sitting in front of a dashboard like driving a car — you turn the wheel and the car turns itself. You don't want surprises. But actually, for AI, not only AI but for IT systems, the response to inputs is not always as punctual or as direct feedback to you as in most human societies. So, it's very hard, but that doesn't say we cannot do anything.
Many big companies — especially, we all know about Salesforce and their dashboard — have dashboards about themselves. It's very important for business leaders to have a good sense of how the business will go. But do I truly believe those percentages of the close rate, deal close rate, are accurate all the time? Probably not. But with a dashboard, you can rely on this.
For generative AI, for an enterprise AI system, it's the same. If you've set up some metrics — for example, how many virtual machines have been started today, how big they are, how much I'm paying in the cloud because that's real time — you can have an estimation. Based on the consumption today, I can estimate how much it will be tomorrow.
There are other ways, like comparing different costs. We know some models are metered like ChatGPT, like OpenAI APIs. You have to pay real-time. How many tokens you consume, you have to pay that much. But there are other models like DeepSeek, like Ollama. They can be installed on a specific machine, be local, and don't even talk with the internet. So you can still ask them questions. They can have some genuine answers, maybe not as good as the latest model like GPT-5, but still it works in some ways.
If you have a better sense of that, you can have a strategy. "Oh, I understand, okay, those are models to answer very basic questions. They're in my virtual machine. The virtual machine is there, the machine is there, it's running every day. So if today it costs one dollar, tomorrow it will cost one dollar as well." Then I know, okay, that's one of my dashboard data points.
For very innovative problems, we need some scientists. We need some very critical customers to engage with these models to feed new data. Then we know — but we know it's unpredictable, but we should make sure this kind of model is in a limited scope. We should set a limit, say, maybe 2,000 or 10,000 tokens per conversation. We have a limit. We understand, okay, you couldn't feed 10,000 tokens every second. It's not possible. So, you have a bond of how much it will cost. You have a better sense in that case. So, you separate the dashboard and you have at least the worst case and the best case — you'll see how much it goes in the future.
Sacopulos: Very interesting. As our time together comes to a conclusion, I'm interested: if you could give one piece of advice to the leader out there who's afraid of the cost of generative AI, but at the same time is also afraid of being left behind, what's the sweet spot strategy for getting started safely while still keeping pace with innovation?
Huang: Well, maybe I cannot give very good advice, but I can give a warning. If you fear missing out, if you do not try, you're always missing this trend, you're always missing something you should do. But if you try, pay attention to the worst case. In that case, you'll understand how much you can learn, get better and more prepared.
Sacopulos: Hugo Huang, sound advice. Thank you so much for being on SoundPractice.
Huang: Thank you, Mike. My pleasure.
Listen to this episode of SoundPractice.
Topics
Technology Integration
Economics
Critical Appraisal Skills
Related
Why Training Employees Pays Off TwiceDeveloping a Thicker Skin as a LeaderHow to Make a Seemingly Impossible Leadership DecisionRecommended Reading
Operations and Policy
Why Training Employees Pays Off Twice
Operations and Policy
Developing a Thicker Skin as a Leader
Problem Solving
How to Make a Seemingly Impossible Leadership Decision
Problem Solving
Redefining Physician Leadership: A Comparative Review of Traditional and Emerging Competencies and Domains



