In recent years, but specifically since November 2022 when ChatGPT launched, AI has been driving innovation at an unprecedented rate, transforming various industries and the way businesses operate. Every single company today, and their executive leadership, understands that AI needs to be a part of their future strategy, or they will be left behind. That is why we are witnessing a race to deliver the greatest possible innovations on top of AI-powered everything. This is largely a byproduct of AI’s democratization that has made it consumable for the masses — from users to innovators.

Tech executives today are telling their engineering teams, “We need to have an AI story NOW,” with little regard for how this is ultimately implemented in their systems. The race is real, and it has its own set of unique implications, especially for those managing cloud infrastructure. This AI rush is creating AI tech debt at an unprecedented scale, and understanding these implications is crucial for ensuring that our cloud environments remain efficient, secure and cost-effective.

A Look at the Dual Impact of AI

As a cloud asset management company, we are witnessing an AI disruption through the consumption of AI-driven cloud services and assets in growing numbers through our telemetry data. From the GPUs to the managed retrieval-augmented generation (RAG) databases, large language models (LLMs) and everything else, all of this AI innovation is built upon some of the most costly cloud resources today. We urge you to check the costs of managed graph databases.

AI’s influence is twofold, affecting both consumers and the infrastructure that supports it. For consumers, there’s a growing need to ensure that the code generated by AI is aware of and compatible with their environments. This includes making sure that AI-driven applications adhere to existing policies, security protocols and compliance requirements.

On the infrastructure side, AI demands significant resources and scalability. The recent Datadog “State of Cloud Costs” 2024 report highlights a 40% increase in spending on GPU instances as organizations experiment with AI, where spending on GPU instances alone now makes up 14% of compute costs. Arm spend has doubled in the past year, which is the new backbone of AI-driven development, the chosen architecture for processors like AWS’s Graviton that are powering this AI revolution.

This surge in resource requirements and cloud spend can lead to AI tech debt that many CTOs are starting to lament. We are at a point where the velocity of AI development often outpaces the organization’s ability to manage and optimize it effectively. This can be seen in the spinning up of costly machines without proper tear down or cleanup — with cloud costs spiraling out of control. This, alongside data not being properly managed and fed into models and machines that later improperly expose this in unexpected ways, are just some examples.

Balancing Innovation With Governance

While AI presents incredible opportunities for innovation, it also sheds light on the need to reevaluate existing governance awareness and frameworks to include AI-driven development. Historically DORA metrics were introduced to quantify elite engineering organizations based on two critical categories of speed and safety. Speed alone does not indicate elite engineering if the safety aspects are disregarded altogether. AI development cannot be left behind when considering the safety of AI-driven applications.

Running AI applications according to data privacy, governance, FinOps and policy standards is critical now more than ever, before this tech debt spirals out of control and data privacy is infringed upon by machines that are no longer in human control. Data is not the only thing at stake, of course. Costs and breakage should also be a consideration.

If the CrowdStrike outage from last month has taught us anything it’s that even seemingly simple code changes can bring down entire mission-critical systems at a global scale when not properly released and governed. This involves enforcing rigorous data policies, cost-conscious policies, compliance checks and comprehensive tagging of AI-related resources.

The recent acquisition of Qwak.ai by JFrog is another indication that companies that have deep-enough pockets will be snatching up the emerging AI players for quicker time to market for competing AI solutions. With more than 50% of programmers today leveraging AI on a regular basis to write or augment code, any tools and platforms promising greater agility in this domain are gaining closer scrutiny and interest for potential acquisitions. Stay tuned for more happening on this front.

One interesting data point emerging from recent AI-driven research and development is code quality (apropos DORA metrics). This recent report by GitClear suggests that code quality is being adversely affected by AI. It states that there’s a significant uptick in code churn and a serious decline in code reuse. An interesting post on the findings can be read here.

Some of the recent critiques of AI indicate that text-based AI assistants are great, as there are massive amounts of text-based data to analyze. That is why AI assistants are able to augment typical text and deliver above-average results when it comes to generating creative or functional texts. However, the same does not hold true for code.

The large majority of available code to examine for AI modeling is actually below average. These include early projects by aspiring engineers and students, and open code that is not in commercial use. It requires many years of domain expertise to produce performant, cost-effective and quality code. Yet these types of repositories are often parsed and collected for code-based AI large language models (LLMs), making AI-assisted code quality, at this point below average to senior engineers’ code quality. High-quality code repositories are often closed source and belong to commercial applications not available to LLMs for data modeling.

This underscores the importance of integrating AI-driven innovations with robust governance structures. Cloud asset managers must be equipped with the tools and knowledge to monitor and manage AI workloads effectively, within their context, understanding the nuances of the complex systems they are managing. This includes ensuring visibility into AI operations and maintaining stringent compliance with governance policies.

Preparing for the Future of AI

As we look to the future, it’s essential to ask: What does this mean for the day after tomorrow when it comes to running AI? For organizations not developing their own LLMs or models, the focus shifts to managing expensive cloud infrastructure. This needs to be done with the same governance and cost-efficiency in mind as any other cloud operation.

Organizations must develop strategies to balance the innovation AI brings with the need for greater, and even more meticulous, governance. This involves leveraging AI-aware tools and platforms that provide visibility and control over AI resources. By doing so, companies can on the one hand, channel the power of AI toward higher-order goals, while maintaining a secure, compliant and cost-effective cloud environment.

As AI continues to drive innovation, its implications on cloud infrastructure and governance cannot be overlooked. Balancing the benefits of AI with effective management and governance practices is key to ensuring sustainable AI innovation powered by emerging cloud technologies.

‍