Is This the End of IaC Management As We Know It?

The original goal of Infrastructure as Code (IaC), which HashiCorp distilled in 2021, was to enable reusability of configurations for resource provisioning. “Infrastructure automation is a four-phased journey,” we were told, the fourth phase including the availability of self-service provisioning.
The open question, heightened recently, has been whether cloud native environments are stuck on Phase 4.
It’s a deliberation that organizations might have comfortably back-burnered, until last summer when Terraform’s continued viability as an IaC industry-standard suddenly came under intense scrutiny when HashiCorp changed its license scheme from a purely open source model to a less-than-open alternative. Since that time, the Linux Foundation-backed OpenTofu initiative appears to have changed the headers of code HashiCorp had previously released under its new Business Source License (BUSL), rereleasing it under the MPL 2.0 license. That’s HashiCorp’s charge, which OpenTofu denies.
“Folks,” wrote InfoWorld’s Matt Asay, “that’s not how open source works.”
The Disparate Centralization Paradox
Whenever a platform provider moves away from an open source licensing model, the immediate impact of that move is felt by the third parties that, up to that point, had been supporting the platform with tools, integrations and plug-ins. For Terraform, the major category of support came from management tools. Their providers were suddenly and unexpectedly faced with a choice: Enter into a business arrangement with HashiCorp (which wouldn’t come without costs) or follow OpenTofu down whatever road it takes.
Perhaps just as painful a decision is the one Terraform’s users are faced with now, especially if they’ve become accustomed to managing their virtual infrastructure in the style of one or the other Terraform tool or system. It’s making them explore alternatives they hadn’t considered before, including AWS CloudFormation, Red Hat Ansible and Perforce’s Puppet. If the IaC market didn’t appear so fragmented before, it might now.
As organizations ponder their decisions, Quali argues, they may be investing more time and resources into gathering their IaC skill sets into singular talent clusters. You’re starting to see the Terraform talent pool separate itself from the Ansible talent pool… and ne’er the twain shall meet.
“The centralization of [IaC] is both a symptom and a cause of IaC’s complexity,” said Colin Neagle, Quali’s vice president of marketing, in an interview. “It’s centralized because only a select few people know how to provision it. They manage the IaC modules, they know where they are, and they know what the parameters are for where it’s being deployed — on which cloud region. They have the security credentials. When it comes to individual cloud services, that’s fine. Where we see this really start to break down is in application environments.”
In those environments, he said, configurations need to be as accessible and immediately deployable as the applications themselves. Otherwise, every time a dev requests a resource or a staged environment, it launches a whole new lab experiment.
IaC: ‘Utopia for Devs’?
“When IaC [first] came around, it was a utopia for a lot of developers,” Quali’s head of product strategy David Williams told me, “especially in the early DevOps days.” Back before cloud computing became suitable for production workloads, the task of provisioning infrastructure was delegated to capacity planners. These were operations personnel, many of whom charted their resource capacity estimates on paper.
Developers really enjoyed that freedom of choice, Williams said. “Terraform was built around that ability to do democratization of code, aligned with what they wanted. The biggest problem was, like most bottoms-up practices, once it takes grip, it requires all these things like governance, policies and tagging standards… There’s a lot of fragmentation. If the three of us were using the same IaC platform — Terraform, OpenTofu or whatever — it’s unlikely we’d write our code and interpret that code in the same way.”
With the ratio of each organization’s software engineers to infrastructure engineers generally estimated to be 10:1 — perhaps 20:1 for a company experimenting with generative AI (GenAI) — the result is a kind of supply chain crisis. Here, the demand for IaC as a commodity clearly outweighs the supply. And since the supply task falls to people whose skills not only have to be certified but also have become widely diversified, there’s no longer a single lever anyone can apply to open the floodgates for preconfigured and battle-tested infrastructure resources.
“I think IaC automation was designed for the right purpose, at the right time,” remarked Neagle. Now “more and more people are reliant upon cloud infrastructure, and the tools used to provision them weren’t designed for those people. There’s a learning curve, an education barrier to using them. Platform engineering as a concept is designed to bring down that barrier and deliver a developer platform where people can access what they need without having to learn the intricacies of the technologies used to provision it.”
Configuration by Blueprint
Quali’s solution involves its developer platform, called Torque. With it, organizations can leverage their existing configuration code files, letting Torque recompose them into interlocking blocks of specifications. Those blueprints become wrappers around the existing code, so they’re not imported so much as referenced.
Torque can consume a variety of sources, including Helm charts and Kubernetes YAML-based deployment files, Williams explained. During this process, the functions stated in imperative IaC files, such as those used by Ansible, declarative IaC files used by platforms such as Terraform, and JSON and YAML lists used by AWS CloudFormation and others, are brought into Torque by leveraging support and integration with GitOps and reevaluated under a policy-driven mechanism. That reevaluation leads to blueprints that follow governance standards and policies set by the Torque user.
One common example case involves a reusable blueprint for configuring and deploying an AWS S3 bucket for binary large object (blob) storage, with the intent of making it available to developers. Governance practices stipulate rules and restrictions upon how, or even when, devs can issue such a request for resources. That’s one reason IaC engineering specialists are special: They’re the gatekeepers for corporate IT governance, and it’s one reason Quali believes they’ve become the modern data center’s equivalent of the “priests in lab coats” of the 1960s and ’70s.
Torque features an integrated policy that organizations can deploy that restricts an S3 bucket request to private storage only and perhaps also for designated dev teams. When a user invokes the S3 blob public storage request, Torque responds by first examining all active resource configurations. If there’s already a public bucket active, Torque can deny that attempt until it’s accompanied by express approval. The absence of this feature, Neagle said, would open the door to security vulnerabilities.
So let’s say there’s a blueprint for Torque to launch a production environment. An IaC engineer can write the code for validating the launch request for security compliance. More likely, that code has already been written. But that compliance code will probably run only once in a standard IaC environment. With Torque, said Neagle, compliance checks and validations may be automated to run daily or twice daily. In case of a validation failure, Torque can trigger a notification policy, which may involve adding approval workflows to the organization’s IT service management platform, such as BMC IT Service Management (ITSM).
“You’re not only democratizing the access,” Neagle said, “but you’re consistently monitoring everything that’s running from that platform to make sure that it is secure. You couldn’t democratize it [before] because you didn’t know what people were able to do. It was the cart before the horse. Now you know what they’re able to do, you can democratize it.”
In a more complex example, Neagle explained creating a blueprint for a staging environment for a web app, tied to an S3 bucket and a virtual private cloud (VPC). Suppose such a configuration option isn’t already available in the catalog. A user with permission may invoke a text prompt with a GenAI interface. This user can use natural language to prompt, “Create a staging environment for a web app using AWS S3 and VPC.” Torque interprets this text using GenAI, produces the internal YAML code for configuring this environment and allows the user to run it. The user sees a graphical representation of the new environment, with the S3 bucket clearly marked, and an option to view the configuration code in YAML.
That YAML code, he continued, will fully illustrate the dependencies between the web app and VPC. “You can see all the parameters, everything that’s needed to provision each resource [while] following those dependencies.”
Generative Glut
It won’t shock anyone to learn that Quali, along with the rest of the cloud computing ecosystem, is grappling with the sudden influx of AI workloads.
“It creates huge amounts of demands on the infrastructure that are not always feasible for the engineers to work out: ‘How much cloud do I use? How much data? How much storage?'” said Williams. They’re talking with other cloud providers, asking how to help organizations use GenAI technology without it bankrupting them. “If you’ve got a badly defined model, you’re going to be eating the world.”
Organizations will want to impose restrictions on developers’ resource usage, Williams foresees. Those restrictions will be based not on capacity — which the IaC engineer understands more readily — but instead upon cost. Presently, enabling the restrictions necessary to maintain compliance and achieve security objectives requires, at the very least, expert guidance. Meanwhile, the influx of talent in platform engineering is weighted towards AI engineers who may not know what these infrastructure resources even are.
“We’re trying to provide the ability for companies to optimize the amount of skills they have,” said Williams, “so not everybody becomes an infrastructure expert. And they have enough knowledge of the way we’re articulating what the infrastructure is, in a natural language-type of way, that enables them to ask the things that [require] an understanding of cloud services. We will do that for you.”
As the ratio of IaC experts to developers becomes ever more lopsided, Quali believes these experts will still find value in their organizations by contributing to a system that is more actively and regularly governed. “We’re not saying that it’s auto-magic,” said Williams. “We’re saying, if you have minimum infrastructure skills, we can maximize them.”
For more insight, check out Quali’s self-starting Torque Playground, accessible from Quali’s homepage, to see how it provisions space and resources.