> terraform-lsp is supposed to provide autocomplete, but it mostly doesn’t, in my experience.
I find Terraform for Jetbrains IDEs to work hundred times better than the VSCode one. I really can recommend it, does magic to autocompletion, peek definition, snippets insertion and linting TF/HCL files.
Author of this blogpost have not mentioned Terragrunt [0], while I think it is worth mentioning. It is nice tool, especially if you work on bigger projects having multiple modules and per-environment variables.
Also, a tip from a person, who works in team using terraform - use brew/apt repositories to keep the binaries up-to-date or at least on the same version as your team mates. I remember at least few situations, when patch update of a terraform binary was crucial to make some issues disappear.
Psst... I am also using Terraform to bootstrap Pop_OS! with my dotfiles and it is surprisingly nice enough to mimic declarative and atomic NixOS configuration :)
> Also, a tip from a person, who works in team using terraform - use brew/apt repositories to keep the binaries up-to-date or at least on the same version as your team mates.
Use tfenv instead. That way you can have a .terraform_version file in your repo and tfenv will refuse to run without having the right version.
For distributed work on environments you ideally you want to have something like a gitlab pipeline that does the tf plan for you automatically, with a manual approve button of sorts. That way you a consistent tf version defined in .gitlab-ci.yml, plus due to centralized logging it's easier to find when and what changed in the envs.
I implemented this on top of bitbucket pipelines with a 'manual' pipeline step. The pipeline runs the plan step automatically and outputs it on the console. Then pipelines shows a button next to it called 'deploy' which is only clicked after manually verifying the plan results.
> or at least on the same version as your team mates. I remember at least few situations, when patch update of a terraform binary was crucial to make some issues disappear.
I find it a little evil that Terraform is used so widely in production yet still hangs on to the "0.x.y" semantic version meaning: we break API when ever we want. It should be more stable for a tool so widely used. C'mon Hashicorp, be fair to us and release a 1.0!
The arguments around the language and syntax inconsistencies here really echo my experience.
I can't help but think it just wasn't designed, but patched together by different people over time, with no over-arching vision. There's nothing in Terraform that couldn't just be YAML with some semantics (like Ansible which I find much more approachable), and while a custom language can absolutely do better than that, it really needs to justify itself, and I'm not sure Terraform does yet.
There's some hints of why it should be a custom language, but I honestly think it needs an overhaul focusing on consistency and clarity. These things really make a difference, I'd say it's one of the main reasons Go took off in such a big way for example.
I have been using more and more of https://www.pulumi.com/ for my deployments. It’s nice that the deployment code is written in js/python/go/.net so I can make it as customizable as I want.
I'm seriously considering switching to pulumi, mostly because it seems a lot more flexible. Not to mention that developers are already familiar with the language rather than having to learn all the quirks and inconsistencies of hcl on top of using the tool itself
From my ~1 year of experience with terraform, the simplicity of HCL leads to some pretty complicated config. In particular the absence of a true if, and the inability to define custom functions, or use any kind of condition or iteration on modules.
I keep seeing this popup in search results, and I'm curious to try it in anger.
I've recently switched roles, and have gone from sharing a pretty large terraform deployment on AWS/GCP to managing infrastructure written with cloudformation and troposphere.
(Trophosphere is a python library that generates cloudformation templates. Nice to have a real programming language, but still pretty constrained by the fact that the output is "just" cloudformation.)
Of the two I have had less pain with terraform, but both have caused their days of pain in different ways.
If you've been using TF for any amount of time though, 0.13 is a breaking change (because semver... oh wait...). The migration from 0.11 to 0.12 was a PITA and if you're using any third party modules or resources, they still don't have consistent support for 0.13. Usung 0.12 currently gives you the most consistent experience.
This is not bashing TF, it's just reality. I wish I had time to update everything to 0.13. The third party provider support alone makes it valuable.
I read you - I'm currently still going through 0.13 upgrade. The pain however is considerably smaller than 0.12 upgrade and I've already observed two 0.12 bugs disappearing on 0.13. The bugfix-backporting seems sluggish and it doesn't encourage me at all to keep using 0.12 despite it is "the most consistent experience".
If your only comparison is "I used to work with Heroku", Terraform might seem not great. I would argue this is a blub example, in that this person's lens does not contain enough context or experience to be able to assess value.
Terraform's value becomes clear when moving in scenarios like moving from CloudFormation to Terraform, or when trying to integrate a second cloud's resources into your infrastructure workflow. Without experience in anything other than Heroku, of course a tool designed to do many many things is going to seem complex and at times frustrating.
> Iteration times are way longer than with even mobile apps. Like, “you’re liable to task-switch while waiting to see plan output” longer.
I'm over here laughing in my "over an hour of CloudFormation only to have a run fail, and then a rollback fail, and have to contact AWS support" life.
> I'm over here laughing in my "over an hour of CloudFormation only to have a run fail, and then a rollback fail, and have to contact AWS support" life.
Oh, I see I am not alone in having the same experience with the AWS CloudFormation (CF) update flow. I have started feeling "good enough" in CF after about 6-8 months to write templates from my memory (accompanied with a documentation page). The funny thing about the CF is fact, the all people I know had working flow based on searching GitHub/Gist over an example, edit, refine, deploy, push fixes, deploy, rinse & repeat.
In the blogpost I have noticed mentioned Pulumi. To be honest, I have not used it yet, but I have tried AWS CDK [0]. I suppose it is AWS's go-to solution for everybody who wants to use the CloudFormation for saving deployment stages and also have programmed templates in languages of their choice (like a TypeScript or Python). It is interesting solution that I suggest investigating if you haven't done already. It supports "importing" CF templates as-is, so they can be incrementally translated to CDK.
I came to Terraform optimistically from CloudFormation, thinking it would fix many of the latter’s warts, and it sort of has, except it’s introduced as many of its own. I’m still undecided about which set of problems I prefer, but in general I’m disappointed with Terraform. Some particular things that bother me: I get the distinct impression that it’s trying to badly reinvent a programming language (with “locals” replacing variables, “variables” replacing parameters, and “modules” replacing functions, “for each” replacing loops or comprehensions, and so on). Additionally, I find myself wanting rollback support like CloudFormation has. It’s unsettling that TF makes it easy to get into a bad state. Further still, (and maybe this is just my organization’s use of Terraform), it seems the convention is to split the whole architecture up into lots of root modules, but the links between resources in these modules are basically string identifiers (e.g., ARNs in the AWS world) which will likely change if the resource gets deleted and recreated or if AWS changes their naming conventions or so on. Similarly, people seem to build these identifiers from strings instead of referencing them directly from resource attributes (I’ve seen this practice advertised in some of the AWS provider docs IIRC) which is bad for all of the same reasons that pointer arithmetic is bad. I do like that custom providers aren’t full-on lambdas that I have to deploy, unlike in the CloudFormation world, but mostly I’ve been disappointed.
I wonder if Pulumi or AWS CDK is the solution I’ve been searching for, or if I should just stick to generating CloudFormation from YAML.
> I'm over here laughing in my "over an hour of CloudFormation only to have a run fail, and then a rollback fail, and have to contact AWS support" life.
Heh, same. Any of the cloud-provided orchestration tools (Cloudformation, Openstack Heat etc.) are great only for the most basic tasks; using them to provision complex infrastructure is just begging for a world of hurt.
That being said. I think terraform could do better. I use terraform a lot, and yet I agree with the authors complaints that the syntax can be super confusing, is not documented very well, and the providers have their own idiosyncrasies. The last one isn't strictly an issue with terraform, its with the provider implementation. But... if terraform aims to be the Swiss army knife of infrastructure provisioning, I think criticism that its hard to standardize even within the same provider is fair.
> Heh, same. Any of the cloud-provided orchestration tools (Cloudformation, Openstack Heat etc.) are great only for the most basic tasks; using them to provision complex infrastructure is just begging for a world of hurt.
I disagree- I can't speak to OpenStack Heat (and I have no idea what you're referring to by 'cloud-provided orchestration tools' beyond these two specifically), but my own experience using CloudFormation to provision complex infrastructure is that it is in fact great for all but the most basic tasks (where any orchestration tool would just add unnecessary overhead).
I imagine other clouds have similar tools (or perhaps they have converged towards more general ones like Ansible or Terraform).
As for your disagreement: you're free to have your own opinion on this. But my personal experience has been that eventually converging infrastructure provisioning systems like CF have complex failure modes that make it hard to modularize and scale them up.
With terraform: the cloud provider is reduced to a dumb API, and most of the issues you see can be resolved client side. Whereas when dealing with issues with cloudformation, its not something you can figure out yourself, you need to hope that the error is something that's clearly displayed to you and/or open a support tix with AWS.
my own experience using CloudFormation to provision complex infrastructure
If we're talking anecdotes, my experience is vastly different. Difficult to assess what's going to be a replacement (other than browsing the doc), replacements that cause a chain of other replacements, leaf replacement that fails due to some syntax in a SSM doc that wasn't checked at the very beginning, wasting 45 min, so everything gets rolled back, but you used Retain policies, so those ASG groups are now not managed, and still live, so you need to delete them manually.
Building complex means breaking down stacks in multiple pieces, and you can only use URLs. Which can only point to S3. Which means you have to pre-upload your sources there. For which you have to build the tooling, because aws provides nothing. So not only you have to build your infrastructure: you need to build the infrastructure to build and develop your infrastructure.
You want to know why a nested stack is going to replace that ASG you though it was safe? Well, you can always dive into the nested stack changeset... aaand nothing there. You can't. Maybe in the parent stack JSON.
Complexity without loops? Good luck. Or lots of copy pastes, I guess. And the Conditions are just rudimentary and clunky.
The Resources/Events UI is just meh with no sensible sizing for the otherwise huge columns (big names, big ARNs). Impossible to get the sorting of new events coming in right. Every refresh reshuffles the rows.
And cfn L1 Support is hit-or-miss: 50% of the times it's simply not useful, because of the complexity of the infrastructure. We just get the problem echoed back to us. I'm lucky we have Enterprise, and can escalate. There would be issues we wouldn't have solved in weeks otherwise.
I very much like the fact that we have a state management tool. But calling it great is an overstatement.
> Difficult to assess what's going to be a replacement (other than browsing the doc)
Do you know of any better solution than docs? Or better implementations of docs? I struggle to imagine how they could handle this better - but I've been using CF for so long that my imagination is limited.
- the nested stack elements that are going to change, not only the parent ones
- the logical members/fields that prompt the change
from the changeset window. All this info they have, and should be trivial to show, which tells me they either don't use their own tooling (no dogfooding), or they have internal, better diagnostic tools not exposed to customers.
create-change-set – Change sets for nested stacks is not enabled by default for the AWS CLI. To create a change set for the entire stack hierarchy, specify the --include-nested-stacks parameter. For more information, see To create a change set (AWS CLI).
> Any of the cloud-provided orchestration tools (Cloudformation, Openstack Heat etc.) are great only for the most basic tasks; using them to provision complex infrastructure is just begging for a world of hurt.
It's also fun when you have to standardize something that was either created by hand in the Web UI (and has a bunch of hidden default values set), or was by some other orchestration tool, and then you have to import/recreate it in terraform. It's usually a PITA, but you'll learn a lot!
Yes! Its great to capture a bespoke configuration in code. You may already know this but there is a tool called terraformer that has limited support for doing just this: https://github.com/GoogleCloudPlatform/terraformer
> I'm over here laughing in my "over an hour of CloudFormation only to have a run fail, and then a rollback fail, and have to contact AWS support" life.
Your experience must be over four years out of date, because CloudFormation has supported the ability to continue updating a rollback since February 2016 [1].
Their experience is accurate. If you use CloudFormation for anything non-trivial you’ll be rudely corrected from thinking that this works and then spend time migrating to Terraform so you never go through that again, plus you’ll get support for most new AWS features faster. I spent 2019 & early 2020 helping people make that switch after getting burned by the CF “impossible” situations happening.
Yes, it is true. However, I am hesitant to say the AWS CloudFormation is a bulletproof tool that does not need help from the support side. If somebody was too impatient and manually removed a resource that was managed by CFN, then he might end with a deadlock, where only help was actually a support team member. Been there, don't repeat that error - please do not create (not auto-detectable) drift changes on a purpose (it could be hard to modify/remove the stack in future).
> If somebody was too impatient and manually removed a resource that was managed by CFN, then he might end with a deadlock, where only help was actually a support team member.
This specific situation can be resolved by doing a 'continue update rollback' and skipping the already-deleted resources- see the troubleshooting guide [1] for more detailed info.
Needing to contact support for this issue in 2019 only means that the user didn't know how to use the 'continue update rollback' feature properly, which was a feature added back in 2016 specifically to support the rollback-failed scenarios.
It's weird the frequency with which the AWS Support Reps agree the issues we find aren't those that would've been resolvable on our own, as each time we file a ticket we follow up with our reps for appropriate training and support.
I understand that you like this tool, but you do not need to fight everyone suggesting it doesn't work as well as you want it to.
I’ve also run into these from time to time, at an old gig where we didn’t have a support contract, and thus we just had borked stacks here and there and prayed that we would never similarly bork prod. On the other hand, we worked hard to get our CloudFormation templates to a state that we could rebuild prod from scratch fairly easily if it came down to it.
Terraform died for me, not because it didn't work well to setup one set of resources, but because when I went back to update things much later and a backwards breaking new version of TF totally broke my state file and the 3rd party resources I was using, had not caught up to the new version of TF yet. Short of starting all over again, I was dead in the water.
As much as I really liked how quickly I was originally able to get my infra up and running with TF, if I'm ever in the need to have similar functionality again, I'll find something else to use.
It is in v0, breaking changes are expected. If the new version is not compatible, wait until it is and then upgrade. It is also completely reasonable to not want to use it until it is stable.
> Terraform 0.14 release. That's our last chance to deprecate features or introduce breaking changes, before 1.0. Will we have 0.15? That's anyone's guess at this point. I honestly don't know.
Calling a software that's been out for 6+ years powering everyone's IaaC from startups to multi-billion companies around the world as "initial development" to fit into SemVer is a stretch for me.
> When it doesn’t, it generally fails in a useful way, and then I can fix it and try again.
This is the "killer feature" for me when comparing Terraform and CloudFormation. I know they've been working on making CF better, but the way Terraform handles errors (and allows you to pick up from where you left off, instead of waiting around for stuff to roll back) is a lot better suited to tight experimentation and feedback loops.
I think this feature lends itself well to tight feedback loops, as you said. However, after I crossed the painful divide from CloudFormation novice to... well, whatever is after that, CloudFormation rollbacks became a friend instead of an enemy. The ability to return to the last known good checkpoint is a powerful feature that is extremely useful in production systems and the pipelines that deploy them. In those scenarios, you're not experimenting and learning, but enhancing and evolving existing infrastructure.
Some secrets (like IAM access keys) can be encrypted with a public PGP key.
Some other secrets (like RDS instance master passwords) can't be encrypted, but I like to use a trick where you have a local provisioner that runs after the instance gets created, which updates the just created DB and sets the password to a random value and prints it to stdout so you can save it in your secrets management tool of choice. The value saved in the terraform state is no longer valid.
Some other other secrets, like elasticache auth tokens - there's no solution, they'll always be available as plaintext in the statefile.
Overall agreed that Terraform needs to do better here.
On RDS, my typical pattern is to use Secrets Manager to auto rotate the master password immediately. AWS has sample Lambdas for that various databases.
I have a lambda that runs every day to cycle the RDS master password. I create the password using the random provider, and save it to a secrets manager secret - the first time the environment is created, the secret is in plaintext in the state, but it will not be valid the next time the lambda runs (less than 24 hours).
You can do the same with ElastiCache auth tokens, and ensure your application reads the token from a value in secrets manager.
No, once you've logged in with MySQL, changing the password doesn't close the connection.
For rotating application passwords we use the same technique but we update the usernames, i.e. app_1@'%' becomes app_2@'%', and then rotates back to app_1@'%' to prevent issues with unsynced config files.
I am using Terraform daily, and it's excruciating at times. First, HCL is very limited - HashiCorp didn't even use it for Sentinel! I won't switch to TypeScript either - I wish they've adopted an embedded language like Starlark like in this POC [0].
There are many inconsistencies; refactoring is very, very painful. While `terraform import` and `terraform state mv` work, it's all manual, and the first doesn't work with their SaaS - Terraform Cloud as imports, unlike runs, are always local, but sensitive variables are only available in the cloud. As pointed out, secrets are in plain text - GCP state even had my personal access token, which it's not supposed to as it's not something that has to be shared with the team.
The good thing is that the velocity is greatly improving - we had v0.13 this year, and v0.14 is almost out - many features of v0.15 are done, so that pace compared to v0.12 is new!
My biggest issue is how much copypasta you need for just reusing modules! There's no clean way to do Dependency Inject, and you have to replicate code all over the place as you can't "export" resources or data sources - you can only export primitive values, and then you need to have a data source, which uses that primary key of the resource.
Terraform could turn into a huge, missed opportunity. I hope they realized the potential and that they need to execute fast.
Also, it's very maddening that their prioritization is based on the number of thumbs-ups!
Last but not least - many key providers are run by people who don't follow standards and are working part-time on key projects instead of HashiCorp making sure there's great DX with major vendors. One such example is the GitHub provider. A huge fiasco recently had a backward-incompatible minor release, which also didn't update the docs. After three weeks, the maintainer was still considering unreleasing when people already have found workarounds and reverse-engineered the issues and the missing documentation. Same provider still does not provide organization-level secrets even though this features has been available for more than 6 months!
"Resource and module object values: An entire resource or module can now be treated as an object value within expressions, including passing them through input variables and output values to other modules, using an attribute-less reference syntax, like `aws_instance.foo`"
There are definitely some oddities in HCL. Regarding the dynamic blocks, it gets weirder. If you have a dynamic block inside of a resource that is using for_each, you can still reference the 'each' value alongside the value that the 'dynamic' block is looking at.
The original for_each is referencing its object as 'each', while the for_each inside the dynamic is referencing the iteration as 'foo'. Why not just name the for_each inside the dynamic block as something else?
With that said, I absolutely love Terraform, and it is IMO the most powerful tool I've had the pleasure to work with. I also am a fan of DSLs if they are thoughtfully done. I think with some improvements, like some of the ones the author suggests, HCL could be a great language to work with.
For me the biggest disappointment is it won't import an existing aws stack into config files. The idea of declarative infrastructure seems amazing, but this implementation hasn't overwhelmed me.
A very good summary, this resonates well with my views. Terraform survived because devops folks mostly come from non- programming background and have low expectations from a DSL. When developers look at TF, there are obvious things that glare
I would add to the list the fact that for_each cannot be used with pre-computed values.
So if you have a module/resource that outputs a list of resources, you can't just plug it into another resource and use for_each to iterate on.
I also think that using multiple state files is the only way to keep the config from being too fragile. Which is another point in which Terraform provides very little help and why Terragrunt is a popular combo.
Terraform certainly has its fair share of quirks, but that's no different from any other language.
However, I think it's fair to say that the infrastructure as code ecosystem is still much younger and therefore less mature. And there are various things that are standard in application development toolchains that don't exist for IaC yet.
Take my focus, use-case specific frameworks for example. If I'm building a web application, I don't write my own request routing or authentication. But for Terraform, the majority of teams have to start from scratch, even for common use-cases. Yes, there are re-usable modules, but they're comparable to libraries and integrating various modules is still a lot of effort.
If my use-case is building a Jamstack website, doing so using Gatsby gets me started faster, gives me a modern developer experience and means I can re-use community tested and maintained components to reduce the bespoke code I have to write and maintain.
I'm trying to do the exact same thing for Iac with Kubestack.
Kubestack is an open-source Terraform framework for managed Kubernetes. If your use-case is provisioning and maintaining EKS, AKS or GKE using Terraform, Kubestack may be worth trying. In my obviously creator-biased opinion.
It helps you with the typical framework like workflow to get started faster and scaffold a repository with one command, then bring up a local development environment with the next.
Yes, the long feedback cycles of IaC can be annoying. That's why I'm trying to improve the developer experience by providing an auto updating local development environment.
For all modules (EKS, AKS, GKE) I maintain as part of the Kubestack framework, I also maintain a local variant. These accept the same input variables, but instead of provisioning cloud resources, provisions "mock" clusters locally using Docker containers as the cluster nodes.
The kbst CLI watches for changes in the repository, and then runs Terraform locally and dynamically replaces the module source of the real cluster module with the local variant. Here's a video showing that in action: https://youtu.be/_VtakP6AdCs
Similarly, I provide a Docker image for each framework release to provide a tested combination of versions of Terraform, it's providers and the cloud CLI (aws, gcloud, az). The images are used to bootstrap, for CI/CD runs and for the occasionally required manual tasks (state mv, etc.) or disaster recovery.
Just to name two examples from the discussion here where the developer experience of Terraform lacks behind the equivalent application development tooling.
Many people underestimate IaC, probably because of all the 5 minute Terraform tutorials and demos out there. But what these miss is that the real work only starts when you have to get your automation ready for day-2 operations.
This is where I'm trying to provide a better developer experience through re-usable use-case specific modules, inheritance based environment configuration to avoid drift, and integration into a convenient but reliable GitOps workflows for teams from local development all the way to production.
Kubestack's code is open-source for two years in December. But I only got around writing documentation after leaving the DevOps consulting job that inspired the framework.
Anyone interested can give it a try: https://www.kubestack.com/ The site has links to the Slack channel (#kubestack on the Kubernetes Slack) and also the source on Github.
I find Terraform for Jetbrains IDEs to work hundred times better than the VSCode one. I really can recommend it, does magic to autocompletion, peek definition, snippets insertion and linting TF/HCL files.
Author of this blogpost have not mentioned Terragrunt [0], while I think it is worth mentioning. It is nice tool, especially if you work on bigger projects having multiple modules and per-environment variables.
Also, a tip from a person, who works in team using terraform - use brew/apt repositories to keep the binaries up-to-date or at least on the same version as your team mates. I remember at least few situations, when patch update of a terraform binary was crucial to make some issues disappear.
[0]: https://terragrunt.gruntwork.io/
Psst... I am also using Terraform to bootstrap Pop_OS! with my dotfiles and it is surprisingly nice enough to mimic declarative and atomic NixOS configuration :)