Developer machine automation: Dependencies
As I mentioned in a post last week we’ve been automating the setup of our developer machines with puppet over the last week and one thing that we’ve learnt is that you need to be careful about how you define dependencies.
The aim is to get your scripts to the point where the outcome is reasonably deterministic so that we can have confidence they’re going to work the next we run them.
We noticed two ways in which we haven’t quite achieved determinism yet:
Accidental Dependencies
The first few times that we ran the scripts on top of a vanilla image we were doing it on a virtual machine which had VMware tools installed on it.
We’d forgotten that VMware tools had been installed on those VMs and ran into a problem with Oracle dependencies not being satisfied when we ran puppet on some machines which had CentOS installed directly (i.e. not on a virtual machine).
Those dependencies had been satisfied by our VMware tools installation on the VMs so we didn’t realise that we hadn’t explicitly stated those dependencies, something which we have done now.
External Dependencies
We couldn’t find the Firefox version that we wanted install on the default yum repositories so we created a puppet task which linked to a Firefox RPM on an external server and then installed it.
It worked originally but at some stage over the last couple of weeks the URI was changed as a minor version had been upgraded, breaking our script.
We also came across another way that external dependencies can fail today - if a corporate proxy blocks access to the URL!
We’re trying to get to the stage where we’re only relying on artifacts either coming from a yum repository or an internal repository where we can store any libraries which aren’t available through yum.
Don’t assume determinism
While trying to solve these dependency problems in our puppet scripts I made the mistake of assuming that if the script runs through once and works that it’s always going to be that way in the future.
Since we had achieved that previously in my mind it was impossible for it to fail in future which stopped me from properly investigating why it had stopped working.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.