AI for DevOps Engineers - Part 1: The Building Blocks of DevOps AI
DevOps is a key success factor for modern software development and we have most definitely come across AI in one way or another. The intersection of AI and
A post about DevOps, ChatOps, ChatBots and how we used them to create a fully automated deployment experience for a fortune 500 company.
For two years, we have accompanied one of our customers on the journey to move applications from "classic" infrastructure into the cloud (Amazon Web Services). The goal was not only to reduce cost and to speed up deployments but also to use the tools available in AWS to create an automated deployment pipeline.
Everything should go seamless from committing code, to running test, building artifacts and deployment. We should reduce manual interaction as much as possible to free up precious time for the employees and to also reduce human error.
The applications that we had to move to the cloud are pretty complex. At the time of writing this, we have moved five major enterprise applications in cooperation with the customer's internal DevOps team. More are sure to come.
A big challenge with the internal team, and the separate teams that are attached to each enterprise application, is that they are all remote teams. There is no place where you can go to meet all of them at the same time. They spread out all over Europe and are managed by the internal DevOps team here in Austria.
The first thing we did was conducting interviews with all people involved. We wanted to gain perspective on the needs, wishes and fears that employees had about the move to the cloud. You can imagine that not everyone was a fan right from the start.
Questions we asked looked something like this:
"You can imagine that not everyone was a fan right from the start"
The answers gave us insight into the current process and needs:
Our main takeaway from these interviews was that we wanted to give people the power to deploy as often as they needed to. To reduce the access to tools we wanted to grant no direct access to new tools. All should be managed through a chat tool. We also needed to improve visibility of the work that was being done across the whole company and make information available instantly.
ChatOps is a collaboration model that connects people, tools, processes and automation into a transparent workflow. It's all about conversation and your whole company happening inside a chat tool. It was exactly what we needed to solve most of the customer's communication and permission problems.
"It was exactly what we needed to solve most of the customer's communication and permission problems."
Every employee had already some experience with it. It is no different than sending SMS or WhatsApp messages. In the enterprise, you may not be using WhatsApp or Facebook Messenger, but Skype or Jabber are common in inter-office communication. People are already using chat tools to talk to each other in the office.
But does your IT infrastructure also chat with you?
We are a big fan of the chat tool Slack and are using it in our everyday conversations. We have also integrated IT infrastructure to use it too.
This screenshot shows a typical chat conversation here at Infralovers:
What you can see here is that we have created chat rooms (channels) for each topic or project in our company. E.g. marketing or trainings. To keep our customers safe, I have blacked out the channels that might give away their names.
You can also see my co-worker Theresa make a direct reference to me (@jaybrueder) and giving me some information in the marketing channel.
However, this is not the most exciting thing you can see here. Two external tools (Trello, Mailchimp) are also leaving messages for us. Informing us of new signups to our company newsletter or about progress on some tasks.
The Trello message you can see right at the top is even interactive. You can click buttons to interact with the Trello service right from your chat window.
Human to human interaction is driven by communication. However, human to machine interaction used to be driven by buttons. And it is still the same in our Slack channel. I can interact with the external services, but only by using buttons. A direct message towards the Trello user would not yield any results.
For our customer's project, we wanted to bridge that gap. We wanted the teams to be able to really talk to their IT services and infrastructure.
The above image was created by Nordicapis in their post 12 Frameworks to build chatops bots
Just because we are using Slack does not mean that our customer wants to use it too. You might also want to look at other ChatOps tools. Here is a list of possible candidates:
Keep in mind, that Slack and HipChat are hosted on their creator's servers. Content and traffic is encrypted. Still, never share confidential data through one of these tools.
If you cannot see yourself use a tool that is hosted by someone else, you might like Mattermost. It is basically a self-hosted open source version of Slack.
"Still, never share confidential data through one of these tools"
The good thing about Slack and Mattermost is that external tools often already have integrations with them out-of-the-box. This saves a lot of time!
Now, that we had the ChatOps tool down, we needed a way to make IT infrastructure understand our plain-text demands. This is where ChatBots come in.
A ChatBot is a piece of software that can read your message, understand it and answer. It should mimic a conversation with people using artificial intelligence.
Not only is "ChatBot" the Buzzword of the year, we are convinced that it will play a major role in IT automation.
ChatOps was a big part of solving our customer's problem. We created our own bot that joins the conversation. This bot can interact with the deployment pipeline and AWS services. Employees can interact with the bot through normal chat conversations. However, they are not able to interact with the deployment pipeline or AWS services directly.
We created our own bot based on the Lita bot framework. We extended its functionality heavily by writing custom plugins and extensions by the Ruby programming language that Lita is using. We are big fans of Ruby here at Infralovers and we felt that Lita was the easiest for us to customize.
You might want to checkout a bot frameworks that uses your favorite programming langauge: Hubot, Cog, Errbot
The bot behaves like any other user in Slack. You can send it messages and it will respond. The focus of our custom bot is to trigger deployments with a tool called Jenkins. Feel free to learn more about Jenkins here: [Jenkins link]
I do not want to get into too much into detail. You may want to use a different tool (like TravisCI or CodeShip). For the rest of this post I will refer to it as the "Deploy System". However, there will be a follow up post on this in the future.
The process starts by an employee sending a specific message to the ChatBot. The bot understands the message and triggers a new deployment on the Deploy System. This system knows all the necessary steps to start a new deployment. The bot itself is oblivious to the exact process. It only knows what information to pass on and what system to trigger.
No employee of our customer can access the Deploy System directly. The bot acts as a proxy in between. If there are any problems with the deployment, the Deploy System will tell the bot. The bot will then tell the user. This way you also have all error messages and warnings that might come up inside your chat tool.
To make the power of the bot more controllable, we created two separate bots. One bot we called "Marvin", after the paranoid android from Hitchhiker's Guide to the Galaxy, and the second one is called "Bender", after the robot from the TV series Futurama.
Marvin is responsible for handling deployments in the "staging" environment. Here every developer can deploy a new version and test-drive it. No real customer data is being used and errors are no big deal. You can imagine that these deployments may fail often. Therefore, we choose Marvin as the name for this bot.
However, everything else is the same as when doing a real production deployment. Everyone is permitted to interact with Marvin. This enables all developers to do deployments on their own.
Here is a short example on how such an interaction with Marvin to deploy version 1.0.2 of the Content-Management-System (CMS) might look like:
1@marvin deploy cms staging 1.0.2
Marvin will then answer which success or failure after the deployment is done:
1Content Management System: deployed version 1.0.2 to staging in 6 min, 1 sec
If the staging deployment is all right. We can promote the deployment to production. This is the real deal. This is where the "heavy lifting" starts. These are the actual services that are being used by our customer's customers. Only specific employees are permitted to talk to Bender. The interaction happens the same way as with Marvin, through a simple chat message. Instead of addressing Marvin, we would address Bender instead:
1@bender deploy cms production 1.0.2
2
3Content Management System: deployed version 1.0.2 to production in 5 min, 57 sec
All these permissions can be controlled by our customer. This also happens through bot commands. We can grant and revoke deployment rights on a per user basis. Only admin users are able to grant or revoke permissions.
The following command would allow me to deploy the CMS application with Bender:
1@bender auth jbrueder cms
Additionally, external services might be allowed to post messages into Slack. For example, a build system might post a message that addresses Marvin to trigger the staging deployment. This truly integrates continuous build and continuous deployment. It still all happens inside the Slack chat, but this time no manual interaction was involved at all and everyone can still see what it is happening.
One of the most important things that we achieved here was the empowerment of the developers. They feel way more in control of their work. They can deploy at any time in in couple of minutes and see the result on the same infrastructure that the production deployment would be on.
We also removed unnecessary human interaction from the deployments. All that is needed now is a simple chat message. No more logging into the Deploy System, looking for the deployment you want to trigger, entering version numbers or target environments. It is so much simpler now.
Quality assurance is still a big thing. We have a staging environment that mimics production in every detail. However, it is way easier to do quality assurance and testing now.
"They can deploy at any time in in couple of minutes and see the result on the same infrastructure that the production deployment would be on"
Whenever a deployment in production works, we also send a success message into a management-only channel. This way management sees work happening. They were very impressed when they saw that employees deploy multiple times a day and that deployments now need only mere minutes.
Another benefit is that there is now a common place for information. All deployments that are happening are documented inside the ChatOps tool.
We are very excited about Amazon Alexa, Google Assistant and Apple Siri. It would be awesome to take bot to next level and not only support text interaction but also speech.
We are also working on a prototype where you can use a smartwatch for two-factor authentication when talking to a bot.
Maybe ChatOps and bots is something that you want to implement in your company as well. We would be happy to hear from you and help you implement it.
I am also happy to answer any questions that you might have. Either in the comments, via twitter (@jaybrueder) or email jbrueder@infralovers.com.
You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.
Contact us