AI for DevOps Engineers - Part 2: Building AI Applications with LangChain

Paul Strebenitzer | 08.01.2025 Artificial Intelligence, DevOps

In the first part of our blog series, we explored the different challenges of DevOps and how AI can address them. We also introduced the fundamentals of Generative AI and Large Language Models (LLMs) as transformative tools for DevOps engineers. In this second part, we’ll dive deeper into building our own AI applications using LangChain, a powerful framework designed to simplify and enhance the development of LLM-powered solutions.

Why Use Frameworks for AI Applications?

Building AI applications from scratch can be complex and time-consuming. LLM Frameworks provide tools and interfaces that simplify the integration of Large Language Models into applications:

Reduce Boilerplate Code: Frameworks can handle repetitive tasks like API calls, error handling, and data processing, which enables developers to focus on core logic.
Simplify Development: Abstracted tools and interfaces make it easier to experiment, iterate, and build applications quickly.
Improve Maintainability: Structured approaches to prompt engineering and model architecture make codebases easier to manage and scale.
Enable Advanced Workflows: Support for techniques like few-shot learning, Retrieval-Augmented Generation (RAG), and chaining LLM interactions minimizes custom coding.
Ecosystem Integration: Simple connections to databases, vector stores, and other tools enable quick end-to-end solutions.

Introducing LangChain

LangChain

LangChain is an open-source framework designed to simplify working with LLMs. It stands out for its ability to handle agentic task workflows, which allows for autonomous task execution by chaining multiple LLM interactions. This makes it ideal for creating applications with complex workflows, such as:

Retrieval-Augmented Generation (RAG)
Multi-step reasoning
Integration with external tools and APIs

Key Components of LangChain

LangChain’s modular design features several key components that work together to build cool AI applications:

Document Loaders: With loaders we can load and process data from different sources like PDFs, websites, or databases, preparing it for use.
Vector Stores: Embeddings are important for processing and comparing text data. Vector stores help manage and query these embeddings efficiently.
Prompts: Manage and structure instructions or queries to ensure accurate and relevant model responses.
LLMs: Of course, we can use any large language model (e.g., GPT-4, LLaMA), whether cloud-based or local.
Chains: Link multiple LLM interactions to create complex workflows (e.g., retrieval, summarization, and Q&A).
Agents: Enable autonomous task completion by making decisions, interacting with tools, and adapting behavior - all autonomously.

Building AI Applications with LangChain

Let’s now explore how LangChain can be used to build AI applications through some practical examples. Note that we're using the programming language Python as it is widely used in AI development and integrates well with LangChain.

Example 1: Basic Chat Application

We start off easy with a simple chat application.

 1from langchain.chains import ConversationChain
 2from langchain.memory import ConversationBufferMemory
 3from langchain_community.llms import Ollama
 4
 5# Initialize the LLM
 6llm = Ollama(model="llama2")
 7
 8# Set up memory to store conversation history
 9memory = ConversationBufferMemory()
10
11# Create a conversation chain
12conversation = ConversationChain(llm=llm, memory=memory)
13
14# Chat loop
15while True:
16    user_input = input("Input: ")
17    response = conversation.run(user_input)
18    print(f"AI: {response}")

Let’s break down the code:

First of all, the necessary LangChain components have to be imported.
Then, we initialize an LLM (local Ollama model) with the desired model and settings.
A memory buffer is set up to store the conversation history so that the AI can refer back to previous interactions.
Also, a conversation chain is created to manage the interaction between the user and the AI.
Finally, a chat loop is established where the user can input messages, and the AI responds accordingly.

See how LangChain simplifies the process of setting up a complete chat application with an LLM! Of course, this example is very basic as we're only chatting in the console and our model is a simple chatbot. But it shows how easy it is to get started with LangChain.

Example 2: Retrieval-Augmented Generation (RAG)

If we remember from the first part of this series, Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with language generation. LangChain can be used to implement RAG workflows:

 1from langchain.document_loaders import TextLoader
 2from langchain.vectorstores import Chroma
 3from langchain.embeddings import OpenAIEmbeddings
 4from langchain.chains import RetrievalQA
 5from langchain.chat_models import ChatOpenAI
 6
 7# Load documents
 8loader = TextLoader("example.txt")
 9documents = loader.load()
10
11# Create embeddings and vector store
12embeddings = OpenAIEmbeddings()
13vector_store = Chroma.from_documents(documents, embeddings)
14
15# Set up the retriever
16retriever = vector_store.as_retriever()
17
18# Initialize the LLM
19llm = ChatOpenAI(model="gpt-3.5-turbo")
20
21# Create a RetrievalQA chain
22qa_chain = RetrievalQA(llm=llm, retriever=retriever)
23
24# Ask a question
25query = "What is the main topic of the document?"
26response = qa_chain.run(query)
27print(f"Answer: {response}")

We can see that there is a bit more going on in this example:

Note that we need to import more components like document loaders, vector stores, and embeddings to handle the retrieval process.
We load a document from a file and create embeddings for it using OpenAI's embeddings. LangChain supports various retrieval sources, including files (e.g., PDFs, CSVs), web content, and cloud providers. Embeddings are a way to represent text data as numerical vectors. These vectors capture the semantic meaning of the text, allowing for more effective processing and comparison by the model.
Then, we create a vector store to manage all the embeddings and set up a retriever to retrieve relevant information based on the user query.
Finally, we initialize an LLM and create a RetrievalQA chain to handle the interaction between the user query and the retrieved information.

So we see, the rough structure of the code is similar to the chat application, but we have added more components to handle the retrieval process.

Note that when working with cloud-based models like GPT-3, we need to have an API key to access the model! You can get an API key from the respective provider.

Example 3: Chatting with PDF Content

Using LangChain and Streamlit, we can build interactive applications like chatting with PDF content.

Streamlit

Streamlit is an open-source Python framework designed for building interactive web applications quickly and easily. It is popular among data scientists and machine learning engineers because it allows us to create user-friendly interfaces for models and data workflows with minimal effort.

Simplicity: We can build web apps with just a few lines of Python code.
Real-time Interactivity: Supports widgets like sliders, text inputs, and file uploaders for dynamic user interactions.
Rapid Prototyping: We can see changes instantly as we modify the code.
Integration with AI and ML Tools: Streamlit works seamlessly with popular libraries like LangChain, TensorFlow, and PyTorch.

 1import streamlit as st
 2from langchain.document_loaders import PyPDFLoader
 3from langchain.vectorstores import Chroma
 4from langchain.embeddings import OpenAIEmbeddings
 5from langchain.chains import ConversationalRetrievalChain
 6from langchain.chat_models import ChatOpenAI
 7
 8# Streamlit UI
 9st.title("Chat with Your PDF")
10uploaded_file = st.file_uploader("Upload a PDF", type="pdf")
11
12if uploaded_file:
13    # Load PDF content
14    loader = PyPDFLoader(uploaded_file)
15    documents = loader.load()
16
17    # Create embeddings and vector store
18    embeddings = OpenAIEmbeddings()
19    vector_store = Chroma.from_documents(documents, embeddings)
20
21    # Set up conversational retrieval chain
22    retriever = vector_store.as_retriever()
23    llm = ChatOpenAI(model="gpt-3.5-turbo")
24    conversation_chain = ConversationalRetrievalChain(llm=llm, retriever=retriever)
25
26    # Chat interface
27    if "chat_history" not in st.session_state:
28        st.session_state.chat_history = []
29
30    user_input = st.text_input("Ask a question about the PDF:")
31    if user_input:
32        response = conversation_chain.run(user_input)
33        st.session_state.chat_history.append((user_input, response))
34
35    # Display chat history
36    for user_msg, ai_msg in st.session_state.chat_history:
37        st.write(f"You: {user_msg}")
38        st.write(f"AI: {ai_msg}")

With this simple web-based Spleamlit application, users can upload a PDF, ask questions about its content, and receive context-aware responses. The LangChain components handle the PDF processing, text extraction, and LLM interactions, while Streamlit provides the user-friendly interface. The structure is again similar to the previous examples, but now we have integrated it with Streamlit to create a more interactive experience.

Our application starts with a Streamlit UI that allows users to upload a PDF file.
If a file is uploaded, we load the PDF content using a PyPDFLoader.
We create embeddings and a vector store to manage the text data from the PDF.
A conversational retrieval chain is set up to handle the interaction between the user queries and the PDF content.
The user can input questions about the PDF, and the AI responds accordingly. The chat history is stored in the session state and displayed to the user.

Ensuring Quality in AI Applications

Building AI apps is one thing, but ensuring their quality is another. AI can be unpredictable, so we have to ensure that the application is reliable, accurate, and performs as expected in real-world scenarios. For applications powered by Large Language Models (LLMs), testing and evaluation are critical to maintaining performance or identifying issues.

In this section, we’ll explore the importance of testing in LLM applications, different testing approaches, and how tools like LangSmith can also help to simplify this process.

Why Testing is Crucial in LLM Applications

Testing LLM applications aims to ensure:

Reliability: The application consistently performs as expected, even as models or prompts are updated.
Accuracy: The outputs generated by the model align with user expectations and application goals.
Performance: The application can handle real-world scenarios, including edge cases, without breaking or producing irrelevant results.
User Satisfaction: By validating outputs, testing ensures that the application meets the needs of its users.

LLMs are probabilistic models, meaning their outputs can vary depending on the input, prompt, or context. This variability makes testing even more critical to ensure consistent and high-quality results.

Testing in LLM Applications: Approaches and Examples

Testing LLM applications incudes several approaches, each tailored to specific use cases. Next, we’ll explore some common testing methods and examples.

Defining Evaluation Metrics

Evaluation metrics are important for assessing the performance of LLM applications. These metrics should align with the goals of the application or the company’s objectives. For example:

Accuracy: How well the model’s output matches the expected result.
Relevance: Whether the output is contextually appropriate.
Creativity: For tasks like story generation, how original and engaging the output is.
Coherence: Whether the output is logically consistent and well-structured.

Test Cases with Expected Outputs

One of the simplest ways to test an LLM application is by creating test cases with clear inputs and expected outputs. This approach is ideal for tasks with deterministic answers, such as question-answering or translation.

 1test_cases = [
 2    {"input": "Hello, how are you?", "expected_output": "Bonjour, comment ça va?"},
 3    {"input": "Goodbye", "expected_output": "Au revoir"},
 4]
 5
 6for test in test_cases:
 7    result = chain.run(text=test["input"])
 8    print(f"Input: {test['input']}")
 9    print(f"Expected: {test['expected_output']}")
10    print(f"Actual: {result.strip()}")
11    print(f"Pass: {result.strip() == test['expected_output']}\n")

Using LLMs as Judges

For tasks with subjective or complex outputs (e.g., story generation, summarization), exact matches between expected and actual outputs may not be feasible. In such cases, an LLM can act as a "judge" to evaluate the quality of its own or another model’s outputs.

 1# Define the prompt for generating a story
 2story_prompt_template = PromptTemplate(
 3    input_variables=["theme"],
 4    template="Write a short story based on the following theme: {theme}."
 5)
 6
 7# Define the prompt for evaluating the story
 8evaluation_prompt_template = PromptTemplate(
 9    input_variables=["story"],
10    template=(
11        "You are a judge evaluating short stories. Please rate the following story on a scale from 1 to 10 "
12        "for creativity, coherence, and grammar. Provide a brief explanation for each rating.\n\n"
13        "Story: {story}\n\n"
14        "Creativity: \nCoherence: \nGrammar: \nExplanation:"
15    )
16)
17
18# Initialize the LLMs
19story_generator = OpenAI(model="text-davinci-003")
20story_evaluator = OpenAI(model="text-davinci-003")
21
22# Create the LLMChains
23story_chain = LLMChain(llm=story_generator, prompt=story_prompt_template)
24evaluation_chain = LLMChain(llm=story_evaluator, prompt=evaluation_prompt_template)
25
26# Generate a story based on a theme
27theme = "adventure in a mystical forest"
28generated_story = story_chain.run(theme=theme)
29
30# Evaluate the generated story
31evaluation = evaluation_chain.run(story=generated_story)
32
33# Output the generated story and its evaluation
34print("Generated Story:\n", generated_story)
35print("\nEvaluation:\n", evaluation)

Prompt Refactoring

Prompt refactoring is an iterative process to improve the quality of prompts for better results. This approach includes analyzing the model’s outputs, identifying areas for improvement, and refining the prompt to achieve the desired outcome.

 1# Define the prompt for enhancing user input
 2enhancement_prompt_template = PromptTemplate(
 3    input_variables=["user_input"],
 4    template=(
 5        "You are an expert in creating detailed prompts for language models. "
 6        "Given the user input: '{user_input}', generate a comprehensive prompt that includes "
 7        "specific details, context, and any relevant questions to ensure a thorough response."
 8    )
 9)
10
11# Define the prompt for the actual task
12task_prompt_template = PromptTemplate(
13    input_variables=["enhanced_prompt"],
14    template="{enhanced_prompt}"
15)
16
17# Initialize the LLMs
18enhancer_model = OpenAI(model="text-davinci-003")
19task_model = OpenAI(model="text-davinci-003")
20
21# Create the LLMChains
22enhancement_chain = LLMChain(llm=enhancer_model, prompt=enhancement_prompt_template)
23task_chain = LLMChain(llm=task_model, prompt=task_prompt_template)
24
25# Example user input
26user_input = "Explain the impact of the Industrial Revolution."
27
28# Step 1: Enhance the user input to create a detailed prompt
29enhanced_prompt = enhancement_chain.run(user_input=user_input)
30
31# Step 2: Use the enhanced prompt to perform the actual task
32task_output = task_chain.run(enhanced_prompt=enhanced_prompt)
33
34# Output the enhanced prompt and the task result
35print("Enhanced Prompt:\n", enhanced_prompt)
36print("\nTask Output:\n", task_output)

LangSmith: LangChain's Tool for Testing and Managing LLM Applications

LangSmith is a powerful tool integrated into the LangChain ecosystem that provides features for testing, monitoring, and managing LLM applications.

Tracing: LangSmith allows developers to visualize the execution of LLM chains. This makes it easier to identify bottlenecks, errors, and inefficiencies. It also provides insights into token usage and latency to optimize performance.
Evaluation Datasets: It also enables the creation of datasets for consistent evaluation of LLM applications. These datasets allow developers to systematically test their applications and ensure reliable performance.
Testing: LangSmith supports automated testing of LLM chains to validate outputs and ensure accuracy. It also allows developers to compare performance for different model versions.
Monitoring: The tool tracks production performance metrics such as latency, accuracy, and user interactions. This helps developers to evaluate how their applications perform in real-world scenarios and identify areas for improvement.
Versioning: With LangSmith, developers can also manage different versions of prompts, chains, and models, enabling them to track changes and compare performance. It also supports A/B testing.

Note that LangSmith can be used with or without LangChain.

Conclusion

In this second part of the blog series, we explored how to build AI applications using LangChain, a powerful framework that simplifies the development of LLM-powered solutions. From creating basic chat applications to implementing advanced workflows like Retrieval-Augmented Generation (RAG) and building interactive tools with Streamlit, LangChain provides a modular and flexible approach to working with large language models.

We also emphasized the importance of ensuring quality in AI applications through testing and evaluations. By using tools like LangSmith, developers can streamline testing, monitor performance, and manage versions effectively, ensuring their applications remain reliable, accurate, and adaptable over time.

In the next and last part of this series, we’ll look at the deployment of AI applications and explore operational and security considerations for LLM-powered solutions as well as AI agents. Stay tuned!

If you're eager to learn more, check out our video recordings of the latest AI for DevOps Engineers Workshop on YouTube:

Go Back explore our courses

Is Atlantis a Viable Alternative to HashiCorp Cloud Platform Terraform?

Infrastructure as Code (IaC) has revolutionized the way organizations manage cloud infrastructure, with Terraform leading as a premier tool. HashiCorp Cloud

Martin Buchleitner | 25.06.2025 Terraform, OpenTofu

Terraform BSL Overview: Limits and Opportunities for Users

Understanding Terraform’s New License: What You Can and Cannot Do Under the Business Source License (BSL) HashiCorp’s Terraform has become the go-to

Martin Buchleitner | 18.06.2025 Infrastructure as Code, DevOps, HashiCorp

Automating Environments with Trunk-Based Development

Introduction Trunk-Based Development (TBD) means everyone integrates to a single branch: usually main or trunk. There are almost no long-lived branches or repo

Matthias Theuermann | 16.06.2025 DevOps, Artificial Intelligence

Postman's Model Context Protocol (MCP) Server Integration with GitHub Copilot and CLI

The world of AI is evolving rapidly, and with it, the way we interact with APIs and infrastructure. At Infralovers, we're excited to explore how Postman's new

Martin Buchleitner | 11.06.2025 Infrastructure as Code, DevOps, HashiCorp

Pipeline Automation for Forked Repository Environment Management

Introduction If your team chose the forked repo model for maximum isolation—or due to regulatory/access concerns—your CI/CD strategy has special requirements.

We are here for you

You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.