Engineering

Four ways to optimize your AI feature post launch

Shipping a V1 of your AI feature is just the first step. This is an article for software engineers and product managers looking to improve the speed, accuracy, and cost of their AI features.

Ship, analyze, test, train—repeat

Like any release, shipping an AI feature to production is only the starting point. Unlike the rest of your tech stack, LLMs introduce a non-deterministic element to your product. To maintain consistency, you need a data-driven evaluation system in place.

Once your feature is being used by real customers, you'll continually iterate to improve response accuracy, speed, and cost over time. Below we share examples of how to analyze AI features, make pre-trained models more specific, and leverage your data effectively.

Analyze your usage, performance, and costs

There’s a marginal cost with each call you make to OpenAI and other paid models. Beyond cost, different models and treatments impact the output of your AI feature. The requests and responses you send and receive are part of your tech stack and can be queried just like any other product data you collect.

Example: You want to understand how much you paid OpenAI for your “summarize ratings” feature, and how satisfied customers are with the quality and accuracy of the summarization. You can run analysis on your logs to break down granular costs, usage, and feedback.

Your LLM logs store rich data. Query the model, endpoint, user, function, features, and any other parameters. Analyze your features post launch to make better product decisions—and most importantly, improve the end user experience.

Once you understand the marginal cost and satisfaction with this feature, you can take steps to improve it moving forward.

Evaluate prompt engineering effectiveness

Prompt engineering and retrieval-augmented generation (RAG) are cost-effective techniques to produce focused results from pre-trained LLMs. To optimize a feature, you'll need to test variations and evaluate how they perform in production.

Example: You want to understand the effectiveness of your context. You pass several prompts and real-time documents to OpenAI, but it's unclear how effective they are.

After testing locally, you ship to production with a first implementation of prompts and RAG. The following week, you notice engagement drops off after the first interaction with this feature. You analyze a sample of responses and implement changes to test.

Now you can test the effectiveness between prompts, documents, data, and other parameters you're passing to OpenAI. You can iterate each day until you've achieved a good baseline, and eventually run formal evaluations to optimize this feature.

Fine-tune a model for your use case

Once you’ve validated your AI feature with techniques like prompt engineering, you may consider fine-tuning a model that can reason about your specific use case. Fine tuning can lead to lower costs, more accurate results, and faster results. This process involves training a model on a labeled dataset that's specific to your system.

Example: You want to fine-tune an OpenAI model so it can intelligently reason about your specific use case. You'll still use some prompt engineering, but you want to leverage a more cost-effective, resilient, and accurate model.

To fine tune a model, you need a dataset of relevant inputs and outputs. The easiest way to build up your own data set is to store requests and responses over time, even when using a general model.

After fine tuning, your model can reason on its own about the use case you're applying it to. If done effectively, fine tuning results in more accuracy, speed, and cost savings. Note, the fine-tuning process is time and resource intensive to do right. It only makes sense to invest in this process if you're dissatisfied with the other techniques at your disposal.

Forward requests to any platform

Your request logs can be used with a variety of platforms and frameworks. Once warehoused, you can forward your requests to any platform your team wants to use.

Example: You want to give non-technical stakeholders access to your OpenAI logs in an analytics dashboard. You forward your logs to an observability platform so everyone can view responses, identify problems, and define solutions.

Once your feature is past the MVP stage, you'll want to optimize this feature just like any other part of your product. Warehousing your LLM requests and responses unlocks optionality to leverage any platform your team wants to use.

Optimize your AI features

In summary, we covered four best practices for evolving your AI features beyond MVP.

  • Analyze usage, performance, and costs of your AI
  • Evaluate effectiveness of prompt engineering
  • Prepare your data for fine-tuning models
  • Forward logs to any platform

We'd love to hear from you. What are you building, what's your AI tech stack, and what tactics have been most effective? Reach us at team@usevelvet.com.

AI gateway

Analyze and optimize your AI features

Free up to 10k requests per month.

2 lines of code to get started.

Try Velvet for free

More articles

Product
Query logs with Velvet's text-to-SQL editor

Use our data copilot to query your AI request logs with SQL.

Product
Warehouse every request from OpenAI

Use Velvet to observe, analyze, and optimize your AI features.

Product
Warehouse every request from Anthropic

Use Velvet to observe, analyze, and optimize your AI features.