How we use OpenAI to automate our data copilot

Velvet’s first feature was a data copilot. Our AI-powered editor turns natural language queries into SQL, returning real-time database results. In this article, we share lessons learned optimizing our usage of OpenAI.

Using LLMs to automate workflows

We needed a way for customers to easily query their production data. So we set out build a best-in-class AI SQL editor using OpenAI to automate SQL.

Requirements: Customers can connect a database, ask natural language questions to automate SQL, and return real-time data in milliseconds.

Why? A SQL query that might take a human hours to construct should take only a few seconds using our data copilot. This not only has a 10x effect in engineering workflows, but also empowers any stakeholder to query production data.

Building an MVP with OpenAI

Connecting to OpenAI was relatively easy. We started with basic prompting to instruct OpenAI, something like “you are a data engineer and you only write SQL”. We shipped an MVP in a few weeks that could capture any event-based data source. The AI copilot could write contextual SQL and output live data results.

This first version ingested webhooks from tools like Stripe or Webflow. These tools throw off thousands of events, so it was an easy way to capture and unify customer data. Webhooks are often unstructured and incomplete, which caused challenges in writing relevant SQL.

  • Problem: The editor often returned errors, irrelevant data, or nothing at all.
  • Solution: Capture complete and structured data, and pass through more context.

Experiments to improve the UX and LLM responses

There were some obvious customer bottlenecks we knew we had to fix. Connecting data was onerous and incomplete, the editor only surfaced one SQL statement at a time, and our context generation was rudimentary. We set out to solve these problems first.

App improvements: Customers were frustrated when they hit an error and couldn't find a path forward. We updated the UI so users could iteratively build SQL on queries without losing context, and added self-healing so errors could resolved themselves. And most importantly, we added direct database connections instead of event-based data sources.

AI improvements: We started to see patterns of when OpenAI had trouble processing user requests. After a variety of experiments, we found that reducing temperature, adding additional prompts, and improving the database schema context allowed the model to return more predictable SQL.

Watch a demo of the query editor here.

This version of the product felt like a 100x improvement. It was faster and more intuitive than writing SQL by hand or using ChatGPT. We loved using it internally and quickly onboarded beta customers.

Using data to optimize the editor experience

We had moved past the MVP validation stage, and were ready to optimize. But it was hard to tell what was working well, and where the problems lived. When a customer reported a problem, was it because of their database connection, an error in our application layer, the way we managed context, or limitations to the model itself?

A positive or negative message from a user was largely dependent on the LLM response, but we had no quantitative insight into response performance.

There were two things we couldn’t observe or control; the customer’s database and the OpenAI model. We could at least trouble shoot database connection issues with our customers. But without raw LLM logs in a queryable database, it felt like we were crossing our fingers and hoping for the best from the OpenAI gods.

  • Problem: Limited observability, no way to query our full tech stack.
  • Solution: Warehouse our LLM request and response logs.

Unlocking control and optionality

We started warehousing our LLM requests to analyze what was happening. Granular access to our OpenAI logs created a sense of familiarity and control. We could utilize their models as part of our tech stack, instead of as a black box.

  • Analysis: We can granularly query logs to understand costs, usage, and feedback across various customers, features, models, endpoints, and parameters.
  • Experimentation: We can test the impact of prompts, function calls, and RAG. And apply different tactics, endpoints, or models to various use cases within our product.
  • Fine-tuning: Eventually, we’ll have the data and resourcing to fine-tune our own models. This gives us an eventual path to reduce costs and increase accuracy at scale.

No one can predict the future, especially when it comes to AI. We felt it was an unnecessary risk to be dependent on one company, one model, or one data platform. With our OpenAI logs warehoused in our own database, we have technical optionality moving forward.

Use a transparent proxy for your data

Velvet is an AI-first data pipeline. We help you warehouse your LLM requests to your database. It’s free to get started, and you have full ownership of your data.

ai-first data PIPELINE

Warehouse LLM requests, optimize AI features.

Try Velvet for free

More articles

Why Find AI logs OpenAI requests with Velvet

AI-powered B2B search engine logged 1,500 requests per second.

How we use OpenAI to automate our data copilot

Lessons learned using LLMs to automate data workflows.

Four ways to optimize your AI feature post launch

Tactics to analyze, test, and improve your AI-features.