Engineering

Why Find AI logs OpenAI requests with Velvet

Find AI is an AI-powered search engine for companies and people. Their app makes millions of requests to OpenAI every week, and warehouses every single request using Velvet. In this post, we'll explore how the Find AI engineering team uses LLM request logs to optimize accuracy, manage costs, and compare models.

Building a next-generation search app for companies and people

Find AI aims to be the source of truth for all people and companies online. Imagine Perplexity, but focused on data you might find on LinkedIn, Pitchbook, and CB Insights. Search a natural language query like "biotech startups focused on sustainability" and it finds a list of accurate results, including a photo, summary, and links to learn more.

Find AI leverages OpenAI to build its knowledge base and answer questions.

Ahead of their public launch, the team sought to store all their requests, responses, and metadata from OpenAI. They wanted to analyze usage and costs, evaluate how different models and prompts performed, and eventually fine-tune some of their own models.

Logging 1,500 requests per second

Find AI had an impressive launch. Thousands of new users visited the site to ask questions, triggering millions of requests to OpenAI. They converted free users to paying subscribers who were tired of antiquated search tools like LinkedIn.

Find AI uses four OpenAI endpoints—chat, batch, moderations, and embeddings. On launch day, their system scaled quickly. At peak times, they were sending 1,500 requests to OpenAI per second.

OpenAI costs quickly became a major concern inside the company, exceeding even their cloud hosting bills. OpenAI's invoices contain minimal information, so the company needed to analyze request logs to decode their OpenAI bill.

Find AI leverages their inputs and outputs to optimize AI features

After launch, Find AI had stored millions of requests in their PostgreSQL database. They set out to analyze and optimize their system.

Find AI's post-launch goals:

  1. Return fast, accurate, and complete results
  2. Measure and minimize marginal costs
  3. Evaluate performance of different LLM models

1. Return fast, accurate, and complete results

Find AI uses OpenAI to power a variety of features including natural language search, data analysis, and text summarization. They wanted to optimize results as much as possible.

Improve search result accuracy: AI apps commonly feature "👍 / 👎" on results to gather user feedback. Find AI collects these user reviews across their app. When a user gives a negative rating, the engineering team handles that like a bug in their prompt.

"We use Velvet to trace back the inaccurate result to the OpenAI LLM request log, tweak the prompt and parameters to get an accurate answer, and then deploy a fix. It's not unlike how we handle errors in our code," reported Find AI CTO Philip Thomas.

Improve summary quality: Find AI provides a summary explaining why each result is a good fit for the user's query. The effectiveness of this feature is less about accuracy and more about the quality of generated text.

When the team makes changes to Prompts, they use Velvet logs to replay past requests with the new prompts, then compare the output head-to-head. Reviewing text is a qualitative process, so sampling from live requests helps the team confidently deploy changes.

Trace errors and evaluate model performance: On launch day, Find AI saw some occasional errors in their logs. With logs warehoused and queryable, they were able to trace events to internal server errors at OpenAI.

Moving forward, the team can granularly query their OpenAI usage and uncover insights. For example - though OpenAI batch calls promised 24 hour response times, most queries completed within three hours in production.

The Find AI engineering team has a robust set of structured inputs and outputs to evaluate prompts, improve quality, and monitor usage over time.

2. Measure and minimize marginal costs

Find AI needs to measure and manage the cost per query, while maintaining accuracy and speed. Using logs, they can run granular cost analysis using Search ID meta tags and other system-specific parameters.

We'll walk through a few example queries from Find AI's launch day.

Average cost per query: Each user search in the Find AI app makes a variety of calls to OpenAI. The team measures average costs and identifies outlier high-cost searches.

Cost per service: Find AI divides its prompts into 'services', which are labeled as parameters on each call to OpenAI. They want to identify the highest cost services and then optimize prompts to reduce spend.

"Shortening prompts can decrease costs dramatically, but it's only worth it on high-frequency services," said Philip.

Cost per model: The team wants to understand the difference between running the same service on different models. How does each model impact speed and cost?

With this data on hand, the team can run experiments to optimize costs. They can modify inputs, implement batching and caching, evaluate different models, and prompt end users to interact with the system differently.

3. Evaluate performance of different LLM models

FindAI wants the flexibility to switch between models and fine-tune their own models.

"We use LLMs to make decisions or generate text. For models used to make decisions, we can replay the Velvet request logs across different models or vendors to evaluate their comparative accuracy," said Philip.

As Find AI's logs increase, they're building a data set they can use for finetuning.

"We can bootstrap a data set from OpenAI, then pull that data from Velvet to fine-tune a foundation model like BERT. These self-hosted models end up having about a $0 marginal cost, which can improve margin a lot," said Philip.

Launch is just the beginning

In the days after launch, the Find AI engineering team credits Velvet as an important part of their launch strategy. Warehousing their OpenAI calls has given them a robust data set for growing and optimizing their AI production infrastructure.

"Velvet enables us to turn our OpenAI calls into valuable data sets. It turns OpenAI from throwaway calls into a cornerstone of a sophisticated AI program with in-house models." - Philip Thomas, Find AI co-founder

Want to try Find AI yourself? Search vetted data on people and tech startups. For example, type “technical founders building AI companies who care about ethical AI”.

Use code “VELVET” for a free month of Find AI Premium. Check it out

AI gateway

Analyze, evaluate, and monitor your AI

Free up to 10k requests per month.

2 lines of code to get started.

Try Velvet for free

More articles

Engineering
Monitor AI features in production

Continuously test AI features in production, set alerts to take action.

Product
Run model experiments on your historical data

Test models, settings, and metrics against historical request logs.

Product
Query logs with Velvet's text-to-SQL editor

Use our data copilot to query your AI request logs with SQL.