Never built an AI feature? This week’s article is for you. We’ll walk through the evolution of building an AI-powered book recommendation app. Read about 6 experiments for developing a valuable AI feature.
You're an app developer tasked with helping users discover new books to read. Your app has a large, constantly evolving catalog of books. Because of the volume of books available, your end users struggle to find books that are relevant to them.
Your feature needs to take the user's search criteria, past reads, and interests into consideration. You decide to use OpenAI to return relevant results.
You make an MVP chat interface that accepts user inputs, sends them to OpenAI, and returns answers. Without additional prompting work, your feature acts like a wrapper for OpenAI. There will be no personality, content, or capability unique to your product or end users.
Next, you layer on prompts to instruct the LLM how to act. While this is often called prompt engineering, it doesn’t require much coding expertise. A "prompt" is a text-based note passed to the LLM. This extra context instructs the model to behave in a particular way. A prompt changes the results you'll get from this specific call to OpenAI, but it won’t train the model’s general knowledge for every future call (more on that later).
An example prompt might be: “ You're a librarian specialized in fictional literature. People come to you when they don't know what to read next. You take into consideration what that person has read in the past to inform your recommendations and suggest four books at a time. You can ask additional questions to produce better recommendations.”
Now your chat feature talks more like a librarian, it returns four results at a time, and it focuses on fictional literature from the beginning.
While text-based chat is functional, you start to see a decline in engagement after someone tries the feature once. Plus, you notice that most users are typing in just a few key words, something like "beach reads based in italy with a female protagonist".
You try implementing a more focused book-buying experience that includes a mix of prepared recommendations and chat.
Now your app starts feeling more familiar to users. It's not so obviously "AI", but it recommends relevant books quickly, and keeps the user focused on the task at hand.
You realize your book recommendation feature isn't effectively converting to book rentals, and each query to OpenAI is expensive and time consuming. You want the LLM to focus on your app's thematic and available catalogs, rather than all books on the internet. Your library catalog is constantly changing, so the data needs to be updated frequently.
You implement retrieval-augmented generation (RAG) to address these problems. Now the LLM narrows its scope to a particular database of books - improving the relevancy of results. Adding RAG impacts the results of each call, but doesn't train the model to reason differently.
At this point, your recommendation feature is working - you’ve found product-market fit with millions of end users.
You notice some people use the search bar to find a book they've already read. In this case, they aren't looking for different (more creative) results, they simply want to find the exact book or recommendation list they found previously.
You add caching so you can surface past search results without sending new context to OpenAI every time. This makes your repeat searches faster and cheaper.
You built a useful feature using one of OpenAI's pre-trained models. Experimenting with prompt engineering, RAG, UI, and caching helped you validate this feature at a reasonable cost. Now you have broad adoption, and you're ready to optimize further.
You decide to invest in fine-tuning a model to further improve costs, latency, and accuracy. With fine-tuning, you’re changing the reasoning capabilities of the model itself, not just the results of each call.
Note: The fine tuning process is expensive and can result in catastrophic failures, so it’s best to depend on other tactics until you identify a problem worth the investment.
We discussed 6 experiments of how to test and optimize a new AI feature:
LLM requests, responses, and parameters can be used to analyze, optimize, and fine-tune your AI features. Velvet makes it easy to warehouse every request from OpenAI.
Email team@usevelvet.com, or schedule a call to get started.
Use our data copilot to query your AI request logs with SQL.
Use Velvet to observe, analyze, and optimize your AI features.
Use Velvet to observe, analyze, and optimize your AI features.