Not Classic RAG: Building a Structured-Retrieval Discovery Agent with LangGraph

I just added a new feature to Kino, my educational movie-discovery project built with LangGraph: a prompt-driven discovery flow that finds grounded titles from a local catalog.

The easy label would be RAG.

More precisely, it is not classic RAG.

What I built is closer to a structured-retrieval agent: the model interprets the request, a narrow service returns structured facts, and deterministic code enforces the final result.

That distinction sounds academic at first.

In practice, it changed almost every implementation choice.

What I actually built

Kino is an educational project, but this feature forced a very real architecture decision.

I wanted a user to be able to type something like Discover comedy movies from 2010 onward from Kino's catalog. and get back grounded titles from the project's own data.

The final flow is intentionally small:

an LLM interprets the user's request
a single search_titles tool turns that into a structured catalog query
the project's data service returns matching titles
deterministic middleware validates the result and emits the final payload

That final payload is also deliberately narrow: titles plus notes.

I removed extra narrative fields because the system is not trying to be a movie critic. It is trying to be a grounded discovery interface.

Why this is not classic RAG

A lot of AI features get called RAG by default.

That label is too broad here.

Classic RAG usually means retrieving unstructured text, such as document chunks, and injecting that material back into the model context so the model can answer from it.

The project is doing something different.

The tool is not a web search. It is not a vector search over article chunks. It is not a document retriever.

It is a domain-specific structured retrieval API.

The model is translating natural language into structured query arguments like:

genres
title_type
is_adult
min_year
max_year

Then the catalog service returns structured records.

That puts the feature much closer to a structured-retrieval agent than classic document RAG.

If you want the research lineage, the loop is also closer to ReAct than to a standard document-grounded RAG pipeline.

The known industry pattern

The most useful taxonomy I found for explaining this is Anthropic's framing of workflows and agents, along with the corresponding LangGraph workflows and agents documentation.

Under that framing, this feature sits in the middle.

It is not a pure workflow, because the model is doing real interpretation work.

It is not a free-form autonomous agent either, because the tool surface is tiny and the final output is tightly controlled.

The best plain-English description is:

a workflow-agent hybrid over a structured retrieval backend

Or, shorter:

a structured-retrieval discovery agent

That phrasing matters because it tells people what the model is actually allowed to do.

Why the distinction changed the implementation

Once I stopped pretending this was generic RAG or open-ended recommendation, a few design choices became obvious.

I renamed `recommend` to `discover`

That was not branding fluff.

Recommend suggests some kind of ranking intelligence or taste layer.

At the moment, the project does not have ratings, popularity signals, user history, or editorial scoring. So calling it a recommendation engine would be overclaiming.

Discover is much more honest.

It says: the system helps the user find grounded candidates from the catalog.

I let the model interpret language, but not enforce truth

This was one of the most important separations in the whole feature.

The LLM is good at understanding phrases like:

from 2010 onward
between 1990 and 2000
comedy movies
non-adult

So the model should do that linguistic work.

But the system should still enforce the structured constraints.

That is why I moved toward explicit query arguments like min_year and max_year, and why the backend contract now uses clearer minYear and maxYear semantics instead of leaking database-style operator names into the user-facing architecture.

In other words:

the model interprets intent
the retrieval service applies structured filters
the middleware validates the grounded result

That split ended up being much cleaner than trying to re-parse English later in the pipeline.

I simplified the final response on purpose

At one point, the response included extra explanation fields like reason and tradeoff.

In theory, that sounds useful.

In practice, for a narrow structured-retrieval flow, those fields were mostly filler.

They pushed me toward brittle heuristics or shallow generated text without adding much product value.

So I simplified the contract.

Now the system returns only the grounded titles and any relevant notes.

That made the feature easier to trust and easier to inspect.

LangSmith Studio made the architecture visible

One nice side effect of building this in LangGraph is that the graph is inspectable in LangSmith Studio.

That matters more than it sounds.

When you can literally see the flow as:

__start__
model
tools
model
after_agent
__end__

it becomes much easier to explain what the system is and is not doing.

It also made debugging much more concrete.

I could see when the model produced the right year bounds, when the tool call returned old titles, when a provider was failing upstream, and when a stale local server was being mistaken for the deployed agent.

For an educational project, that visibility is a real advantage.

The architecture is not hidden behind marketing language.

Reality Check

This feature is useful, but it is still intentionally narrow.

It is not:

a full recommendation engine
a multi-agent system
a web-search agent
a classic RAG stack

And some user preferences still cannot be enforced unless the catalog actually has the right metadata.

For example, general audience sounds natural in a prompt, but it is only enforceable if the underlying data exposes a reliable signal for that concept.

There is also a more boring operational truth: provider reliability still matters.

Even when the architecture is clean, upstream model providers can still return transient failures, timeouts, or internal errors.

That is exactly why narrowing the tool surface and keeping the final response contract deterministic was worth it.

The Takeaway

The practical lesson from this feature is simple:

If your domain data is structured, do not force a classic RAG story onto it.

Sometimes the better answer is a small agent that interprets the user's language, calls a narrow retrieval API, and hands the final truth back to deterministic code.

That is what I ended up building in this project.

Not a generic AI search box.

Not a recommendation engine.

A structured-retrieval discovery agent.

And once I named it correctly, the implementation got better.

Search This Blog

Eido Askayo