Adding Session Memory Without Building a Preference Engine

I just added another feature to Kino, my educational, cinema-themed distributed systems project. The LangGraph-based agent service is one part of that platform.

This time the feature was memory.

But not the vague, hyped kind.

I did not want Kino to pretend it knows a user's taste forever.

I wanted something much narrower and much more useful:

Short-term conversational memory for follow-up turns that helps the agent continue the current search without pretending to know more than it actually does

That decision mattered a lot.

It kept the feature small enough to trust, but still visible enough to feel like real agent behavior.

What changed

The new capability is simple to describe.

A user can start with:

Discover exactly 3 comedy movies from 2010 onward from Kino's catalog.

Then follow up with:

I didn't like them, please discover different ones.

And Kino can continue from the current thread instead of treating the second turn like a brand-new request.

That means it carries forward the latest search constraints, such as:

Genre
Title type
Year bounds

And when the user explicitly rejects the previous picks, it excludes those exact title IDs from the next search.

The result is not “AI memory” in the broad sense.

It is short-term conversational memory used for follow-up turns.

What this memory actually is

In plain English, the memory works like this:

The user makes an initial grounded discovery request.
The agent turns that into structured search arguments.
The thread keeps the search context.
A follow-up turn can refine that context instead of rebuilding it from scratch.

That is enough to support useful follow-ups like:

I didn't like them, please discover different ones.
Movie only.
Make it older.

The two turns look like this:

Turn 1

flowchart LR
    U1[Turn 1 request] --> L1[LLM]
    L1 --> S1[Catalog search]
    S1 --> R1[Grounded titles]

Turn 2

flowchart LR
    U2[Turn 2 follow-up] --> C[Short-term conversation context]
    R1[Turn 1 titles] --> C
    C --> L2[LLM]
    L2 --> S2[Search + exclusions]
    S2 --> R2[Different grounded titles]

Without turning the system into a long-term preference engine.

That distinction is important.

I did not want the project to jump straight into:

Inferred taste profiles
Cross-session memory
Hidden preference ranking
Generic “the AI remembers you” product language

Kino is still a grounded discovery system that uses a structured catalog.

The memory feature had to match that architecture.

In Kino, that short-term memory stays inside the current thread instead of being saved as long-term preference memory.

The practical part

The best way to explain the feature is to show the actual flow.

Turn 1

The first turn asks for:

Discover exactly 3 comedy movies from 2010 onward from Kino's catalog.

Kino returns grounded titles such as:

The Wandering Soap Opera
A Thin Life
Joe Finds Grace

And the structured response now exposes an activeContext object that shows the effective search context used for that turn.

That context looked like this:

genres=["Comedy"]
titleType="movie"
minYear=2010
excludedTitleIds=[]

Turn 2

Then the user says:

I didn't like them, please discover different ones.

That is where the new behavior becomes useful.

Kino carries forward the current discovery context, increases the search window, and passes exclude_ids with the titles it already showed.

So the second search still uses the same structured constraints, but now it looks for fresh candidates inside the same limited search scope.

The follow-up results included:

Blood Type
Foodfight!
Return to Babylon

That is a much better user experience than forcing the user to restate the entire request every turn.

Why the model choice mattered

This feature also reinforced something practical for me: the quality did not come only from the model.

I tested this flow with Gemini 3.1 Flash Lite, which Google positioned in March 2026 as its fastest and most cost-efficient Gemini 3 series model for high-volume workloads. That fit this feature well.

It gave me a few clear advantages:

Lower cost
Lower latency
Faster iteration while testing prompts, traces, and edge cases

More importantly, it was a useful architectural signal.

If this feature only worked on a much larger model, I would trust the design less. But when a lighter model can handle the follow-up turn, carry forward the right search constraints, and trigger the right tool call, it usually means the system design is doing more of the real work.

Why I kept it narrow

There is a strong temptation to call any multi-turn behavior “memory” and stop there.

I think that is where a lot of agent features become unclear.

Useful memory is not just about remembering more.

It is about remembering the right thing at the right scope.

For Kino, that scope is the current discovery thread.

That is why this feature is:

Thread-scoped
Explicit
Inspectable
Grounded in structured search arguments

And not:

Long-term user profiles
Hidden ranking bias
A broad semantic memory layer

I would rather have a smaller memory feature that is easy to reason about than a bigger one that sounds impressive but is hard to trust.

What had to be true for this to be worth keeping

The implementation only became useful after I enforced a few correctness rules.

The latest turn has to win

If a follow-up turn is the newest turn, it has to be the one that counts.

That includes cases where the newest search returns fewer results, no results, or even an upstream error.

If Kino reused older successful results after a newer follow-up turn, the memory would feel fake.

Rejection has to lead to fresh candidates, not hidden duplicates

It was not enough to filter repeated titles out at the very end.

The agent had to keep paging through the grounded catalog results until it found unseen candidates or ran out of pages.

Otherwise the result window could still get burned on already seen titles.

Informational turns should stay informational

If a user asks a no-tool question like “How does this work?”, the structured response should not pretend that a search failed.

That sounds obvious, but it is exactly the kind of detail that makes an agent feel either solid or sloppy.

Why `activeContext` matters

One of the most useful additions was exposing activeContext in the structured response.

That object is not a magic memory profile.

It is a simple, inspectable view of the search context Kino used for the current turn.

That makes the feature easier to debug, easier to demo, and easier to explain.

It also keeps the project honest.

Instead of saying “the agent remembers,” I can show exactly what the current turn carried forward:

Free text
Genres
Title type
Year bounds
Adult flag
Excluded title IDs

That is much better than hiding the memory behavior behind vague claims.

Reality Check

This is useful memory, but it is still intentionally limited.

It is not:

A personal recommendation engine
Durable taste memory
A profile store
A general-purpose long-term memory system

And that is fine.

For this project, the right first memory feature was not “remember everything.”

It was “help the next turn make sense.”

That is a much stronger starting point.

The takeaway

If you are adding memory to a small agent, I think there is a good lesson here:

Start with follow-up memory before you start talking about preferences

Short-term memory in one session is often enough to make the system feel much more capable.

And if it is grounded, inspectable, and narrow in scope, it is also much easier to trust.

That is where Kino is now.

Not a giant memory system.

Just a better second turn.

Search This Blog

Eido Askayo