Adding Session Memory Without Building a Preference Engine
I just added another feature to Kino, my educational, cinema-themed distributed systems project. The LangGraph-based agent service is one part of that platform.
This time the feature was memory.
But not the vague, hyped kind.
I did not want Kino to pretend it knows a user's taste forever.
I wanted something much narrower and much more useful:
Short-term conversational memory for follow-up turns that helps the agent continue the current search without pretending to know more than it actually does
That decision mattered a lot.
It kept the feature small enough to trust, but still visible enough to feel like real agent behavior.
What changed
The new capability is simple to describe.
A user can start with:
Discover exactly 3 comedy movies from 2010 onward from Kino's
catalog.
Then follow up with:
I didn't like them, please discover different ones.
And Kino can continue from the current thread instead of treating the second turn like a brand-new request.
That means it carries forward the latest search constraints, such as:
- Genre
- Title type
- Year bounds
And when the user explicitly rejects the previous picks, it excludes those exact title IDs from the next search.
The result is not “AI memory” in the broad sense.
It is short-term conversational memory used for follow-up turns.
What this memory actually is
In plain English, the memory works like this:
- The user makes an initial grounded discovery request.
- The agent turns that into structured search arguments.
- The thread keeps the search context.
- A follow-up turn can refine that context instead of rebuilding it from scratch.
That is enough to support useful follow-ups like:
I didn't like them, please discover different ones.Movie only.Make it older.
The two turns look like this:
Turn 1
flowchart LR
U1[Turn 1 request] --> L1[LLM]
L1 --> S1[Catalog search]
S1 --> R1[Grounded titles]
Turn 2
flowchart LR
U2[Turn 2 follow-up] --> C[Short-term conversation context]
R1[Turn 1 titles] --> C
C --> L2[LLM]
L2 --> S2[Search + exclusions]
S2 --> R2[Different grounded titles]
Without turning the system into a long-term preference engine.
That distinction is important.
I did not want the project to jump straight into:
- Inferred taste profiles
- Cross-session memory
- Hidden preference ranking
- Generic “the AI remembers you” product language
Kino is still a grounded discovery system that uses a structured catalog.
The memory feature had to match that architecture.
In Kino, that short-term memory stays inside the current thread instead of being saved as long-term preference memory.
The practical part
The best way to explain the feature is to show the actual flow.
Turn 1
The first turn asks for:
Discover exactly 3 comedy movies from 2010 onward from Kino's
catalog.
Kino returns grounded titles such as:
The Wandering Soap OperaA Thin LifeJoe Finds Grace
And the structured response now exposes an activeContext object
that shows the effective search context used for that turn.
That context looked like this:
genres=["Comedy"]titleType="movie"minYear=2010excludedTitleIds=[]
Turn 2
Then the user says:
I didn't like them, please discover different ones.
That is where the new behavior becomes useful.
Kino carries forward the current discovery context, increases the search
window, and passes exclude_ids with the titles it already showed.
So the second search still uses the same structured constraints, but now it looks for fresh candidates inside the same limited search scope.
The follow-up results included:
Blood TypeFoodfight!Return to Babylon
That is a much better user experience than forcing the user to restate the entire request every turn.
Why the model choice mattered
This feature also reinforced something practical for me: the quality did not come only from the model.
I tested this flow with Gemini 3.1 Flash Lite, which Google positioned in March 2026 as its fastest and most cost-efficient Gemini 3 series model for high-volume workloads. That fit this feature well.
It gave me a few clear advantages:
- Lower cost
- Lower latency
- Faster iteration while testing prompts, traces, and edge cases
More importantly, it was a useful architectural signal.
If this feature only worked on a much larger model, I would trust the design less. But when a lighter model can handle the follow-up turn, carry forward the right search constraints, and trigger the right tool call, it usually means the system design is doing more of the real work.
Why I kept it narrow
There is a strong temptation to call any multi-turn behavior “memory” and stop there.
I think that is where a lot of agent features become unclear.
Useful memory is not just about remembering more.
It is about remembering the right thing at the right scope.
For Kino, that scope is the current discovery thread.
That is why this feature is:
- Thread-scoped
- Explicit
- Inspectable
- Grounded in structured search arguments
And not:
- Long-term user profiles
- Hidden ranking bias
- A broad semantic memory layer
I would rather have a smaller memory feature that is easy to reason about than a bigger one that sounds impressive but is hard to trust.
What had to be true for this to be worth keeping
The implementation only became useful after I enforced a few correctness rules.
The latest turn has to win
If a follow-up turn is the newest turn, it has to be the one that counts.
That includes cases where the newest search returns fewer results, no results, or even an upstream error.
If Kino reused older successful results after a newer follow-up turn, the memory would feel fake.
Rejection has to lead to fresh candidates, not hidden duplicates
It was not enough to filter repeated titles out at the very end.
The agent had to keep paging through the grounded catalog results until it found unseen candidates or ran out of pages.
Otherwise the result window could still get burned on already seen titles.
Informational turns should stay informational
If a user asks a no-tool question like “How does this work?”, the structured response should not pretend that a search failed.
That sounds obvious, but it is exactly the kind of detail that makes an agent feel either solid or sloppy.
Why activeContext matters
One of the most useful additions was exposing activeContext in
the structured response.
That object is not a magic memory profile.
It is a simple, inspectable view of the search context Kino used for the current turn.
That makes the feature easier to debug, easier to demo, and easier to explain.
It also keeps the project honest.
Instead of saying “the agent remembers,” I can show exactly what the current turn carried forward:
- Free text
- Genres
- Title type
- Year bounds
- Adult flag
- Excluded title IDs
That is much better than hiding the memory behavior behind vague claims.
Reality Check
This is useful memory, but it is still intentionally limited.
It is not:
- A personal recommendation engine
- Durable taste memory
- A profile store
- A general-purpose long-term memory system
And that is fine.
For this project, the right first memory feature was not “remember everything.”
It was “help the next turn make sense.”
That is a much stronger starting point.
The takeaway
If you are adding memory to a small agent, I think there is a good lesson here:
Start with follow-up memory before you start talking about preferences
Short-term memory in one session is often enough to make the system feel much more capable.
And if it is grounded, inspectable, and narrow in scope, it is also much easier to trust.
That is where Kino is now.
Not a giant memory system.
Just a better second turn.
