Overview
The Dual Memory System (DMS) powers conversation continuity by combining two complementary memory types:- Short Term Memory: The most recent messages the AI can see directly in its context window
- Long Term Memory: Relevant memories automatically retrieved from your entire chat history
{{user}} likes pasta). These memories are what the system retrieves during generation.
DMS runs automatically in the background - no setup or manual management required.
Why it matters
- Continuity: Keeps plots, relationships, and facts consistent over long sessions
- Relevance: Surfaces the right past details at the right time
- Zero setup: Works out of the box; you focus on the story
How DMS works (at a glance)
1
Ingest recent messages
DMS collects the latest part of the conversation to form Short Term Memory.
2
Select correct swipes
For each message with swipes, DMS picks the appropriate one for context.
3
Retrieve relevant history
When helpful, DMS searches your entire chat for semantically relevant memories.
4
Compose final context
Short Term Memory + retrieved Long Term memories are provided to the AI for generation.
Memory Details
- Short Term
- Long Term
- Memories
Short Term Memory is the direct, visible portion of the conversation the AI can access.Trade-offs
- Very large windows can dilute what is most important on smaller models
- Messages outside the window are not directly visible
- Larger windows can increase latency and cost on lower-end models
- DMS gathers the latest messages
- Selects the correct swipes
- Sends the last X messages within your token limit to the AI
Standard Short Term Limits by Tier
| Tier | Tokens |
|---|---|
| Free | 8,000 tokens |
| Premium Tier 1 | 10,000 tokens |
| Premium Tier 2 | 14,000 tokens |
| Premium Tier 3 | 18,000 tokens |
Higher token limits allow longer recent context without relying on retrieval.