Skip to main content

Overview

The Dual Memory System (DMS) powers conversation continuity by combining two complementary memory types:
  • Short Term Memory: The most recent messages the AI can see directly in its context window
  • Long Term Memory: Relevant memories automatically retrieved from your entire chat history
Long Term Memory is powered by compact memories - short factual statements extracted from your chat (for example: {{user}} likes pasta). These memories are what the system retrieves during generation.
DMS runs automatically in the background - no setup or manual management required.

Why it matters

  • Continuity: Keeps plots, relationships, and facts consistent over long sessions
  • Relevance: Surfaces the right past details at the right time
  • Zero setup: Works out of the box; you focus on the story

How DMS works (at a glance)

1

Ingest recent messages

DMS collects the latest part of the conversation to form Short Term Memory.
2

Select correct swipes

For each message with swipes, DMS picks the appropriate one for context.
3

Retrieve relevant history

When helpful, DMS searches your entire chat for semantically relevant memories.
4

Compose final context

Short Term Memory + retrieved Long Term memories are provided to the AI for generation.

Memory Details

Short Term Memory is the direct, visible portion of the conversation the AI can access.Trade-offs
  • Very large windows can dilute what is most important on smaller models
  • Messages outside the window are not directly visible
  • Larger windows can increase latency and cost on lower-end models
How it’s filled
  • DMS gathers the latest messages
  • Selects the correct swipes
  • Sends the last X messages within your token limit to the AI

Standard Short Term Limits by Tier

TierTokens
Free8,000 tokens
Premium Tier 110,000 tokens
Premium Tier 214,000 tokens
Premium Tier 318,000 tokens
Higher token limits allow longer recent context without relying on retrieval.