When we started building AIThreads, we had a simple goal: create AI agents that respond to customer emails automatically. The first version worked great—until customers replied.

Customer: "Hi, I'm having trouble with my order #12345" AI: "I'd be happy to help! Could you provide your order number?"

The AI had no memory. Every email was a blank slate. Customers had to repeat themselves, and our AI looked... well, dumb.

We needed memory. Not just for the current conversation, but across all conversations with the same customer. This is the story of how we built it.

The Problem with Stateless AI

Most AI email systems work like this:

┌─────────────────────────────────────────────────────────────┐
│                    STATELESS AI FLOW                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   📧 Email      ➜    🔄 Rebuild      ➜    🤖 Generate      │
│   Arrives           Context              Reply              │
│                     from scratch                            │
│                          │                                  │
│                          ▼                                  │
│                    ❌ No Memory                             │
│                    ❌ No History                            │
│                    ❌ No Learning                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The context is rebuilt from scratch every time. This creates several problems:

Repetition fatigue. Customers say things like "As I mentioned in my last email..." but the AI has no idea what they mentioned.

Lost context. A customer upgrades to the Pro plan, but the AI still asks about their free trial two weeks later.

No learning. The same customer reports the same bug five times, and the AI treats each report as a brand new issue.

Inconsistent experience. Each reply feels like it comes from a different person because there's no continuity.

What we wanted was different:

┌─────────────────────────────────────────────────────────────┐
│                    STATEFUL AI FLOW                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   📧 Email      ➜    🧠 Load         ➜    🤖 Generate      │
│   Arrives           Memory               Reply              │
│                        │                    │               │
│                        ▼                    ▼               │
│              ┌─────────────────┐    ┌─────────────┐        │
│              │ Customer Facts  │    │   Update    │        │
│              │ Thread State    │    │   Memory    │        │
│              │ Past Context    │    │             │        │
│              └─────────────────┘    └─────────────┘        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Our Three-Layer Memory Architecture

After researching systems like Mem0, LangMem, and various RAG architectures, we designed a three-layer system. Each layer serves a different purpose and operates at a different timescale.

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   ┌─────────────────────────────────────────────────────┐  │
│   │  LAYER 3: PARTICIPANT MEMORY                        │  │
│   │  ────────────────────────────────────────────────── │  │
│   │  • Discrete facts about customers                   │  │
│   │  • Persists across ALL conversations                │  │
│   │  • Semantic search with pgvector                    │  │
│   │  • "John is on Enterprise plan"                     │  │
│   └─────────────────────────────────────────────────────┘  │
│                           ▲                                 │
│                           │                                 │
│   ┌─────────────────────────────────────────────────────┐  │
│   │  LAYER 2: THREAD SUMMARY                            │  │
│   │  ────────────────────────────────────────────────── │  │
│   │  • Rolling AI-generated summary                     │  │
│   │  • Scoped to current conversation                   │  │
│   │  • Cached in Redis, stored in Postgres              │  │
│   │  • Max 200 words, updated incrementally             │  │
│   └─────────────────────────────────────────────────────┘  │
│                           ▲                                 │
│                           │                                 │
│   ┌─────────────────────────────────────────────────────┐  │
│   │  LAYER 1: THREAD STATE                              │  │
│   │  ────────────────────────────────────────────────── │  │
│   │  • Explicit state machine (new → resolved)          │  │
│   │  • Workflow tracking & SLA deadlines                │  │
│   │  • Real-time updates in PostgreSQL                  │  │
│   │  • AI actions & goals logged                        │  │
│   └─────────────────────────────────────────────────────┘  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Layer 1: Thread State sits at the bottom. It's an explicit state machine that tracks where each conversation stands in its workflow. Is this a new thread? Are we waiting for the customer to respond? Has it been escalated to a human? This layer uses PostgreSQL and updates in real-time.

Layer 2: Thread Summary sits in the middle. It's a rolling, AI-generated summary of the current conversation that evolves as emails are exchanged. Instead of feeding the AI 50 raw emails, we give it a 200-word summary of what's happened. This layer uses PostgreSQL for persistence and Redis for caching.

Layer 3: Participant Memory sits at the top. This is our Mem0-like system that stores facts about customers across all their conversations. It's not a summary—it's discrete, searchable pieces of knowledge. "John is on the Enterprise plan." "His company is Acme Corp." "He had a billing dispute in December that we resolved." This layer uses PostgreSQL with pgvector for semantic search.

Layer 1: Thread State Machine

Every conversation has a workflow, and making that workflow explicit prevents the AI from getting confused about where things stand.

THREAD STATE MACHINE
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │                      ┌─────────┐                       │
    │           ┌─────────▶│  NEW    │                       │
    │           │          └────┬────┘                       │
    │           │               │                            │
    │           │               ▼                            │
    │           │        ┌────────────┐                      │
    │           │        │IN_PROGRESS │◀─────────┐           │
    │           │        └─────┬──────┘          │           │
    │           │              │                 │           │
    │           │     ┌────────┴────────┐        │           │
    │           │     ▼                 ▼        │           │
    │      ┌──────────────┐      ┌───────────┐   │           │
    │      │AWAITING_REPLY│      │ ESCALATED │───┘           │
    │      └──────┬───────┘      └─────┬─────┘               │
    │             │                    │                     │
    │             │    ┌───────────────┘                     │
    │             ▼    ▼                                     │
    │         ┌───────────┐                                  │
    │         │ RESOLVED  │                                  │
    │         └─────┬─────┘                                  │
    │               │                                        │
    │               ▼                                        │
    │          ┌─────────┐                                   │
    │          │ CLOSED  │                                   │
    │          └─────────┘                                   │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

We defined six states a thread can be in: new (just received, not yet processed), in_progress (AI is actively working on it), awaiting_reply (we've responded and are waiting for the customer), escalated (handed off to a human agent), resolved (issue is complete), and closed (no further action needed).

The key insight is that not all transitions are valid. You can't go directly from "new" to "closed" without passing through the work states. You can't mark something "resolved" if it's currently escalated—the human needs to hand it back first. We enforce these rules in code, which prevents the AI from making nonsensical state changes.

Every state transition gets logged with a timestamp, what triggered it (AI, human, or system), and a reason. This audit trail is invaluable for debugging and for understanding how conversations flow through the system.

The AI can update thread state using a tool call. When it sends a reply that asks the customer a question, it can simultaneously set the state to "awaiting_reply" and add a goal like "waiting for customer to confirm shipping address."

Layer 2: Thread Summaries

Raw email chains get long. Feeding 50 emails into context is expensive and noisy. Most of that content is greetings, signatures, and quoted replies. What the AI actually needs is a distilled understanding of what's happened.

ROLLING SUMMARY UPDATES
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │   Email 1        Email 2        Email 3       Email N   │
    │      │              │              │             │      │
    │      ▼              ▼              ▼             ▼      │
    │  ┌───────┐     ┌─────────┐    ┌─────────┐   ┌───────┐  │
    │  │ Init  │ ──▶ │ Update  │ ─▶ │ Update  │─▶ │  ...  │  │
    │  │Summary│     │ +Email2 │    │ +Email3 │   │       │  │
    │  └───────┘     └─────────┘    └─────────┘   └───────┘  │
    │      │              │              │             │      │
    │      ▼              ▼              ▼             ▼      │
    │   50 words      80 words      120 words     200 words   │
    │                                              (max cap)  │
    │                                                         │
    │   ✓ O(1) per email, not O(n)                           │
    │   ✓ Recent events emphasized                            │
    │   ✓ Historical context preserved                        │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

Our approach is rolling summaries. When a thread is created, there's no summary. After the first email arrives, we generate an initial summary. After each subsequent email, we update the summary incrementally—we don't regenerate from scratch.

The update prompt is simple: here's the current summary, here are the new emails since the last update, produce an updated summary that incorporates the new information while staying under 200 words. Focus on what was requested, what was done, and what's still pending.

This approach has two benefits. First, it's efficient—we're only processing new emails, not re-reading the entire thread. Second, the summary naturally emphasizes recent events while still preserving important historical context.

We cache summaries in Redis with a one-hour TTL. Most email threads have bursts of activity followed by quiet periods, so caching dramatically reduces redundant AI calls.

The summary generation happens after the reply is sent, not before. We don't want to add latency to the response. The AI works with the previous summary, sends its reply, and then we update the summary asynchronously in the background.

Layer 3: Participant Memory

This is where things get interesting. We want to remember facts about customers across all their conversations, even if they email different inboxes or their threads get closed and reopened months later.

The core insight is that we don't store entire conversations. We extract discrete facts.

FACT EXTRACTION FLOW
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │  📧 "Hi, I'm John from Acme Corp. We just upgraded     │
    │      to Enterprise last week. Quick question about      │
    │      the API rate limits..."                            │
    │                            │                            │
    │                            ▼                            │
    │                    ┌───────────────┐                   │
    │                    │  AI Extracts  │                   │
    │                    │    Facts      │                   │
    │                    └───────┬───────┘                   │
    │                            │                            │
    │           ┌────────────────┼────────────────┐          │
    │           ▼                ▼                ▼          │
    │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │
    │   │ FACT 1      │  │ FACT 2      │  │ FACT 3      │   │
    │   │─────────────│  │─────────────│  │─────────────│   │
    │   │type: info   │  │type: info   │  │type: info   │   │
    │   │key: name    │  │key: company │  │key: plan    │   │
    │   │val: "John"  │  │val: "Acme"  │  │val: "Ent."  │   │
    │   │conf: 0.95   │  │conf: 0.90   │  │conf: 0.95   │   │
    │   └─────────────┘  └─────────────┘  └─────────────┘   │
    │           │                │                │          │
    │           └────────────────┼────────────────┘          │
    │                            ▼                            │
    │                   ┌─────────────────┐                  │
    │                   │    pgvector     │                  │
    │                   │   (semantic     │                  │
    │                   │    search)      │                  │
    │                   └─────────────────┘                  │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

When John emails us, we don't save "John had a long conversation about billing in December." Instead, we extract and store: "John works at Acme Corp." "John is on the Enterprise plan." "John had a billing dispute in December." "The billing dispute was resolved with a $50 credit."

Each fact has a type (customer info, order info, preference, issue history, contact info), a key (like "company" or "plan_tier"), a value, and a confidence score. Facts can also have expiration dates—"John is on vacation until January 15th" should automatically become irrelevant after that date.

The AI extracts facts during email processing. When it reads an email that says "By the way, we just upgraded to your Enterprise plan last week," it calls a tool to store that fact. The extraction is opportunistic—we're not trying to parse every email exhaustively, just capture notable information as it appears naturally.

Semantic search makes facts useful. We generate vector embeddings for each fact, which allows the AI to find relevant context even when the wording doesn't match exactly. If a customer asks about "that order from last month," we can find facts about their orders even though they didn't specify an order number.

We chose pgvector over dedicated vector databases for simplicity. Having one database for everything means we can join facts with other tables, and we get ACID transactions for free. The performance is good enough for our scale—sub-50ms queries on thousands of facts.

The Participant State Machine

Beyond facts, we track each customer's lifecycle state. This is like a CRM status that helps the AI understand who it's talking to.

PARTICIPANT LIFECYCLE STATES
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │                      ┌─────────┐                       │
    │                      │ UNKNOWN │                       │
    │                      └────┬────┘                       │
    │                           │                            │
    │              ┌────────────┴────────────┐               │
    │              ▼                         ▼               │
    │       ┌──────────┐              ┌──────────┐          │
    │       │ PROSPECT │──────────────▶│ CUSTOMER │          │
    │       └──────────┘   converts   └─────┬────┘          │
    │                                       │                │
    │                          ┌────────────┼────────────┐   │
    │                          ▼            │            ▼   │
    │                    ┌─────────┐        │      ┌───────┐│
    │                    │   VIP   │        │      │AT_RISK││
    │                    └─────────┘        │      └───┬───┘│
    │                          │            │          │     │
    │                          │            │          ▼     │
    │                          │            │    ┌─────────┐ │
    │                          │            │    │ CHURNED │ │
    │                          │            │    └────┬────┘ │
    │                          │            │         │      │
    │                          │            │         ▼      │
    │                          │            │   ┌──────────┐ │
    │                          └────────────┴──▶│RETURNING │ │
    │                                           └──────────┘ │
    │                                                         │
    │   Signals: sentiment, risk score, issue frequency       │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

The states form a progression: unknown (we haven't classified them yet), prospect (they're asking questions but not a customer), customer (active paying customer), vip (high-value customer who gets extra attention), at_risk (showing churn signals), churned (no longer active), and returning (a churned customer who came back).

The AI can update this state based on signals in emails. Someone asking about pricing is probably a prospect. Someone mentioning they just renewed their annual subscription is definitely a customer, maybe a VIP. Someone saying "I'm considering switching to Competitor X" is at risk.

We also track risk scores and sentiment scores that the AI updates over time. A customer with three frustrated emails in a row will have a high risk score, which tells the AI to be extra careful and empathetic.

Putting It All Together

When an email arrives, here's what the AI sees:

┌─────────────────────────────────────────────────────────┐
    │                   AI CONTEXT ASSEMBLY                   │
    ├─────────────────────────────────────────────────────────┤
    │                                                         │
    │  📧 NEW EMAIL                                          │
    │  ─────────────────                                      │
    │  From: john@acme.com                                    │
    │  Subject: Re: API rate limits                           │
    │  Body: "Thanks, but now I'm hitting a different..."     │
    │                                                         │
    │            +                                            │
    │                                                         │
    │  👤 PARTICIPANT MEMORY                                 │
    │  ─────────────────────                                  │
    │  State: VIP Customer | Risk: Low                        │
    │  Facts:                                                 │
    │    • Name: John                                         │
    │    • Company: Acme Corp                                 │
    │    • Plan: Enterprise                                   │
    │    • History: Billing dispute Dec (resolved)            │
    │                                                         │
    │            +                                            │
    │                                                         │
    │  📝 THREAD SUMMARY                                     │
    │  ─────────────────                                      │
    │  "John asked about API rate limits for Enterprise.      │
    │   We explained the 10k/min limit. He confirmed it       │
    │   meets his needs. Now has follow-up question."         │
    │                                                         │
    │            +                                            │
    │                                                         │
    │  🔄 THREAD STATE                                       │
    │  ───────────────                                        │
    │  Status: IN_PROGRESS                                    │
    │  Goal: "Help with API implementation"                   │
    │                                                         │
    │            +                                            │
    │                                                         │
    │  📚 RAG KNOWLEDGE                                      │
    │  ───────────────                                        │
    │  "Enterprise rate limits: 10,000 requests/minute..."    │
    │                                                         │
    │            ▼                                            │
    │  ┌─────────────────────────────────────────────────┐   │
    │  │  🤖 AI GENERATES CONTEXTUAL REPLY               │   │
    │  │     with full memory of who John is              │   │
    │  └─────────────────────────────────────────────────┘   │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

First, the customer context: their email, name, current lifecycle state (like "VIP Customer"), risk level, how many past issues they've had and how many were resolved, any tags we've applied (like "enterprise" or "decision-maker"), and all their relevant facts organized by category.

Second, the thread summary: a concise description of the current conversation, including what was requested, what's been done, and what's still pending.

Third, the thread state: where this conversation stands in its workflow and what goals are outstanding.

Fourth, any relevant knowledge from our RAG system that might help answer the customer's question.

Finally, the new email itself.

With all this context, the AI can respond intelligently. It knows who John is, what he's asked about before, where this conversation stands, and has access to relevant documentation. It can pick up right where the last message left off.

The Killer Feature: Resume as Same Agent

The magic happens when a customer emails again after weeks or months of silence.

┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │   ❌ WITHOUT MEMORY                                     │
    │   ─────────────────                                     │
    │                                                         │
    │   John (March): "Can you help with billing?"            │
    │   AI: "Sure! What's your account email?"                │
    │                                                         │
    │   John (June): "Following up on our chat..."            │
    │   AI: "I don't have context. Can you explain?"    😕   │
    │                                                         │
    ├─────────────────────────────────────────────────────────┤
    │                                                         │
    │   ✅ WITH MEMORY                                        │
    │   ───────────────                                       │
    │                                                         │
    │   John (March): "Can you help with billing?"            │
    │   AI: "Sure! What's your account email?"                │
    │   [STORES: name=John, issue=billing]                    │
    │                                                         │
    │   John (June): "Following up on our chat..."            │
    │   AI: "Hi John! Yes, I remember we were working on      │
    │        your billing question. Let me pull that up..." 🎯 │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

Traditional systems treat this as a brand new interaction. The customer has to re-explain who they are and what they need. The AI has no memory of previous conversations.

With our memory system, the AI immediately knows: "Oh, this is John from Acme Corp, our VIP Enterprise customer. He had that billing issue in January which we resolved with a $50 credit. He switched to annual billing last month. His sentiment has been positive lately."

The AI doesn't have to ask for context. It doesn't need to look anything up. It just continues the relationship as if no time had passed.

This is what we mean by "resume as same agent." The AI isn't just responding to an email—it's maintaining an ongoing relationship with a customer, with full memory of everything that's happened between them.

What We Learned

Extract facts, not conversations. Storing full email chains is expensive and noisy. Discrete facts are searchable and composable.

Explicit state machines beat implicit state. Having a clear state field with valid transitions prevents the AI from getting confused about where things stand.

Rolling summaries scale. Regenerating summaries for every email would be expensive. Incremental updates are O(1) regardless of thread length.

Semantic search is essential. Customers don't say "order number ORD-12345"—they say "that order from last week." Embeddings bridge this gap.

Confidence scores matter. A fact from ten verified emails is more reliable than a fact the AI inferred once. Weighting by confidence produces better context.

What's Next

We're exploring automatic fact decay, where facts mentioned in many conversations are weighted higher than one-off mentions. We're also looking at proactive outreach—if a customer hasn't emailed in 30 days and their renewal is coming up, maybe we should reach out first.

Memory is what transforms a stateless AI into an agent that truly knows your customers. It's the difference between "How can I help you today?" and "Hi John, good to hear from you again—how did that enterprise deployment go?"


Building something similar? We'd love to hear about it. Reach out at hey@aithreads.io