AI Conversation System: Audit and Redesign — January 2026

Date: January 27, 2026 Scope: Comprehensive audit of AI conversation flow, context management, and memory systems Status: COMPLETE — Recommendations ready for implementation

Executive Summary

MyStoryFlow is a Next.js application that helps seniors create life stories through AI-assisted conversations. This audit was conducted to assess the current state of the AI conversation system and align it with 2025 best practices for conversational AI.

Key Findings:

DB Foundation (Phase 1 from Nov 2024) is fully intact and ready — pgvector, user_memory table, HNSW indexes, and search functions all verified working
Critical gap: Context manager uses regex-based extraction — This is the weakest link in the system, missing semantic meaning and family relationships
No cross-session memory — AI does not remember users across conversations
Fixed 6-message context window — Long conversations lose important earlier details

Recommendation: Build custom pgvector solution (scored 4.10/5 in weighted decision matrix) rather than adopting external memory services.

Current State Assessment

What Works

Component	Location	Status
Conversation API route	`apps/web-app/app/api/conversation/route.ts`	Basic flow working
AI service orchestration	`apps/web-app/lib/ai/enhanced-server-ai-service.ts`	Multi-provider support
Gemini provider	`apps/web-app/lib/ai/providers/gemini-provider.ts`	Primary provider, working
Book context service	`apps/web-app/lib/ai/book-context-service.ts`	Fetches story previews
Conversation tagging	AI service	Working
Quality analysis	AI service	Working
Usage tracking	Rate limits	Working
Prompt templates	Database	Admin-configurable

What’s Weak (Critical Gaps)

Gap	Impact	Severity
No cross-session memory	AI doesn’t remember user across conversations	Critical
Regex-based context extraction	Misses semantic meaning, family relationships	Critical
6-message fixed context window	Long conversations lose important earlier details	High
No story/recording retrieval	Conversations can’t reference user’s existing content	High
No user profiling	AI doesn’t adapt to communication style over time	Medium
No conversation compaction	No strategy for token limits	Medium
No memory transparency	Users can’t see/edit what AI remembers	Low (MVP)

Current Context Manager Analysis

File: apps/web-app/lib/conversation/context-manager.ts


Current Flow:
User Message -> Regex extraction -> JSON summary -> LLM

Problems:
- Regex patterns miss semantic meaning
- Only last 6 messages in context
- No cross-session memory
- No story/recording retrieval
- Max 2000 characters (loses detail)
- Every API call sends full context (inefficient)

DB Foundation Verification (Phase 1 — COMPLETE)

Verified against live Supabase project qrlygafaejovxxlnkpxa on January 27, 2026:

Component	Status	Notes
pgvector extension	Enabled	Version 0.8.0, working
`stories.content_embedding vector(1536)`	Column exists	No data yet — pending embedding pipeline
`recordings.transcript_embedding vector(1536)`	Column exists	No data yet — pending embedding pipeline
`user_memory` table	Exists	13 columns including metadata jsonb
HNSW index on stories	`idx_stories_embedding`	Created, ready
HNSW index on recordings	`idx_recordings_embedding`	Created, ready
HNSW index on user_memory	`idx_user_memory_embedding`	Created, ready
`search_unified_context()` SQL function	Exists	Ready to use
RLS policies on user_memory	Configured	Users can only access own memories

user_memory Table Schema (Verified)


CREATE TABLE user_memory (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  memory_type text NOT NULL CHECK (memory_type IN ('fact', 'preference', 'relationship', 'event', 'theme')),
  content text NOT NULL,
  embedding vector(1536),
  source_type text,  -- 'conversation', 'story', 'recording', 'explicit'
  source_id uuid,
  importance_score numeric(3,2) DEFAULT 0.5,
  confidence numeric(3,2) DEFAULT 0.8,
  metadata jsonb DEFAULT '{}',
  last_referenced_at timestamptz DEFAULT now(),
  reference_count int DEFAULT 0,
  created_at timestamptz DEFAULT now(),
  updated_at timestamptz DEFAULT now()
);

Memory Layer Decision (Weighted Analysis)

Evaluated three approaches for the memory layer:

Criteria	Weight	Custom pgvector	Mem0 Self-Hosted	Mem0 Cloud
Cost at MVP	30%	5 ($0 infra)	3 (needs Qdrant)	1 (per-op cost)
Implementation effort	20%	3	4	5
Control and customization	15%	5	3	2
Performance	10%	4	5	5
Maintenance burden	10%	2	3	5
Migration risk	10%	5	3	2
Domain fit (stories)	5%	5	3	3
Weighted Score	100%	4.10	3.30	2.95

Decision: Custom pgvector solution

Rationale:

Already available in Supabase (zero additional infrastructure cost)
Data stays in PostgreSQL (no sync issues, GDPR compliance easier)
Full control over memory extraction logic (story-domain customization)
Sufficient performance for projected scale (<1M vectors)
Easy migration path to dedicated vector DB if needed later

2025 Best Practices Applied

Based on research from Anthropic, Mem0, LangChain, and industry standards:

1. Conversation Compaction (Anthropic-Recommended)

3-tier strategy for managing conversation context:


Tier 1: Recent Messages (full fidelity)
- Last 8-10 conversation turns
- Full message content preserved

Tier 2: Session Summaries (compressed)
- Compressed summaries of older session content
- Created after every 8 turns
- Stores: key topics, emotional tone, decisions made

Tier 3: Cross-Session Memories (persistent)
- Extracted facts, preferences, relationships
- Stored in user_memory table with embeddings
- Retrieved via semantic similarity

Implementation: CompactionService with configurable thresholds

2. LLM-Based Memory Extraction (Replaces Regex)

The current regex-based extraction in context-manager.ts will be replaced:

Current (Regex):


// Misses semantic meaning
const familyPattern = /\b(mother|father|grandmother|grandfather|sister|brother)\b/gi

New (LLM Extraction):


// Uses cheap model for structured extraction
const extraction = await gemini.extractStructured(message, {
  model: 'gemini-2.0-flash-lite',  // Fastest, cheapest
  schema: {
    people: [{ name: string, relationship: string, details: string }],
    events: [{ description: string, timeframe: string, emotions: string[] }],
    preferences: [{ category: string, preference: string }],
    facts: [{ subject: string, fact: string }]
  }
})

3. Adaptive Context Window

Replace fixed 6-message limit with dynamic sizing:


class AdaptiveContextWindow {
  private tokenBudget = 4000  // Adjustable per conversation
 
  buildContext(history: Message[]): Message[] {
    let tokens = 0
    const result: Message[] = []
 
    // Always include system prompt
    tokens += this.countTokens(systemPrompt)
 
    // Include recent messages until budget exhausted
    for (const msg of history.reverse()) {
      const msgTokens = this.countTokens(msg)
      if (tokens + msgTokens > this.tokenBudget) break
      result.unshift(msg)
      tokens += msgTokens
    }
 
    // Add compressed summary of excluded messages
    if (result.length < history.length) {
      result.unshift(this.getSummary(history.slice(0, -result.length)))
    }
 
    return result
  }
}

4. Memory Transparency

User-facing memory dashboard for trust and GDPR compliance:

“What do you know about me?” query handler in conversation
Memory dashboard in user settings
Edit/delete individual memories
Export all memories (GDPR data portability)
Clear all memories option

5. Session Continuity

When resuming a conversation:


async resumeConversation(sessionId: string, userId: string) {
  // 1. Load session checkpoint (compressed summary)
  const checkpoint = await loadCheckpoint(sessionId)
 
  // 2. Retrieve relevant memories for current topic
  const memories = await searchMemories(userId, checkpoint.lastTopic)
 
  // 3. Find related stories/recordings
  const relatedContent = await searchUnifiedContext(userId, checkpoint.lastTopic)
 
  // 4. Build resumption context
  return {
    systemPrompt: buildResumptionPrompt(checkpoint, memories, relatedContent),
    suggestedFollowUp: generateFollowUp(checkpoint)
  }
}

6. Anthropic Constitution Alignment

Following Anthropic’s guidance on AI assistants:

“Brilliant Friend” Model for Seniors:

Patient, unhurried responses
Gentle prompts, never pressuring
Validates emotions before asking follow-ups
Explains what it remembers and why

Anti-Sycophancy in Memory Recall:

Admits uncertainty about remembered facts
Asks for confirmation before using old memories
Never fabricates details to seem helpful

Emotional Safety Layer:


const emotionalSafetyCheck = async (message: string) => {
  const indicators = await detectDistressIndicators(message)
 
  if (indicators.grief || indicators.trauma) {
    return {
      responseModifier: 'gentle_acknowledgment',
      avoidFollowUps: true,
      suggestPause: indicators.severity > 0.7
    }
  }
}

Vulnerable User Safeguards:

Detect confusion or frustration
Offer to simplify or take a break
Never rush through difficult topics
Provide clear exit options

Implementation Phases

Phase	Status	Description
Phase 1	COMPLETE	DB foundation (pgvector, user_memory, indexes, search function)
Phase 2	COMPLETE	Embedding Pipeline (EmbeddingService, background jobs, save hooks)
Phase 3	COMPLETE	Enhanced Context Manager (replaces regex context-manager.ts)
Phase 4	COMPLETE	Memory System (lifecycle, transparency API, GDPR compliance)
Phase 5	COMPLETE	Quality and Analytics (memory-aware tagging, adaptive pace)

Implementation Summary (January 2026)

All phases have been implemented. Key files created:

Phase 2 — Embedding Pipeline:

apps/web-app/lib/ai/embedding-service.ts — OpenAI embedding generation
apps/web-app/lib/jobs/embedding-pipeline.ts — Batch processing
apps/web-app/lib/hooks/use-embedding-trigger.ts — React hooks
scripts/backfill-embeddings.ts — CLI backfill script
apps/web-app/app/api/embeddings/generate/route.ts — API endpoint

Phase 3 — Enhanced Context Manager:

apps/web-app/lib/conversation/enhanced-context-manager.ts — pgvector semantic search
apps/web-app/lib/conversation/compaction-service.ts — Conversation summarization
apps/web-app/lib/conversation/memory-extractor.ts — LLM-based memory extraction
supabase/migrations/20260127000000_conversation_context_enhancements.sql — DB migration

Phase 4 — Memory System:

apps/web-app/lib/conversation/memory-manager.ts — Memory lifecycle management
apps/web-app/app/api/user/memories/route.ts — Memory transparency API
apps/web-app/app/api/user/memories/[id]/route.ts — Individual memory operations

Phase 5 — Quality and Analytics:

apps/web-app/lib/ai/conversation-quality-service.ts — Quality tracking
apps/web-app/lib/ai/conversation-tagging-service.ts — Enhanced with memory context
supabase/migrations/20260127100000_quality_tracking.sql — Quality tables

Key Files to Modify/Create

Existing Files to Modify

File	Change
`apps/web-app/app/api/conversation/route.ts`	Integrate EnhancedContextManager
`apps/web-app/lib/ai/enhanced-server-ai-service.ts`	Add memory extraction hooks
`apps/web-app/lib/conversation/context-manager.ts`	Deprecate (replace with EnhancedContextManager)

New Files to Create

File	Purpose
`apps/web-app/lib/ai/embedding-service.ts`	Generate embeddings via OpenAI
`apps/web-app/lib/conversation/enhanced-context-manager.ts`	New context manager with memory retrieval
`apps/web-app/lib/conversation/memory-extractor.ts`	LLM-based memory extraction
`apps/web-app/lib/conversation/compaction-service.ts`	Conversation summary generation
`apps/web-app/lib/jobs/embedding-backfill.ts`	Backfill existing content

Cost Projections

Embedding Costs (text-embedding-3-small)

Scale	Stories	Recordings	Memories	Queries/mo	Monthly Cost
100 users	5K	2K	10K	10K	~$0.50
1,000 users	50K	20K	100K	100K	~$5
10,000 users	500K	200K	1M	1M	~$50

Memory Extraction Costs (gemini-2.0-flash-lite)

Scale	Extractions/mo	Avg Tokens	Monthly Cost
100 users	1K	500	~$0.10
1,000 users	10K	500	~$1
10,000 users	100K	500	~$10

Total projected cost at 1,000 users: ~$6/month

References

External

Anthropic’s Claude Constitution — Guidance on AI assistant behavior
Anthropic: Effective Context Engineering for AI Agents (2025)
Anthropic: Protecting Well-Being of Users (2025)
Mem0 Research Paper (arXiv:2504.19413) — Memory layer architecture
Supabase pgvector Documentation

Internal Documents

AI Constitution — MyStoryFlow AI Constitution (north star for all AI behavior)
Contextual Memory and RAG — Cost analysis and ADR-001
Unified Context Architecture — Original architecture proposal

Audit Sign-Off

Role	Name	Date
Audit Lead	AI System Review	January 27, 2026
Architecture Review	—	Pending
Implementation Lead	—	Pending

This audit document should be referenced for all AI conversation system changes. Update implementation phases as work progresses.