AI Conversation System: Audit and Redesign — January 2026
Date: January 27, 2026 Scope: Comprehensive audit of AI conversation flow, context management, and memory systems Status: COMPLETE — Recommendations ready for implementation
Executive Summary
MyStoryFlow is a Next.js application that helps seniors create life stories through AI-assisted conversations. This audit was conducted to assess the current state of the AI conversation system and align it with 2025 best practices for conversational AI.
Key Findings:
- DB Foundation (Phase 1 from Nov 2024) is fully intact and ready — pgvector, user_memory table, HNSW indexes, and search functions all verified working
- Critical gap: Context manager uses regex-based extraction — This is the weakest link in the system, missing semantic meaning and family relationships
- No cross-session memory — AI does not remember users across conversations
- Fixed 6-message context window — Long conversations lose important earlier details
Recommendation: Build custom pgvector solution (scored 4.10/5 in weighted decision matrix) rather than adopting external memory services.
Current State Assessment
What Works
| Component | Location | Status |
|---|---|---|
| Conversation API route | apps/web-app/app/api/conversation/route.ts | Basic flow working |
| AI service orchestration | apps/web-app/lib/ai/enhanced-server-ai-service.ts | Multi-provider support |
| Gemini provider | apps/web-app/lib/ai/providers/gemini-provider.ts | Primary provider, working |
| Book context service | apps/web-app/lib/ai/book-context-service.ts | Fetches story previews |
| Conversation tagging | AI service | Working |
| Quality analysis | AI service | Working |
| Usage tracking | Rate limits | Working |
| Prompt templates | Database | Admin-configurable |
What’s Weak (Critical Gaps)
| Gap | Impact | Severity |
|---|---|---|
| No cross-session memory | AI doesn’t remember user across conversations | Critical |
| Regex-based context extraction | Misses semantic meaning, family relationships | Critical |
| 6-message fixed context window | Long conversations lose important earlier details | High |
| No story/recording retrieval | Conversations can’t reference user’s existing content | High |
| No user profiling | AI doesn’t adapt to communication style over time | Medium |
| No conversation compaction | No strategy for token limits | Medium |
| No memory transparency | Users can’t see/edit what AI remembers | Low (MVP) |
Current Context Manager Analysis
File: apps/web-app/lib/conversation/context-manager.ts
Current Flow:
User Message -> Regex extraction -> JSON summary -> LLM
Problems:
- Regex patterns miss semantic meaning
- Only last 6 messages in context
- No cross-session memory
- No story/recording retrieval
- Max 2000 characters (loses detail)
- Every API call sends full context (inefficient)DB Foundation Verification (Phase 1 — COMPLETE)
Verified against live Supabase project qrlygafaejovxxlnkpxa on January 27, 2026:
| Component | Status | Notes |
|---|---|---|
| pgvector extension | Enabled | Version 0.8.0, working |
stories.content_embedding vector(1536) | Column exists | No data yet — pending embedding pipeline |
recordings.transcript_embedding vector(1536) | Column exists | No data yet — pending embedding pipeline |
user_memory table | Exists | 13 columns including metadata jsonb |
| HNSW index on stories | idx_stories_embedding | Created, ready |
| HNSW index on recordings | idx_recordings_embedding | Created, ready |
| HNSW index on user_memory | idx_user_memory_embedding | Created, ready |
search_unified_context() SQL function | Exists | Ready to use |
| RLS policies on user_memory | Configured | Users can only access own memories |
user_memory Table Schema (Verified)
CREATE TABLE user_memory (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
memory_type text NOT NULL CHECK (memory_type IN ('fact', 'preference', 'relationship', 'event', 'theme')),
content text NOT NULL,
embedding vector(1536),
source_type text, -- 'conversation', 'story', 'recording', 'explicit'
source_id uuid,
importance_score numeric(3,2) DEFAULT 0.5,
confidence numeric(3,2) DEFAULT 0.8,
metadata jsonb DEFAULT '{}',
last_referenced_at timestamptz DEFAULT now(),
reference_count int DEFAULT 0,
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now()
);Memory Layer Decision (Weighted Analysis)
Evaluated three approaches for the memory layer:
| Criteria | Weight | Custom pgvector | Mem0 Self-Hosted | Mem0 Cloud |
|---|---|---|---|---|
| Cost at MVP | 30% | 5 ($0 infra) | 3 (needs Qdrant) | 1 (per-op cost) |
| Implementation effort | 20% | 3 | 4 | 5 |
| Control and customization | 15% | 5 | 3 | 2 |
| Performance | 10% | 4 | 5 | 5 |
| Maintenance burden | 10% | 2 | 3 | 5 |
| Migration risk | 10% | 5 | 3 | 2 |
| Domain fit (stories) | 5% | 5 | 3 | 3 |
| Weighted Score | 100% | 4.10 | 3.30 | 2.95 |
Decision: Custom pgvector solution
Rationale:
- Already available in Supabase (zero additional infrastructure cost)
- Data stays in PostgreSQL (no sync issues, GDPR compliance easier)
- Full control over memory extraction logic (story-domain customization)
- Sufficient performance for projected scale (<1M vectors)
- Easy migration path to dedicated vector DB if needed later
2025 Best Practices Applied
Based on research from Anthropic, Mem0, LangChain, and industry standards:
1. Conversation Compaction (Anthropic-Recommended)
3-tier strategy for managing conversation context:
Tier 1: Recent Messages (full fidelity)
- Last 8-10 conversation turns
- Full message content preserved
Tier 2: Session Summaries (compressed)
- Compressed summaries of older session content
- Created after every 8 turns
- Stores: key topics, emotional tone, decisions made
Tier 3: Cross-Session Memories (persistent)
- Extracted facts, preferences, relationships
- Stored in user_memory table with embeddings
- Retrieved via semantic similarityImplementation: CompactionService with configurable thresholds
2. LLM-Based Memory Extraction (Replaces Regex)
The current regex-based extraction in context-manager.ts will be replaced:
Current (Regex):
// Misses semantic meaning
const familyPattern = /\b(mother|father|grandmother|grandfather|sister|brother)\b/giNew (LLM Extraction):
// Uses cheap model for structured extraction
const extraction = await gemini.extractStructured(message, {
model: 'gemini-2.0-flash-lite', // Fastest, cheapest
schema: {
people: [{ name: string, relationship: string, details: string }],
events: [{ description: string, timeframe: string, emotions: string[] }],
preferences: [{ category: string, preference: string }],
facts: [{ subject: string, fact: string }]
}
})3. Adaptive Context Window
Replace fixed 6-message limit with dynamic sizing:
class AdaptiveContextWindow {
private tokenBudget = 4000 // Adjustable per conversation
buildContext(history: Message[]): Message[] {
let tokens = 0
const result: Message[] = []
// Always include system prompt
tokens += this.countTokens(systemPrompt)
// Include recent messages until budget exhausted
for (const msg of history.reverse()) {
const msgTokens = this.countTokens(msg)
if (tokens + msgTokens > this.tokenBudget) break
result.unshift(msg)
tokens += msgTokens
}
// Add compressed summary of excluded messages
if (result.length < history.length) {
result.unshift(this.getSummary(history.slice(0, -result.length)))
}
return result
}
}4. Memory Transparency
User-facing memory dashboard for trust and GDPR compliance:
- “What do you know about me?” query handler in conversation
- Memory dashboard in user settings
- Edit/delete individual memories
- Export all memories (GDPR data portability)
- Clear all memories option
5. Session Continuity
When resuming a conversation:
async resumeConversation(sessionId: string, userId: string) {
// 1. Load session checkpoint (compressed summary)
const checkpoint = await loadCheckpoint(sessionId)
// 2. Retrieve relevant memories for current topic
const memories = await searchMemories(userId, checkpoint.lastTopic)
// 3. Find related stories/recordings
const relatedContent = await searchUnifiedContext(userId, checkpoint.lastTopic)
// 4. Build resumption context
return {
systemPrompt: buildResumptionPrompt(checkpoint, memories, relatedContent),
suggestedFollowUp: generateFollowUp(checkpoint)
}
}6. Anthropic Constitution Alignment
Following Anthropic’s guidance on AI assistants:
“Brilliant Friend” Model for Seniors:
- Patient, unhurried responses
- Gentle prompts, never pressuring
- Validates emotions before asking follow-ups
- Explains what it remembers and why
Anti-Sycophancy in Memory Recall:
- Admits uncertainty about remembered facts
- Asks for confirmation before using old memories
- Never fabricates details to seem helpful
Emotional Safety Layer:
const emotionalSafetyCheck = async (message: string) => {
const indicators = await detectDistressIndicators(message)
if (indicators.grief || indicators.trauma) {
return {
responseModifier: 'gentle_acknowledgment',
avoidFollowUps: true,
suggestPause: indicators.severity > 0.7
}
}
}Vulnerable User Safeguards:
- Detect confusion or frustration
- Offer to simplify or take a break
- Never rush through difficult topics
- Provide clear exit options
Implementation Phases
| Phase | Status | Description |
|---|---|---|
| Phase 1 | COMPLETE | DB foundation (pgvector, user_memory, indexes, search function) |
| Phase 2 | COMPLETE | Embedding Pipeline (EmbeddingService, background jobs, save hooks) |
| Phase 3 | COMPLETE | Enhanced Context Manager (replaces regex context-manager.ts) |
| Phase 4 | COMPLETE | Memory System (lifecycle, transparency API, GDPR compliance) |
| Phase 5 | COMPLETE | Quality and Analytics (memory-aware tagging, adaptive pace) |
Implementation Summary (January 2026)
All phases have been implemented. Key files created:
Phase 2 — Embedding Pipeline:
apps/web-app/lib/ai/embedding-service.ts— OpenAI embedding generationapps/web-app/lib/jobs/embedding-pipeline.ts— Batch processingapps/web-app/lib/hooks/use-embedding-trigger.ts— React hooksscripts/backfill-embeddings.ts— CLI backfill scriptapps/web-app/app/api/embeddings/generate/route.ts— API endpoint
Phase 3 — Enhanced Context Manager:
apps/web-app/lib/conversation/enhanced-context-manager.ts— pgvector semantic searchapps/web-app/lib/conversation/compaction-service.ts— Conversation summarizationapps/web-app/lib/conversation/memory-extractor.ts— LLM-based memory extractionsupabase/migrations/20260127000000_conversation_context_enhancements.sql— DB migration
Phase 4 — Memory System:
apps/web-app/lib/conversation/memory-manager.ts— Memory lifecycle managementapps/web-app/app/api/user/memories/route.ts— Memory transparency APIapps/web-app/app/api/user/memories/[id]/route.ts— Individual memory operations
Phase 5 — Quality and Analytics:
apps/web-app/lib/ai/conversation-quality-service.ts— Quality trackingapps/web-app/lib/ai/conversation-tagging-service.ts— Enhanced with memory contextsupabase/migrations/20260127100000_quality_tracking.sql— Quality tables
Key Files to Modify/Create
Existing Files to Modify
| File | Change |
|---|---|
apps/web-app/app/api/conversation/route.ts | Integrate EnhancedContextManager |
apps/web-app/lib/ai/enhanced-server-ai-service.ts | Add memory extraction hooks |
apps/web-app/lib/conversation/context-manager.ts | Deprecate (replace with EnhancedContextManager) |
New Files to Create
| File | Purpose |
|---|---|
apps/web-app/lib/ai/embedding-service.ts | Generate embeddings via OpenAI |
apps/web-app/lib/conversation/enhanced-context-manager.ts | New context manager with memory retrieval |
apps/web-app/lib/conversation/memory-extractor.ts | LLM-based memory extraction |
apps/web-app/lib/conversation/compaction-service.ts | Conversation summary generation |
apps/web-app/lib/jobs/embedding-backfill.ts | Backfill existing content |
Cost Projections
Embedding Costs (text-embedding-3-small)
| Scale | Stories | Recordings | Memories | Queries/mo | Monthly Cost |
|---|---|---|---|---|---|
| 100 users | 5K | 2K | 10K | 10K | ~$0.50 |
| 1,000 users | 50K | 20K | 100K | 100K | ~$5 |
| 10,000 users | 500K | 200K | 1M | 1M | ~$50 |
Memory Extraction Costs (gemini-2.0-flash-lite)
| Scale | Extractions/mo | Avg Tokens | Monthly Cost |
|---|---|---|---|
| 100 users | 1K | 500 | ~$0.10 |
| 1,000 users | 10K | 500 | ~$1 |
| 10,000 users | 100K | 500 | ~$10 |
Total projected cost at 1,000 users: ~$6/month
References
External
- Anthropic’s Claude Constitution — Guidance on AI assistant behavior
- Anthropic: Effective Context Engineering for AI Agents (2025)
- Anthropic: Protecting Well-Being of Users (2025)
- Mem0 Research Paper (arXiv:2504.19413) — Memory layer architecture
- Supabase pgvector Documentation
Internal Documents
- AI Constitution — MyStoryFlow AI Constitution (north star for all AI behavior)
- Contextual Memory and RAG — Cost analysis and ADR-001
- Unified Context Architecture — Original architecture proposal
Audit Sign-Off
| Role | Name | Date |
|---|---|---|
| Audit Lead | AI System Review | January 27, 2026 |
| Architecture Review | — | Pending |
| Implementation Lead | — | Pending |
This audit document should be referenced for all AI conversation system changes. Update implementation phases as work progresses.