Skip to Content
πŸ“š MyStoryFlow Docs β€” Your guide to preserving family stories

Contextual Memory & RAG Architecture Research

Research Date: November 2024 Status: APPROVED - Database Foundation Complete Decision: Use pgvector (Supabase built-in) + OpenAI text-embedding-3-small


Decision Summary

After evaluating multiple vector database solutions, we’ve decided to use pgvector with Supabase for the following reasons:

FactorDecision
Vector Databasepgvector (already included in Supabase)
Embedding Modeltext-embedding-3-small ($0.02/1M tokens)
Additional Cost$0 for storage/search, ~$5/month for embeddings
Scale SupportedUp to 500K-1M vectors (sufficient for 10K+ users)
Migration RiskLow - no new services, data stays in PostgreSQL

Architecture Decision Record (ADR-001)

ADR-001: Vector Search Infrastructure Status: APPROVED Date: November 2024 Context: StoryFlow needs semantic search across stories, recordings, and user memories to provide contextual AI assistance during conversations and editing. Decision: Use pgvector (PostgreSQL extension) with OpenAI text-embedding-3-small. Rationale: 1. Zero additional infrastructure cost (included in Supabase) 2. Data stays in PostgreSQL (no sync issues) 3. Sufficient performance for less than 1M vectors 4. Simpler architecture (one less service) 5. Easy migration path to dedicated DB if needed later Consequences: - Must monitor vector count and query latency - Limited to ~1M vectors before performance degrades - No multi-region vector search (acceptable for now) Review Triggers: - Vector count exceeds 800K - P95 query latency exceeds 150ms - Need for multi-region deployment

Current State Analysis

How StoryFlow Handles Context Today

Location: apps/web-app/lib/conversation/context-manager.ts

Current Flow: User Message β†’ Rule-based extraction β†’ JSON summary β†’ LLM

Current Implementation:

  1. Keep last 6 messages as raw conversation history
  2. Regex-based extraction for:
    • Family members (mother, father, etc.)
    • Dates (years, β€œX years ago”)
    • Emotions (love, happy, sad, etc.)
    • Locations (capitalized words after prepositions)
  3. Text summary concatenated and passed to LLM
  4. Max 2000 characters for context summary

Problems:

  • No semantic understanding (regex misses nuanced references)
  • No long-term memory across sessions
  • No story content retrieval (conversations don’t reference past stories)
  • Context truncation loses important details
  • Every API call sends full context (inefficient)

Memory Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MEMORY SYSTEM β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ SHORT-TERM β”‚ β”‚ LONG-TERM β”‚ β”‚ β”‚ β”‚ MEMORY (STM) β”‚ β”‚ MEMORY (LTM) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Last 6 turns β”‚ β”‚ β€’ Story embeddings β”‚ β”‚ β”‚ β”‚ β€’ Current topic β”‚ β”‚ β€’ Family member graph β”‚ β”‚ β”‚ β”‚ β€’ Active emotion β”‚ β”‚ β€’ User preferences β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Historical themes β”‚ β”‚ β”‚ β”‚ [In-Memory/ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Redis Cache] β”‚ β”‚ [pgvector + PostgreSQL] β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ USER PROFILE (Persistent) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Writing style preferences β”‚ β”‚ β”‚ β”‚ β€’ Family tree structure β”‚ β”‚ β”‚ β”‚ β€’ Dialect/language patterns β”‚ β”‚ β”‚ β”‚ β€’ Topic interests & sensitivities β”‚ β”‚ β”‚ β”‚ β€’ Conversation patterns (talker vs. brief) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ [profiles table + JSONB enrichment] β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Progress

Last Updated: November 29, 2024 Status: Database Foundation Complete - TypeScript Implementation Pending

Phase 1: Database Foundation - COMPLETE

Applied migrations (in order):

VersionNameDescription
20251129135540enable_pgvector_extensionEnabled pgvector 0.8.0
20251129135551add_embedding_columnsAdded content_embedding to stories, transcript_embedding to recordings
20251129135609create_user_memory_tableCreated user_memory table with RLS policies
20251129135622create_hnsw_indexesCreated HNSW indexes on all embedding columns
20251129135643create_unified_search_functionCreated search_unified_context() SQL function

Database changes applied:

  • Enable pgvector extension
  • Add content_embedding vector(1536) to stories table
  • Add transcript_embedding vector(1536) to recordings table
  • Create user_memory table with embedding column
  • Create HNSW indexes for fast similarity search
  • Create search_unified_context() function
  • RLS policies on user_memory table

Phase 2: Embedding Pipeline - PENDING

  • Implement EmbeddingService TypeScript class (lib/ai/embedding-service.ts)
  • Add embedding generation on story save (hook into save flow)
  • Add embedding generation on recording transcription complete
  • Create backfill script for existing stories/recordings

Phase 3: Enhanced Context Manager - PENDING

  • Implement EnhancedContextManager (lib/conversation/enhanced-context-manager.ts)
  • Integrate with conversation API (/api/conversation/route.ts)
  • Add context retrieval to editor AI features

Phase 4: Memory System - PENDING

  • Implement memory extraction from conversations (LLM-based)
  • Add memory importance scoring algorithm
  • Implement memory decay/cleanup job
  • Add memory reference tracking

Embedding Service Implementation

// lib/ai/embedding-service.ts import { OpenAI } from 'openai' import { createTypedClient } from '@/lib/supabase/typed-client' const openai = new OpenAI() export class EmbeddingService { private model = 'text-embedding-3-small' // 1536 dimensions, $0.00002/1K tokens async generateEmbedding(text: string): Promise<number[]> { const response = await openai.embeddings.create({ model: this.model, input: text.slice(0, 8000), // Max input limit }) return response.data[0].embedding } async embedStory(storyId: string): Promise<void> { const supabase = await createTypedClient() const { data: story } = await supabase.raw .from('stories') .select('id, title, content') .eq('id', storyId) .single() if (!story) return // Combine title and content for richer embedding const textToEmbed = `${story.title}\n\n${story.content}` const embedding = await this.generateEmbedding(textToEmbed) await supabase.raw .from('stories') .update({ content_embedding: embedding, embedding_updated_at: new Date().toISOString() }) .eq('id', storyId) } async findSimilarStories( userId: string, query: string, limit: number = 5 ): Promise<Array<{ id: string; title: string; similarity: number }>> { const queryEmbedding = await this.generateEmbedding(query) const supabase = await createTypedClient() const { data } = await supabase.raw.rpc('search_stories_hybrid', { query_text: query, query_embedding: queryEmbedding, user_id: userId, match_count: limit }) return data || [] } }

Cost Analysis

Embedding Generation Costs

ModelDimensionsCost/1M tokensQuality
text-embedding-3-small1536$0.02Good (Selected)
text-embedding-ada-0021536$0.10Good
text-embedding-3-large3072$0.13Best

Monthly Cost Projection

ScaleStoriesRecordingsMemoriesQueriesEmbedding Cost
100 users5K2K10K10K~$0.50
1,000 users50K20K100K100K~$5
10,000 users500K200K1M1M~$50

Why NOT Pinecone/Weaviate Now

StoryFlow Expected Scale (Year 1): β”œβ”€β”€ Users: ~1,000-5,000 β”œβ”€β”€ Stories: ~50,000-250,000 β”œβ”€β”€ Recordings: ~20,000-100,000 β”œβ”€β”€ User Memories: ~100,000-500,000 └── Total Vectors: ~170,000-850,000 pgvector Comfortable Limit: ~1,000,000 vectors Pinecone Advantage Threshold: More than 1,000,000 vectors with less than 20ms latency requirement

Decision: pgvector handles our scale with no additional cost.


Performance Optimization

pgvector Best Practices

ConfigurationRecommendationWhy
Index TypeHNSWBetter query performance than IVFFlat
m parameter16Good balance for under 100K vectors
ef_construction64Higher = better recall, slower build
Dimension1536OpenAI text-embedding-3-small

Latency Expectations (Supabase)

Dataset SizeQuery LatencyNotes
1K documents~15msInstant
10K documents~25msExcellent
100K documents~85msGood
1M documents~250msMay need sharding

Comparison: Current vs. Proposed

AspectCurrentProposed
Context retrievalRegex patternsSemantic similarity
Story referencesNoneVector search
Cross-session memoryNonePersistent embeddings
User personalizationBasic JSONProfile + memories
Latency~100ms (JSON stringify)~50ms (cached) + ~25ms (vector)
Accuracy~60% (regex misses)~90%+ (semantic)
Cost per query$0~$0.00002

Quick Start: Resuming Implementation

When you’re ready to continue, here’s where to pick up:

1. Regenerate TypeScript Types

# Run from project root npx supabase gen types typescript --project-id qrlygafaejovxxlnkpxa > apps/web-app/types/database.ts

2. Create EmbeddingService

Create apps/web-app/lib/ai/embedding-service.ts - see implementation above.

3. Test the Search Function

-- Test with a sample embedding (1536 zeros for testing) SELECT * FROM search_unified_context( array_fill(0::real, ARRAY[1536])::vector(1536), 'your-user-id-here'::uuid, true, true, true, 5, 0.5 );

4. Key Files to Modify

  • apps/web-app/lib/ai/embedding-service.ts (create)
  • apps/web-app/lib/conversation/enhanced-context-manager.ts (create)
  • apps/web-app/app/api/conversation/route.ts (integrate context)
  • apps/web-app/app/components/editor/SidebarAITab.tsx (add context-aware suggestions)

Monitoring & Alerts

Key Metrics to Track

MetricWarningCriticalAction
Vector count500K800KEvaluate dedicated DB
Query P95 latency100ms200msAdd caching
Embedding API errors1%5%Check OpenAI status
Memory table size1GB5GBImplement cleanup

References