Contextual Memory & RAG Architecture Research

Research Date: November 2024 Status: APPROVED - Database Foundation Complete Decision: Use pgvector (Supabase built-in) + OpenAI text-embedding-3-small

Decision Summary

After evaluating multiple vector database solutions, we’ve decided to use pgvector with Supabase for the following reasons:

Factor	Decision
Vector Database	pgvector (already included in Supabase)
Embedding Model	text-embedding-3-small ($0.02/1M tokens)
Additional Cost	$0 for storage/search, ~$5/month for embeddings
Scale Supported	Up to 500K-1M vectors (sufficient for 10K+ users)
Migration Risk	Low - no new services, data stays in PostgreSQL

Architecture Decision Record (ADR-001)


ADR-001: Vector Search Infrastructure

Status: APPROVED
Date: November 2024

Context:
StoryFlow needs semantic search across stories, recordings, and user memories
to provide contextual AI assistance during conversations and editing.

Decision:
Use pgvector (PostgreSQL extension) with OpenAI text-embedding-3-small.

Rationale:
1. Zero additional infrastructure cost (included in Supabase)
2. Data stays in PostgreSQL (no sync issues)
3. Sufficient performance for less than 1M vectors
4. Simpler architecture (one less service)
5. Easy migration path to dedicated DB if needed later

Consequences:
- Must monitor vector count and query latency
- Limited to ~1M vectors before performance degrades
- No multi-region vector search (acceptable for now)

Review Triggers:
- Vector count exceeds 800K
- P95 query latency exceeds 150ms
- Need for multi-region deployment

Current State Analysis

How StoryFlow Handles Context Today

Location: apps/web-app/lib/conversation/context-manager.ts


Current Flow:
User Message → Rule-based extraction → JSON summary → LLM

Current Implementation:

Keep last 6 messages as raw conversation history
Regex-based extraction for:
- Family members (mother, father, etc.)
- Dates (years, “X years ago”)
- Emotions (love, happy, sad, etc.)
- Locations (capitalized words after prepositions)
Text summary concatenated and passed to LLM
Max 2000 characters for context summary

Problems:

No semantic understanding (regex misses nuanced references)
No long-term memory across sessions
No story content retrieval (conversations don’t reference past stories)
Context truncation loses important details
Every API call sends full context (inefficient)

Memory Architecture


┌─────────────────────────────────────────────────────────────┐
│                    MEMORY SYSTEM                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────┐    ┌──────────────────────────────┐  │
│  │   SHORT-TERM     │    │       LONG-TERM              │  │
│  │   MEMORY (STM)   │    │       MEMORY (LTM)           │  │
│  │                  │    │                              │  │
│  │ • Last 6 turns   │    │ • Story embeddings          │  │
│  │ • Current topic  │    │ • Family member graph       │  │
│  │ • Active emotion │    │ • User preferences          │  │
│  │                  │    │ • Historical themes         │  │
│  │ [In-Memory/      │    │                              │  │
│  │  Redis Cache]    │    │ [pgvector + PostgreSQL]     │  │
│  └──────────────────┘    └──────────────────────────────┘  │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              USER PROFILE (Persistent)               │   │
│  │                                                      │   │
│  │ • Writing style preferences                          │   │
│  │ • Family tree structure                              │   │
│  │ • Dialect/language patterns                          │   │
│  │ • Topic interests & sensitivities                    │   │
│  │ • Conversation patterns (talker vs. brief)           │   │
│  │                                                      │   │
│  │ [profiles table + JSONB enrichment]                  │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Implementation Progress

Last Updated: November 29, 2024 Status: Database Foundation Complete - TypeScript Implementation Pending

Phase 1: Database Foundation - COMPLETE

Applied migrations (in order):

Version	Name	Description
20251129135540	`enable_pgvector_extension`	Enabled pgvector 0.8.0
20251129135551	`add_embedding_columns`	Added `content_embedding` to stories, `transcript_embedding` to recordings
20251129135609	`create_user_memory_table`	Created `user_memory` table with RLS policies
20251129135622	`create_hnsw_indexes`	Created HNSW indexes on all embedding columns
20251129135643	`create_unified_search_function`	Created `search_unified_context()` SQL function

Database changes applied:

Enable pgvector extension
Add content_embedding vector(1536) to stories table
Add transcript_embedding vector(1536) to recordings table
Create user_memory table with embedding column
Create HNSW indexes for fast similarity search
Create search_unified_context() function
RLS policies on user_memory table

Phase 2: Embedding Pipeline - PENDING

Implement EmbeddingService TypeScript class (lib/ai/embedding-service.ts)
Add embedding generation on story save (hook into save flow)
Add embedding generation on recording transcription complete
Create backfill script for existing stories/recordings

Phase 3: Enhanced Context Manager - PENDING

Implement EnhancedContextManager (lib/conversation/enhanced-context-manager.ts)
Integrate with conversation API (/api/conversation/route.ts)
Add context retrieval to editor AI features

Phase 4: Memory System - PENDING

Implement memory extraction from conversations (LLM-based)
Add memory importance scoring algorithm
Implement memory decay/cleanup job
Add memory reference tracking

Embedding Service Implementation


// lib/ai/embedding-service.ts
import { OpenAI } from 'openai'
import { createTypedClient } from '@/lib/supabase/typed-client'
 
const openai = new OpenAI()
 
export class EmbeddingService {
  private model = 'text-embedding-3-small' // 1536 dimensions, $0.00002/1K tokens
 
  async generateEmbedding(text: string): Promise<number[]> {
    const response = await openai.embeddings.create({
      model: this.model,
      input: text.slice(0, 8000), // Max input limit
    })
    return response.data[0].embedding
  }
 
  async embedStory(storyId: string): Promise<void> {
    const supabase = await createTypedClient()
 
    const { data: story } = await supabase.raw
      .from('stories')
      .select('id, title, content')
      .eq('id', storyId)
      .single()
 
    if (!story) return
 
    // Combine title and content for richer embedding
    const textToEmbed = `${story.title}\n\n${story.content}`
    const embedding = await this.generateEmbedding(textToEmbed)
 
    await supabase.raw
      .from('stories')
      .update({
        content_embedding: embedding,
        embedding_updated_at: new Date().toISOString()
      })
      .eq('id', storyId)
  }
 
  async findSimilarStories(
    userId: string,
    query: string,
    limit: number = 5
  ): Promise<Array<{ id: string; title: string; similarity: number }>> {
    const queryEmbedding = await this.generateEmbedding(query)
    const supabase = await createTypedClient()
 
    const { data } = await supabase.raw.rpc('search_stories_hybrid', {
      query_text: query,
      query_embedding: queryEmbedding,
      user_id: userId,
      match_count: limit
    })
 
    return data || []
  }
}

Cost Analysis

Embedding Generation Costs

Model	Dimensions	Cost/1M tokens	Quality
text-embedding-3-small	1536	$0.02	Good (Selected)
text-embedding-ada-002	1536	$0.10	Good
text-embedding-3-large	3072	$0.13	Best

Monthly Cost Projection

Scale	Stories	Recordings	Memories	Queries	Embedding Cost
100 users	5K	2K	10K	10K	~$0.50
1,000 users	50K	20K	100K	100K	~$5
10,000 users	500K	200K	1M	1M	~$50

Why NOT Pinecone/Weaviate Now


StoryFlow Expected Scale (Year 1):
├── Users: ~1,000-5,000
├── Stories: ~50,000-250,000
├── Recordings: ~20,000-100,000
├── User Memories: ~100,000-500,000
└── Total Vectors: ~170,000-850,000

pgvector Comfortable Limit: ~1,000,000 vectors
Pinecone Advantage Threshold: More than 1,000,000 vectors with less than 20ms latency requirement

Decision: pgvector handles our scale with no additional cost.

Performance Optimization

pgvector Best Practices

Configuration	Recommendation	Why
Index Type	HNSW	Better query performance than IVFFlat
`m` parameter	16	Good balance for under 100K vectors
`ef_construction`	64	Higher = better recall, slower build
Dimension	1536	OpenAI text-embedding-3-small

Latency Expectations (Supabase)

Dataset Size	Query Latency	Notes
1K documents	~15ms	Instant
10K documents	~25ms	Excellent
100K documents	~85ms	Good
1M documents	~250ms	May need sharding

Comparison: Current vs. Proposed

Aspect	Current	Proposed
Context retrieval	Regex patterns	Semantic similarity
Story references	None	Vector search
Cross-session memory	None	Persistent embeddings
User personalization	Basic JSON	Profile + memories
Latency	~100ms (JSON stringify)	~50ms (cached) + ~25ms (vector)
Accuracy	~60% (regex misses)	~90%+ (semantic)
Cost per query	$0	~$0.00002

Quick Start: Resuming Implementation

When you’re ready to continue, here’s where to pick up:

1. Regenerate TypeScript Types


# Run from project root
npx supabase gen types typescript --project-id qrlygafaejovxxlnkpxa > apps/web-app/types/database.ts

2. Create EmbeddingService

Create apps/web-app/lib/ai/embedding-service.ts - see implementation above.

3. Test the Search Function


-- Test with a sample embedding (1536 zeros for testing)
SELECT * FROM search_unified_context(
  array_fill(0::real, ARRAY[1536])::vector(1536),
  'your-user-id-here'::uuid,
  true, true, true, 5, 0.5
);

4. Key Files to Modify

apps/web-app/lib/ai/embedding-service.ts (create)
apps/web-app/lib/conversation/enhanced-context-manager.ts (create)
apps/web-app/app/api/conversation/route.ts (integrate context)
apps/web-app/app/components/editor/SidebarAITab.tsx (add context-aware suggestions)

Monitoring & Alerts

Key Metrics to Track

Metric	Warning	Critical	Action
Vector count	500K	800K	Evaluate dedicated DB
Query P95 latency	100ms	200ms	Add caching
Embedding API errors	1%	5%	Check OpenAI status
Memory table size	1GB	5GB	Implement cleanup