Contextual Memory & RAG Architecture Research
Research Date: November 2024 Status: APPROVED - Database Foundation Complete Decision: Use pgvector (Supabase built-in) + OpenAI text-embedding-3-small
Decision Summary
After evaluating multiple vector database solutions, weβve decided to use pgvector with Supabase for the following reasons:
| Factor | Decision |
|---|---|
| Vector Database | pgvector (already included in Supabase) |
| Embedding Model | text-embedding-3-small ($0.02/1M tokens) |
| Additional Cost | $0 for storage/search, ~$5/month for embeddings |
| Scale Supported | Up to 500K-1M vectors (sufficient for 10K+ users) |
| Migration Risk | Low - no new services, data stays in PostgreSQL |
Architecture Decision Record (ADR-001)
ADR-001: Vector Search Infrastructure
Status: APPROVED
Date: November 2024
Context:
StoryFlow needs semantic search across stories, recordings, and user memories
to provide contextual AI assistance during conversations and editing.
Decision:
Use pgvector (PostgreSQL extension) with OpenAI text-embedding-3-small.
Rationale:
1. Zero additional infrastructure cost (included in Supabase)
2. Data stays in PostgreSQL (no sync issues)
3. Sufficient performance for less than 1M vectors
4. Simpler architecture (one less service)
5. Easy migration path to dedicated DB if needed later
Consequences:
- Must monitor vector count and query latency
- Limited to ~1M vectors before performance degrades
- No multi-region vector search (acceptable for now)
Review Triggers:
- Vector count exceeds 800K
- P95 query latency exceeds 150ms
- Need for multi-region deploymentCurrent State Analysis
How StoryFlow Handles Context Today
Location: apps/web-app/lib/conversation/context-manager.ts
Current Flow:
User Message β Rule-based extraction β JSON summary β LLMCurrent Implementation:
- Keep last 6 messages as raw conversation history
- Regex-based extraction for:
- Family members (mother, father, etc.)
- Dates (years, βX years agoβ)
- Emotions (love, happy, sad, etc.)
- Locations (capitalized words after prepositions)
- Text summary concatenated and passed to LLM
- Max 2000 characters for context summary
Problems:
- No semantic understanding (regex misses nuanced references)
- No long-term memory across sessions
- No story content retrieval (conversations donβt reference past stories)
- Context truncation loses important details
- Every API call sends full context (inefficient)
Memory Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MEMORY SYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β SHORT-TERM β β LONG-TERM β β
β β MEMORY (STM) β β MEMORY (LTM) β β
β β β β β β
β β β’ Last 6 turns β β β’ Story embeddings β β
β β β’ Current topic β β β’ Family member graph β β
β β β’ Active emotion β β β’ User preferences β β
β β β β β’ Historical themes β β
β β [In-Memory/ β β β β
β β Redis Cache] β β [pgvector + PostgreSQL] β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β USER PROFILE (Persistent) β β
β β β β
β β β’ Writing style preferences β β
β β β’ Family tree structure β β
β β β’ Dialect/language patterns β β
β β β’ Topic interests & sensitivities β β
β β β’ Conversation patterns (talker vs. brief) β β
β β β β
β β [profiles table + JSONB enrichment] β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββImplementation Progress
Last Updated: November 29, 2024 Status: Database Foundation Complete - TypeScript Implementation Pending
Phase 1: Database Foundation - COMPLETE
Applied migrations (in order):
| Version | Name | Description |
|---|---|---|
| 20251129135540 | enable_pgvector_extension | Enabled pgvector 0.8.0 |
| 20251129135551 | add_embedding_columns | Added content_embedding to stories, transcript_embedding to recordings |
| 20251129135609 | create_user_memory_table | Created user_memory table with RLS policies |
| 20251129135622 | create_hnsw_indexes | Created HNSW indexes on all embedding columns |
| 20251129135643 | create_unified_search_function | Created search_unified_context() SQL function |
Database changes applied:
- Enable pgvector extension
- Add
content_embedding vector(1536)tostoriestable - Add
transcript_embedding vector(1536)torecordingstable - Create
user_memorytable with embedding column - Create HNSW indexes for fast similarity search
- Create
search_unified_context()function - RLS policies on
user_memorytable
Phase 2: Embedding Pipeline - PENDING
- Implement
EmbeddingServiceTypeScript class (lib/ai/embedding-service.ts) - Add embedding generation on story save (hook into save flow)
- Add embedding generation on recording transcription complete
- Create backfill script for existing stories/recordings
Phase 3: Enhanced Context Manager - PENDING
- Implement
EnhancedContextManager(lib/conversation/enhanced-context-manager.ts) - Integrate with conversation API (
/api/conversation/route.ts) - Add context retrieval to editor AI features
Phase 4: Memory System - PENDING
- Implement memory extraction from conversations (LLM-based)
- Add memory importance scoring algorithm
- Implement memory decay/cleanup job
- Add memory reference tracking
Embedding Service Implementation
// lib/ai/embedding-service.ts
import { OpenAI } from 'openai'
import { createTypedClient } from '@/lib/supabase/typed-client'
const openai = new OpenAI()
export class EmbeddingService {
private model = 'text-embedding-3-small' // 1536 dimensions, $0.00002/1K tokens
async generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: this.model,
input: text.slice(0, 8000), // Max input limit
})
return response.data[0].embedding
}
async embedStory(storyId: string): Promise<void> {
const supabase = await createTypedClient()
const { data: story } = await supabase.raw
.from('stories')
.select('id, title, content')
.eq('id', storyId)
.single()
if (!story) return
// Combine title and content for richer embedding
const textToEmbed = `${story.title}\n\n${story.content}`
const embedding = await this.generateEmbedding(textToEmbed)
await supabase.raw
.from('stories')
.update({
content_embedding: embedding,
embedding_updated_at: new Date().toISOString()
})
.eq('id', storyId)
}
async findSimilarStories(
userId: string,
query: string,
limit: number = 5
): Promise<Array<{ id: string; title: string; similarity: number }>> {
const queryEmbedding = await this.generateEmbedding(query)
const supabase = await createTypedClient()
const { data } = await supabase.raw.rpc('search_stories_hybrid', {
query_text: query,
query_embedding: queryEmbedding,
user_id: userId,
match_count: limit
})
return data || []
}
}Cost Analysis
Embedding Generation Costs
| Model | Dimensions | Cost/1M tokens | Quality |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02 | Good (Selected) |
| text-embedding-ada-002 | 1536 | $0.10 | Good |
| text-embedding-3-large | 3072 | $0.13 | Best |
Monthly Cost Projection
| Scale | Stories | Recordings | Memories | Queries | Embedding Cost |
|---|---|---|---|---|---|
| 100 users | 5K | 2K | 10K | 10K | ~$0.50 |
| 1,000 users | 50K | 20K | 100K | 100K | ~$5 |
| 10,000 users | 500K | 200K | 1M | 1M | ~$50 |
Why NOT Pinecone/Weaviate Now
StoryFlow Expected Scale (Year 1):
βββ Users: ~1,000-5,000
βββ Stories: ~50,000-250,000
βββ Recordings: ~20,000-100,000
βββ User Memories: ~100,000-500,000
βββ Total Vectors: ~170,000-850,000
pgvector Comfortable Limit: ~1,000,000 vectors
Pinecone Advantage Threshold: More than 1,000,000 vectors with less than 20ms latency requirementDecision: pgvector handles our scale with no additional cost.
Performance Optimization
pgvector Best Practices
| Configuration | Recommendation | Why |
|---|---|---|
| Index Type | HNSW | Better query performance than IVFFlat |
m parameter | 16 | Good balance for under 100K vectors |
ef_construction | 64 | Higher = better recall, slower build |
| Dimension | 1536 | OpenAI text-embedding-3-small |
Latency Expectations (Supabase)
| Dataset Size | Query Latency | Notes |
|---|---|---|
| 1K documents | ~15ms | Instant |
| 10K documents | ~25ms | Excellent |
| 100K documents | ~85ms | Good |
| 1M documents | ~250ms | May need sharding |
Comparison: Current vs. Proposed
| Aspect | Current | Proposed |
|---|---|---|
| Context retrieval | Regex patterns | Semantic similarity |
| Story references | None | Vector search |
| Cross-session memory | None | Persistent embeddings |
| User personalization | Basic JSON | Profile + memories |
| Latency | ~100ms (JSON stringify) | ~50ms (cached) + ~25ms (vector) |
| Accuracy | ~60% (regex misses) | ~90%+ (semantic) |
| Cost per query | $0 | ~$0.00002 |
Quick Start: Resuming Implementation
When youβre ready to continue, hereβs where to pick up:
1. Regenerate TypeScript Types
# Run from project root
npx supabase gen types typescript --project-id qrlygafaejovxxlnkpxa > apps/web-app/types/database.ts2. Create EmbeddingService
Create apps/web-app/lib/ai/embedding-service.ts - see implementation above.
3. Test the Search Function
-- Test with a sample embedding (1536 zeros for testing)
SELECT * FROM search_unified_context(
array_fill(0::real, ARRAY[1536])::vector(1536),
'your-user-id-here'::uuid,
true, true, true, 5, 0.5
);4. Key Files to Modify
apps/web-app/lib/ai/embedding-service.ts(create)apps/web-app/lib/conversation/enhanced-context-manager.ts(create)apps/web-app/app/api/conversation/route.ts(integrate context)apps/web-app/app/components/editor/SidebarAITab.tsx(add context-aware suggestions)
Monitoring & Alerts
Key Metrics to Track
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Vector count | 500K | 800K | Evaluate dedicated DB |
| Query P95 latency | 100ms | 200ms | Add caching |
| Embedding API errors | 1% | 5% | Check OpenAI status |
| Memory table size | 1GB | 5GB | Implement cleanup |