Skip to Content
📚 MyStoryFlow Docs — Your guide to preserving family stories

F002 - Analyzer Database Schema

Objective

Create a separate analyzer schema within the existing MyStoryFlow Supabase database to support manuscript analysis, genre templates, and usage tracking. This approach provides logical separation while maintaining unified authentication and user management across all MyStoryFlow apps.

Requirements

Functional Requirements

  • Store manuscripts with metadata (genre, word count, analysis status)
  • Support multiple file formats (PDF, DOCX, TXT, Google Docs)
  • Track analysis sessions and progress
  • Store detailed analysis results and scores
  • Support manuscript versioning and revision tracking
  • Implement usage tracking for billing integration
  • Maintain separation between core NextSaaS and analyzer features

Technical Requirements

  • Maintain Row Level Security (RLS) for data isolation
  • Support vector embeddings for content similarity
  • Efficient indexing for large manuscript content
  • Integrate with existing auth.users and public schemas
  • Track AI usage for admin-app monitoring
  • GDPR-compliant data export and deletion

Database Schema Architecture

Schema Creation

-- Create analyzer schema within MyStoryFlow database CREATE SCHEMA IF NOT EXISTS analyzer; -- Grant permissions to Supabase roles GRANT USAGE ON SCHEMA analyzer TO anon, authenticated, service_role; GRANT ALL ON ALL TABLES IN SCHEMA analyzer TO authenticated; GRANT SELECT ON ALL TABLES IN SCHEMA analyzer TO anon; -- Enable RLS by default on all analyzer tables ALTER DEFAULT PRIVILEGES IN SCHEMA analyzer GRANT ALL ON TABLES TO authenticated; -- Create cross-schema helper functions CREATE OR REPLACE FUNCTION analyzer.get_user_id() RETURNS UUID AS $$ SELECT auth.uid() $$ LANGUAGE SQL SECURITY DEFINER; CREATE OR REPLACE FUNCTION analyzer.check_user_subscription(user_id UUID) RETURNS BOOLEAN AS $$ SELECT EXISTS ( SELECT 1 FROM public.subscriptions WHERE user_id = $1 AND status = 'active' AND expires_at > NOW() ) $$ LANGUAGE SQL SECURITY DEFINER;

Database Changes

1. Core Manuscripts Table

-- Core Manuscripts Table CREATE TABLE analyzer.manuscripts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, title VARCHAR(500) NOT NULL, file_url TEXT NOT NULL, file_name VARCHAR(255) NOT NULL, file_size BIGINT NOT NULL, file_type VARCHAR(50) NOT NULL, -- pdf, docx, txt, google_docs word_count INTEGER, page_count INTEGER, genre VARCHAR(100), genre_confidence DECIMAL(3,2), language VARCHAR(10) DEFAULT 'en', upload_status VARCHAR(50) DEFAULT 'processing', -- processing, completed, failed content_extracted BOOLEAN DEFAULT FALSE, analysis_status VARCHAR(50) DEFAULT 'pending', -- pending, processing, completed, failed analysis_started_at TIMESTAMPTZ, analysis_completed_at TIMESTAMPTZ, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ );

2. Analysis Results Table

-- Analysis Results Table CREATE TABLE analyzer.manuscript_analyses ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE, analysis_version INTEGER DEFAULT 1, genre_detected VARCHAR(100), genre_confidence DECIMAL(3,2), overall_score INTEGER CHECK (overall_score >= 0 AND overall_score <= 100), structure_score INTEGER CHECK (structure_score >= 0 AND structure_score <= 100), character_score INTEGER CHECK (character_score >= 0 AND character_score <= 100), plot_score INTEGER CHECK (plot_score >= 0 AND plot_score <= 100), writing_quality_score INTEGER CHECK (writing_quality_score >= 0 AND writing_quality_score <= 100), pacing_score INTEGER CHECK (pacing_score >= 0 AND pacing_score <= 100), dialogue_score INTEGER CHECK (dialogue_score >= 0 AND dialogue_score <= 100), market_readiness_score INTEGER CHECK (market_readiness_score >= 0 AND market_readiness_score <= 100), detailed_feedback JSONB DEFAULT '{}', improvement_suggestions JSONB DEFAULT '[]', strengths JSONB DEFAULT '[]', weaknesses JSONB DEFAULT '[]', market_insights JSONB DEFAULT '{}', publishing_recommendations JSONB DEFAULT '{}', report_url TEXT, processing_time_seconds INTEGER, ai_model_version VARCHAR(50), created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

3. Manuscript Chunks for AI Processing

-- Manuscript Chunks for AI Processing CREATE TABLE analyzer.manuscript_chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE, chunk_number INTEGER NOT NULL, content TEXT NOT NULL, content_type VARCHAR(50), -- chapter, section, paragraph word_count INTEGER, character_count INTEGER, vector_embedding VECTOR(1536), -- OpenAI embedding dimensions processed BOOLEAN DEFAULT FALSE, created_at TIMESTAMPTZ DEFAULT NOW() );

4. Analysis Sessions for Progress Tracking

-- Analysis Sessions for Progress Tracking CREATE TABLE analyzer.analysis_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE, session_type VARCHAR(50) NOT NULL, -- full_analysis, quick_review, revision_check status VARCHAR(50) DEFAULT 'queued', -- queued, processing, completed, failed, cancelled progress_percentage INTEGER DEFAULT 0 CHECK (progress_percentage >= 0 AND progress_percentage <= 100), current_step VARCHAR(100), steps_completed JSONB DEFAULT '[]', estimated_completion_time TIMESTAMPTZ, error_message TEXT, retry_count INTEGER DEFAULT 0, priority INTEGER DEFAULT 5, -- 1-10, higher = more priority started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

5. Manuscript Versions for Revision Tracking

-- Manuscript Versions for Revision Tracking CREATE TABLE analyzer.manuscript_versions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE, version_number INTEGER NOT NULL, version_name VARCHAR(255), file_url TEXT NOT NULL, word_count INTEGER, changes_summary TEXT, comparison_data JSONB DEFAULT '{}', analysis_id UUID REFERENCES analyzer.manuscript_analyses(id), created_by UUID NOT NULL REFERENCES auth.users(id), created_at TIMESTAMPTZ DEFAULT NOW() );

6. Genre Templates System

-- Genre Templates and Criteria CREATE TABLE analyzer.genre_templates ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), genre_name VARCHAR(100) NOT NULL UNIQUE, genre_slug VARCHAR(100) NOT NULL UNIQUE, parent_genre VARCHAR(100), description TEXT, analysis_criteria JSONB NOT NULL, scoring_weights JSONB NOT NULL, market_insights JSONB DEFAULT '{}', example_manuscripts JSONB DEFAULT '[]', is_active BOOLEAN DEFAULT TRUE, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

7. Usage Tracking for Billing

-- Usage Tracking for Billing CREATE TABLE analyzer.manuscript_usage ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, manuscript_id UUID REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE, usage_type VARCHAR(50) NOT NULL, -- analysis, export, api_call word_count INTEGER, credits_used DECIMAL(10,2), plan_tier VARCHAR(50), created_at TIMESTAMPTZ DEFAULT NOW() ); -- AI Usage Tracking (for admin-app monitoring) CREATE TABLE analyzer.ai_usage ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, manuscript_id UUID REFERENCES analyzer.manuscripts(id), feature_name VARCHAR(100) NOT NULL, -- genre-detection, full-analysis, etc model_provider VARCHAR(50) NOT NULL, -- openai, anthropic, gemini, xai model_name VARCHAR(100) NOT NULL, tokens_used INTEGER NOT NULL, cost_usd DECIMAL(10,4), response_time_ms INTEGER, success BOOLEAN DEFAULT TRUE, error_message TEXT, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW() );

8. Reference Data System Tables

-- Reference Data Collections CREATE TABLE analyzer.reference_collections ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), collection_name VARCHAR(100) NOT NULL UNIQUE, version VARCHAR(20) NOT NULL DEFAULT '1.0.0', description TEXT, is_active BOOLEAN DEFAULT TRUE, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); -- Reference Data Items CREATE TABLE analyzer.reference_data_items ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), collection_id UUID NOT NULL REFERENCES analyzer.reference_collections(id) ON DELETE CASCADE, code VARCHAR(100) NOT NULL, label VARCHAR(255) NOT NULL, description TEXT, category VARCHAR(100), sort_order INTEGER DEFAULT 0, is_active BOOLEAN DEFAULT TRUE, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE(collection_id, code) );

Indexes for Performance

-- Manuscript indexes CREATE INDEX idx_manuscripts_user_id ON analyzer.manuscripts(user_id); CREATE INDEX idx_manuscripts_genre ON analyzer.manuscripts(genre); CREATE INDEX idx_manuscripts_analysis_status ON analyzer.manuscripts(analysis_status); CREATE INDEX idx_manuscripts_created_at ON analyzer.manuscripts(created_at); -- Analysis indexes CREATE INDEX idx_manuscript_analyses_manuscript_id ON analyzer.manuscript_analyses(manuscript_id); CREATE INDEX idx_manuscript_analyses_overall_score ON analyzer.manuscript_analyses(overall_score); CREATE INDEX idx_manuscript_analyses_created_at ON analyzer.manuscript_analyses(created_at); -- Chunk indexes for vector search CREATE INDEX idx_manuscript_chunks_manuscript_id ON analyzer.manuscript_chunks(manuscript_id); CREATE INDEX idx_manuscript_chunks_vector_embedding ON analyzer.manuscript_chunks USING ivfflat (vector_embedding vector_cosine_ops); -- Session tracking indexes CREATE INDEX idx_analysis_sessions_manuscript_id ON analyzer.analysis_sessions(manuscript_id); CREATE INDEX idx_analysis_sessions_status ON analyzer.analysis_sessions(status); CREATE INDEX idx_analysis_sessions_priority ON analyzer.analysis_sessions(priority DESC); -- Usage tracking indexes -- CREATE INDEX idx_manuscript_usage_organization_id ON analyzer.manuscript_usage(organization_id); -- Removed: No organization_id in user-centric model CREATE INDEX idx_manuscript_usage_user_id ON analyzer.manuscript_usage(user_id); CREATE INDEX idx_manuscript_usage_created_at ON analyzer.manuscript_usage(created_at); -- Reference data indexes CREATE INDEX idx_reference_collections_name ON analyzer.reference_collections(collection_name); CREATE INDEX idx_reference_collections_active ON analyzer.reference_collections(is_active); CREATE INDEX idx_reference_data_items_collection ON analyzer.reference_data_items(collection_id); CREATE INDEX idx_reference_data_items_code ON analyzer.reference_data_items(collection_id, code); CREATE INDEX idx_reference_data_items_active ON analyzer.reference_data_items(is_active); CREATE INDEX idx_reference_data_items_sort ON analyzer.reference_data_items(collection_id, sort_order);

Row Level Security (RLS) Policies

-- Enable RLS on all tables ALTER TABLE analyzer.manuscripts ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.manuscript_analyses ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.manuscript_chunks ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.analysis_sessions ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.manuscript_versions ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.genre_templates ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.manuscript_usage ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.ai_usage ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.reference_collections ENABLE ROW LEVEL SECURITY; ALTER TABLE analyzer.reference_data_items ENABLE ROW LEVEL SECURITY; -- Manuscript access policies CREATE POLICY "Users can access their own manuscripts" ON analyzer.manuscripts FOR ALL USING (user_id = auth.uid()); -- Analysis access policies CREATE POLICY "Users can access analyses for their manuscripts" ON analyzer.manuscript_analyses FOR ALL USING ( EXISTS ( SELECT 1 FROM analyzer.manuscripts m WHERE m.id = manuscript_analyses.manuscript_id AND m.user_id = auth.uid() ) ); -- AI Usage access policies CREATE POLICY "Users can view their AI usage" ON analyzer.ai_usage FOR SELECT USING (user_id = auth.uid()); CREATE POLICY "System can insert AI usage" ON analyzer.ai_usage FOR INSERT WITH CHECK (true); -- Usage tracking policies CREATE POLICY "Users can view their usage" ON analyzer.manuscript_usage FOR SELECT USING (user_id = auth.uid()); -- Reference data policies (public read access) CREATE POLICY "Everyone can read active reference collections" ON analyzer.reference_collections FOR SELECT USING (is_active = true); CREATE POLICY "Everyone can read active reference data items" ON analyzer.reference_data_items FOR SELECT USING (is_active = true); CREATE POLICY "Admins can manage reference data" ON analyzer.reference_collections FOR ALL USING ( EXISTS ( SELECT 1 FROM auth.users WHERE id = auth.uid() AND raw_user_meta_data->>'role' = 'admin' ) ); CREATE POLICY "Admins can manage reference data items" ON analyzer.reference_data_items FOR ALL USING ( EXISTS ( SELECT 1 FROM auth.users WHERE id = auth.uid() AND raw_user_meta_data->>'role' = 'admin' ) ); -- Admin access for all tables CREATE POLICY "Admins have full access" ON analyzer.manuscripts FOR ALL USING ( EXISTS ( SELECT 1 FROM users WHERE id = auth.uid() AND metadata->>'role' = 'admin' ) );

API Endpoints (Future)

This schema will support future API endpoints for:

  • /api/manuscripts - CRUD operations
  • /api/analysis - Analysis management
  • /api/usage - Billing and limits

Testing Requirements

Unit Tests

  • Test all table constraints and validations
  • Verify RLS policies block unauthorized access
  • Test all indexes improve query performance

Integration Tests

  • Test manuscript upload and analysis workflow
  • Test usage tracking and billing calculations
  • Verify reference data loading

E2E Tests

  • Complete author journey from upload to analysis
  • Admin manuscript management and analytics
  • Multi-tenant data isolation verification

Acceptance Criteria

Must Have

  • All tables created with proper constraints
  • RLS policies implemented for data security
  • Indexes created for performance
  • Vector extension enabled for embeddings
  • Migration scripts tested in development

Should Have

  • Audit triggers for all manuscript operations
  • Soft delete functionality for manuscripts
  • Data export functionality for GDPR
  • Performance benchmarks documented

Could Have

  • Automated backup policies
  • Data archiving for old manuscripts
  • Advanced analytics views
  • Database monitoring alerts

Dependencies

  • F001-PROJECT-SETUP must be completed first
  • Requires Supabase vector extension enabled
  • Requires existing NextSaaS auth and organization tables

Estimated Effort

  • Development: 3 days
  • Testing: 2 days
  • Documentation: 1 day
  • Total: 6 days

Implementation Notes

Migration Strategy

  1. Create new tables in development environment
  2. Test RLS policies with sample data
  3. Create migration scripts for production
  4. Implement rollback procedures
  5. Monitor performance after deployment

Performance Considerations

  • Vector embeddings require significant storage
  • Index large text content carefully
  • Consider partitioning for high-volume usage tracking

Next Feature

After completion, proceed to F003-FILE-STORAGE to configure manuscript file storage and processing.