F002 - Analyzer Database Schema

Objective

Create a separate analyzer schema within the existing MyStoryFlow Supabase database to support manuscript analysis, genre templates, and usage tracking. This approach provides logical separation while maintaining unified authentication and user management across all MyStoryFlow apps.

Requirements

Functional Requirements

Store manuscripts with metadata (genre, word count, analysis status)
Support multiple file formats (PDF, DOCX, TXT, Google Docs)
Track analysis sessions and progress
Store detailed analysis results and scores
Support manuscript versioning and revision tracking
Implement usage tracking for billing integration
Maintain separation between core NextSaaS and analyzer features

Technical Requirements

Maintain Row Level Security (RLS) for data isolation
Support vector embeddings for content similarity
Efficient indexing for large manuscript content
Integrate with existing auth.users and public schemas
Track AI usage for admin-app monitoring
GDPR-compliant data export and deletion

Database Schema Architecture

Schema Creation


-- Create analyzer schema within MyStoryFlow database
CREATE SCHEMA IF NOT EXISTS analyzer;
 
-- Grant permissions to Supabase roles
GRANT USAGE ON SCHEMA analyzer TO anon, authenticated, service_role;
GRANT ALL ON ALL TABLES IN SCHEMA analyzer TO authenticated;
GRANT SELECT ON ALL TABLES IN SCHEMA analyzer TO anon;
 
-- Enable RLS by default on all analyzer tables
ALTER DEFAULT PRIVILEGES IN SCHEMA analyzer
  GRANT ALL ON TABLES TO authenticated;
 
-- Create cross-schema helper functions
CREATE OR REPLACE FUNCTION analyzer.get_user_id()
RETURNS UUID AS $$
  SELECT auth.uid()
$$ LANGUAGE SQL SECURITY DEFINER;
 
CREATE OR REPLACE FUNCTION analyzer.check_user_subscription(user_id UUID)
RETURNS BOOLEAN AS $$
  SELECT EXISTS (
    SELECT 1 FROM public.subscriptions
    WHERE user_id = $1
    AND status = 'active'
    AND expires_at > NOW()
  )
$$ LANGUAGE SQL SECURITY DEFINER;

Database Changes

1. Core Manuscripts Table


-- Core Manuscripts Table
CREATE TABLE analyzer.manuscripts (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  title VARCHAR(500) NOT NULL,
  file_url TEXT NOT NULL,
  file_name VARCHAR(255) NOT NULL,
  file_size BIGINT NOT NULL,
  file_type VARCHAR(50) NOT NULL, -- pdf, docx, txt, google_docs
  word_count INTEGER,
  page_count INTEGER,
  genre VARCHAR(100),
  genre_confidence DECIMAL(3,2),
  language VARCHAR(10) DEFAULT 'en',
  upload_status VARCHAR(50) DEFAULT 'processing', -- processing, completed, failed
  content_extracted BOOLEAN DEFAULT FALSE,
  analysis_status VARCHAR(50) DEFAULT 'pending', -- pending, processing, completed, failed
  analysis_started_at TIMESTAMPTZ,
  analysis_completed_at TIMESTAMPTZ,
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW(),
  deleted_at TIMESTAMPTZ
);

2. Analysis Results Table


-- Analysis Results Table
CREATE TABLE analyzer.manuscript_analyses (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
  analysis_version INTEGER DEFAULT 1,
  genre_detected VARCHAR(100),
  genre_confidence DECIMAL(3,2),
  overall_score INTEGER CHECK (overall_score >= 0 AND overall_score <= 100),
  structure_score INTEGER CHECK (structure_score >= 0 AND structure_score <= 100),
  character_score INTEGER CHECK (character_score >= 0 AND character_score <= 100),
  plot_score INTEGER CHECK (plot_score >= 0 AND plot_score <= 100),
  writing_quality_score INTEGER CHECK (writing_quality_score >= 0 AND writing_quality_score <= 100),
  pacing_score INTEGER CHECK (pacing_score >= 0 AND pacing_score <= 100),
  dialogue_score INTEGER CHECK (dialogue_score >= 0 AND dialogue_score <= 100),
  market_readiness_score INTEGER CHECK (market_readiness_score >= 0 AND market_readiness_score <= 100),
  detailed_feedback JSONB DEFAULT '{}',
  improvement_suggestions JSONB DEFAULT '[]',
  strengths JSONB DEFAULT '[]',
  weaknesses JSONB DEFAULT '[]',
  market_insights JSONB DEFAULT '{}',
  publishing_recommendations JSONB DEFAULT '{}',
  report_url TEXT,
  processing_time_seconds INTEGER,
  ai_model_version VARCHAR(50),
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

3. Manuscript Chunks for AI Processing


-- Manuscript Chunks for AI Processing
CREATE TABLE analyzer.manuscript_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
  chunk_number INTEGER NOT NULL,
  content TEXT NOT NULL,
  content_type VARCHAR(50), -- chapter, section, paragraph
  word_count INTEGER,
  character_count INTEGER,
  vector_embedding VECTOR(1536), -- OpenAI embedding dimensions
  processed BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

4. Analysis Sessions for Progress Tracking


-- Analysis Sessions for Progress Tracking
CREATE TABLE analyzer.analysis_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
  session_type VARCHAR(50) NOT NULL, -- full_analysis, quick_review, revision_check
  status VARCHAR(50) DEFAULT 'queued', -- queued, processing, completed, failed, cancelled
  progress_percentage INTEGER DEFAULT 0 CHECK (progress_percentage >= 0 AND progress_percentage <= 100),
  current_step VARCHAR(100),
  steps_completed JSONB DEFAULT '[]',
  estimated_completion_time TIMESTAMPTZ,
  error_message TEXT,
  retry_count INTEGER DEFAULT 0,
  priority INTEGER DEFAULT 5, -- 1-10, higher = more priority
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

5. Manuscript Versions for Revision Tracking


-- Manuscript Versions for Revision Tracking
CREATE TABLE analyzer.manuscript_versions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
  version_number INTEGER NOT NULL,
  version_name VARCHAR(255),
  file_url TEXT NOT NULL,
  word_count INTEGER,
  changes_summary TEXT,
  comparison_data JSONB DEFAULT '{}',
  analysis_id UUID REFERENCES analyzer.manuscript_analyses(id),
  created_by UUID NOT NULL REFERENCES auth.users(id),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

6. Genre Templates System


-- Genre Templates and Criteria
CREATE TABLE analyzer.genre_templates (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  genre_name VARCHAR(100) NOT NULL UNIQUE,
  genre_slug VARCHAR(100) NOT NULL UNIQUE,
  parent_genre VARCHAR(100),
  description TEXT,
  analysis_criteria JSONB NOT NULL,
  scoring_weights JSONB NOT NULL,
  market_insights JSONB DEFAULT '{}',
  example_manuscripts JSONB DEFAULT '[]',
  is_active BOOLEAN DEFAULT TRUE,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

7. Usage Tracking for Billing


-- Usage Tracking for Billing
CREATE TABLE analyzer.manuscript_usage (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  manuscript_id UUID REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
  usage_type VARCHAR(50) NOT NULL, -- analysis, export, api_call
  word_count INTEGER,
  credits_used DECIMAL(10,2),
  plan_tier VARCHAR(50),
  created_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- AI Usage Tracking (for admin-app monitoring)
CREATE TABLE analyzer.ai_usage (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  manuscript_id UUID REFERENCES analyzer.manuscripts(id),
  feature_name VARCHAR(100) NOT NULL, -- genre-detection, full-analysis, etc
  model_provider VARCHAR(50) NOT NULL, -- openai, anthropic, gemini, xai
  model_name VARCHAR(100) NOT NULL,
  tokens_used INTEGER NOT NULL,
  cost_usd DECIMAL(10,4),
  response_time_ms INTEGER,
  success BOOLEAN DEFAULT TRUE,
  error_message TEXT,
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

8. Reference Data System Tables


-- Reference Data Collections
CREATE TABLE analyzer.reference_collections (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  collection_name VARCHAR(100) NOT NULL UNIQUE,
  version VARCHAR(20) NOT NULL DEFAULT '1.0.0',
  description TEXT,
  is_active BOOLEAN DEFAULT TRUE,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- Reference Data Items
CREATE TABLE analyzer.reference_data_items (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  collection_id UUID NOT NULL REFERENCES analyzer.reference_collections(id) ON DELETE CASCADE,
  code VARCHAR(100) NOT NULL,
  label VARCHAR(255) NOT NULL,
  description TEXT,
  category VARCHAR(100),
  sort_order INTEGER DEFAULT 0,
  is_active BOOLEAN DEFAULT TRUE,
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE(collection_id, code)
);

Indexes for Performance


-- Manuscript indexes
CREATE INDEX idx_manuscripts_user_id ON analyzer.manuscripts(user_id);
CREATE INDEX idx_manuscripts_genre ON analyzer.manuscripts(genre);
CREATE INDEX idx_manuscripts_analysis_status ON analyzer.manuscripts(analysis_status);
CREATE INDEX idx_manuscripts_created_at ON analyzer.manuscripts(created_at);
 
-- Analysis indexes
CREATE INDEX idx_manuscript_analyses_manuscript_id ON analyzer.manuscript_analyses(manuscript_id);
CREATE INDEX idx_manuscript_analyses_overall_score ON analyzer.manuscript_analyses(overall_score);
CREATE INDEX idx_manuscript_analyses_created_at ON analyzer.manuscript_analyses(created_at);
 
-- Chunk indexes for vector search
CREATE INDEX idx_manuscript_chunks_manuscript_id ON analyzer.manuscript_chunks(manuscript_id);
CREATE INDEX idx_manuscript_chunks_vector_embedding ON analyzer.manuscript_chunks 
  USING ivfflat (vector_embedding vector_cosine_ops);
 
-- Session tracking indexes
CREATE INDEX idx_analysis_sessions_manuscript_id ON analyzer.analysis_sessions(manuscript_id);
CREATE INDEX idx_analysis_sessions_status ON analyzer.analysis_sessions(status);
CREATE INDEX idx_analysis_sessions_priority ON analyzer.analysis_sessions(priority DESC);
 
 
-- Usage tracking indexes
-- CREATE INDEX idx_manuscript_usage_organization_id ON analyzer.manuscript_usage(organization_id); -- Removed: No organization_id in user-centric model
CREATE INDEX idx_manuscript_usage_user_id ON analyzer.manuscript_usage(user_id);
CREATE INDEX idx_manuscript_usage_created_at ON analyzer.manuscript_usage(created_at);
 
-- Reference data indexes
CREATE INDEX idx_reference_collections_name ON analyzer.reference_collections(collection_name);
CREATE INDEX idx_reference_collections_active ON analyzer.reference_collections(is_active);
CREATE INDEX idx_reference_data_items_collection ON analyzer.reference_data_items(collection_id);
CREATE INDEX idx_reference_data_items_code ON analyzer.reference_data_items(collection_id, code);
CREATE INDEX idx_reference_data_items_active ON analyzer.reference_data_items(is_active);
CREATE INDEX idx_reference_data_items_sort ON analyzer.reference_data_items(collection_id, sort_order);

Row Level Security (RLS) Policies


-- Enable RLS on all tables
ALTER TABLE analyzer.manuscripts ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_analyses ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_chunks ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.analysis_sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_versions ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.genre_templates ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_usage ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.ai_usage ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.reference_collections ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.reference_data_items ENABLE ROW LEVEL SECURITY;
 
-- Manuscript access policies
CREATE POLICY "Users can access their own manuscripts" ON analyzer.manuscripts
  FOR ALL USING (user_id = auth.uid());
 
 
-- Analysis access policies
CREATE POLICY "Users can access analyses for their manuscripts" ON analyzer.manuscript_analyses
  FOR ALL USING (
    EXISTS (
      SELECT 1 FROM analyzer.manuscripts m
      WHERE m.id = manuscript_analyses.manuscript_id
      AND m.user_id = auth.uid()
    )
  );
 
-- AI Usage access policies
CREATE POLICY "Users can view their AI usage" ON analyzer.ai_usage
  FOR SELECT USING (user_id = auth.uid());
 
CREATE POLICY "System can insert AI usage" ON analyzer.ai_usage
  FOR INSERT WITH CHECK (true);
 
-- Usage tracking policies
CREATE POLICY "Users can view their usage" ON analyzer.manuscript_usage
  FOR SELECT USING (user_id = auth.uid());
 
 
-- Reference data policies (public read access)
CREATE POLICY "Everyone can read active reference collections" ON analyzer.reference_collections
  FOR SELECT USING (is_active = true);
 
CREATE POLICY "Everyone can read active reference data items" ON analyzer.reference_data_items
  FOR SELECT USING (is_active = true);
 
CREATE POLICY "Admins can manage reference data" ON analyzer.reference_collections
  FOR ALL USING (
    EXISTS (
      SELECT 1 FROM auth.users
      WHERE id = auth.uid()
      AND raw_user_meta_data->>'role' = 'admin'
    )
  );
 
CREATE POLICY "Admins can manage reference data items" ON analyzer.reference_data_items
  FOR ALL USING (
    EXISTS (
      SELECT 1 FROM auth.users
      WHERE id = auth.uid()
      AND raw_user_meta_data->>'role' = 'admin'
    )
  );
 
-- Admin access for all tables
CREATE POLICY "Admins have full access" ON analyzer.manuscripts FOR ALL USING (
  EXISTS (
    SELECT 1 FROM users
    WHERE id = auth.uid()
    AND metadata->>'role' = 'admin'
  )
);

API Endpoints (Future)

This schema will support future API endpoints for:

/api/manuscripts - CRUD operations
/api/analysis - Analysis management
/api/usage - Billing and limits

Testing Requirements

Unit Tests

Test all table constraints and validations
Verify RLS policies block unauthorized access
Test all indexes improve query performance

Integration Tests

Test manuscript upload and analysis workflow
Test usage tracking and billing calculations
Verify reference data loading

E2E Tests

Complete author journey from upload to analysis
Admin manuscript management and analytics
Multi-tenant data isolation verification

Acceptance Criteria

Must Have

All tables created with proper constraints
RLS policies implemented for data security
Indexes created for performance
Vector extension enabled for embeddings
Migration scripts tested in development

Should Have

Audit triggers for all manuscript operations
Soft delete functionality for manuscripts
Data export functionality for GDPR
Performance benchmarks documented

Could Have

Automated backup policies
Data archiving for old manuscripts
Advanced analytics views
Database monitoring alerts

Dependencies

F001-PROJECT-SETUP must be completed first
Requires Supabase vector extension enabled
Requires existing NextSaaS auth and organization tables

Estimated Effort

Development: 3 days
Testing: 2 days
Documentation: 1 day
Total: 6 days

Implementation Notes

Migration Strategy

Create new tables in development environment
Test RLS policies with sample data
Create migration scripts for production
Implement rollback procedures
Monitor performance after deployment

Performance Considerations

Vector embeddings require significant storage
Index large text content carefully
Consider partitioning for high-volume usage tracking

Next Feature

After completion, proceed to F003-FILE-STORAGE to configure manuscript file storage and processing.