F002 - Analyzer Database Schema
Objective
Create a separate analyzer schema within the existing MyStoryFlow Supabase database to support manuscript analysis, genre templates, and usage tracking. This approach provides logical separation while maintaining unified authentication and user management across all MyStoryFlow apps.
Requirements
Functional Requirements
- Store manuscripts with metadata (genre, word count, analysis status)
- Support multiple file formats (PDF, DOCX, TXT, Google Docs)
- Track analysis sessions and progress
- Store detailed analysis results and scores
- Support manuscript versioning and revision tracking
- Implement usage tracking for billing integration
- Maintain separation between core NextSaaS and analyzer features
Technical Requirements
- Maintain Row Level Security (RLS) for data isolation
- Support vector embeddings for content similarity
- Efficient indexing for large manuscript content
- Integrate with existing auth.users and public schemas
- Track AI usage for admin-app monitoring
- GDPR-compliant data export and deletion
Database Schema Architecture
Schema Creation
-- Create analyzer schema within MyStoryFlow database
CREATE SCHEMA IF NOT EXISTS analyzer;
-- Grant permissions to Supabase roles
GRANT USAGE ON SCHEMA analyzer TO anon, authenticated, service_role;
GRANT ALL ON ALL TABLES IN SCHEMA analyzer TO authenticated;
GRANT SELECT ON ALL TABLES IN SCHEMA analyzer TO anon;
-- Enable RLS by default on all analyzer tables
ALTER DEFAULT PRIVILEGES IN SCHEMA analyzer
GRANT ALL ON TABLES TO authenticated;
-- Create cross-schema helper functions
CREATE OR REPLACE FUNCTION analyzer.get_user_id()
RETURNS UUID AS $$
SELECT auth.uid()
$$ LANGUAGE SQL SECURITY DEFINER;
CREATE OR REPLACE FUNCTION analyzer.check_user_subscription(user_id UUID)
RETURNS BOOLEAN AS $$
SELECT EXISTS (
SELECT 1 FROM public.subscriptions
WHERE user_id = $1
AND status = 'active'
AND expires_at > NOW()
)
$$ LANGUAGE SQL SECURITY DEFINER;Database Changes
1. Core Manuscripts Table
-- Core Manuscripts Table
CREATE TABLE analyzer.manuscripts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
title VARCHAR(500) NOT NULL,
file_url TEXT NOT NULL,
file_name VARCHAR(255) NOT NULL,
file_size BIGINT NOT NULL,
file_type VARCHAR(50) NOT NULL, -- pdf, docx, txt, google_docs
word_count INTEGER,
page_count INTEGER,
genre VARCHAR(100),
genre_confidence DECIMAL(3,2),
language VARCHAR(10) DEFAULT 'en',
upload_status VARCHAR(50) DEFAULT 'processing', -- processing, completed, failed
content_extracted BOOLEAN DEFAULT FALSE,
analysis_status VARCHAR(50) DEFAULT 'pending', -- pending, processing, completed, failed
analysis_started_at TIMESTAMPTZ,
analysis_completed_at TIMESTAMPTZ,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ
);2. Analysis Results Table
-- Analysis Results Table
CREATE TABLE analyzer.manuscript_analyses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
analysis_version INTEGER DEFAULT 1,
genre_detected VARCHAR(100),
genre_confidence DECIMAL(3,2),
overall_score INTEGER CHECK (overall_score >= 0 AND overall_score <= 100),
structure_score INTEGER CHECK (structure_score >= 0 AND structure_score <= 100),
character_score INTEGER CHECK (character_score >= 0 AND character_score <= 100),
plot_score INTEGER CHECK (plot_score >= 0 AND plot_score <= 100),
writing_quality_score INTEGER CHECK (writing_quality_score >= 0 AND writing_quality_score <= 100),
pacing_score INTEGER CHECK (pacing_score >= 0 AND pacing_score <= 100),
dialogue_score INTEGER CHECK (dialogue_score >= 0 AND dialogue_score <= 100),
market_readiness_score INTEGER CHECK (market_readiness_score >= 0 AND market_readiness_score <= 100),
detailed_feedback JSONB DEFAULT '{}',
improvement_suggestions JSONB DEFAULT '[]',
strengths JSONB DEFAULT '[]',
weaknesses JSONB DEFAULT '[]',
market_insights JSONB DEFAULT '{}',
publishing_recommendations JSONB DEFAULT '{}',
report_url TEXT,
processing_time_seconds INTEGER,
ai_model_version VARCHAR(50),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);3. Manuscript Chunks for AI Processing
-- Manuscript Chunks for AI Processing
CREATE TABLE analyzer.manuscript_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
chunk_number INTEGER NOT NULL,
content TEXT NOT NULL,
content_type VARCHAR(50), -- chapter, section, paragraph
word_count INTEGER,
character_count INTEGER,
vector_embedding VECTOR(1536), -- OpenAI embedding dimensions
processed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT NOW()
);4. Analysis Sessions for Progress Tracking
-- Analysis Sessions for Progress Tracking
CREATE TABLE analyzer.analysis_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
session_type VARCHAR(50) NOT NULL, -- full_analysis, quick_review, revision_check
status VARCHAR(50) DEFAULT 'queued', -- queued, processing, completed, failed, cancelled
progress_percentage INTEGER DEFAULT 0 CHECK (progress_percentage >= 0 AND progress_percentage <= 100),
current_step VARCHAR(100),
steps_completed JSONB DEFAULT '[]',
estimated_completion_time TIMESTAMPTZ,
error_message TEXT,
retry_count INTEGER DEFAULT 0,
priority INTEGER DEFAULT 5, -- 1-10, higher = more priority
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);5. Manuscript Versions for Revision Tracking
-- Manuscript Versions for Revision Tracking
CREATE TABLE analyzer.manuscript_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
manuscript_id UUID NOT NULL REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
version_number INTEGER NOT NULL,
version_name VARCHAR(255),
file_url TEXT NOT NULL,
word_count INTEGER,
changes_summary TEXT,
comparison_data JSONB DEFAULT '{}',
analysis_id UUID REFERENCES analyzer.manuscript_analyses(id),
created_by UUID NOT NULL REFERENCES auth.users(id),
created_at TIMESTAMPTZ DEFAULT NOW()
);6. Genre Templates System
-- Genre Templates and Criteria
CREATE TABLE analyzer.genre_templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
genre_name VARCHAR(100) NOT NULL UNIQUE,
genre_slug VARCHAR(100) NOT NULL UNIQUE,
parent_genre VARCHAR(100),
description TEXT,
analysis_criteria JSONB NOT NULL,
scoring_weights JSONB NOT NULL,
market_insights JSONB DEFAULT '{}',
example_manuscripts JSONB DEFAULT '[]',
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);7. Usage Tracking for Billing
-- Usage Tracking for Billing
CREATE TABLE analyzer.manuscript_usage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
manuscript_id UUID REFERENCES analyzer.manuscripts(id) ON DELETE CASCADE,
usage_type VARCHAR(50) NOT NULL, -- analysis, export, api_call
word_count INTEGER,
credits_used DECIMAL(10,2),
plan_tier VARCHAR(50),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- AI Usage Tracking (for admin-app monitoring)
CREATE TABLE analyzer.ai_usage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
manuscript_id UUID REFERENCES analyzer.manuscripts(id),
feature_name VARCHAR(100) NOT NULL, -- genre-detection, full-analysis, etc
model_provider VARCHAR(50) NOT NULL, -- openai, anthropic, gemini, xai
model_name VARCHAR(100) NOT NULL,
tokens_used INTEGER NOT NULL,
cost_usd DECIMAL(10,4),
response_time_ms INTEGER,
success BOOLEAN DEFAULT TRUE,
error_message TEXT,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);8. Reference Data System Tables
-- Reference Data Collections
CREATE TABLE analyzer.reference_collections (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
collection_name VARCHAR(100) NOT NULL UNIQUE,
version VARCHAR(20) NOT NULL DEFAULT '1.0.0',
description TEXT,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Reference Data Items
CREATE TABLE analyzer.reference_data_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
collection_id UUID NOT NULL REFERENCES analyzer.reference_collections(id) ON DELETE CASCADE,
code VARCHAR(100) NOT NULL,
label VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100),
sort_order INTEGER DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(collection_id, code)
);Indexes for Performance
-- Manuscript indexes
CREATE INDEX idx_manuscripts_user_id ON analyzer.manuscripts(user_id);
CREATE INDEX idx_manuscripts_genre ON analyzer.manuscripts(genre);
CREATE INDEX idx_manuscripts_analysis_status ON analyzer.manuscripts(analysis_status);
CREATE INDEX idx_manuscripts_created_at ON analyzer.manuscripts(created_at);
-- Analysis indexes
CREATE INDEX idx_manuscript_analyses_manuscript_id ON analyzer.manuscript_analyses(manuscript_id);
CREATE INDEX idx_manuscript_analyses_overall_score ON analyzer.manuscript_analyses(overall_score);
CREATE INDEX idx_manuscript_analyses_created_at ON analyzer.manuscript_analyses(created_at);
-- Chunk indexes for vector search
CREATE INDEX idx_manuscript_chunks_manuscript_id ON analyzer.manuscript_chunks(manuscript_id);
CREATE INDEX idx_manuscript_chunks_vector_embedding ON analyzer.manuscript_chunks
USING ivfflat (vector_embedding vector_cosine_ops);
-- Session tracking indexes
CREATE INDEX idx_analysis_sessions_manuscript_id ON analyzer.analysis_sessions(manuscript_id);
CREATE INDEX idx_analysis_sessions_status ON analyzer.analysis_sessions(status);
CREATE INDEX idx_analysis_sessions_priority ON analyzer.analysis_sessions(priority DESC);
-- Usage tracking indexes
-- CREATE INDEX idx_manuscript_usage_organization_id ON analyzer.manuscript_usage(organization_id); -- Removed: No organization_id in user-centric model
CREATE INDEX idx_manuscript_usage_user_id ON analyzer.manuscript_usage(user_id);
CREATE INDEX idx_manuscript_usage_created_at ON analyzer.manuscript_usage(created_at);
-- Reference data indexes
CREATE INDEX idx_reference_collections_name ON analyzer.reference_collections(collection_name);
CREATE INDEX idx_reference_collections_active ON analyzer.reference_collections(is_active);
CREATE INDEX idx_reference_data_items_collection ON analyzer.reference_data_items(collection_id);
CREATE INDEX idx_reference_data_items_code ON analyzer.reference_data_items(collection_id, code);
CREATE INDEX idx_reference_data_items_active ON analyzer.reference_data_items(is_active);
CREATE INDEX idx_reference_data_items_sort ON analyzer.reference_data_items(collection_id, sort_order);Row Level Security (RLS) Policies
-- Enable RLS on all tables
ALTER TABLE analyzer.manuscripts ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_analyses ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_chunks ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.analysis_sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_versions ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.genre_templates ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.manuscript_usage ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.ai_usage ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.reference_collections ENABLE ROW LEVEL SECURITY;
ALTER TABLE analyzer.reference_data_items ENABLE ROW LEVEL SECURITY;
-- Manuscript access policies
CREATE POLICY "Users can access their own manuscripts" ON analyzer.manuscripts
FOR ALL USING (user_id = auth.uid());
-- Analysis access policies
CREATE POLICY "Users can access analyses for their manuscripts" ON analyzer.manuscript_analyses
FOR ALL USING (
EXISTS (
SELECT 1 FROM analyzer.manuscripts m
WHERE m.id = manuscript_analyses.manuscript_id
AND m.user_id = auth.uid()
)
);
-- AI Usage access policies
CREATE POLICY "Users can view their AI usage" ON analyzer.ai_usage
FOR SELECT USING (user_id = auth.uid());
CREATE POLICY "System can insert AI usage" ON analyzer.ai_usage
FOR INSERT WITH CHECK (true);
-- Usage tracking policies
CREATE POLICY "Users can view their usage" ON analyzer.manuscript_usage
FOR SELECT USING (user_id = auth.uid());
-- Reference data policies (public read access)
CREATE POLICY "Everyone can read active reference collections" ON analyzer.reference_collections
FOR SELECT USING (is_active = true);
CREATE POLICY "Everyone can read active reference data items" ON analyzer.reference_data_items
FOR SELECT USING (is_active = true);
CREATE POLICY "Admins can manage reference data" ON analyzer.reference_collections
FOR ALL USING (
EXISTS (
SELECT 1 FROM auth.users
WHERE id = auth.uid()
AND raw_user_meta_data->>'role' = 'admin'
)
);
CREATE POLICY "Admins can manage reference data items" ON analyzer.reference_data_items
FOR ALL USING (
EXISTS (
SELECT 1 FROM auth.users
WHERE id = auth.uid()
AND raw_user_meta_data->>'role' = 'admin'
)
);
-- Admin access for all tables
CREATE POLICY "Admins have full access" ON analyzer.manuscripts FOR ALL USING (
EXISTS (
SELECT 1 FROM users
WHERE id = auth.uid()
AND metadata->>'role' = 'admin'
)
);API Endpoints (Future)
This schema will support future API endpoints for:
/api/manuscripts- CRUD operations/api/analysis- Analysis management/api/usage- Billing and limits
Testing Requirements
Unit Tests
- Test all table constraints and validations
- Verify RLS policies block unauthorized access
- Test all indexes improve query performance
Integration Tests
- Test manuscript upload and analysis workflow
- Test usage tracking and billing calculations
- Verify reference data loading
E2E Tests
- Complete author journey from upload to analysis
- Admin manuscript management and analytics
- Multi-tenant data isolation verification
Acceptance Criteria
Must Have
- All tables created with proper constraints
- RLS policies implemented for data security
- Indexes created for performance
- Vector extension enabled for embeddings
- Migration scripts tested in development
Should Have
- Audit triggers for all manuscript operations
- Soft delete functionality for manuscripts
- Data export functionality for GDPR
- Performance benchmarks documented
Could Have
- Automated backup policies
- Data archiving for old manuscripts
- Advanced analytics views
- Database monitoring alerts
Dependencies
- F001-PROJECT-SETUP must be completed first
- Requires Supabase vector extension enabled
- Requires existing NextSaaS auth and organization tables
Estimated Effort
- Development: 3 days
- Testing: 2 days
- Documentation: 1 day
- Total: 6 days
Implementation Notes
Migration Strategy
- Create new tables in development environment
- Test RLS policies with sample data
- Create migration scripts for production
- Implement rollback procedures
- Monitor performance after deployment
Performance Considerations
- Vector embeddings require significant storage
- Index large text content carefully
- Consider partitioning for high-volume usage tracking
Next Feature
After completion, proceed to F003-FILE-STORAGE to configure manuscript file storage and processing.