Conversation Testing Dashboard — Test Guide

Location: /dashboard/admin/conversation-testing Access: Admin users only (requires user_profiles.role = 'admin') Created: January 28, 2026

Prerequisites

Dev server running — npm run dev from the web-app workspace
Logged in as admin — Your account must have role = 'admin' in user_profiles
OpenAI API key — The persona simulator uses GPT-4o for generating persona responses and conversation analysis. Ensure OPENAI_API_KEY is set in .env.local

How to Access

Navigate to: http://localhost:3000/dashboard/admin/conversation-testing

Or from any admin page, manually enter the URL (no nav link exists yet).

Feature Overview

The dashboard lets you:

Configure tests — Pick personas, book types, turn count, duration
Start a test run — Creates a run with N persona x book-type combinations
View results — See pass/fail, quality/empathy scores per test
View transcripts — Read full Elena <-> persona conversations
Track history — Compare runs over time

Test Scenarios

Scenario 1: Smoke Test (Single Persona, Single Book Type)

Goal: Verify the full pipeline works end-to-end.

Setting	Value
Personas	Uncheck “All”, select Margaret O’Sullivan only
Book Types	Uncheck “All”, select Memoir only
Max Turns	4
Duration	3 min

Expected: 1 test runs. After completion, you should see:

Status changes from pending -> running -> passed/failed
Quality and empathy scores appear (0-100)
Clicking “View” shows the transcript with persona and Elena messages

What to verify:

Test run appears in the Run History sidebar
Summary cards show 1 total test
Results table shows Margaret with Memoir book type
Transcript viewer opens with conversation bubbles
Scores bar chart renders correctly

Scenario 2: Persona Variety (3 Personas, 1 Book Type)

Goal: Compare how Elena adapts to different personality types.

Setting	Value
Personas	Select: Margaret O’Sullivan, Ray Washington, Priya Sharma
Book Types	Select Family History only
Max Turns	6
Duration	5 min

Expected: 3 tests. Compare:

Margaret (82, warm Irish grandmother) — should get high empathy, slower pacing
Ray (78, reserved veteran) — Elena should be patient, not push too hard
Priya (34, young mom) — more energetic tone, different topics

What to verify:

All 3 results appear in the table
Quality/empathy scores differ per persona
Filter buttons work (All, Passed, Failed)
Sorting by quality score orders correctly
Each transcript reflects the persona’s personality

Scenario 3: Book Type Variety (1 Persona, 3 Book Types)

Goal: Verify Elena adapts conversation style to different book types.

Setting	Value
Personas	Select Carlos Rivera only
Book Types	Select: Cookbook, Military Memoir, Poetry Collection
Max Turns	6
Duration	5 min

Expected: 3 tests. Compare:

Cookbook — Elena should ask about recipes, ingredients, kitchen memories, sensory details
Military Memoir — Elena should handle sensitive topics, ask about service timeline, honor experiences
Poetry Collection — Elena should explore emotional expression, creative process, life moments that inspired poems

What to verify:

Book type label shows correctly in results table
Transcript viewer shows book type badge
Elena’s questions differ meaningfully across book types
Information extracted section shows book-type-relevant data

Scenario 4: Emotional Safety (Sensitive Personas)

Goal: Verify Elena handles emotionally sensitive conversations appropriately.

Setting	Value
Personas	Select: Linda Chen (cancer survivor), Frank Kowalski (widower)
Book Types	Select Legacy Book only
Max Turns	8
Duration	5 min

Expected: 2 tests. Watch for:

Linda — conversations may touch health struggles, mortality; Elena should acknowledge without pushing
Frank — conversations may reference late wife; Elena should be compassionate, not redirect too quickly

What to verify:

Empathy scores are high (80+) if Elena handles emotions well
Elena Performance section notes emotional handling
Transcript shows Elena acknowledging difficult moments, offering choice to continue or pivot
No dismissive or rushed responses in transcript

Scenario 5: Full Matrix (All Personas, All Book Types)

Goal: Comprehensive quality assessment across all 130 combinations.

Setting	Value
Personas	Check “All 10 personas”
Book Types	Check “All 13 book types”
Max Turns	8
Duration	5 min

Expected: 130 tests. This will take significant time and API calls.

Warning: This consumes substantial OpenAI API credits (GPT-4o for persona simulation + analysis per test). Estimate ~260 API calls minimum.

What to verify:

Progress polling updates every 5 seconds
Run History shows running status with remaining count
Results table populates incrementally
Filter by status shows running/pending/passed/failed counts
Pass rate card updates in real-time

Scenario 6: Error Handling

Goal: Verify the UI handles failures gracefully.

Test	How to Trigger	Expected
No API key	Remove `OPENAI_API_KEY` from env	Error status on results, error message visible in transcript viewer
Rate limiting	Run multiple tests quickly	Some tests show `error` status with rate limit message
Network failure	Disconnect during run	Run stays in `running` state, can refresh later
No admin access	Log in as non-admin user	403 error, page should show error state

UI Walkthrough

1. Test Configuration Panel (Left Side)


+-------------------------------+
| Test Configuration            |
|                               |
| Personas:                     |
| [x] All 10 personas          |
|     [Select] to pick specific |
|                               |
| Book Types:                   |
| [x] All 13 book types        |
|     [Select] to pick specific |
|                               |
| Max Turns: [8 v]              |
| Duration:  [5 min v]          |
|                               |
| 130 tests will run            |
| 10 personas x 13 book types  |
|                               |
| [ Run 130 Tests ]             |
+-------------------------------+

2. Run History (Left Side, Below Config)

Shows previous test runs with:

Status icon (spinning = running, check = completed, X = failed)
Test count and date
Pass/fail counts
Click to select and view results

3. Summary Cards (Right Side, Top)

4 cards showing: Total Tests | Passed | Failed | Pass Rate %

4. Results Table (Right Side, Main Area)

Filter tabs: All, Passed, Failed, Error, Pending
Sortable columns: Status, Persona, Book Type, Quality, Empathy, Turns
Actions: Click “View” to open transcript

Opens as an overlay showing:

Header: Persona name, book type badge, turn count, duration, status
Scores: Bar charts for quality, empathy, narrative depth, question quality, emotional safety, pacing
Elena Performance: Strengths and areas for improvement
Information Extracted: Characters, places, events, themes found
Conversation: Chat bubble view with persona (blue) and Elena (amber)
Raw Analysis: Collapsible JSON of full analysis data

Database Tables

`conversation_test_runs`

Column	Type	Description
`id`	UUID	Primary key
`status`	TEXT	`running`, `completed`, `failed`
`total_tests`	INT	Total test combinations
`passed_tests`	INT	Tests that passed
`failed_tests`	INT	Tests that failed or errored
`config`	JSONB	`{ personaIds, bookTypes, maxTurns, conversationDuration }`
`triggered_by`	UUID	Admin user who started the run
`started_at`	TIMESTAMPTZ	When run began
`completed_at`	TIMESTAMPTZ	When run finished

`conversation_test_results`

Column	Type	Description
`id`	UUID	Primary key
`test_run_id`	UUID	FK to test run
`persona_id`	TEXT	e.g. `grandma_margaret`
`persona_name`	TEXT	e.g. `Margaret O'Sullivan`
`book_type`	TEXT	e.g. `cookbook`
`status`	TEXT	`pending`, `running`, `passed`, `failed`, `error`
`overall_quality`	FLOAT	0-100 score
`empathy_score`	FLOAT	0-100 score
`turn_count`	INT	Number of conversation turns
`duration_seconds`	INT	Time taken
`conversation_transcript`	JSONB	Array of `{ role, content }` turns
`analysis_scores`	JSONB	Detailed scoring breakdown
`elena_performance`	JSONB	Strengths, improvements, pacing notes
`information_extracted`	JSONB	Characters, places, events, themes
`error_message`	TEXT	Error details if status is `error`
`conversation_metadata`	JSONB	`{ compositeId, displayLabel, bookTypeLabel }`

API Endpoints

Method	Endpoint	Description
`GET`	`/api/admin/conversation-tests`	List all test runs
`POST`	`/api/admin/conversation-tests`	Start a new test run
`GET`	`/api/admin/conversation-tests/[runId]`	Get run details
`PATCH`	`/api/admin/conversation-tests/[runId]`	Update run status
`GET`	`/api/admin/conversation-tests/[runId]/results`	Get results for a run
`PATCH`	`/api/admin/conversation-tests/[runId]/results`	Update a specific result

Example: Start a test run via API


curl -X POST http://localhost:3000/api/admin/conversation-tests \
  -H "Content-Type: application/json" \
  -H "Cookie: <your-session-cookie>" \
  -d '{
    "personaIds": ["grandma_margaret"],
    "bookTypes": ["cookbook"],
    "maxTurns": 4,
    "conversationDuration": 3
  }'

Example: Fetch results


curl http://localhost:3000/api/admin/conversation-tests/<run-id>/results \
  -H "Cookie: <your-session-cookie>"

Available Personas

ID	Name	Age	Description
`grandma_margaret`	Margaret O’Sullivan	82	Irish immigrant grandmother, warm storyteller
`veteran_ray`	Ray Washington	78	Korean War veteran, reserved
`young_mom_priya`	Priya Sharma	34	Indian-American mom recording for baby
`struggling_artist_carlos`	Carlos Rivera	45	Mexican-American artist, creative
`college_student_malik`	Malik Johnson	20	College student interviewing grandparents
`cancer_survivor_linda`	Linda Chen	62	Cancer survivor documenting journey
`widower_frank`	Frank Kowalski	75	Widower recording late wife’s memories
`entrepreneur_sarah`	Sarah Mitchell	55	Entrepreneur writing business memoir
`rural_farmer_tom`	Tom Erikson	70	Rural farmer, quiet, heritage focus
`nurse_patricia`	Patricia Williams	58	Nurse with healthcare stories

Available Book Types

Key	Label	Focus
`memoir`	Memoir	Personal life experiences, turning points
`autobiography`	Autobiography	Chronological life story
`family-history`	Family History	Genealogy, family traditions, heritage
`cookbook`	Family Cookbook	Recipes with stories, kitchen memories
`recipe-collection`	Recipe Collection	Focused recipe documentation
`travel-journal`	Travel Journal	Travel experiences, cultural encounters
`childrens-book`	Children’s Book	Stories for grandchildren, lessons
`poetry-collection`	Poetry Collection	Emotional expression, creative writing
`business-biography`	Business Biography	Career journey, business lessons
`military-memoir`	Military Memoir	Service experiences, comrades
`spiritual-journey`	Spiritual Journey	Faith, beliefs, spiritual growth
`photo-book`	Photo Book	Stories behind photos, visual memories
`legacy-book`	Legacy Book	Wisdom, life lessons, values to pass on

Scoring Guide

Score Range	Color	Meaning
80-100	Green	Excellent — Elena performed well
60-79	Amber	Acceptable — Room for improvement
0-59	Red	Poor — Needs attention

Quality Score measures: narrative depth, question relevance, story extraction effectiveness, conversation flow

Empathy Score measures: emotional acknowledgment, pacing sensitivity, gentle handling of difficult topics, avoiding dismissiveness

Troubleshooting

Issue	Cause	Fix
”Not authenticated” error	Session expired	Log out and back in
”Admin access required”	Account isn’t admin	Check `user_profiles.role` in Supabase
Tests stuck in “pending”	Test execution hasn’t started	The POST creates placeholder rows; execution is separate
No scores showing	Analysis hasn’t run yet	Check if OpenAI API key is valid
Empty transcript	Test errored before conversation started	Check `error_message` in transcript viewer
Run stays “running” forever	Server crashed mid-run	Manually update run status in Supabase to `completed`

File Locations


apps/web-app/
├── app/
│   ├── (dashboard)/dashboard/admin/
│   │   └── conversation-testing/page.tsx          # Main dashboard page
│   ├── api/admin/conversation-tests/
│   │   ├── route.ts                                # POST (create run), GET (list runs)
│   │   └── [runId]/
│   │       ├── route.ts                            # GET (run details), PATCH (update run)
│   │       └── results/route.ts                    # GET (results), PATCH (update result)
│   └── components/admin/conversation-testing/
│       ├── TestConfigPanel.tsx                     # Config UI
│       ├── TestResultsTable.tsx                    # Results grid
│       └── TranscriptViewer.tsx                    # Transcript modal
└── lib/testing/
    ├── realistic-personas.ts                       # 10 persona definitions
    ├── book-type-modifiers.ts                      # 13 book-type modifiers
    ├── persona-composer.ts                         # Persona + modifier composition
    ├── ai-persona-simulator.ts                     # GPT-4o persona simulation
    └── enhanced-conversation-tester.ts             # Test orchestration

Conversation Testing Dashboard — Test Guide

Prerequisites

How to Access

Feature Overview

Test Scenarios

Scenario 1: Smoke Test (Single Persona, Single Book Type)

Scenario 2: Persona Variety (3 Personas, 1 Book Type)

Scenario 3: Book Type Variety (1 Persona, 3 Book Types)

Scenario 4: Emotional Safety (Sensitive Personas)

Scenario 5: Full Matrix (All Personas, All Book Types)

Scenario 6: Error Handling

UI Walkthrough

1. Test Configuration Panel (Left Side)

2. Run History (Left Side, Below Config)

3. Summary Cards (Right Side, Top)

4. Results Table (Right Side, Main Area)

5. Transcript Viewer (Modal)

Database Tables

conversation_test_runs

conversation_test_results

API Endpoints

Example: Start a test run via API

Example: Fetch results

Available Personas

Available Book Types

Scoring Guide

Troubleshooting

File Locations

`conversation_test_runs`

`conversation_test_results`