Conversation Testing Dashboard — Test Guide
Location:
/dashboard/admin/conversation-testingAccess: Admin users only (requiresuser_profiles.role = 'admin') Created: January 28, 2026
Prerequisites
- Dev server running —
npm run devfrom the web-app workspace - Logged in as admin — Your account must have
role = 'admin'inuser_profiles - OpenAI API key — The persona simulator uses GPT-4o for generating persona responses and conversation analysis. Ensure
OPENAI_API_KEYis set in.env.local
How to Access
Navigate to: http://localhost:3000/dashboard/admin/conversation-testing
Or from any admin page, manually enter the URL (no nav link exists yet).
Feature Overview
The dashboard lets you:
- Configure tests — Pick personas, book types, turn count, duration
- Start a test run — Creates a run with N persona x book-type combinations
- View results — See pass/fail, quality/empathy scores per test
- View transcripts — Read full Elena <-> persona conversations
- Track history — Compare runs over time
Test Scenarios
Scenario 1: Smoke Test (Single Persona, Single Book Type)
Goal: Verify the full pipeline works end-to-end.
| Setting | Value |
|---|---|
| Personas | Uncheck “All”, select Margaret O’Sullivan only |
| Book Types | Uncheck “All”, select Memoir only |
| Max Turns | 4 |
| Duration | 3 min |
Expected: 1 test runs. After completion, you should see:
- Status changes from
pending->running->passed/failed - Quality and empathy scores appear (0-100)
- Clicking “View” shows the transcript with persona and Elena messages
What to verify:
- Test run appears in the Run History sidebar
- Summary cards show 1 total test
- Results table shows Margaret with Memoir book type
- Transcript viewer opens with conversation bubbles
- Scores bar chart renders correctly
Scenario 2: Persona Variety (3 Personas, 1 Book Type)
Goal: Compare how Elena adapts to different personality types.
| Setting | Value |
|---|---|
| Personas | Select: Margaret O’Sullivan, Ray Washington, Priya Sharma |
| Book Types | Select Family History only |
| Max Turns | 6 |
| Duration | 5 min |
Expected: 3 tests. Compare:
- Margaret (82, warm Irish grandmother) — should get high empathy, slower pacing
- Ray (78, reserved veteran) — Elena should be patient, not push too hard
- Priya (34, young mom) — more energetic tone, different topics
What to verify:
- All 3 results appear in the table
- Quality/empathy scores differ per persona
- Filter buttons work (All, Passed, Failed)
- Sorting by quality score orders correctly
- Each transcript reflects the persona’s personality
Scenario 3: Book Type Variety (1 Persona, 3 Book Types)
Goal: Verify Elena adapts conversation style to different book types.
| Setting | Value |
|---|---|
| Personas | Select Carlos Rivera only |
| Book Types | Select: Cookbook, Military Memoir, Poetry Collection |
| Max Turns | 6 |
| Duration | 5 min |
Expected: 3 tests. Compare:
- Cookbook — Elena should ask about recipes, ingredients, kitchen memories, sensory details
- Military Memoir — Elena should handle sensitive topics, ask about service timeline, honor experiences
- Poetry Collection — Elena should explore emotional expression, creative process, life moments that inspired poems
What to verify:
- Book type label shows correctly in results table
- Transcript viewer shows book type badge
- Elena’s questions differ meaningfully across book types
- Information extracted section shows book-type-relevant data
Scenario 4: Emotional Safety (Sensitive Personas)
Goal: Verify Elena handles emotionally sensitive conversations appropriately.
| Setting | Value |
|---|---|
| Personas | Select: Linda Chen (cancer survivor), Frank Kowalski (widower) |
| Book Types | Select Legacy Book only |
| Max Turns | 8 |
| Duration | 5 min |
Expected: 2 tests. Watch for:
- Linda — conversations may touch health struggles, mortality; Elena should acknowledge without pushing
- Frank — conversations may reference late wife; Elena should be compassionate, not redirect too quickly
What to verify:
- Empathy scores are high (80+) if Elena handles emotions well
- Elena Performance section notes emotional handling
- Transcript shows Elena acknowledging difficult moments, offering choice to continue or pivot
- No dismissive or rushed responses in transcript
Scenario 5: Full Matrix (All Personas, All Book Types)
Goal: Comprehensive quality assessment across all 130 combinations.
| Setting | Value |
|---|---|
| Personas | Check “All 10 personas” |
| Book Types | Check “All 13 book types” |
| Max Turns | 8 |
| Duration | 5 min |
Expected: 130 tests. This will take significant time and API calls.
Warning: This consumes substantial OpenAI API credits (GPT-4o for persona simulation + analysis per test). Estimate ~260 API calls minimum.
What to verify:
- Progress polling updates every 5 seconds
- Run History shows running status with remaining count
- Results table populates incrementally
- Filter by status shows running/pending/passed/failed counts
- Pass rate card updates in real-time
Scenario 6: Error Handling
Goal: Verify the UI handles failures gracefully.
| Test | How to Trigger | Expected |
|---|---|---|
| No API key | Remove OPENAI_API_KEY from env | Error status on results, error message visible in transcript viewer |
| Rate limiting | Run multiple tests quickly | Some tests show error status with rate limit message |
| Network failure | Disconnect during run | Run stays in running state, can refresh later |
| No admin access | Log in as non-admin user | 403 error, page should show error state |
UI Walkthrough
1. Test Configuration Panel (Left Side)
+-------------------------------+
| Test Configuration |
| |
| Personas: |
| [x] All 10 personas |
| [Select] to pick specific |
| |
| Book Types: |
| [x] All 13 book types |
| [Select] to pick specific |
| |
| Max Turns: [8 v] |
| Duration: [5 min v] |
| |
| 130 tests will run |
| 10 personas x 13 book types |
| |
| [ Run 130 Tests ] |
+-------------------------------+2. Run History (Left Side, Below Config)
Shows previous test runs with:
- Status icon (spinning = running, check = completed, X = failed)
- Test count and date
- Pass/fail counts
- Click to select and view results
3. Summary Cards (Right Side, Top)
4 cards showing: Total Tests | Passed | Failed | Pass Rate %
4. Results Table (Right Side, Main Area)
- Filter tabs: All, Passed, Failed, Error, Pending
- Sortable columns: Status, Persona, Book Type, Quality, Empathy, Turns
- Actions: Click “View” to open transcript
5. Transcript Viewer (Modal)
Opens as an overlay showing:
- Header: Persona name, book type badge, turn count, duration, status
- Scores: Bar charts for quality, empathy, narrative depth, question quality, emotional safety, pacing
- Elena Performance: Strengths and areas for improvement
- Information Extracted: Characters, places, events, themes found
- Conversation: Chat bubble view with persona (blue) and Elena (amber)
- Raw Analysis: Collapsible JSON of full analysis data
Database Tables
conversation_test_runs
| Column | Type | Description |
|---|---|---|
id | UUID | Primary key |
status | TEXT | running, completed, failed |
total_tests | INT | Total test combinations |
passed_tests | INT | Tests that passed |
failed_tests | INT | Tests that failed or errored |
config | JSONB | { personaIds, bookTypes, maxTurns, conversationDuration } |
triggered_by | UUID | Admin user who started the run |
started_at | TIMESTAMPTZ | When run began |
completed_at | TIMESTAMPTZ | When run finished |
conversation_test_results
| Column | Type | Description |
|---|---|---|
id | UUID | Primary key |
test_run_id | UUID | FK to test run |
persona_id | TEXT | e.g. grandma_margaret |
persona_name | TEXT | e.g. Margaret O'Sullivan |
book_type | TEXT | e.g. cookbook |
status | TEXT | pending, running, passed, failed, error |
overall_quality | FLOAT | 0-100 score |
empathy_score | FLOAT | 0-100 score |
turn_count | INT | Number of conversation turns |
duration_seconds | INT | Time taken |
conversation_transcript | JSONB | Array of { role, content } turns |
analysis_scores | JSONB | Detailed scoring breakdown |
elena_performance | JSONB | Strengths, improvements, pacing notes |
information_extracted | JSONB | Characters, places, events, themes |
error_message | TEXT | Error details if status is error |
conversation_metadata | JSONB | { compositeId, displayLabel, bookTypeLabel } |
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /api/admin/conversation-tests | List all test runs |
POST | /api/admin/conversation-tests | Start a new test run |
GET | /api/admin/conversation-tests/[runId] | Get run details |
PATCH | /api/admin/conversation-tests/[runId] | Update run status |
GET | /api/admin/conversation-tests/[runId]/results | Get results for a run |
PATCH | /api/admin/conversation-tests/[runId]/results | Update a specific result |
Example: Start a test run via API
curl -X POST http://localhost:3000/api/admin/conversation-tests \
-H "Content-Type: application/json" \
-H "Cookie: <your-session-cookie>" \
-d '{
"personaIds": ["grandma_margaret"],
"bookTypes": ["cookbook"],
"maxTurns": 4,
"conversationDuration": 3
}'Example: Fetch results
curl http://localhost:3000/api/admin/conversation-tests/<run-id>/results \
-H "Cookie: <your-session-cookie>"Available Personas
| ID | Name | Age | Description |
|---|---|---|---|
grandma_margaret | Margaret O’Sullivan | 82 | Irish immigrant grandmother, warm storyteller |
veteran_ray | Ray Washington | 78 | Korean War veteran, reserved |
young_mom_priya | Priya Sharma | 34 | Indian-American mom recording for baby |
struggling_artist_carlos | Carlos Rivera | 45 | Mexican-American artist, creative |
college_student_malik | Malik Johnson | 20 | College student interviewing grandparents |
cancer_survivor_linda | Linda Chen | 62 | Cancer survivor documenting journey |
widower_frank | Frank Kowalski | 75 | Widower recording late wife’s memories |
entrepreneur_sarah | Sarah Mitchell | 55 | Entrepreneur writing business memoir |
rural_farmer_tom | Tom Erikson | 70 | Rural farmer, quiet, heritage focus |
nurse_patricia | Patricia Williams | 58 | Nurse with healthcare stories |
Available Book Types
| Key | Label | Focus |
|---|---|---|
memoir | Memoir | Personal life experiences, turning points |
autobiography | Autobiography | Chronological life story |
family-history | Family History | Genealogy, family traditions, heritage |
cookbook | Family Cookbook | Recipes with stories, kitchen memories |
recipe-collection | Recipe Collection | Focused recipe documentation |
travel-journal | Travel Journal | Travel experiences, cultural encounters |
childrens-book | Children’s Book | Stories for grandchildren, lessons |
poetry-collection | Poetry Collection | Emotional expression, creative writing |
business-biography | Business Biography | Career journey, business lessons |
military-memoir | Military Memoir | Service experiences, comrades |
spiritual-journey | Spiritual Journey | Faith, beliefs, spiritual growth |
photo-book | Photo Book | Stories behind photos, visual memories |
legacy-book | Legacy Book | Wisdom, life lessons, values to pass on |
Scoring Guide
| Score Range | Color | Meaning |
|---|---|---|
| 80-100 | Green | Excellent — Elena performed well |
| 60-79 | Amber | Acceptable — Room for improvement |
| 0-59 | Red | Poor — Needs attention |
Quality Score measures: narrative depth, question relevance, story extraction effectiveness, conversation flow
Empathy Score measures: emotional acknowledgment, pacing sensitivity, gentle handling of difficult topics, avoiding dismissiveness
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| ”Not authenticated” error | Session expired | Log out and back in |
| ”Admin access required” | Account isn’t admin | Check user_profiles.role in Supabase |
| Tests stuck in “pending” | Test execution hasn’t started | The POST creates placeholder rows; execution is separate |
| No scores showing | Analysis hasn’t run yet | Check if OpenAI API key is valid |
| Empty transcript | Test errored before conversation started | Check error_message in transcript viewer |
| Run stays “running” forever | Server crashed mid-run | Manually update run status in Supabase to completed |
File Locations
apps/web-app/
├── app/
│ ├── (dashboard)/dashboard/admin/
│ │ └── conversation-testing/page.tsx # Main dashboard page
│ ├── api/admin/conversation-tests/
│ │ ├── route.ts # POST (create run), GET (list runs)
│ │ └── [runId]/
│ │ ├── route.ts # GET (run details), PATCH (update run)
│ │ └── results/route.ts # GET (results), PATCH (update result)
│ └── components/admin/conversation-testing/
│ ├── TestConfigPanel.tsx # Config UI
│ ├── TestResultsTable.tsx # Results grid
│ └── TranscriptViewer.tsx # Transcript modal
└── lib/testing/
├── realistic-personas.ts # 10 persona definitions
├── book-type-modifiers.ts # 13 book-type modifiers
├── persona-composer.ts # Persona + modifier composition
├── ai-persona-simulator.ts # GPT-4o persona simulation
└── enhanced-conversation-tester.ts # Test orchestration