Database Schema — Code Insights

Reference for the Code Insights SQLite database schema — projects, sessions, messages, and insights tables at ~/.code-insights/data.db.

The local SQLite database is stored at ~/.code-insights/data.db. It uses WAL mode for concurrent reads during CLI sync. All timestamps are ISO 8601 strings. Arrays and nested objects are stored as JSON strings.

Tables

projects

Stores project metadata. The project ID is derived from the git remote URL (stable across devices) or a hash of the working directory path.

Column	Type	Default	Description
`id`	TEXT PRIMARY KEY	—	Hash of git remote URL or directory path
`name`	TEXT NOT NULL	—	Project display name
`path`	TEXT NOT NULL	—	Working directory path
`git_remote_url`	TEXT	NULL	Git remote URL (nullable)
`project_id_source`	TEXT NOT NULL	`'path-hash'`	How the ID was derived: `'git-remote'` or `'path-hash'`
`session_count`	INTEGER NOT NULL	`0`	Number of synced sessions
`last_activity`	TEXT NOT NULL	—	ISO 8601 timestamp of most recent session
`created_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 timestamp
`updated_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 timestamp
`total_input_tokens`	INTEGER	`0`	Cumulative input tokens across all sessions
`total_output_tokens`	INTEGER	`0`	Cumulative output tokens
`cache_creation_tokens`	INTEGER	`0`	Cumulative cache write tokens
`cache_read_tokens`	INTEGER	`0`	Cumulative cache read tokens
`estimated_cost_usd`	REAL	`0`	Cumulative estimated cost in USD

sessions

Stores session metadata with generated titles, character classification, token usage, and device info.

Column	Type	Default	Description
`id`	TEXT PRIMARY KEY	—	Session ID from source tool
`project_id`	TEXT NOT NULL	—	Foreign key to `projects.id`
`project_name`	TEXT NOT NULL	—	Denormalized project name
`project_path`	TEXT NOT NULL	—	Denormalized project path
`git_remote_url`	TEXT	NULL	Git remote URL at sync time
`summary`	TEXT	NULL	LLM-generated session summary
`custom_title`	TEXT	NULL	User-provided title
`generated_title`	TEXT	NULL	Auto-generated title
`title_source`	TEXT	NULL	Which tier generated the title
`session_character`	TEXT	NULL	Classification: `deep_focus`, `bug_hunt`, `feature_build`, `exploration`, `refactor`, `learning`, `quick_task`
`started_at`	TEXT NOT NULL	—	ISO 8601 session start time
`ended_at`	TEXT NOT NULL	—	ISO 8601 session end time
`message_count`	INTEGER NOT NULL	`0`	Total messages
`user_message_count`	INTEGER NOT NULL	`0`	User messages only
`assistant_message_count`	INTEGER NOT NULL	`0`	Assistant messages only
`tool_call_count`	INTEGER NOT NULL	`0`	Total tool calls
`git_branch`	TEXT	NULL	Git branch at session time
`claude_version`	TEXT	NULL	Claude version string (Claude Code only)
`source_tool`	TEXT NOT NULL	`'claude-code'`	`claude-code`, `cursor`, `codex-cli`, `copilot-cli`, `copilot`
`device_id`	TEXT	NULL	Stable anonymous device identifier
`device_hostname`	TEXT	NULL	Machine hostname
`device_platform`	TEXT	NULL	`darwin`, `linux`, `win32`
`synced_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 sync timestamp
`total_input_tokens`	INTEGER	NULL	Total input tokens for session
`total_output_tokens`	INTEGER	NULL	Total output tokens
`cache_creation_tokens`	INTEGER	NULL	Cache write tokens
`cache_read_tokens`	INTEGER	NULL	Cache read tokens
`estimated_cost_usd`	REAL	NULL	Estimated cost in USD
`models_used`	TEXT	NULL	JSON array of all model IDs used
`primary_model`	TEXT	NULL	Most-used model ID
`usage_source`	TEXT	NULL	How usage was calculated
`deleted_at`	TEXT	NULL	ISO 8601 soft-delete timestamp (V5); NULL = visible
`compact_count`	INTEGER NOT NULL	`0`	Number of manual context compactions in this session (V6)
`auto_compact_count`	INTEGER NOT NULL	`0`	Number of auto-triggered context compactions (V6)
`slash_commands`	TEXT NOT NULL	`'[]'`	JSON array of slash commands used in this session (V6)

Indexes: project_id, started_at DESC, source_tool, deleted_at

messages

Stores full message content for all sessions.

Column	Type	Default	Description
`id`	TEXT PRIMARY KEY	—	Message ID
`session_id`	TEXT NOT NULL	—	Foreign key to `sessions.id`
`type`	TEXT NOT NULL	—	`user` or `assistant`
`content`	TEXT NOT NULL	`''`	Full message content
`thinking`	TEXT	NULL	Extended thinking content (JSON)
`tool_calls`	TEXT	NULL	JSON array of tool calls
`tool_results`	TEXT	NULL	JSON array of tool results
`usage`	TEXT	NULL	JSON token usage for this message
`timestamp`	TEXT NOT NULL	—	ISO 8601 message timestamp
`parent_id`	TEXT	NULL	Parent message ID

Indexes: session_id, (session_id, timestamp ASC)

insights

Stores LLM-generated insights. Written by the embedded dashboard server, not the CLI.

Column	Type	Default	Description
`id`	TEXT PRIMARY KEY	—	UUID
`session_id`	TEXT NOT NULL	—	Foreign key to `sessions.id`
`project_id`	TEXT NOT NULL	—	Foreign key to `projects.id`
`project_name`	TEXT NOT NULL	—	Denormalized project name
`type`	TEXT NOT NULL	—	`summary`, `decision`, `learning`, `technique`, `prompt_quality`
`title`	TEXT NOT NULL	—	Insight title
`content`	TEXT NOT NULL	—	Full insight text
`summary`	TEXT NOT NULL	—	One-line summary
`bullets`	TEXT	NULL	JSON array of bullet points
`confidence`	REAL NOT NULL	—	Confidence score 0.0–1.0
`source`	TEXT NOT NULL	`'llm'`	Always `llm`
`metadata`	TEXT	NULL	JSON metadata blob
`timestamp`	TEXT NOT NULL	—	ISO 8601 insight timestamp
`created_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 creation timestamp
`scope`	TEXT NOT NULL	`'session'`	Always `session` currently
`analysis_version`	TEXT NOT NULL	`'1.0.0'`	Analysis pipeline version
`linked_insight_ids`	TEXT	NULL	JSON array of related insight IDs

Indexes: session_id, project_id, type, timestamp DESC

usage_stats

A singleton row (id = 1) tracking cumulative totals across all sessions. Updated by the CLI on each sync.

Column	Type	Default	Description
`id`	INTEGER PRIMARY KEY	—	Always 1 (singleton)
`total_input_tokens`	INTEGER	`0`	Cumulative input tokens
`total_output_tokens`	INTEGER	`0`	Cumulative output tokens
`cache_creation_tokens`	INTEGER	`0`	Cumulative cache write tokens
`cache_read_tokens`	INTEGER	`0`	Cumulative cache read tokens
`estimated_cost_usd`	REAL	`0`	Cumulative cost in USD
`sessions_with_usage`	INTEGER	`0`	Sessions that have token data
`last_updated_at`	TEXT	`datetime('now')`	ISO 8601 last update time

session_facets

Stores per-session friction and effective-pattern analysis extracted by the Reflect pipeline. Added in V3.

Column	Type	Default	Description
`session_id`	TEXT PRIMARY KEY	—	Foreign key to `sessions.id`
`outcome_satisfaction`	TEXT NOT NULL	—	`positive`, `negative`, or `mixed`
`workflow_pattern`	TEXT	NULL	Dominant workflow pattern for the session
`had_course_correction`	INTEGER NOT NULL	`0`	1 if the session involved a significant course correction
`course_correction_reason`	TEXT	NULL	Brief description of the correction
`iteration_count`	INTEGER NOT NULL	`0`	Number of distinct iteration cycles
`friction_points`	TEXT	NULL	JSON array of friction point objects
`effective_patterns`	TEXT	NULL	JSON array of effective pattern objects
`extracted_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 extraction timestamp
`analysis_version`	TEXT NOT NULL	`'1.0.0'`	Analysis pipeline version

Indexes: outcome_satisfaction, workflow_pattern

reflect_snapshots

Caches LLM-generated weekly synthesis results. Added in V4.

Column	Type	Default	Description
`period`	TEXT NOT NULL	—	ISO week string (e.g., `2026-W10`) or custom period key
`project_id`	TEXT NOT NULL	`'__all__'`	Project scope; `__all__` = all projects
`results_json`	TEXT NOT NULL	—	JSON blob of the full synthesis result
`generated_at`	TEXT NOT NULL	—	ISO 8601 generation timestamp
`window_start`	TEXT	NULL	ISO 8601 start of the analysis window
`window_end`	TEXT NOT NULL	—	ISO 8601 end of the analysis window
`session_count`	INTEGER NOT NULL	—	Number of sessions included in this snapshot
`facet_count`	INTEGER NOT NULL	—	Number of facets included in this snapshot

Primary key: (period, project_id)

schema_version

Tracks applied migration versions. Checked on every startup to determine which migrations to run.

Column	Type	Default	Description
`version`	INTEGER PRIMARY KEY	—	Migration version number (1–7)
`applied_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 timestamp when the migration was applied

analysis_usage

Tracks LLM API usage and cost per analysis call, keyed by (session_id, analysis_type). Added in V7. Written by the dashboard server during session analysis.

Column	Type	Default	Description
`session_id`	TEXT NOT NULL	—	Foreign key to `sessions.id`
`analysis_type`	TEXT NOT NULL	—	`session`, `prompt_quality`, or `facet`
`provider`	TEXT NOT NULL	—	LLM provider name (e.g., `anthropic`, `openai`)
`model`	TEXT NOT NULL	—	Model ID used for this analysis
`input_tokens`	INTEGER NOT NULL	`0`	Input tokens billed
`output_tokens`	INTEGER NOT NULL	`0`	Output tokens billed
`cache_creation_tokens`	INTEGER NOT NULL	`0`	Cache write tokens (Anthropic prompt caching)
`cache_read_tokens`	INTEGER NOT NULL	`0`	Cache read tokens (Anthropic prompt caching)
`estimated_cost_usd`	REAL NOT NULL	`0`	Estimated cost in USD
`duration_ms`	INTEGER	NULL	Analysis wall-clock duration in milliseconds
`chunk_count`	INTEGER NOT NULL	`1`	Number of message chunks analyzed
`analyzed_at`	TEXT NOT NULL	`datetime('now')`	ISO 8601 analysis timestamp

Primary key: (session_id, analysis_type) — upserted on re-analysis

Index: analyzed_at DESC

Accessing the Database

You can query the database directly with any SQLite client:

sqlite3 ~/.code-insights/data.db
 
.tables
SELECT generated_title, session_character, estimated_cost_usd FROM sessions ORDER BY started_at DESC LIMIT 10;
SELECT SUM(estimated_cost_usd) FROM sessions;
SELECT source_tool, COUNT(*) FROM sessions GROUP BY source_tool;