Documentation · v1.0.0
wawa-note
A free, open-source AI workspace that captures meeting evidence — audio, scans, links, notes — and turns it into a searchable project knowledge store with tasks, decisions, and typed relationships. Agentic AI chat navigates your knowledge like a filesystem.
Overview
wawa-note v1.0.0 is a native iOS app built with Swift 6.0 and SwiftUI. It requires iOS 17+ and runs on iPhone. The project is MIT licensed and fully open source.
The core thesis: meeting evidence becomes reusable project memory, and project memory becomes an explorable graph with tasks, decisions, owners, and connected artifacts — all with evidence provenance tracing back to source material.
There are no Wawa Note servers. The app never sees your data. You bring your own API keys. You choose your AI provider. Your knowledge is portable — export everything, import freely.
Architecture
The app follows a layered architecture with protocol-first boundaries. Every external dependency — AI providers, transcription engines, import/export formats, context sensors — sits behind a Swift protocol.
iOS App (SwiftUI) Capture │ Inbox │ Explore │ Chat │ Settings ───────────────────────────────────────── Domain Layer Agent (Shell VFS + Tool Calling) │ Content Pipeline Project Models │ Graph │ Calendar │ Search ───────────────────────────────────────── Provider Abstraction OpenAI │ Anthropic │ Gemini │ DeepSeek │ OpenAI-compatible │ Local LLM ───────────────────────────────────────── Storage SwiftData (metadata) │ FileManager (artifacts) Keychain (API keys) │ Spotlight (indexing)
The 4-tab navigation is Capture (record, scan, import), Inbox (global search and triage), Explore (project-first workspace browser with Timeline), and Chat (agentic AI chat with tool calling).
No backend. No SaaS. No cloud sync. No accounts.
Project structure
wawa-note/
App/ WawaNoteApp.swift
Audio/ Capture, Playback, Session, FileWriter
Connectivity/ Watch session, RecordingCoordinator
ContextCapture/ Calendar, Location, Focus, Motion, Battery, AudioRoute
Domain/
Agent/ AgentLoop, ShellInterpreter, ShellTool, ToolContext
Calendar/ CalendarEvent, CalendarSyncService, Timeline, DaySummary
Models/ KnowledgeItem, Project, TaskItem, Person, GraphEdge, Entity
Services/ ContentPipeline, Search, Project, Task, Person services
Ecosystem/
Export/ MarkdownExporter, JSONExporter, ProjectExport, TaskReminders
Import/ ImportRouter + 10 format importers
LocalIntelligence/ EmbeddingService, SemanticSearchService
Providers/ AIProvider protocol + OpenAI, Anthropic, Gemini, DeepSeek
Security/ Biometric gate (Face ID)
Storage/ FileArtifactStore, SecureKeyStore
Transcription/ AppleSpeechTranscriptionEngine, RemoteTranscriptionEngine
UI/
Capture/ Scanner, Recording UI
Chat/ ChatView, ChatViewModel, AgentStatusBar
Inbox/ Universal search + triage
Explore/ Project explorer
Project/ Detail, Timeline, Graph, TaskBoard, Decisions, Entities
Calendar/ MonthGrid, DayActivity, OnThisDay
Knowledge/ Detail view, Connections
Settings/ Provider picker, config, templates
Components/ ContentView, CreationSheet, PermissionPrompt Agentic chat
The chat system uses a Shell Virtual Filesystem approach. Instead of dozens of individual AI tools, the agent has a single run_command tool that executes Unix-like commands: ls, cd, cat, find, grep, mv, rm, head, wc, touch, echo, extract, history, js-eval, help. This replaced 47 individual tool files with a single ShellInterpreter.
Key features of the agent system:
- — Context-aware conversations scoped to projects, items, or global
- — Auto/Deep/Fast modes controlling how much the agent iterates
- — Voice input via on-device speech recognition or Whisper API
- — Markdown rendering in messages via AttributedString
- — Choice prompts — numbered options become tappable buttons
- — Swipe actions on task/item cards for quick status changes
- — Suggestion bar on scroll-up with context-aware chips
- — AgentStatusBar — compact collapsible tool call display
Project graph
Every project has a typed graph connecting knowledge items, tasks, decisions, people, and entities. Graph edges carry provenance — each relationship is traceable to a transcript segment, note block, or external event.
Core models in the graph system:
- — KnowledgeItem — polymorphic model: meeting, note, journalEntry, webBookmark, image
- — Project — container with flexible frameworks (LLM-defined schemas)
- — TaskItem — status, priority, owner tracking
- — Person — people mentioned across meetings and projects
- — GraphEdge — typed relationship with source provenance
- — Entity — systems, organizations, tools extracted from content
Content pipeline
Every item that enters wawa-note goes through a unified pipeline: Extract → Analyze → Detect signals → Ingest. This runs automatically per item and is fully automated.
The pipeline handles:
- — Text extraction from audio (transcription), documents (OCR), and imported files
- — Analysis via configured AI provider — structured output with summaries, entities, and decisions
- — Signal detection — cross-referencing against existing project knowledge
- — Ingestion — persisting extracted entities, graph edges, and metadata into SwiftData
AI providers
All AI providers implement a common protocol. Provider-specific JSON never leaks past the provider layer. Configuration is done through a JSON file (Providers/ai_config.json) that defines models, presets, context windows, and feature settings.
Provider | Status | Notes ────────────────────── | ─────────── | ───── OpenAICompatibleProvider | Primary | OpenAI, any /v1/chat/completions endpoint AnthropicProvider | Implemented | Claude models via Anthropic API GeminiProvider | Implemented | Google Gemini models DeepSeek | Implemented | DeepSeek models Local LLM | Partial | Model download infra ready, inference not wired yet
A single AIConfigService.shared.requestParams(for:model:) call handles model presets, reasoning model detection (temperature → nil for reasoning models), feature config ceilings, and context window limits. Services only own their system/user prompts — never hardcoded parameters.
Transcription
Two transcription engines are implemented behind a common protocol:
- — AppleSpeechTranscriptionEngine — on-device, works offline, no API cost
- — RemoteTranscriptionEngine — Whisper API via any OpenAI-compatible endpoint
iOS integrations
Integration | Direction | Status ────────────────────── | ────────── | ────────── Calendar read + sensor | IN | Implemented Calendar create events | OUT | Implemented Reminders export | OUT | Implemented Share Extension | IN | Implemented Format importers (10) | IN | Implemented Export (MD, JSON, CSV) | OUT | Implemented Context sensors (7) | IN | Implemented Watch Connectivity | BOTH | Implemented Apple Speech transcript| IN | Implemented Live Activities | OUT | Implemented Vision OCR doc scanner | IN | Implemented Core Spotlight index | OUT | Implemented Contacts speaker match | IN | Implemented Face ID biometric gate | INTERNAL | Implemented App Intents / Siri | — | Not yet
Calendar integration includes MonthGrid, DayActivity, OnThisDay, and a unified view combining EKEvent items with knowledge items on the same timeline. Seven context sensors (Calendar, AudioRoute, Location, Battery, MotionActivity, FocusMode) feed the content pipeline.
Import & export
wawa-note is designed to be a place you can enter and leave freely. Everything is portable.
Import (10 formats)
JSON, Markdown, ICS (calendar), SRT (subtitles), PDF, HTML, RTF, Plain Text, GitHub Issues — plus a Share Extension that accepts content from any iOS app.
Export
Markdown, JSON, SRT, CSV, Graph JSON. Reminders can be exported to Apple Reminders. Calendar events can be created from meetings.
Use your data anywhere — ChatGPT, Claude, Notion, Obsidian. It's yours.
Security & privacy
There are no Wawa Note servers. The app never sees your data. API keys stay on your device in the iOS Keychain. All permissions are optional — the app works with reduced functionality when permissions are denied.
Key rules
- — API keys in Keychain only. Never in SwiftData, UserDefaults, JSON, or logs.
- — Face ID biometric gate available for accessing sensitive content.
- — Raw audio deletable while keeping transcript and analysis.
- — Original transcript always preserved — edits saved separately.
- — Provider config stores only a Keychain identifier, never the key value.
- — Logs never include API keys, authorization headers, or full transcripts.
- — All processing indicators show whether work is on-device or remote.
On-device AI
The infrastructure for downloading and managing local AI models is in place (ModelDownloadService and ModelRegistry exist). The planned setup:
| Model | Size | Purpose |
|---|---|---|
| Llama 3.2 1B (Q4_K_M) | ~500 MB | Summarization, task extraction, structured JSON output |
| EmbeddingGemma 300M (Q4_K_M) | ~200 MB | Semantic search embeddings |
Models download from Hugging Face on first use, verified with SHA256, and stored in Application Support. The existing AIProvider protocol means switching between cloud and local inference requires no changes to the rest of the app. Inference via swift-llama (llama.cpp + Metal GPU) is not yet wired.
Coding standards
- — Protocol-first boundaries. Every integration behind a protocol: AIProvider, TranscriptionEngine, FormatImporter, ContextSensor.
- — Swift 6.0 concurrency: async/await, @MainActor for UI state, @preconcurrency import for unaudited Apple frameworks.
- — All AI requests must use AIConfigService.shared.requestParams(for:model:) — never hardcode temperature or maxTokens.
- — Provider-specific JSON stays inside the provider. App code uses internal models only.
- — Provenance on every graph edge — relationships traceable to source evidence.
- — SwiftData: no @Relationship cascade deletes. Manual recursive delete in service layer.
- — Keep SwiftUI views thin. Services testable without UI. Dependency injection through initializers.
- — 27 unit tests in the test target. Do not claim something works on device until tested there.
What's next
v1.0.0 is stable and available now. Areas still in progress:
- — On-device LLM inference via swift-llama (download infra ready, inference not wired)
- — Semantic search UI (EmbeddingService exists, not surfaced yet)
- — Cross-reference results persisted as GraphEdges (currently ephemeral DTOs)
- — Device validation on iPhone 14 Plus hardware
- — App Intents / Siri integration (needs extension target)
Track progress and contribute on GitHub.