Athena
Always-on multi-modal perception and reasoning engine — Apple Silicon optimized.
Athena is a multi-modal perception and reasoning engine designed as a foundation toward functional machine intelligence. It ingests data from any modality — text, audio, video, images — into a unified 512-dimensional feature space, builds a temporal knowledge graph, and answers arbitrary queries through a two-stage GNN + LLM reasoning pipeline.
Core Principles
Universal Embedding
Every input becomes a 512-dimensional L2-normalized vector in a shared space, enabling cross-modal reasoning without modality-specific interfaces.
Structure + Semantics
GNN handles graph-structured reasoning; LLM handles semantic interpretation and natural language output. Two stages, one pipeline.
Iterative Learning
Auto-flags novel features via VAE reconstruction error. Human review folds them back into the training loop — continuous open-world learning.
Modality-Agnostic
New input types follow the same pipeline: raw → encoder → projector (→512d) → graph → novelty → LLM.
On-Device
Entire stack runs on Apple M1 with 16GB RAM using on-demand component loading. No cloud dependency.
Pipeline
Athena captures from a MacBook Air (camera, mic) and streams to a Mac Mini server over WebSocket.
The server encodes text via all-MiniLM-L6-v2, images via CLIP ViT-B/32,
and audio via Whisper — all into the shared 512d space. Observations are stored in
a C++ ObservationGraph with 11 edge types, backed by a SQLite KnowledgeGraph.
A VAE novelty detector flags unfamiliar patterns against an adaptive threshold. A GraphSAGE GNN (2 layers, 11-type edge aggregation) reasons over the graph, conditioned on query attention. A drive-based reward system with REINFORCE learning steers the motivation vector.
Current Status
- Ingest pipelines for HuggingFace datasets, Wikipedia, RSS/Atom, Telegram — done
- Multi-modal encoding (text, audio, video, OCR) — done
- ObservationGraph (CSR, C++) + KnowledgeGraph (SQLite) — done
- SelfStateManager with awake/asleep/training modes — done
- VAE novelty detection + adaptive threshold — done
- GraphSAGE GNN with query-conditioned attention — done
- Drive-based reward system with REINFORCE — done
- WebSocket + REST server (aiohttp) — done