Aaasaasa: A Local AI Stack Built for Scale
Multi‑provider CLI + High‑capacity Web Client for ultra‑long sessions and on‑prem performance.
Is There Anything Like This?
Not really. Existing tools are siloed:
Ollama focuses on local models only; vLLM exposes a single server;
vendor CLIs target their own APIs; GUI apps emphasize convenience over orchestration.
None unify multi‑provider routing, load balancing, and a
high‑capacity ChatGPT Web client tuned for extremely long sessions.
What Makes Aaasaasa AI Different
- Unified, multi‑provider CLI — one tool for Ollama, vLLM, OpenAI/Anthropic (extensible), with profiles and per‑model routing.
- Production‑grade load balancing — strategies like round_robin, least_conn, and power‑of‑two choices (p2c) applied to LLM backends.
- Intelligent failover — automatic fallback from cloud (e.g., API quota hit) to local nodes; optional multi‑key rotation to spread usage across accounts.
- Local‑first performance — pushes work to on‑prem GPUs/CPUs via Ollama/vLLM for low latency and data locality.
- High‑capacity Web Client — a dedicated Chromium launcher for ChatGPT with isolated profile, large JS heap, GPU rasterization, and no background throttling; built for 1000+ message sessions.
- Separation of concerns — CLI for orchestration; Web client for human interaction with maximum stability and memory headroom.
- Brandable & private — Aaasaasa Studio by Aleks: local configs, optional on‑prem gateway, no vendor lock‑in.
Advantages Over Other Tools
- One interface, many backends — switch between local clusters and cloud models without changing workflows.
- Resilience by design — quota errors or node outages don’t halt work; 3a2a routes around failures automatically.
- Massive session support — the Web client avoids typical browser constraints by using an isolated profile and aggressive performance flags.
- Edge & on‑prem friendly — keep data near you, use your 64 GB+ RAM and local GPUs to accelerate response time.
Features No One Else Combines
- CLI + Web synergy — send the same task to local LLMs (CLI) or ChatGPT (Web) with consistent behavior.
- LB algorithms for LLMs — web‑inspired balancing (least‑connections, p2c) applied to inference endpoints.
- Key rotation + fallback chain — gracefully move from one API key/provider to another, then to on‑prem models.
- Ultra‑long chats — specialized ChatGPT launcher tuned for huge conversations that overwhelm normal browsers.
Roadmap
- Aaasaasa Gateway (Aaasaasa) — a lightweight on‑prem router for all LLM traffic (already prototyped).
- Auto‑model selection — choose models by task type, latency budget, or cost.
- Cluster controls — health checks, EWMA latency scoring, hedged requests, and adaptive concurrency.
- Advanced Web automations — optional scripting/injection layer for power‑workflows in the ChatGPT client.
Who Is It For?
Power users, teams, and labs that need reliable, fast, and scalable AI workflows,
combining local models for speed/privacy with cloud models when needed —
without ever getting stuck on quota, memory, or browser limits.
Pricing
3a2a™ is proprietary software licensed per machine, per month.
Starter
$49 / machine / month
- Single-node license
- Local Ollama + vLLM integration
- Basic load balancing
- Community support
Professional
$199 / machine / month
- All Starter features
- Multi-provider profiles (OpenAI, Anthropic…)
- Failover + key rotation
- Priority updates & support
Enterprise
$499 / node / month
- Cluster orchestration
- Advanced load balancing (EWMA, p2c, hedged requests)
- Dedicated support channel
- Custom branding & SLA