A security-hardened fork of Fooocus that runs on weak hardware, won't crash on low VRAM, and tells you exactly what it did. Scales from 16 GB RAM laptops to Apple Silicon M4.
Free. No account. No API key. Powered by Pollinations.ai with NSFW filtering enforced — the same safety principle as Cookie-Fooocus. For the full experience, install locally.
Your image will appear here
Type a prompt above and hit Generate
Not more features — better architecture.
Predictive VRAM model estimates memory before a job starts. If budget is tight, it auto-scales: steps first, then resolution, then precision. You get a result, not an OOM error.
estimated = 3.5GB + (megapixels × cost) + (steps × cost)
Every job has a decision chain — an audit log of what each stage decided about your parameters. No more "something silently reduced my steps."
{"stage": "vram_model", "action": "reduce_steps", "reason": "predicted_vram_high"}
RAW passes your prompt unchanged. BALANCED adds deterministic keyword expansion with no LLM. STANDARD uses the original Fooocus GPT-2 engine. LLM uses Ollama for creative rewrites. Every result shows what mode actually ran and why.
Layer 1 is fast deterministic rules — no ML, always active. Layer 2 is an ML classifier that only runs when Layer 1 passes. Image moderation is post-generation only — no wasted GPU cycles.
p50 and p95 per stage, VRAM peak tracking. The VRAM prediction model learns from actual usage using EWA smoothing — gets more accurate over time without oscillating.
n8n webhooks are disabled by default. Auto-tune is opt-in. Telemetry collects data but never modifies behaviour unless explicitly enabled. Everything sensitive requires a deliberate config change.
In-memory LRU for speed, SQLite for persistence. Cold starts warm from disk automatically. Cache failures are non-fatal — a write error never fails a generation job.
Same UI, same safety, same queue — switch to video output in one click. 8 motion presets. Frame cost cap prevents one request from running 144 sequential inference passes.
GPU topology layer detects all CUDA / Metal devices. Jobs route to the least-loaded GPU. Per-GPU free VRAM and throughput scores updated after every job — no more queue stacking on one device.
Control plane / execution plane split. Register local or remote HTTP workers. Job lease system with automatic reclaim on timeout. Foundation for home lab → production rendering farm scaling.
SQLite prompt cache no longer grows indefinitely. Size-based eviction (soft cap 5k / hard cap 10k rows), age-based pruning (30-day TTL), async VACUUM compaction, and eviction metrics in stats().
The EWA feedback model can drift after hardware changes. calibrate() runs a benchmark sweep across all resolution/step combinations and reports drift percentage. reset_vram_model() returns it to baseline.
report = governor.calibrate() → {"max_drift_pct": 3.2}
The decision_chain is now surfaced in the UI as a visual timeline. Every stage shows what it decided, what it changed, and why — VRAM adjustments, safety layer decisions, scheduler actions, all auditable in one panel.
Pick the row that matches your machine.
| Hardware | Mode | Speed | Notes |
|---|---|---|---|
| 16 GB RAM, no GPU | 5 — No VRAM | ~10–20 min/image | Minimum viable. Works. |
| 4–8 GB VRAM GPU | 1 or 2 | ~30–90 s/image | Recommended minimum for regular use |
| Apple Silicon 32 GB+ | 6 — MPS | ~40–90 s/image | No VRAM ceiling — unified memory handles large jobs |
| 12 GB+ VRAM GPU | 1 — GPU | ~10–30 s/image | Full feature set including LLM mode + stable video |
git clone https://github.com/FreddieSparrow/cookiefooocus.git
cd cookiefooocus
bash install_local.sh
bash run.sh
Then open http://localhost:7865
Install Ollama for Mode C (creative prompt rewriting). Falls back to BALANCED automatically if not installed.
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4
ollama serve
ollama serve, or switch to BALANCED mode. The trace will always say which mode actually ran.telemetry.dashboard().safety_policy.json and the n8n Code node. They must be identical.nsfw_block_threshold in safety_policy.json if the content is legitimate.bash run.sh. Server mode requires Linux + CUDA GPU.mps.Everything you need to know — architecture, modules, configuration, and the v2.5 engineering improvements.
Three strict layers: Core (SDXL inference), Orchestration (scheduler, VRAM governor, cache, telemetry), and Policy (safety, rate limits, profiles). UI never calls pipelines directly.
Read more →RAW — unchanged passthrough.
BALANCED — deterministic keyword expansion, no LLM, always reproducible.
STANDARD — original Fooocus GPT-2 engine.
LLM — Ollama with constrained JSON output, falls back to BALANCED.
Pre-execution VRAM budget check with predictive model and EWA feedback correction. Auto-downscale cascade: steps → resolution → precision → reject. New: calibrate() detects model drift; reset_vram_model() restores baseline.
Read more →Layer 1: fast deterministic rules (regex, keyword, fuzzy — always on, no ML). Layer 2: ML classifier (DeBERTa v3, runs only when Layer 1 passes). Image moderation post-generation only. Structured SafetyDecision output on every check.
Read more →Priority queue with starvation prevention (low-priority jobs promoted after 30s). Per-user job limit (max 2 active). Job lifecycle: QUEUED → SCHEDULED → RUNNING → COMPLETE/FAILED/CANCELLED/TIMED_OUT.
Read more →GPUTopology layer detects all CUDA / Metal devices and routes jobs to the least-loaded device. ControlPlane / WorkerNode protocol enables multi-machine rendering with lease-based job ownership and heartbeat monitoring.
Read more →L1 in-memory LRU + L2 SQLite. Separate caches for prompt expansions (no TTL) and NSFW scores (TTL 300s). New: size-based eviction (5k/10k row caps), age-based pruning (30 days), async VACUUM compaction, eviction metrics.
Read more →HMAC-SHA256 signed requests with replay protection (nonce + timestamp). Rate limiting (30 req/60s per IP). Disabled by default in safety_policy.json. Callbacks on: blocked, complete, queue_wait.
Read more →Every job produces an ordered audit log of all parameter decisions. New: explainability module formats this as plain text, structured JSON, or an HTML timeline for the Gradio UI — what the system changed and why, fully visible.
Read more →Four profiles: balanced (default), creative (minimal filtering, LLM), strict (max moderation, public deploy), api_safe (programmatic / n8n). Copy values into safety_policy.json to apply — never auto-applied.
Read more →Same UI, queue, and safety as image generation. Backends: SVD (img2vid) and AnimateDiff. 8 motion presets: smooth, cinematic, handheld, zoom, orbit, parallax, dolly, drone. Frame cost cap: max 96 frames per job.
Read more →Complete documentation including all command-line flags, hardware modes, Apple Silicon guide, n8n signing examples, telemetry dashboard output, and full version comparison table.
Open full readme →