Cookie-Fooocus — AI Image Generation

Generate an image right now

Free. No account. No API key. Powered by Pollinations.ai with NSFW filtering enforced — the same safety principle as Cookie-Fooocus. For the full experience, install locally.

🌐

This demo Static site · client-side prompt filter · Pollinations.ai backend (external, uncontrolled)

🍪

Full Cookie-Fooocus Local / server install · VRAM governor · priority queue · decision chain · 4-mode prompt engine · full 2-layer safety

Style Size Safe mode (client-side preview)

🎨

Your image will appear here

Type a prompt above and hit Generate

Built different

Not more features — better architecture.

Full deployment only

⚡

Won't crash on low VRAM

Predictive VRAM model estimates memory before a job starts. If budget is tight, it auto-scales: steps first, then resolution, then precision. You get a result, not an OOM error.

estimated = 3.5GB + (megapixels × cost) + (steps × cost)

Full deployment only

🔍

No silent changes

Every job has a decision chain — an audit log of what each stage decided about your parameters. No more "something silently reduced my steps."

{"stage": "vram_model", "action": "reduce_steps", "reason": "predicted_vram_high"}

🧠

4 prompt modes

RAW passes your prompt unchanged. BALANCED adds deterministic keyword expansion with no LLM. STANDARD uses the original Fooocus GPT-2 engine. LLM uses Ollama for creative rewrites. Every result shows what mode actually ran and why.

🛡️

2-layer safety

Layer 1 is fast deterministic rules — no ML, always active. Layer 2 is an ML classifier that only runs when Layer 1 passes. Image moderation is post-generation only — no wasted GPU cycles.

Full deployment only

📊

Real performance data

p50 and p95 per stage, VRAM peak tracking. The VRAM prediction model learns from actual usage using EWA smoothing — gets more accurate over time without oscillating.

🔌

Safe by default

n8n webhooks are disabled by default. Auto-tune is opt-in. Telemetry collects data but never modifies behaviour unless explicitly enabled. Everything sensitive requires a deliberate config change.

Full deployment only

💾

L1/L2 cache hierarchy

In-memory LRU for speed, SQLite for persistence. Cold starts warm from disk automatically. Cache failures are non-fatal — a write error never fails a generation job.

Full deployment only

🎥

Video pipeline

Same UI, same safety, same queue — switch to video output in one click. 8 motion presets. Frame cost cap prevents one request from running 144 sequential inference passes.

New in v2.5

🖥

Multi-GPU scheduling

GPU topology layer detects all CUDA / Metal devices. Jobs route to the least-loaded GPU. Per-GPU free VRAM and throughput scores updated after every job — no more queue stacking on one device.

New in v2.5

🌐

Distributed worker protocol

Control plane / execution plane split. Register local or remote HTTP workers. Job lease system with automatic reclaim on timeout. Foundation for home lab → production rendering farm scaling.

New in v2.5

🗄

Cache lifecycle hardening

SQLite prompt cache no longer grows indefinitely. Size-based eviction (soft cap 5k / hard cap 10k rows), age-based pruning (30-day TTL), async VACUUM compaction, and eviction metrics in stats().

New in v2.5

🔬

VRAM calibration tool

The EWA feedback model can drift after hardware changes. calibrate() runs a benchmark sweep across all resolution/step combinations and reports drift percentage. reset_vram_model() returns it to baseline.

report = governor.calibrate() → {"max_drift_pct": 3.2}

New in v2.5

🔎

Safety Explainability UI

The decision_chain is now surfaced in the UI as a visual timeline. Every stage shows what it decided, what it changed, and why — VRAM adjustments, safety layer decisions, scheduler actions, all auditable in one panel.

Runs on your hardware

Pick the row that matches your machine.

Hardware	Mode	Speed	Notes
16 GB RAM, no GPU	5 — No VRAM	~10–20 min/image	Minimum viable. Works.
4–8 GB VRAM GPU	1 or 2	~30–90 s/image	Recommended minimum for regular use
Apple Silicon 32 GB+	6 — MPS	~40–90 s/image	No VRAM ceiling — unified memory handles large jobs
12 GB+ VRAM GPU	1 — GPU	~10–30 s/image	Full feature set including LLM mode + stable video

🍏

Apple Silicon tip: Select Mode 6 on first run. Unified memory means large jobs that would OOM a same-spec NVIDIA card will complete. LLM mode needs Ollama. Best results with 32 GB+ but 16 GB works with BALANCED mode.

Install in 3 commands

git clone https://github.com/FreddieSparrow/cookiefooocus.git
cd cookiefooocus
bash install_local.sh
bash run.sh

Then open http://localhost:7865

Download the repo as ZIP from GitHub
Extract and double-click install_local.bat
Double-click run_local.bat
Open http://localhost:7865

⚠️ Server mode requires Linux + CUDA GPU. Not supported on macOS / Apple Silicon.

bash install_server.sh
cp auth.json.example auth.json
# Edit auth.json — change the admin password
bash run_server.sh

Default credentials: admin / changeme123 — change before exposing to any network.

Optional: LLM prompt mode

Install Ollama for Mode C (creative prompt rewriting). Falls back to BALANCED automatically if not installed.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4
ollama serve

Common errors, solved

CUDA out of memory

The VRAM governor should prevent this. If it happens, select a lower hardware mode or reduce resolution to 512–768.

Ollama unavailable

Run ollama serve, or switch to BALANCED mode. The trace will always say which mode actually ran.

Job timed out

Generation exceeded 600s. Reduce resolution or steps. Check VRAM with telemetry.dashboard().

Invalid HMAC signature

Secret mismatch between safety_policy.json and the n8n Code node. They must be identical.

Output hidden / blank

NSFW score above block threshold. Lower nsfw_block_threshold in safety_policy.json if the content is legitimate.

--server on macOS / Apple Silicon

Server mode is not supported on macOS or Apple Silicon — it will not start. Use local mode instead: bash run.sh. Server mode requires Linux + CUDA GPU.

Very slow on Apple Silicon

You may be in CPU mode accidentally. Confirm Mode 6 is selected. Check startup logs for mps.

Documentation

Everything you need to know — architecture, modules, configuration, and the v2.5 engineering improvements.

Architecture

Three strict layers: Core (SDXL inference), Orchestration (scheduler, VRAM governor, cache, telemetry), and Policy (safety, rate limits, profiles). UI never calls pipelines directly.

Prompt Engine (4 modes)

RAW — unchanged passthrough.
BALANCED — deterministic keyword expansion, no LLM, always reproducible.
STANDARD — original Fooocus GPT-2 engine.
LLM — Ollama with constrained JSON output, falls back to BALANCED.

VRAM Governor

Pre-execution VRAM budget check with predictive model and EWA feedback correction. Auto-downscale cascade: steps → resolution → precision → reject. New: calibrate() detects model drift; reset_vram_model() restores baseline.

2-Layer Safety

Layer 1: fast deterministic rules (regex, keyword, fuzzy — always on, no ML). Layer 2: ML classifier (DeBERTa v3, runs only when Layer 1 passes). Image moderation post-generation only. Structured SafetyDecision output on every check.

Scheduler & Queue

Priority queue with starvation prevention (low-priority jobs promoted after 30s). Per-user job limit (max 2 active). Job lifecycle: QUEUED → SCHEDULED → RUNNING → COMPLETE/FAILED/CANCELLED/TIMED_OUT.

Multi-GPU & Distributed Queue

GPUTopology layer detects all CUDA / Metal devices and routes jobs to the least-loaded device. ControlPlane / WorkerNode protocol enables multi-machine rendering with lease-based job ownership and heartbeat monitoring.

Cache System

L1 in-memory LRU + L2 SQLite. Separate caches for prompt expansions (no TTL) and NSFW scores (TTL 300s). New: size-based eviction (5k/10k row caps), age-based pruning (30 days), async VACUUM compaction, eviction metrics.

n8n Integration

HMAC-SHA256 signed requests with replay protection (nonce + timestamp). Rate limiting (30 req/60s per IP). Disabled by default in safety_policy.json. Callbacks on: blocked, complete, queue_wait.

Decision Chain & Explainability

Every job produces an ordered audit log of all parameter decisions. New: explainability module formats this as plain text, structured JSON, or an HTML timeline for the Gradio UI — what the system changed and why, fully visible.

Policy Profiles

Four profiles: balanced (default), creative (minimal filtering, LLM), strict (max moderation, public deploy), api_safe (programmatic / n8n). Copy values into safety_policy.json to apply — never auto-applied.

Video Pipeline

Same UI, queue, and safety as image generation. Backends: SVD (img2vid) and AnimateDiff. 8 motion presets: smooth, cinematic, handheld, zoom, orbit, parallax, dolly, drone. Frame cost cap: max 96 frames per job.

Full Readme

Complete documentation including all command-line flags, hardware modes, Apple Silicon guide, n8n signing examples, telemetry dashboard output, and full version comparison table.

Open full readme →

AI images thatactually work

Generate an image right now

Built different

Won't crash on low VRAM

No silent changes

4 prompt modes

2-layer safety

Real performance data

Safe by default

L1/L2 cache hierarchy

Video pipeline

Multi-GPU scheduling

Distributed worker protocol

Cache lifecycle hardening

VRAM calibration tool

Safety Explainability UI

Runs on your hardware

Install in 3 commands

Optional: LLM prompt mode

Common errors, solved

Documentation

Architecture

Prompt Engine (4 modes)

VRAM Governor

2-Layer Safety

Scheduler & Queue

Multi-GPU & Distributed Queue

Cache System

n8n Integration

Decision Chain & Explainability

Policy Profiles

Video Pipeline

Full Readme

AI images that
actually work