Getting started with Nvidia Garak

Getting Started with Nvidia Garak

What Is Nvidia Garak?

Nvidia Garak (Generative AI Red-teaming and Assessment Kit) is an open-source LLM vulnerability scanner built by NVIDIA's AI Red Team.

Garak probes models for:

Prompt injection — overriding system instructions via user input
Jailbreaks — bypassing safety guardrails (DAN variants, etc.)
Data leakage — surfacing training data or confidential context
Toxicity generation — coaxing offensive or harmful output
Hallucination and misinformation — factually incorrect confident outputs
Encoding-based bypasses — Base64, quoted-printable, MIME tricks that slip past input filters
Malware generation — getting the model to write evasive code

It works against virtually any target: OpenAI, HuggingFace, Bedrock, Groq, NVIDIA NIMs, custom REST APIs — and crucially, local Ollama models.

Installation

There are other ways you can install Garak but the simplest is using virtual environment.

python -m venv garak-env

source garak-env/bin/activate

pip install --upgrade pip
pip install garak

Installing Nvidia Garak using pip

The installation make take a couple of minutes.

Architecture

Generator - Connects to the target LLM (Ollama, OpenAI, HuggingFace, REST, etc.)
Probe - Crafts attack payloads targeting a specific vulnerability class
Detector -Analyzes the LLM's response — did the attack land?
Harness - Orchestrates probe → generator → detector flow

How to run Nvidia Garak?

In the example below we run a dan probe on a model hosted on local Ollama instance. We are running quick targeted rerun of just the wild prompts:

source garak-env/bin/activate
garak --target_type ollama --target_name llama3.2:3b-clean --probes dan.DanInTheWild

Tip - make sure the model name matches with the models returned by ollama list command.

Nvidia Garak running Dan probe for LLM vulnerability scanning

Where are the results?

The results are stored in /home/<username>/.local/share/garak/garak_runs for Linux users. You can customize this using a config file

Nvidia Garak Scan results

As you can see above, the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.That's the garak security tier verdict — the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.

Conclusion - Below DC-3 means the model failed to meet even the moderate baseline. Given the 90% DAN success rate we saw, that tracks — llama3.2:3b-clean with no system prompt is essentially unguarded.