![]() |
| Getting Started with Nvidia Garak |
What Is Nvidia Garak?
Nvidia Garak (Generative AI Red-teaming and Assessment Kit) is an open-source LLM vulnerability scanner built by NVIDIA's AI Red Team.
Garak probes models for:
- Prompt injection — overriding system instructions via user input
- Jailbreaks — bypassing safety guardrails (DAN variants, etc.)
- Data leakage — surfacing training data or confidential context
- Toxicity generation — coaxing offensive or harmful output
- Hallucination and misinformation — factually incorrect confident outputs
- Encoding-based bypasses — Base64, quoted-printable, MIME tricks that slip past input filters
- Malware generation — getting the model to write evasive code
It works against virtually any target: OpenAI, HuggingFace, Bedrock, Groq, NVIDIA NIMs, custom REST APIs — and crucially, local Ollama models.
Installation
There are other ways you can install Garak but the simplest is using virtual environment.
python -m venv garak-env
source garak-env/bin/activate
pip install --upgrade pip
pip install garak![]() |
| Installing Nvidia Garak using pip |
The installation make take a couple of minutes.
Architecture
- Generator - Connects to the target LLM (Ollama, OpenAI, HuggingFace, REST, etc.)
- Probe - Crafts attack payloads targeting a specific vulnerability class
- Detector -Analyzes the LLM's response — did the attack land?
- Harness - Orchestrates probe → generator → detector flow
How to run Nvidia Garak?
In the example below we run a dan probe on a model hosted on local Ollama instance. We are running quick targeted rerun of just the wild prompts:
source garak-env/bin/activate
garak --target_type ollama --target_name llama3.2:3b-clean --probes dan.DanInTheWildTip - make sure the model name matches with the models returned by ollama list command.
Where are the results?
The results are stored in /home/<username>/.local/share/garak/garak_runs for Linux users. You can customize this using a config file![]() |
| Nvidia Garak Scan results |
As you can see above, the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.That's the garak security tier verdict — the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.
Conclusion - Below DC-3 means the model failed to meet even the moderate baseline. Given the 90% DAN success rate we saw, that tracks —
llama3.2:3b-clean with no system prompt is essentially unguarded.





0 comments:
Post a Comment
What do you think?.