Getting started with Nvidia Garak

 

Getting Started with Nvidia Garak
Getting Started with Nvidia Garak

What Is  Nvidia Garak?

Nvidia Garak (Generative AI Red-teaming and Assessment Kit) is an open-source LLM vulnerability scanner built by NVIDIA's AI Red Team.

 Garak probes models for:

  1. Prompt injection — overriding system instructions via user input
  2. Jailbreaks — bypassing safety guardrails (DAN variants, etc.)
  3. Data leakage — surfacing training data or confidential context
  4. Toxicity generation — coaxing offensive or harmful output
  5. Hallucination and misinformation — factually incorrect confident outputs
  6. Encoding-based bypasses — Base64, quoted-printable, MIME tricks that slip past input filters
  7. Malware generation — getting the model to write evasive code
It works against virtually any target: OpenAI, HuggingFace, Bedrock, Groq, NVIDIA NIMs, custom REST APIs — and crucially, local Ollama models.


Installation

There are other ways you can install Garak but the simplest is using virtual environment. 
python -m venv garak-env

source garak-env/bin/activate
pip install --upgrade pip
pip install garak
Installing Nvidia Garak using pip
Installing Nvidia Garak using pip

The installation make take a couple of minutes. 

Architecture

  • Generator - Connects to the target LLM (Ollama, OpenAI, HuggingFace, REST, etc.)
  • Probe - Crafts attack payloads targeting a specific vulnerability class
  • Detector -Analyzes the LLM's response — did the attack land?
  • Harness - Orchestrates probe → generator → detector flow

How to run Nvidia Garak?

In the example below we run a dan probe on a model hosted on local Ollama instance. We are running quick targeted rerun of just the wild prompts:
source garak-env/bin/activate
garak --target_type ollama --target_name llama3.2:3b-clean --probes dan.DanInTheWild
Tip - make sure the model name matches with the models returned by ollama list command.

Nvidia Garak running Dan probe for LLM vulnerability scanning
Nvidia Garak running Dan probe for LLM vulnerability scanning


Where are the results?

The results are stored in /home/<username>/.local/share/garak/garak_runs for Linux users. You can customize this using a config file


Nvidia Garak Scan results
Nvidia Garak Scan results


As you can see above, the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.That's the garak security tier verdict — the overall scan scored below DC-3 on NVIDIA's Defense Capability scale.

ConclusionBelow DC-3 means the model failed to meet even the moderate baseline. Given the 90% DAN success rate we saw, that tracks — llama3.2:3b-clean with no system prompt is essentially unguarded.




0 comments:

Post a Comment

What do you think?.

© 2007 - DMCA.com Protection Status
The content is copyrighted to Sundeep Machado


Note: The author is not responsible for damages related to improper use of software, techniques, tips and copyright claims.