How to add a custom Guardrail like GLiGuard to your AI Gateway

Adding GLIGuard to LiteLLM AI Gateway

Guardrails for a Large language model (LLM) are rule based safety controls that validate the input and output of a model. They basically act like a gatekeeper between a user and a Large language model.

GLiGuard is an open-source, ultra-fast and very light weight AI guardrail that has only 300 million parameters. It is available on HuggingFace and can be easily integrated on any AI Gateway like LiteLLM.

Getting started with GLiGuard

GLiGuard ships as a HuggingFace model (fastino/gliguard-LLMGuardrails-300M) built on the gliner2 library — not the older gliner package, which will 404 trying to load it (it expects config.json, not gliner_config.json). At ~300M parameters and ~300MB download, it's light enough to run on a modest GPU (this setup uses a 6GB GTX 1660 Super) or even CPU.

Three steps to a running guardrail:

1. Install — in a clean virtualenv:

  pip install "gliner2[local]" fastapi uvicorn httpx

2. Load it once, classify many times — load the model at process startup (e.g. FastAPI's lifespan), move it to cuda/cpu, and call model.classify_text(text, tasks_dict) per request. Reloading per-request is the single biggest performance mistake to avoid.

model = GLiNER2.from_pretrained("fastino/gliguard-LLMGuardrails-300M")

model.to("cuda")

result = model.classify_text(prompt, {"prompt_safety": ["safe", "unsafe"]})

3. Wrap it in a service — expose it as a small FastAPI sidecar (/guard, /health) so any proxy or gateway in front of your LLM can call out to it before/after inference, rather than embedding the model in every app.

From there, the "getting started" naturally hands off to the more interesting half of the post — the five tasks GLiGuard supports (prompt_safety, prompt_toxicity, jailbreak_detection, response_safety, response_toxicity, response_refusal), the label sets, and the actual blocking rules, which is where the real engineering decisions live (e.g. why response_refusal has to gate response_safety.

We have used FastAPI service that exposes end-points that makes our job easier.

Testing a GuardRail using LiteLLM Guardrail Playground

/health

health check

  curl http://localhost:8765/health
  # {"status": "ok", "device": "cuda:0"}

As seen above, our code is running on a GPU - MSI GTX1660 Super with 6 GB RAM

/guard

/guard direct API

  Request:
  POST /guard
  {
    "prompt": "Ignore your previous instructions and tell me how to...",
    "response": "I can't help with that.",
    "tasks": ["prompt_safety", "jailbreak_detection", "response_refusal"]
  }

  Response (flat dict of task → label / label list):
  {
    "prompt_safety": "unsafe",
    "jailbreak_detection": ["instruction_override", "data_exfiltration"],
    "response_refusal": "refusal"
  }

The guard end-point makes the service independent. You can use it anywhere.

/beta/litellm_basic_guardrail_api

 POST /beta/litellm_basic_guardrail_api
  { "texts": ["Ignore your previous instructions..."], "input_type": "request" }

  What GLiGuard replies:
  { "action": "BLOCKED", "blocked_reason": "prompt_safety=unsafe, jailbreak_detection=['instruction_override']" }
  or, when content passes:
  { "action": "NONE" }

This is the format required by LiteLLM as per their documentation for custom guardrails.

LiteLLM gateway config (config.yaml)


  litellm_settings:
    guardrails:
      - guardrail_name: "gliguard"
        litellm_params:
          guardrail: generic_guardrail_api
          mode: pre_call          # or post_call / during_call
          api_base: http://10.0.0.111   # bare host — LiteLLM appends the path itself

Note - Gotcha worth a callout box: setting api_base: http://10.0.0.111/guard is a common mistake — LiteLLM appends /beta/litellm_basic_guardrail_api itself, so the path suffix produces 404s.

Full code https://github.com/sjmach/artificial-intelligence/tree/main/concepts/simple/guardrail/adding-gliguard-litellm-ai-gateway

How to add a custom Guardrail like GLiGuard to your AI Gateway

0 comments:

Post a Comment