Sundeep Machado

How to do a Performance Test for AI Models hosted on NVIDIA infrastructure?

A performance test is extremely important for an Artificial Intelligence (A.I.) Model just as it is important for an e-commerce website like Amazon.

Getting Started with NVIDIA Triton Server

Labels: ai, llm

NVIDIA Triton Inference Server

What is an Inference Server?

The role of an Inference Server is to accept user input data and pass it to an underlying trained model in the required format and return the results. It is also widely known as a Prediction Server as the results are nothing but predictions (in most cases).

The NVIDIA Triton server is a gold standard that standardizes AI model deployment and execution across every workload and it is important to know how it works internally for your custom or off the shelf models.

Getting Started with Ollama.ai

Labels: ai, development, large-language-models, llm, tutorial

Ollama.ai

Ollama.ai is an excellent tool that helps you to run Large Language Models (LLMs) locally on your computer like Llama2.

In this article, I decided to test whether Ollama can work with my consumer grade GPU - MSI GTX Super 1660

A list of things to remember when deploying a Large Language Model (LLM) on Production

Labels: ai, development, large-language-models, llm

LLAMA

A small checklist that intends to make your LLM deployment easier.

This checklist is intended to help you get started with deploying your own Model or an open source one like Llama2

Free Kubernetes cluster on Oracle Cloud using k3s

Labels: cloud, k3s, kubernetes, oci, tutorial

K3S

Oracle Cloud has a very generous forever free tier. I have been using k3s on my Raspberry Pi 4 machines for sometime in my local home network and it is working amazingly well.

I was very keen to deploy k3s for free on a cloud provider as a backup to my local clusters and finally managed to do that recently.