Self-Hosting - Quack AI

Backend API

Prerequisites

Whatever your installation method, you’ll need at least the following to be installed:

Docker (and Docker compose if you’re using an old version)
NVIDIA Container Toolkit and a GPU

We recommend min 5Gb of VRAM on your GPU for good performance/latency balance. Please note that by default, this will run your LLM locally (available offline) but if you don’t have a GPU, you can use online LLM providers (currently supported: Groq, OpenAI)

60 seconds setup ⏱️

This method is easier for you to try it out locally. Follow the instructions over here.

Production setup

This method is more adequate if you have an isolated/remote environment where you don’t necessarily want Git to interfere. You’ll need to own a domain name with the default configuration (e.g. “mydomain.nom”)

Create a `.env` file

Save this example file and name it .env.

Edit your environment variables

Edit your .env and replace the following values:

SUPERADMIN_GH_PAT: your Github user Personal Access Token to authenticate you as the admin. Head over to your Developer settings on GitHub, and “Generate new token”, pick a name and an expiration and confirm with “Generate token” (no need for extra permissions i.e. read-only)
pick secure passwords for POSTGRES_PASSWORD, SUPERADMIN_PWD, GF_ADMIN_PWD
ACME_EMAIL: the email linked to your certificate for HTTPS
POSTGRES_HOST & POSTGRES_PORT: the host and port of your remote PostgreSQL database service.
BACKEND_HOST: the subdomain where your users will access your API (e.g “api.mydomain.com”)

If you want to edit other aspects, check the env variable description.

Define a Docker orchestration

Save this docker compose configuration and name it docker-compose.yml. You can comment the deploy section of the ollama service if you wish to use your CPU to run the LLM instead.

Setting the certificate access permission

Before starting our docker setup, we need to make sure your ACME certificate will be readable:

touch acme.json
sudo chmod 600 acme.json

Run the service

You should now have two files (.env, docker-compose.yml) in your folder. Time to start the services:

docker compose up -d --wait ollama backend traefik

Using your API

Bravo, you now have a full running service! You can now start your VSCode extension, open the command palette and look for “Quack Companion: Set API endpoint” where you’ll need to paste the URL to the API endpoint.

Additional options

There are additional options to customize your service, here are a few:

Database hosting

Instead of hosting your PostgreSQL database locally or self-hosted, you can use hosting services like Supabase. You only need to replace the values of POSTGRES_HOST & POSTGRES_PORT in your .env file.

LLM selection

We use Ollama to serve LLMs. Edit OLLAMA_MODEL to use other models from the hub. For a good performance/latency balance, we recommend you use one of the following models: dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M, deepseek-coder:6.7b-instruct-q4_K_M. Please don’t pick oversized hardware or models for your needs, to preserve both your hardware life expectancy and your energy bills 💚

Application performance monitoring

When you start running your service at high workload, you might want to monitor its performances. You’ll find additional service in your docker compose using Prometheus & Grafana to give you a proper APM dashboard.

Getting started

​Backend API

​Prerequisites

​60 seconds setup ⏱️

​Production setup

​Using your API

​Additional options