👋
In my last post I mentioned I was experimenting with running AIs locally and I now have a working setup!
I'm still using Ollama to run the actual models (the tool is pretty nice to use). But now I'm using their official docker image instead of building my own.
docker pull ollama/ollama
But the real game-changer was deploying an instance of open-webui. This is an incredible open-source project that integrates with ollama and wraps a familiar and friendly UI around it.
demo from open-webui github repository
So how did I run this?
Since I use Portainer to manage all the docker containers I run on my home server, I added a simple stack to run both together:
version: "2"
services:
ui:
image: ghcr.io/open-webui/open-webui:main
restart: unless-stopped
depends_on:
- ollama
volumes:
- /data/apps/ai/ui/:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
ports:
- 11300:8080
labels:
- traefik.enable=true
- traefik.http.services.ollama-ui.loadbalancer.server.port=8080
- traefik.docker.network=traefik_public
- traefik.http.routers.ollama-ui-web.rule=Host(`ai.domain.tld`)
- traefik.http.routers.ollama-ui-web.entrypoints=web
- traefik.http.routers.ollama-ui-web.middlewares=ollama-ui-redirect-web-secure
- traefik.http.routers.ollama-ui.tls=true
- traefik.http.routers.ollama-ui.rule=Host(`ai.domain.tld`)
- traefik.http.routers.ollama-ui.entrypoints=websecure
- traefik.http.middlewares.ollama-ui-redirect-web-secure.redirectscheme.scheme=https
networks:
- default
- traefik
ollama:
image: ollama/ollama
restart: unless-stopped
volumes:
- /data/apps/ai/ollama/:/root/.ollama
ports:
- 11434:11434
environment:
- OLLAMA_HOST:"0.0.0.0:11434"
networks:
traefik:
external:
name: "traefik_public"
(I use traefik as a reverse proxy for my containers, so the labels and network configuration are related to that)
From there I can connect to the open-webui
container and create an admin account. It can access the backing ollama
to actually answer prompts 🙌
Like I said before, I'm not too worried about performance - I don't have the machine power for that. But it actually performs a lot better than I expected.
I'm running this on a Minisforum NAB7 with an intel i7 12700H and 32GB of RAM. Since it doesn't have a dedicated GPU, everything is running on CPU only.
But still, once the models are loaded into memory, the responses are pretty quick. It probably isn't enough for multiple users, and it definitely isn't enough for the larger models, but it's definitely usable.
For reference the models I've been using are llama3.1:8b
and gemma2:9b
.
It's a cool little project to experiment different things with AI and LLMs. Not quite sure where I'm going to take it from here, maybe I'll share that in a future post!
See you tomorrow-ish 👋