14 KiB
Investigation Report: kiwix-serve Docker & zim-llm Setup
Date: 2026-05-14
Investigator: AI Assistant
Part 1: zim-llm Setup (COMPLETE)
Repository
- GitHub: https://github.com/rouralberto/zim-llm
- README: https://raw.githubusercontent.com/rouralberto/zim-llm/master/README.md
What is zim-llm?
A comprehensive system for processing ZIM files (compressed Wikipedia/offline content databases) and creating a vector database for Retrieval-Augmented Generation (RAG) with Large Language Models, effectively having an offline knowledge base.
Exact Setup Commands
1. Clone the Repository (as regular user)
git clone https://github.com/rouralberto/zim-llm.git
cd zim-llm
2. Install Dependencies (as regular user)
# Run the setup script
./setup.sh
# Or manually install with pip
pip install -r requirements.txt
3. Download ZIM Files (as regular user)
# Create library directory
mkdir -p zim_library
# Download ZIM files from Kiwix Library
# Options:
# - https://library.kiwix.org/
# - https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
# Example: Download a ZIM file using wget
cd zim_library
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
# Or copy from downloads
cp ~/Downloads/*.zim ./zim_library/
4. Configure (as regular user)
Create a config.json file:
{
"zim_library_path": "./zim_library",
"embedding_model": "all-MiniLM-L6-v2",
"vector_db_type": "chroma",
"chunk_size": 1000,
"chunk_overlap": 200,
"persist_directory": "./vector_db",
"collection_name": "zim_articles",
"llm_provider": "docker_model_runner",
"llm_model": "ai/smollm3:Q4_K_M",
"max_articles_per_zim": null
}
5. Build Vector Database (as regular user)
# Build from all ZIM files in library
python zim_rag.py build
# Or build from specific ZIM file
python zim_rag.py build --zim-file "wikipedia_en_medicine_maxi_2023-12.zim"
# Limit articles per ZIM file for faster processing
python zim_rag.py build --limit 1000
# Force rebuild
python zim_rag.py build --force
Note: The first build is necessary only once and can take a very long time depending on the ZIM file size. Large ZIM files (2GB+) may take several hours.
6. Setup Docker Model Runner (requires Docker)
# Pull the model (requires Docker)
docker pull ghcr.io/ramses-rf/dmr:latest
# Or use the recommended model
docker pull ghcr.io/ramses-rf/dmr:ai-smollm3-Q4_K_M
7. Run Queries (as regular user)
# Simple semantic search
python zim_rag.py query "What are treatments for PTSD?"
# Full RAG with LLM generation
python zim_rag.py rag-query "Explain the latest developments in military medicine"
# List all ZIM files in library
python zim_rag.py list-zim
# Get system information
python zim_rag.py info
Available Commands
# Build vector database
python zim_rag.py build [OPTIONS]
--zim-file TEXT Specific ZIM file to process
--limit INTEGER Limit number of articles per ZIM file
--force Force rebuild even if vector DB exists
# Query commands
python zim_rag.py query [OPTIONS] QUESTION
--k INTEGER Number of documents to retrieve [default: 5]
python zim_rag.py rag-query QUESTION
# Library management
python zim_rag.py list-zim
python zim_rag.py info
# Export articles
python zim_rag.py export [OPTIONS]
--zim-file TEXT Specific ZIM file to export
--output TEXT Output file [default: zim_articles.json]
--limit INTEGER Limit number of articles per ZIM file
Configuration Options
Embedding Models:
all-MiniLM-L6-v2(fast, good quality) - defaultall-mpnet-base-v2(higher quality, slower)paraphrase-multilingual-MiniLM-L12-v2(multilingual support)
Vector Database Types:
chroma- ChromaDB (recommended, persistent, metadata-rich)faiss- FAISS (faster search, less metadata)
LLM Configuration:
- Uses Docker Model Runner with
ai/smollm3:Q4_K_Mmodel
System Requirements
- RAM: 4GB minimum, 8GB+ recommended
- Storage: 2-3x the size of your ZIM file for the vector database
- GPU: Optional, but recommended for faster embedding generation
Part 2: kiwix-serve Docker Deployment (COMPLETE)
What is kiwix-serve?
kiwix-serve is an HTTP server for serving ZIM files (offline Wikipedia and other wiki content). It's part of the Kiwix project.
Repository
- GitHub: https://github.com/kiwix/kiwix-serve
- Kiwix Website: https://kiwix.org
- Kiwix Wiki: https://wiki.kiwix.org
Official Docker Image
Note: There is NO official kiwix/kiwix-serve Docker image on Docker Hub. The namespace exists but the image is not published. You have two options:
Option A: Build from Source (Recommended)
# 1. Clone the repository (as regular user)
git clone https://github.com/kiwix/kiwix-serve.git
cd kiwix-serve
# 2. Build the binary (requires build dependencies)
# Install dependencies (requires sudo)
sudo apt-get update
sudo apt-get install -y \
cmake \
g++ \
libkiwix-dev \
libmicrohttpd-dev
# Build (as regular user)
cmake .
make
sudo make install
# 3. Run kiwix-serve directly (as regular user)
kiwix-serve --port=8080 /path/to/library.xml
Option B: Use Community Docker Image (if available)
# Search for community images
docker search kiwix
# Example (if a community image exists):
docker pull someuser/kiwix-serve
Option C: Run kiwix-serve in Docker using libkiwix
# Create a Dockerfile
FROM ubuntu:22.04
# Install dependencies
RUN apt-get update && apt-get install -y \
libkiwix-dev \
libmicrohttpd-dev \
&& rm -rf /var/lib/apt/lists/*
# Clone and build
RUN git clone https://github.com/kiwix/kiwix-serve.git /tmp/kiwix-serve \
&& cd /tmp/kiwix-serve \
&& cmake . \
&& make \
&& make install
EXPOSE 8080
CMD ["kiwix-serve", "--port=8080", "/data/library.xml"]
Build and run:
# Build the image (requires Docker)
docker build -t kiwix-serve .
# Run with ZIM files and library.xml (requires Docker)
docker run -d \
-p 8080:8080 \
-v /path/to/zim/files:/data \
--name kiwix-server \
kiwix-serve
Download ZIM Files (as regular user)
# Create directory for ZIM files
mkdir -p ~/kiwix/zim
cd ~/kiwix/zim
# Download from official Kiwix library
# Main source: https://download.kiwix.org/zim/
# Wikipedia: https://download.kiwix.org/zim/wikipedia/
# Medicine: https://download.kiwix.org/zim/medical/
# Example downloads:
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
wget https://download.kiwix.org/zim/medical/wikimed_en_medicine_maxi_2023-12.zim
# Or use zimfetchdownloader (if installed)
zimfetchdownloader --url "https://library.kiwix.org"
Create library.xml (as regular user)
cat > ~/kiwix/library.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book path="wikipedia_en_all_maxi_2024-01.zim"
title="English Wikipedia"
language="eng"
creator="Wikipedia"
publisher="Kiwix"
date="2024-01"
description="English Wikipedia"
flavor="maxi"
tags="wikipedia;english" />
<book path="wikimed_en_medicine_maxi_2023-12.zim"
title="Medical Wikipedia"
language="eng"
creator="Wikipedia"
publisher="Kiwix"
date="2023-12"
description="Medical content from Wikipedia"
flavor="maxi"
tags="wikipedia;medicine;health" />
</library>
EOF
Alternative: Generate library.xml automatically:
# Using kiwix-tools (if installed)
kiwix-maintainlib --output=library.xml /path/to/zim/files/*.zim
# Or use zim-tools
zimsearch --list > library.xml
Configuration Options for kiwix-serve
kiwix-serve [OPTIONS] [ZIM_FILE | LIBRARY_FILE]
Options:
--port=PORT Port to listen on (default: 8080)
--address=ADDRESS IP address to bind to (default: 0.0.0.0)
--daemon Run as daemon (background process)
--threads=NUM Number of threads to use (default: 4)
--timeout=SEC Timeout in seconds (default: 30)
--root=PATH Root URL path
--without-kernel Disable kernel support
--help Show help message
Example: Full kiwix-serve Setup
# 1. Install build dependencies (requires sudo)
sudo apt-get update
sudo apt-get install -y \
cmake \
g++ \
libkiwix-dev \
libmicrohttpd-dev \
libzim-dev
# 2. Clone and build (as regular user)
git clone https://github.com/kiwix/kiwix-serve.git
cd kiwix-serve
cmake .
make
sudo make install
# 3. Download ZIM files (as regular user)
mkdir -p ~/kiwix/zim
cd ~/kiwix/zim
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
# 4. Create library.xml (as regular user)
cat > ~/kiwix/library.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book path="wikipedia_en_all_maxi_2024-01.zim" title="English Wikipedia" />
</library>
EOF
# 5. Run kiwix-serve (as regular user)
kiwix-serve --port=8080 --daemon ~/kiwix/library.xml
# 6. Access at http://localhost:8080
Part 3: Docker Prerequisites
Check if Docker is Installed
# Check Docker version
docker --version
# Check if Docker daemon is running
systemctl status docker
# Check if you can run Docker without sudo
docker run hello-world
Install Docker (requires sudo)
Ubuntu/Debian:
# Update package index
sudo apt-get update
# Install prerequisites
sudo apt-get install -y \
ca-certificates \
curl \
gnupg \
lsb-release
# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add Docker repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker
Fedora/RHEL:
# Add Docker repository
sudo dnf -y dnf install dnf-plugins-core
sudo dnf config-manager \
--add-repo https://download.docker.com/linux/fedora/docker-ce.repo
# Install Docker Engine
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker
Add User to Docker Group (requires sudo)
# Add current user to docker group
sudo usermod -aG docker $USER
# Verify group membership
groups $USER
# IMPORTANT: Logout and login again for changes to take effect
# Or run:
newgrp docker
# Verify Docker works without sudo
docker run hello-world
Verify Docker Installation
# Check Docker version
docker --version
docker-compose --version
# Test Docker installation
docker run hello-world
# Check Docker service status
systemctl status docker
# List Docker images
docker images
# List running containers
docker ps
Part 4: Complete Workflow Example
Scenario: Setting up zim-llm with kiwix-serve
Step 1: Install Docker (requires sudo)
# Follow the Docker installation steps in Part 3
# Add user to docker group and logout/login
Step 2: Install zim-llm (as regular user)
git clone https://github.com/rouralberto/zim-llm.git
cd zim-llm
./setup.sh
Step 3: Download ZIM Files (as regular user)
mkdir -p zim_library
cd zim_library
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
cd ..
Step 4: Build Vector Database (as regular user)
python zim_rag.py build --limit 1000 # Limit for faster testing
Step 5: Run Queries (as regular user)
python zim_rag.py query "What is machine learning?"
python zim_rag.py rag-query "Explain neural networks"
Step 6: (Optional) Run kiwix-serve for Web Interface (as regular user)
# If you built kiwix-serve from source
kiwix-serve --port=8080 zim_library/library.xml
# Access at http://localhost:8080
Links and Resources
Official Documentation
- Kiwix Website: https://kiwix.org
- Kiwix Wiki: https://wiki.kiwix.org
- Kiwix GitHub: https://github.com/kiwix
- ZIM File Downloads: https://download.kiwix.org/zim/
- Kiwix Library Browser: https://library.kiwix.org
Repositories
- zim-llm: https://github.com/rouralberto/zim-llm
- kiwix-serve: https://github.com/kiwix/kiwix-serve
- libkiwix: https://github.com/kiwix/libkiwix
Related Tools
- Docker Model Runner: https://github.com/ramses-rf/dmr
- ChromaDB: https://www.trychroma.com
- FAISS: https://github.com/facebookresearch/faiss
Summary of Privilege Requirements
| Task | Privilege Level | Command Example |
|---|---|---|
| Install Docker | sudo | sudo apt-get install docker-ce |
| Add user to docker group | sudo | sudo usermod -aG docker $USER |
| Clone git repositories | User | git clone https://github.com/... |
| Install Python packages | User | pip install -r requirements.txt |
| Download ZIM files | User | wget https://... |
| Build vector database | User | python zim_rag.py build |
| Run queries | User | python zim_rag.py query "..." |
| Build kiwix-serve from source | sudo for deps, then user | sudo apt-get install libkiwix-dev then cmake . && make |
| Run kiwix-serve | User | kiwix-serve --port=8080 library.xml |
| Pull Docker images | User (if in docker group) | docker pull image:tag |
Troubleshooting
Common Issues
1. "Permission denied" when running Docker commands
- Solution: Add user to docker group and logout/login
sudo usermod -aG docker $USER
2. zim-llm build takes too long
- Solution: Use
--limitflag to process fewer articles python zim_rag.py build --limit 100
3. Out of memory during build
- Solution: Use smaller ZIM files or increase RAM
- Consider using FAISS instead of ChromaDB
4. kiwix-serve won't compile
- Solution: Ensure all dependencies are installed
sudo apt-get install libkiwix-dev libmicrohttpd-dev libzim-dev cmake g++
5. Cannot find ZIM files
- Solution: Check https://download.kiwix.org/zim/ for available files
- Use smaller files for testing first