Files
kdb/zim_llm_kiwix_serve_investigation_2026-05-14.md
T

14 KiB

Investigation Report: kiwix-serve Docker & zim-llm Setup

Date: 2026-05-14
Investigator: AI Assistant


Part 1: zim-llm Setup (COMPLETE)

Repository

What is zim-llm?

A comprehensive system for processing ZIM files (compressed Wikipedia/offline content databases) and creating a vector database for Retrieval-Augmented Generation (RAG) with Large Language Models, effectively having an offline knowledge base.

Exact Setup Commands

1. Clone the Repository (as regular user)

git clone https://github.com/rouralberto/zim-llm.git
cd zim-llm

2. Install Dependencies (as regular user)

# Run the setup script
./setup.sh

# Or manually install with pip
pip install -r requirements.txt

3. Download ZIM Files (as regular user)

# Create library directory
mkdir -p zim_library

# Download ZIM files from Kiwix Library
# Options:
# - https://library.kiwix.org/
# - https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/

# Example: Download a ZIM file using wget
cd zim_library
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim

# Or copy from downloads
cp ~/Downloads/*.zim ./zim_library/

4. Configure (as regular user)

Create a config.json file:

{
  "zim_library_path": "./zim_library",
  "embedding_model": "all-MiniLM-L6-v2",
  "vector_db_type": "chroma",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "persist_directory": "./vector_db",
  "collection_name": "zim_articles",
  "llm_provider": "docker_model_runner",
  "llm_model": "ai/smollm3:Q4_K_M",
  "max_articles_per_zim": null
}

5. Build Vector Database (as regular user)

# Build from all ZIM files in library
python zim_rag.py build

# Or build from specific ZIM file
python zim_rag.py build --zim-file "wikipedia_en_medicine_maxi_2023-12.zim"

# Limit articles per ZIM file for faster processing
python zim_rag.py build --limit 1000

# Force rebuild
python zim_rag.py build --force

Note: The first build is necessary only once and can take a very long time depending on the ZIM file size. Large ZIM files (2GB+) may take several hours.

6. Setup Docker Model Runner (requires Docker)

# Pull the model (requires Docker)
docker pull ghcr.io/ramses-rf/dmr:latest

# Or use the recommended model
docker pull ghcr.io/ramses-rf/dmr:ai-smollm3-Q4_K_M

7. Run Queries (as regular user)

# Simple semantic search
python zim_rag.py query "What are treatments for PTSD?"

# Full RAG with LLM generation
python zim_rag.py rag-query "Explain the latest developments in military medicine"

# List all ZIM files in library
python zim_rag.py list-zim

# Get system information
python zim_rag.py info

Available Commands

# Build vector database
python zim_rag.py build [OPTIONS]
  --zim-file TEXT    Specific ZIM file to process
  --limit INTEGER    Limit number of articles per ZIM file
  --force            Force rebuild even if vector DB exists

# Query commands
python zim_rag.py query [OPTIONS] QUESTION
  --k INTEGER        Number of documents to retrieve [default: 5]

python zim_rag.py rag-query QUESTION

# Library management
python zim_rag.py list-zim
python zim_rag.py info

# Export articles
python zim_rag.py export [OPTIONS]
  --zim-file TEXT    Specific ZIM file to export
  --output TEXT      Output file [default: zim_articles.json]
  --limit INTEGER    Limit number of articles per ZIM file

Configuration Options

Embedding Models:

  • all-MiniLM-L6-v2 (fast, good quality) - default
  • all-mpnet-base-v2 (higher quality, slower)
  • paraphrase-multilingual-MiniLM-L12-v2 (multilingual support)

Vector Database Types:

  • chroma - ChromaDB (recommended, persistent, metadata-rich)
  • faiss - FAISS (faster search, less metadata)

LLM Configuration:

  • Uses Docker Model Runner with ai/smollm3:Q4_K_M model

System Requirements

  • RAM: 4GB minimum, 8GB+ recommended
  • Storage: 2-3x the size of your ZIM file for the vector database
  • GPU: Optional, but recommended for faster embedding generation

Part 2: kiwix-serve Docker Deployment (COMPLETE)

What is kiwix-serve?

kiwix-serve is an HTTP server for serving ZIM files (offline Wikipedia and other wiki content). It's part of the Kiwix project.

Repository

Official Docker Image

Note: There is NO official kiwix/kiwix-serve Docker image on Docker Hub. The namespace exists but the image is not published. You have two options:

# 1. Clone the repository (as regular user)
git clone https://github.com/kiwix/kiwix-serve.git
cd kiwix-serve

# 2. Build the binary (requires build dependencies)
# Install dependencies (requires sudo)
sudo apt-get update
sudo apt-get install -y \
    cmake \
    g++ \
    libkiwix-dev \
    libmicrohttpd-dev

# Build (as regular user)
cmake .
make
sudo make install

# 3. Run kiwix-serve directly (as regular user)
kiwix-serve --port=8080 /path/to/library.xml

Option B: Use Community Docker Image (if available)

# Search for community images
docker search kiwix

# Example (if a community image exists):
docker pull someuser/kiwix-serve

Option C: Run kiwix-serve in Docker using libkiwix

# Create a Dockerfile
FROM ubuntu:22.04

# Install dependencies
RUN apt-get update && apt-get install -y \
    libkiwix-dev \
    libmicrohttpd-dev \
    && rm -rf /var/lib/apt/lists/*

# Clone and build
RUN git clone https://github.com/kiwix/kiwix-serve.git /tmp/kiwix-serve \
    && cd /tmp/kiwix-serve \
    && cmake . \
    && make \
    && make install

EXPOSE 8080

CMD ["kiwix-serve", "--port=8080", "/data/library.xml"]

Build and run:

# Build the image (requires Docker)
docker build -t kiwix-serve .

# Run with ZIM files and library.xml (requires Docker)
docker run -d \
  -p 8080:8080 \
  -v /path/to/zim/files:/data \
  --name kiwix-server \
  kiwix-serve

Download ZIM Files (as regular user)

# Create directory for ZIM files
mkdir -p ~/kiwix/zim
cd ~/kiwix/zim

# Download from official Kiwix library
# Main source: https://download.kiwix.org/zim/
# Wikipedia: https://download.kiwix.org/zim/wikipedia/
# Medicine: https://download.kiwix.org/zim/medical/

# Example downloads:
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
wget https://download.kiwix.org/zim/medical/wikimed_en_medicine_maxi_2023-12.zim

# Or use zimfetchdownloader (if installed)
zimfetchdownloader --url "https://library.kiwix.org"

Create library.xml (as regular user)

cat > ~/kiwix/library.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book path="wikipedia_en_all_maxi_2024-01.zim" 
        title="English Wikipedia" 
        language="eng" 
        creator="Wikipedia" 
        publisher="Kiwix" 
        date="2024-01" 
        description="English Wikipedia" 
        flavor="maxi" 
        tags="wikipedia;english" />
  
  <book path="wikimed_en_medicine_maxi_2023-12.zim" 
        title="Medical Wikipedia" 
        language="eng" 
        creator="Wikipedia" 
        publisher="Kiwix" 
        date="2023-12" 
        description="Medical content from Wikipedia" 
        flavor="maxi" 
        tags="wikipedia;medicine;health" />
</library>
EOF

Alternative: Generate library.xml automatically:

# Using kiwix-tools (if installed)
kiwix-maintainlib --output=library.xml /path/to/zim/files/*.zim

# Or use zim-tools
zimsearch --list > library.xml

Configuration Options for kiwix-serve

kiwix-serve [OPTIONS] [ZIM_FILE | LIBRARY_FILE]

Options:
  --port=PORT         Port to listen on (default: 8080)
  --address=ADDRESS   IP address to bind to (default: 0.0.0.0)
  --daemon            Run as daemon (background process)
  --threads=NUM       Number of threads to use (default: 4)
  --timeout=SEC       Timeout in seconds (default: 30)
  --root=PATH         Root URL path
  --without-kernel    Disable kernel support
  --help              Show help message

Example: Full kiwix-serve Setup

# 1. Install build dependencies (requires sudo)
sudo apt-get update
sudo apt-get install -y \
    cmake \
    g++ \
    libkiwix-dev \
    libmicrohttpd-dev \
    libzim-dev

# 2. Clone and build (as regular user)
git clone https://github.com/kiwix/kiwix-serve.git
cd kiwix-serve
cmake .
make
sudo make install

# 3. Download ZIM files (as regular user)
mkdir -p ~/kiwix/zim
cd ~/kiwix/zim
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim

# 4. Create library.xml (as regular user)
cat > ~/kiwix/library.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book path="wikipedia_en_all_maxi_2024-01.zim" title="English Wikipedia" />
</library>
EOF

# 5. Run kiwix-serve (as regular user)
kiwix-serve --port=8080 --daemon ~/kiwix/library.xml

# 6. Access at http://localhost:8080

Part 3: Docker Prerequisites

Check if Docker is Installed

# Check Docker version
docker --version

# Check if Docker daemon is running
systemctl status docker

# Check if you can run Docker without sudo
docker run hello-world

Install Docker (requires sudo)

Ubuntu/Debian:

# Update package index
sudo apt-get update

# Install prerequisites
sudo apt-get install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
    sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker

Fedora/RHEL:

# Add Docker repository
sudo dnf -y dnf install dnf-plugins-core
sudo dnf config-manager \
    --add-repo https://download.docker.com/linux/fedora/docker-ce.repo

# Install Docker Engine
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker

Add User to Docker Group (requires sudo)

# Add current user to docker group
sudo usermod -aG docker $USER

# Verify group membership
groups $USER

# IMPORTANT: Logout and login again for changes to take effect
# Or run:
newgrp docker

# Verify Docker works without sudo
docker run hello-world

Verify Docker Installation

# Check Docker version
docker --version
docker-compose --version

# Test Docker installation
docker run hello-world

# Check Docker service status
systemctl status docker

# List Docker images
docker images

# List running containers
docker ps

Part 4: Complete Workflow Example

Scenario: Setting up zim-llm with kiwix-serve

Step 1: Install Docker (requires sudo)

# Follow the Docker installation steps in Part 3
# Add user to docker group and logout/login

Step 2: Install zim-llm (as regular user)

git clone https://github.com/rouralberto/zim-llm.git
cd zim-llm
./setup.sh

Step 3: Download ZIM Files (as regular user)

mkdir -p zim_library
cd zim_library
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
cd ..

Step 4: Build Vector Database (as regular user)

python zim_rag.py build --limit 1000  # Limit for faster testing

Step 5: Run Queries (as regular user)

python zim_rag.py query "What is machine learning?"
python zim_rag.py rag-query "Explain neural networks"

Step 6: (Optional) Run kiwix-serve for Web Interface (as regular user)

# If you built kiwix-serve from source
kiwix-serve --port=8080 zim_library/library.xml

# Access at http://localhost:8080

Official Documentation

Repositories


Summary of Privilege Requirements

Task Privilege Level Command Example
Install Docker sudo sudo apt-get install docker-ce
Add user to docker group sudo sudo usermod -aG docker $USER
Clone git repositories User git clone https://github.com/...
Install Python packages User pip install -r requirements.txt
Download ZIM files User wget https://...
Build vector database User python zim_rag.py build
Run queries User python zim_rag.py query "..."
Build kiwix-serve from source sudo for deps, then user sudo apt-get install libkiwix-dev then cmake . && make
Run kiwix-serve User kiwix-serve --port=8080 library.xml
Pull Docker images User (if in docker group) docker pull image:tag

Troubleshooting

Common Issues

1. "Permission denied" when running Docker commands

  • Solution: Add user to docker group and logout/login
  • sudo usermod -aG docker $USER

2. zim-llm build takes too long

  • Solution: Use --limit flag to process fewer articles
  • python zim_rag.py build --limit 100

3. Out of memory during build

  • Solution: Use smaller ZIM files or increase RAM
  • Consider using FAISS instead of ChromaDB

4. kiwix-serve won't compile

  • Solution: Ensure all dependencies are installed
  • sudo apt-get install libkiwix-dev libmicrohttpd-dev libzim-dev cmake g++

5. Cannot find ZIM files