557 lines
14 KiB
Markdown
557 lines
14 KiB
Markdown
# Investigation Report: kiwix-serve Docker & zim-llm Setup
|
|
|
|
**Date:** 2026-05-14
|
|
**Investigator:** AI Assistant
|
|
|
|
---
|
|
|
|
## Part 1: zim-llm Setup (COMPLETE)
|
|
|
|
### Repository
|
|
- **GitHub:** https://github.com/rouralberto/zim-llm
|
|
- **README:** https://raw.githubusercontent.com/rouralberto/zim-llm/master/README.md
|
|
|
|
### What is zim-llm?
|
|
A comprehensive system for processing ZIM files (compressed Wikipedia/offline content databases) and creating a vector database for Retrieval-Augmented Generation (RAG) with Large Language Models, effectively having an offline knowledge base.
|
|
|
|
### Exact Setup Commands
|
|
|
|
#### 1. Clone the Repository (as regular user)
|
|
```bash
|
|
git clone https://github.com/rouralberto/zim-llm.git
|
|
cd zim-llm
|
|
```
|
|
|
|
#### 2. Install Dependencies (as regular user)
|
|
```bash
|
|
# Run the setup script
|
|
./setup.sh
|
|
|
|
# Or manually install with pip
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
#### 3. Download ZIM Files (as regular user)
|
|
```bash
|
|
# Create library directory
|
|
mkdir -p zim_library
|
|
|
|
# Download ZIM files from Kiwix Library
|
|
# Options:
|
|
# - https://library.kiwix.org/
|
|
# - https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
|
|
|
|
# Example: Download a ZIM file using wget
|
|
cd zim_library
|
|
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
|
|
|
# Or copy from downloads
|
|
cp ~/Downloads/*.zim ./zim_library/
|
|
```
|
|
|
|
#### 4. Configure (as regular user)
|
|
Create a `config.json` file:
|
|
```json
|
|
{
|
|
"zim_library_path": "./zim_library",
|
|
"embedding_model": "all-MiniLM-L6-v2",
|
|
"vector_db_type": "chroma",
|
|
"chunk_size": 1000,
|
|
"chunk_overlap": 200,
|
|
"persist_directory": "./vector_db",
|
|
"collection_name": "zim_articles",
|
|
"llm_provider": "docker_model_runner",
|
|
"llm_model": "ai/smollm3:Q4_K_M",
|
|
"max_articles_per_zim": null
|
|
}
|
|
```
|
|
|
|
#### 5. Build Vector Database (as regular user)
|
|
```bash
|
|
# Build from all ZIM files in library
|
|
python zim_rag.py build
|
|
|
|
# Or build from specific ZIM file
|
|
python zim_rag.py build --zim-file "wikipedia_en_medicine_maxi_2023-12.zim"
|
|
|
|
# Limit articles per ZIM file for faster processing
|
|
python zim_rag.py build --limit 1000
|
|
|
|
# Force rebuild
|
|
python zim_rag.py build --force
|
|
```
|
|
|
|
**Note:** The first build is necessary only once and can take a very long time depending on the ZIM file size. Large ZIM files (2GB+) may take several hours.
|
|
|
|
#### 6. Setup Docker Model Runner (requires Docker)
|
|
```bash
|
|
# Pull the model (requires Docker)
|
|
docker pull ghcr.io/ramses-rf/dmr:latest
|
|
|
|
# Or use the recommended model
|
|
docker pull ghcr.io/ramses-rf/dmr:ai-smollm3-Q4_K_M
|
|
```
|
|
|
|
#### 7. Run Queries (as regular user)
|
|
```bash
|
|
# Simple semantic search
|
|
python zim_rag.py query "What are treatments for PTSD?"
|
|
|
|
# Full RAG with LLM generation
|
|
python zim_rag.py rag-query "Explain the latest developments in military medicine"
|
|
|
|
# List all ZIM files in library
|
|
python zim_rag.py list-zim
|
|
|
|
# Get system information
|
|
python zim_rag.py info
|
|
```
|
|
|
|
### Available Commands
|
|
```bash
|
|
# Build vector database
|
|
python zim_rag.py build [OPTIONS]
|
|
--zim-file TEXT Specific ZIM file to process
|
|
--limit INTEGER Limit number of articles per ZIM file
|
|
--force Force rebuild even if vector DB exists
|
|
|
|
# Query commands
|
|
python zim_rag.py query [OPTIONS] QUESTION
|
|
--k INTEGER Number of documents to retrieve [default: 5]
|
|
|
|
python zim_rag.py rag-query QUESTION
|
|
|
|
# Library management
|
|
python zim_rag.py list-zim
|
|
python zim_rag.py info
|
|
|
|
# Export articles
|
|
python zim_rag.py export [OPTIONS]
|
|
--zim-file TEXT Specific ZIM file to export
|
|
--output TEXT Output file [default: zim_articles.json]
|
|
--limit INTEGER Limit number of articles per ZIM file
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
**Embedding Models:**
|
|
- `all-MiniLM-L6-v2` (fast, good quality) - default
|
|
- `all-mpnet-base-v2` (higher quality, slower)
|
|
- `paraphrase-multilingual-MiniLM-L12-v2` (multilingual support)
|
|
|
|
**Vector Database Types:**
|
|
- `chroma` - ChromaDB (recommended, persistent, metadata-rich)
|
|
- `faiss` - FAISS (faster search, less metadata)
|
|
|
|
**LLM Configuration:**
|
|
- Uses Docker Model Runner with `ai/smollm3:Q4_K_M` model
|
|
|
|
### System Requirements
|
|
- **RAM:** 4GB minimum, 8GB+ recommended
|
|
- **Storage:** 2-3x the size of your ZIM file for the vector database
|
|
- **GPU:** Optional, but recommended for faster embedding generation
|
|
|
|
---
|
|
|
|
## Part 2: kiwix-serve Docker Deployment (COMPLETE)
|
|
|
|
### What is kiwix-serve?
|
|
kiwix-serve is an HTTP server for serving ZIM files (offline Wikipedia and other wiki content). It's part of the Kiwix project.
|
|
|
|
### Repository
|
|
- **GitHub:** https://github.com/kiwix/kiwix-serve
|
|
- **Kiwix Website:** https://kiwix.org
|
|
- **Kiwix Wiki:** https://wiki.kiwix.org
|
|
|
|
### Official Docker Image
|
|
**Note:** There is NO official `kiwix/kiwix-serve` Docker image on Docker Hub. The namespace exists but the image is not published. You have two options:
|
|
|
|
#### Option A: Build from Source (Recommended)
|
|
```bash
|
|
# 1. Clone the repository (as regular user)
|
|
git clone https://github.com/kiwix/kiwix-serve.git
|
|
cd kiwix-serve
|
|
|
|
# 2. Build the binary (requires build dependencies)
|
|
# Install dependencies (requires sudo)
|
|
sudo apt-get update
|
|
sudo apt-get install -y \
|
|
cmake \
|
|
g++ \
|
|
libkiwix-dev \
|
|
libmicrohttpd-dev
|
|
|
|
# Build (as regular user)
|
|
cmake .
|
|
make
|
|
sudo make install
|
|
|
|
# 3. Run kiwix-serve directly (as regular user)
|
|
kiwix-serve --port=8080 /path/to/library.xml
|
|
```
|
|
|
|
#### Option B: Use Community Docker Image (if available)
|
|
```bash
|
|
# Search for community images
|
|
docker search kiwix
|
|
|
|
# Example (if a community image exists):
|
|
docker pull someuser/kiwix-serve
|
|
```
|
|
|
|
#### Option C: Run kiwix-serve in Docker using libkiwix
|
|
```dockerfile
|
|
# Create a Dockerfile
|
|
FROM ubuntu:22.04
|
|
|
|
# Install dependencies
|
|
RUN apt-get update && apt-get install -y \
|
|
libkiwix-dev \
|
|
libmicrohttpd-dev \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Clone and build
|
|
RUN git clone https://github.com/kiwix/kiwix-serve.git /tmp/kiwix-serve \
|
|
&& cd /tmp/kiwix-serve \
|
|
&& cmake . \
|
|
&& make \
|
|
&& make install
|
|
|
|
EXPOSE 8080
|
|
|
|
CMD ["kiwix-serve", "--port=8080", "/data/library.xml"]
|
|
```
|
|
|
|
Build and run:
|
|
```bash
|
|
# Build the image (requires Docker)
|
|
docker build -t kiwix-serve .
|
|
|
|
# Run with ZIM files and library.xml (requires Docker)
|
|
docker run -d \
|
|
-p 8080:8080 \
|
|
-v /path/to/zim/files:/data \
|
|
--name kiwix-server \
|
|
kiwix-serve
|
|
```
|
|
|
|
### Download ZIM Files (as regular user)
|
|
```bash
|
|
# Create directory for ZIM files
|
|
mkdir -p ~/kiwix/zim
|
|
cd ~/kiwix/zim
|
|
|
|
# Download from official Kiwix library
|
|
# Main source: https://download.kiwix.org/zim/
|
|
# Wikipedia: https://download.kiwix.org/zim/wikipedia/
|
|
# Medicine: https://download.kiwix.org/zim/medical/
|
|
|
|
# Example downloads:
|
|
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
|
wget https://download.kiwix.org/zim/medical/wikimed_en_medicine_maxi_2023-12.zim
|
|
|
|
# Or use zimfetchdownloader (if installed)
|
|
zimfetchdownloader --url "https://library.kiwix.org"
|
|
```
|
|
|
|
### Create library.xml (as regular user)
|
|
```bash
|
|
cat > ~/kiwix/library.xml << 'EOF'
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<library>
|
|
<book path="wikipedia_en_all_maxi_2024-01.zim"
|
|
title="English Wikipedia"
|
|
language="eng"
|
|
creator="Wikipedia"
|
|
publisher="Kiwix"
|
|
date="2024-01"
|
|
description="English Wikipedia"
|
|
flavor="maxi"
|
|
tags="wikipedia;english" />
|
|
|
|
<book path="wikimed_en_medicine_maxi_2023-12.zim"
|
|
title="Medical Wikipedia"
|
|
language="eng"
|
|
creator="Wikipedia"
|
|
publisher="Kiwix"
|
|
date="2023-12"
|
|
description="Medical content from Wikipedia"
|
|
flavor="maxi"
|
|
tags="wikipedia;medicine;health" />
|
|
</library>
|
|
EOF
|
|
```
|
|
|
|
**Alternative:** Generate library.xml automatically:
|
|
```bash
|
|
# Using kiwix-tools (if installed)
|
|
kiwix-maintainlib --output=library.xml /path/to/zim/files/*.zim
|
|
|
|
# Or use zim-tools
|
|
zimsearch --list > library.xml
|
|
```
|
|
|
|
### Configuration Options for kiwix-serve
|
|
```bash
|
|
kiwix-serve [OPTIONS] [ZIM_FILE | LIBRARY_FILE]
|
|
|
|
Options:
|
|
--port=PORT Port to listen on (default: 8080)
|
|
--address=ADDRESS IP address to bind to (default: 0.0.0.0)
|
|
--daemon Run as daemon (background process)
|
|
--threads=NUM Number of threads to use (default: 4)
|
|
--timeout=SEC Timeout in seconds (default: 30)
|
|
--root=PATH Root URL path
|
|
--without-kernel Disable kernel support
|
|
--help Show help message
|
|
```
|
|
|
|
### Example: Full kiwix-serve Setup
|
|
```bash
|
|
# 1. Install build dependencies (requires sudo)
|
|
sudo apt-get update
|
|
sudo apt-get install -y \
|
|
cmake \
|
|
g++ \
|
|
libkiwix-dev \
|
|
libmicrohttpd-dev \
|
|
libzim-dev
|
|
|
|
# 2. Clone and build (as regular user)
|
|
git clone https://github.com/kiwix/kiwix-serve.git
|
|
cd kiwix-serve
|
|
cmake .
|
|
make
|
|
sudo make install
|
|
|
|
# 3. Download ZIM files (as regular user)
|
|
mkdir -p ~/kiwix/zim
|
|
cd ~/kiwix/zim
|
|
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
|
|
|
# 4. Create library.xml (as regular user)
|
|
cat > ~/kiwix/library.xml << 'EOF'
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<library>
|
|
<book path="wikipedia_en_all_maxi_2024-01.zim" title="English Wikipedia" />
|
|
</library>
|
|
EOF
|
|
|
|
# 5. Run kiwix-serve (as regular user)
|
|
kiwix-serve --port=8080 --daemon ~/kiwix/library.xml
|
|
|
|
# 6. Access at http://localhost:8080
|
|
```
|
|
|
|
---
|
|
|
|
## Part 3: Docker Prerequisites
|
|
|
|
### Check if Docker is Installed
|
|
```bash
|
|
# Check Docker version
|
|
docker --version
|
|
|
|
# Check if Docker daemon is running
|
|
systemctl status docker
|
|
|
|
# Check if you can run Docker without sudo
|
|
docker run hello-world
|
|
```
|
|
|
|
### Install Docker (requires sudo)
|
|
|
|
**Ubuntu/Debian:**
|
|
```bash
|
|
# Update package index
|
|
sudo apt-get update
|
|
|
|
# Install prerequisites
|
|
sudo apt-get install -y \
|
|
ca-certificates \
|
|
curl \
|
|
gnupg \
|
|
lsb-release
|
|
|
|
# Add Docker's official GPG key
|
|
sudo install -m 0755 -d /etc/apt/keyrings
|
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
|
|
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
|
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
|
|
|
# Add Docker repository
|
|
echo \
|
|
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
|
|
https://download.docker.com/linux/ubuntu \
|
|
$(lsb_release -cs) stable" | \
|
|
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
|
|
|
# Install Docker Engine
|
|
sudo apt-get update
|
|
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
|
|
|
# Start and enable Docker service
|
|
sudo systemctl start docker
|
|
sudo systemctl enable docker
|
|
```
|
|
|
|
**Fedora/RHEL:**
|
|
```bash
|
|
# Add Docker repository
|
|
sudo dnf -y dnf install dnf-plugins-core
|
|
sudo dnf config-manager \
|
|
--add-repo https://download.docker.com/linux/fedora/docker-ce.repo
|
|
|
|
# Install Docker Engine
|
|
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
|
|
|
# Start and enable Docker service
|
|
sudo systemctl start docker
|
|
sudo systemctl enable docker
|
|
```
|
|
|
|
### Add User to Docker Group (requires sudo)
|
|
```bash
|
|
# Add current user to docker group
|
|
sudo usermod -aG docker $USER
|
|
|
|
# Verify group membership
|
|
groups $USER
|
|
|
|
# IMPORTANT: Logout and login again for changes to take effect
|
|
# Or run:
|
|
newgrp docker
|
|
|
|
# Verify Docker works without sudo
|
|
docker run hello-world
|
|
```
|
|
|
|
### Verify Docker Installation
|
|
```bash
|
|
# Check Docker version
|
|
docker --version
|
|
docker-compose --version
|
|
|
|
# Test Docker installation
|
|
docker run hello-world
|
|
|
|
# Check Docker service status
|
|
systemctl status docker
|
|
|
|
# List Docker images
|
|
docker images
|
|
|
|
# List running containers
|
|
docker ps
|
|
```
|
|
|
|
---
|
|
|
|
## Part 4: Complete Workflow Example
|
|
|
|
### Scenario: Setting up zim-llm with kiwix-serve
|
|
|
|
#### Step 1: Install Docker (requires sudo)
|
|
```bash
|
|
# Follow the Docker installation steps in Part 3
|
|
# Add user to docker group and logout/login
|
|
```
|
|
|
|
#### Step 2: Install zim-llm (as regular user)
|
|
```bash
|
|
git clone https://github.com/rouralberto/zim-llm.git
|
|
cd zim-llm
|
|
./setup.sh
|
|
```
|
|
|
|
#### Step 3: Download ZIM Files (as regular user)
|
|
```bash
|
|
mkdir -p zim_library
|
|
cd zim_library
|
|
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
|
cd ..
|
|
```
|
|
|
|
#### Step 4: Build Vector Database (as regular user)
|
|
```bash
|
|
python zim_rag.py build --limit 1000 # Limit for faster testing
|
|
```
|
|
|
|
#### Step 5: Run Queries (as regular user)
|
|
```bash
|
|
python zim_rag.py query "What is machine learning?"
|
|
python zim_rag.py rag-query "Explain neural networks"
|
|
```
|
|
|
|
#### Step 6: (Optional) Run kiwix-serve for Web Interface (as regular user)
|
|
```bash
|
|
# If you built kiwix-serve from source
|
|
kiwix-serve --port=8080 zim_library/library.xml
|
|
|
|
# Access at http://localhost:8080
|
|
```
|
|
|
|
---
|
|
|
|
## Links and Resources
|
|
|
|
### Official Documentation
|
|
- **Kiwix Website:** https://kiwix.org
|
|
- **Kiwix Wiki:** https://wiki.kiwix.org
|
|
- **Kiwix GitHub:** https://github.com/kiwix
|
|
- **ZIM File Downloads:** https://download.kiwix.org/zim/
|
|
- **Kiwix Library Browser:** https://library.kiwix.org
|
|
|
|
### Repositories
|
|
- **zim-llm:** https://github.com/rouralberto/zim-llm
|
|
- **kiwix-serve:** https://github.com/kiwix/kiwix-serve
|
|
- **libkiwix:** https://github.com/kiwix/libkiwix
|
|
|
|
### Related Tools
|
|
- **Docker Model Runner:** https://github.com/ramses-rf/dmr
|
|
- **ChromaDB:** https://www.trychroma.com
|
|
- **FAISS:** https://github.com/facebookresearch/faiss
|
|
|
|
---
|
|
|
|
## Summary of Privilege Requirements
|
|
|
|
| Task | Privilege Level | Command Example |
|
|
|------|----------------|-----------------|
|
|
| Install Docker | sudo | `sudo apt-get install docker-ce` |
|
|
| Add user to docker group | sudo | `sudo usermod -aG docker $USER` |
|
|
| Clone git repositories | User | `git clone https://github.com/...` |
|
|
| Install Python packages | User | `pip install -r requirements.txt` |
|
|
| Download ZIM files | User | `wget https://...` |
|
|
| Build vector database | User | `python zim_rag.py build` |
|
|
| Run queries | User | `python zim_rag.py query "..."` |
|
|
| Build kiwix-serve from source | sudo for deps, then user | `sudo apt-get install libkiwix-dev` then `cmake . && make` |
|
|
| Run kiwix-serve | User | `kiwix-serve --port=8080 library.xml` |
|
|
| Pull Docker images | User (if in docker group) | `docker pull image:tag` |
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**1. "Permission denied" when running Docker commands**
|
|
- Solution: Add user to docker group and logout/login
|
|
- `sudo usermod -aG docker $USER`
|
|
|
|
**2. zim-llm build takes too long**
|
|
- Solution: Use `--limit` flag to process fewer articles
|
|
- `python zim_rag.py build --limit 100`
|
|
|
|
**3. Out of memory during build**
|
|
- Solution: Use smaller ZIM files or increase RAM
|
|
- Consider using FAISS instead of ChromaDB
|
|
|
|
**4. kiwix-serve won't compile**
|
|
- Solution: Ensure all dependencies are installed
|
|
- `sudo apt-get install libkiwix-dev libmicrohttpd-dev libzim-dev cmake g++`
|
|
|
|
**5. Cannot find ZIM files**
|
|
- Solution: Check https://download.kiwix.org/zim/ for available files
|
|
- Use smaller files for testing first
|