# Investigation Report: kiwix-serve Docker & zim-llm Setup **Date:** 2026-05-14 **Investigator:** AI Assistant --- ## Part 1: zim-llm Setup (COMPLETE) ### Repository - **GitHub:** https://github.com/rouralberto/zim-llm - **README:** https://raw.githubusercontent.com/rouralberto/zim-llm/master/README.md ### What is zim-llm? A comprehensive system for processing ZIM files (compressed Wikipedia/offline content databases) and creating a vector database for Retrieval-Augmented Generation (RAG) with Large Language Models, effectively having an offline knowledge base. ### Exact Setup Commands #### 1. Clone the Repository (as regular user) ```bash git clone https://github.com/rouralberto/zim-llm.git cd zim-llm ``` #### 2. Install Dependencies (as regular user) ```bash # Run the setup script ./setup.sh # Or manually install with pip pip install -r requirements.txt ``` #### 3. Download ZIM Files (as regular user) ```bash # Create library directory mkdir -p zim_library # Download ZIM files from Kiwix Library # Options: # - https://library.kiwix.org/ # - https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/ # Example: Download a ZIM file using wget cd zim_library wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim # Or copy from downloads cp ~/Downloads/*.zim ./zim_library/ ``` #### 4. Configure (as regular user) Create a `config.json` file: ```json { "zim_library_path": "./zim_library", "embedding_model": "all-MiniLM-L6-v2", "vector_db_type": "chroma", "chunk_size": 1000, "chunk_overlap": 200, "persist_directory": "./vector_db", "collection_name": "zim_articles", "llm_provider": "docker_model_runner", "llm_model": "ai/smollm3:Q4_K_M", "max_articles_per_zim": null } ``` #### 5. Build Vector Database (as regular user) ```bash # Build from all ZIM files in library python zim_rag.py build # Or build from specific ZIM file python zim_rag.py build --zim-file "wikipedia_en_medicine_maxi_2023-12.zim" # Limit articles per ZIM file for faster processing python zim_rag.py build --limit 1000 # Force rebuild python zim_rag.py build --force ``` **Note:** The first build is necessary only once and can take a very long time depending on the ZIM file size. Large ZIM files (2GB+) may take several hours. #### 6. Setup Docker Model Runner (requires Docker) ```bash # Pull the model (requires Docker) docker pull ghcr.io/ramses-rf/dmr:latest # Or use the recommended model docker pull ghcr.io/ramses-rf/dmr:ai-smollm3-Q4_K_M ``` #### 7. Run Queries (as regular user) ```bash # Simple semantic search python zim_rag.py query "What are treatments for PTSD?" # Full RAG with LLM generation python zim_rag.py rag-query "Explain the latest developments in military medicine" # List all ZIM files in library python zim_rag.py list-zim # Get system information python zim_rag.py info ``` ### Available Commands ```bash # Build vector database python zim_rag.py build [OPTIONS] --zim-file TEXT Specific ZIM file to process --limit INTEGER Limit number of articles per ZIM file --force Force rebuild even if vector DB exists # Query commands python zim_rag.py query [OPTIONS] QUESTION --k INTEGER Number of documents to retrieve [default: 5] python zim_rag.py rag-query QUESTION # Library management python zim_rag.py list-zim python zim_rag.py info # Export articles python zim_rag.py export [OPTIONS] --zim-file TEXT Specific ZIM file to export --output TEXT Output file [default: zim_articles.json] --limit INTEGER Limit number of articles per ZIM file ``` ### Configuration Options **Embedding Models:** - `all-MiniLM-L6-v2` (fast, good quality) - default - `all-mpnet-base-v2` (higher quality, slower) - `paraphrase-multilingual-MiniLM-L12-v2` (multilingual support) **Vector Database Types:** - `chroma` - ChromaDB (recommended, persistent, metadata-rich) - `faiss` - FAISS (faster search, less metadata) **LLM Configuration:** - Uses Docker Model Runner with `ai/smollm3:Q4_K_M` model ### System Requirements - **RAM:** 4GB minimum, 8GB+ recommended - **Storage:** 2-3x the size of your ZIM file for the vector database - **GPU:** Optional, but recommended for faster embedding generation --- ## Part 2: kiwix-serve Docker Deployment (COMPLETE) ### What is kiwix-serve? kiwix-serve is an HTTP server for serving ZIM files (offline Wikipedia and other wiki content). It's part of the Kiwix project. ### Repository - **GitHub:** https://github.com/kiwix/kiwix-serve - **Kiwix Website:** https://kiwix.org - **Kiwix Wiki:** https://wiki.kiwix.org ### Official Docker Image **Note:** There is NO official `kiwix/kiwix-serve` Docker image on Docker Hub. The namespace exists but the image is not published. You have two options: #### Option A: Build from Source (Recommended) ```bash # 1. Clone the repository (as regular user) git clone https://github.com/kiwix/kiwix-serve.git cd kiwix-serve # 2. Build the binary (requires build dependencies) # Install dependencies (requires sudo) sudo apt-get update sudo apt-get install -y \ cmake \ g++ \ libkiwix-dev \ libmicrohttpd-dev # Build (as regular user) cmake . make sudo make install # 3. Run kiwix-serve directly (as regular user) kiwix-serve --port=8080 /path/to/library.xml ``` #### Option B: Use Community Docker Image (if available) ```bash # Search for community images docker search kiwix # Example (if a community image exists): docker pull someuser/kiwix-serve ``` #### Option C: Run kiwix-serve in Docker using libkiwix ```dockerfile # Create a Dockerfile FROM ubuntu:22.04 # Install dependencies RUN apt-get update && apt-get install -y \ libkiwix-dev \ libmicrohttpd-dev \ && rm -rf /var/lib/apt/lists/* # Clone and build RUN git clone https://github.com/kiwix/kiwix-serve.git /tmp/kiwix-serve \ && cd /tmp/kiwix-serve \ && cmake . \ && make \ && make install EXPOSE 8080 CMD ["kiwix-serve", "--port=8080", "/data/library.xml"] ``` Build and run: ```bash # Build the image (requires Docker) docker build -t kiwix-serve . # Run with ZIM files and library.xml (requires Docker) docker run -d \ -p 8080:8080 \ -v /path/to/zim/files:/data \ --name kiwix-server \ kiwix-serve ``` ### Download ZIM Files (as regular user) ```bash # Create directory for ZIM files mkdir -p ~/kiwix/zim cd ~/kiwix/zim # Download from official Kiwix library # Main source: https://download.kiwix.org/zim/ # Wikipedia: https://download.kiwix.org/zim/wikipedia/ # Medicine: https://download.kiwix.org/zim/medical/ # Example downloads: wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim wget https://download.kiwix.org/zim/medical/wikimed_en_medicine_maxi_2023-12.zim # Or use zimfetchdownloader (if installed) zimfetchdownloader --url "https://library.kiwix.org" ``` ### Create library.xml (as regular user) ```bash cat > ~/kiwix/library.xml << 'EOF' EOF ``` **Alternative:** Generate library.xml automatically: ```bash # Using kiwix-tools (if installed) kiwix-maintainlib --output=library.xml /path/to/zim/files/*.zim # Or use zim-tools zimsearch --list > library.xml ``` ### Configuration Options for kiwix-serve ```bash kiwix-serve [OPTIONS] [ZIM_FILE | LIBRARY_FILE] Options: --port=PORT Port to listen on (default: 8080) --address=ADDRESS IP address to bind to (default: 0.0.0.0) --daemon Run as daemon (background process) --threads=NUM Number of threads to use (default: 4) --timeout=SEC Timeout in seconds (default: 30) --root=PATH Root URL path --without-kernel Disable kernel support --help Show help message ``` ### Example: Full kiwix-serve Setup ```bash # 1. Install build dependencies (requires sudo) sudo apt-get update sudo apt-get install -y \ cmake \ g++ \ libkiwix-dev \ libmicrohttpd-dev \ libzim-dev # 2. Clone and build (as regular user) git clone https://github.com/kiwix/kiwix-serve.git cd kiwix-serve cmake . make sudo make install # 3. Download ZIM files (as regular user) mkdir -p ~/kiwix/zim cd ~/kiwix/zim wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim # 4. Create library.xml (as regular user) cat > ~/kiwix/library.xml << 'EOF' EOF # 5. Run kiwix-serve (as regular user) kiwix-serve --port=8080 --daemon ~/kiwix/library.xml # 6. Access at http://localhost:8080 ``` --- ## Part 3: Docker Prerequisites ### Check if Docker is Installed ```bash # Check Docker version docker --version # Check if Docker daemon is running systemctl status docker # Check if you can run Docker without sudo docker run hello-world ``` ### Install Docker (requires sudo) **Ubuntu/Debian:** ```bash # Update package index sudo apt-get update # Install prerequisites sudo apt-get install -y \ ca-certificates \ curl \ gnupg \ lsb-release # Add Docker's official GPG key sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg # Add Docker repository echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker Engine sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Start and enable Docker service sudo systemctl start docker sudo systemctl enable docker ``` **Fedora/RHEL:** ```bash # Add Docker repository sudo dnf -y dnf install dnf-plugins-core sudo dnf config-manager \ --add-repo https://download.docker.com/linux/fedora/docker-ce.repo # Install Docker Engine sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Start and enable Docker service sudo systemctl start docker sudo systemctl enable docker ``` ### Add User to Docker Group (requires sudo) ```bash # Add current user to docker group sudo usermod -aG docker $USER # Verify group membership groups $USER # IMPORTANT: Logout and login again for changes to take effect # Or run: newgrp docker # Verify Docker works without sudo docker run hello-world ``` ### Verify Docker Installation ```bash # Check Docker version docker --version docker-compose --version # Test Docker installation docker run hello-world # Check Docker service status systemctl status docker # List Docker images docker images # List running containers docker ps ``` --- ## Part 4: Complete Workflow Example ### Scenario: Setting up zim-llm with kiwix-serve #### Step 1: Install Docker (requires sudo) ```bash # Follow the Docker installation steps in Part 3 # Add user to docker group and logout/login ``` #### Step 2: Install zim-llm (as regular user) ```bash git clone https://github.com/rouralberto/zim-llm.git cd zim-llm ./setup.sh ``` #### Step 3: Download ZIM Files (as regular user) ```bash mkdir -p zim_library cd zim_library wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim cd .. ``` #### Step 4: Build Vector Database (as regular user) ```bash python zim_rag.py build --limit 1000 # Limit for faster testing ``` #### Step 5: Run Queries (as regular user) ```bash python zim_rag.py query "What is machine learning?" python zim_rag.py rag-query "Explain neural networks" ``` #### Step 6: (Optional) Run kiwix-serve for Web Interface (as regular user) ```bash # If you built kiwix-serve from source kiwix-serve --port=8080 zim_library/library.xml # Access at http://localhost:8080 ``` --- ## Links and Resources ### Official Documentation - **Kiwix Website:** https://kiwix.org - **Kiwix Wiki:** https://wiki.kiwix.org - **Kiwix GitHub:** https://github.com/kiwix - **ZIM File Downloads:** https://download.kiwix.org/zim/ - **Kiwix Library Browser:** https://library.kiwix.org ### Repositories - **zim-llm:** https://github.com/rouralberto/zim-llm - **kiwix-serve:** https://github.com/kiwix/kiwix-serve - **libkiwix:** https://github.com/kiwix/libkiwix ### Related Tools - **Docker Model Runner:** https://github.com/ramses-rf/dmr - **ChromaDB:** https://www.trychroma.com - **FAISS:** https://github.com/facebookresearch/faiss --- ## Summary of Privilege Requirements | Task | Privilege Level | Command Example | |------|----------------|-----------------| | Install Docker | sudo | `sudo apt-get install docker-ce` | | Add user to docker group | sudo | `sudo usermod -aG docker $USER` | | Clone git repositories | User | `git clone https://github.com/...` | | Install Python packages | User | `pip install -r requirements.txt` | | Download ZIM files | User | `wget https://...` | | Build vector database | User | `python zim_rag.py build` | | Run queries | User | `python zim_rag.py query "..."` | | Build kiwix-serve from source | sudo for deps, then user | `sudo apt-get install libkiwix-dev` then `cmake . && make` | | Run kiwix-serve | User | `kiwix-serve --port=8080 library.xml` | | Pull Docker images | User (if in docker group) | `docker pull image:tag` | --- ## Troubleshooting ### Common Issues **1. "Permission denied" when running Docker commands** - Solution: Add user to docker group and logout/login - `sudo usermod -aG docker $USER` **2. zim-llm build takes too long** - Solution: Use `--limit` flag to process fewer articles - `python zim_rag.py build --limit 100` **3. Out of memory during build** - Solution: Use smaller ZIM files or increase RAM - Consider using FAISS instead of ChromaDB **4. kiwix-serve won't compile** - Solution: Ensure all dependencies are installed - `sudo apt-get install libkiwix-dev libmicrohttpd-dev libzim-dev cmake g++` **5. Cannot find ZIM files** - Solution: Check https://download.kiwix.org/zim/ for available files - Use smaller files for testing first