Initial commit: Obsidian KDB with templates
This commit is contained in:
@@ -0,0 +1,556 @@
|
||||
# Investigation Report: kiwix-serve Docker & zim-llm Setup
|
||||
|
||||
**Date:** 2026-05-14
|
||||
**Investigator:** AI Assistant
|
||||
|
||||
---
|
||||
|
||||
## Part 1: zim-llm Setup (COMPLETE)
|
||||
|
||||
### Repository
|
||||
- **GitHub:** https://github.com/rouralberto/zim-llm
|
||||
- **README:** https://raw.githubusercontent.com/rouralberto/zim-llm/master/README.md
|
||||
|
||||
### What is zim-llm?
|
||||
A comprehensive system for processing ZIM files (compressed Wikipedia/offline content databases) and creating a vector database for Retrieval-Augmented Generation (RAG) with Large Language Models, effectively having an offline knowledge base.
|
||||
|
||||
### Exact Setup Commands
|
||||
|
||||
#### 1. Clone the Repository (as regular user)
|
||||
```bash
|
||||
git clone https://github.com/rouralberto/zim-llm.git
|
||||
cd zim-llm
|
||||
```
|
||||
|
||||
#### 2. Install Dependencies (as regular user)
|
||||
```bash
|
||||
# Run the setup script
|
||||
./setup.sh
|
||||
|
||||
# Or manually install with pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
#### 3. Download ZIM Files (as regular user)
|
||||
```bash
|
||||
# Create library directory
|
||||
mkdir -p zim_library
|
||||
|
||||
# Download ZIM files from Kiwix Library
|
||||
# Options:
|
||||
# - https://library.kiwix.org/
|
||||
# - https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
|
||||
|
||||
# Example: Download a ZIM file using wget
|
||||
cd zim_library
|
||||
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
||||
|
||||
# Or copy from downloads
|
||||
cp ~/Downloads/*.zim ./zim_library/
|
||||
```
|
||||
|
||||
#### 4. Configure (as regular user)
|
||||
Create a `config.json` file:
|
||||
```json
|
||||
{
|
||||
"zim_library_path": "./zim_library",
|
||||
"embedding_model": "all-MiniLM-L6-v2",
|
||||
"vector_db_type": "chroma",
|
||||
"chunk_size": 1000,
|
||||
"chunk_overlap": 200,
|
||||
"persist_directory": "./vector_db",
|
||||
"collection_name": "zim_articles",
|
||||
"llm_provider": "docker_model_runner",
|
||||
"llm_model": "ai/smollm3:Q4_K_M",
|
||||
"max_articles_per_zim": null
|
||||
}
|
||||
```
|
||||
|
||||
#### 5. Build Vector Database (as regular user)
|
||||
```bash
|
||||
# Build from all ZIM files in library
|
||||
python zim_rag.py build
|
||||
|
||||
# Or build from specific ZIM file
|
||||
python zim_rag.py build --zim-file "wikipedia_en_medicine_maxi_2023-12.zim"
|
||||
|
||||
# Limit articles per ZIM file for faster processing
|
||||
python zim_rag.py build --limit 1000
|
||||
|
||||
# Force rebuild
|
||||
python zim_rag.py build --force
|
||||
```
|
||||
|
||||
**Note:** The first build is necessary only once and can take a very long time depending on the ZIM file size. Large ZIM files (2GB+) may take several hours.
|
||||
|
||||
#### 6. Setup Docker Model Runner (requires Docker)
|
||||
```bash
|
||||
# Pull the model (requires Docker)
|
||||
docker pull ghcr.io/ramses-rf/dmr:latest
|
||||
|
||||
# Or use the recommended model
|
||||
docker pull ghcr.io/ramses-rf/dmr:ai-smollm3-Q4_K_M
|
||||
```
|
||||
|
||||
#### 7. Run Queries (as regular user)
|
||||
```bash
|
||||
# Simple semantic search
|
||||
python zim_rag.py query "What are treatments for PTSD?"
|
||||
|
||||
# Full RAG with LLM generation
|
||||
python zim_rag.py rag-query "Explain the latest developments in military medicine"
|
||||
|
||||
# List all ZIM files in library
|
||||
python zim_rag.py list-zim
|
||||
|
||||
# Get system information
|
||||
python zim_rag.py info
|
||||
```
|
||||
|
||||
### Available Commands
|
||||
```bash
|
||||
# Build vector database
|
||||
python zim_rag.py build [OPTIONS]
|
||||
--zim-file TEXT Specific ZIM file to process
|
||||
--limit INTEGER Limit number of articles per ZIM file
|
||||
--force Force rebuild even if vector DB exists
|
||||
|
||||
# Query commands
|
||||
python zim_rag.py query [OPTIONS] QUESTION
|
||||
--k INTEGER Number of documents to retrieve [default: 5]
|
||||
|
||||
python zim_rag.py rag-query QUESTION
|
||||
|
||||
# Library management
|
||||
python zim_rag.py list-zim
|
||||
python zim_rag.py info
|
||||
|
||||
# Export articles
|
||||
python zim_rag.py export [OPTIONS]
|
||||
--zim-file TEXT Specific ZIM file to export
|
||||
--output TEXT Output file [default: zim_articles.json]
|
||||
--limit INTEGER Limit number of articles per ZIM file
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
**Embedding Models:**
|
||||
- `all-MiniLM-L6-v2` (fast, good quality) - default
|
||||
- `all-mpnet-base-v2` (higher quality, slower)
|
||||
- `paraphrase-multilingual-MiniLM-L12-v2` (multilingual support)
|
||||
|
||||
**Vector Database Types:**
|
||||
- `chroma` - ChromaDB (recommended, persistent, metadata-rich)
|
||||
- `faiss` - FAISS (faster search, less metadata)
|
||||
|
||||
**LLM Configuration:**
|
||||
- Uses Docker Model Runner with `ai/smollm3:Q4_K_M` model
|
||||
|
||||
### System Requirements
|
||||
- **RAM:** 4GB minimum, 8GB+ recommended
|
||||
- **Storage:** 2-3x the size of your ZIM file for the vector database
|
||||
- **GPU:** Optional, but recommended for faster embedding generation
|
||||
|
||||
---
|
||||
|
||||
## Part 2: kiwix-serve Docker Deployment (COMPLETE)
|
||||
|
||||
### What is kiwix-serve?
|
||||
kiwix-serve is an HTTP server for serving ZIM files (offline Wikipedia and other wiki content). It's part of the Kiwix project.
|
||||
|
||||
### Repository
|
||||
- **GitHub:** https://github.com/kiwix/kiwix-serve
|
||||
- **Kiwix Website:** https://kiwix.org
|
||||
- **Kiwix Wiki:** https://wiki.kiwix.org
|
||||
|
||||
### Official Docker Image
|
||||
**Note:** There is NO official `kiwix/kiwix-serve` Docker image on Docker Hub. The namespace exists but the image is not published. You have two options:
|
||||
|
||||
#### Option A: Build from Source (Recommended)
|
||||
```bash
|
||||
# 1. Clone the repository (as regular user)
|
||||
git clone https://github.com/kiwix/kiwix-serve.git
|
||||
cd kiwix-serve
|
||||
|
||||
# 2. Build the binary (requires build dependencies)
|
||||
# Install dependencies (requires sudo)
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y \
|
||||
cmake \
|
||||
g++ \
|
||||
libkiwix-dev \
|
||||
libmicrohttpd-dev
|
||||
|
||||
# Build (as regular user)
|
||||
cmake .
|
||||
make
|
||||
sudo make install
|
||||
|
||||
# 3. Run kiwix-serve directly (as regular user)
|
||||
kiwix-serve --port=8080 /path/to/library.xml
|
||||
```
|
||||
|
||||
#### Option B: Use Community Docker Image (if available)
|
||||
```bash
|
||||
# Search for community images
|
||||
docker search kiwix
|
||||
|
||||
# Example (if a community image exists):
|
||||
docker pull someuser/kiwix-serve
|
||||
```
|
||||
|
||||
#### Option C: Run kiwix-serve in Docker using libkiwix
|
||||
```dockerfile
|
||||
# Create a Dockerfile
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
libkiwix-dev \
|
||||
libmicrohttpd-dev \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Clone and build
|
||||
RUN git clone https://github.com/kiwix/kiwix-serve.git /tmp/kiwix-serve \
|
||||
&& cd /tmp/kiwix-serve \
|
||||
&& cmake . \
|
||||
&& make \
|
||||
&& make install
|
||||
|
||||
EXPOSE 8080
|
||||
|
||||
CMD ["kiwix-serve", "--port=8080", "/data/library.xml"]
|
||||
```
|
||||
|
||||
Build and run:
|
||||
```bash
|
||||
# Build the image (requires Docker)
|
||||
docker build -t kiwix-serve .
|
||||
|
||||
# Run with ZIM files and library.xml (requires Docker)
|
||||
docker run -d \
|
||||
-p 8080:8080 \
|
||||
-v /path/to/zim/files:/data \
|
||||
--name kiwix-server \
|
||||
kiwix-serve
|
||||
```
|
||||
|
||||
### Download ZIM Files (as regular user)
|
||||
```bash
|
||||
# Create directory for ZIM files
|
||||
mkdir -p ~/kiwix/zim
|
||||
cd ~/kiwix/zim
|
||||
|
||||
# Download from official Kiwix library
|
||||
# Main source: https://download.kiwix.org/zim/
|
||||
# Wikipedia: https://download.kiwix.org/zim/wikipedia/
|
||||
# Medicine: https://download.kiwix.org/zim/medical/
|
||||
|
||||
# Example downloads:
|
||||
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
||||
wget https://download.kiwix.org/zim/medical/wikimed_en_medicine_maxi_2023-12.zim
|
||||
|
||||
# Or use zimfetchdownloader (if installed)
|
||||
zimfetchdownloader --url "https://library.kiwix.org"
|
||||
```
|
||||
|
||||
### Create library.xml (as regular user)
|
||||
```bash
|
||||
cat > ~/kiwix/library.xml << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<library>
|
||||
<book path="wikipedia_en_all_maxi_2024-01.zim"
|
||||
title="English Wikipedia"
|
||||
language="eng"
|
||||
creator="Wikipedia"
|
||||
publisher="Kiwix"
|
||||
date="2024-01"
|
||||
description="English Wikipedia"
|
||||
flavor="maxi"
|
||||
tags="wikipedia;english" />
|
||||
|
||||
<book path="wikimed_en_medicine_maxi_2023-12.zim"
|
||||
title="Medical Wikipedia"
|
||||
language="eng"
|
||||
creator="Wikipedia"
|
||||
publisher="Kiwix"
|
||||
date="2023-12"
|
||||
description="Medical content from Wikipedia"
|
||||
flavor="maxi"
|
||||
tags="wikipedia;medicine;health" />
|
||||
</library>
|
||||
EOF
|
||||
```
|
||||
|
||||
**Alternative:** Generate library.xml automatically:
|
||||
```bash
|
||||
# Using kiwix-tools (if installed)
|
||||
kiwix-maintainlib --output=library.xml /path/to/zim/files/*.zim
|
||||
|
||||
# Or use zim-tools
|
||||
zimsearch --list > library.xml
|
||||
```
|
||||
|
||||
### Configuration Options for kiwix-serve
|
||||
```bash
|
||||
kiwix-serve [OPTIONS] [ZIM_FILE | LIBRARY_FILE]
|
||||
|
||||
Options:
|
||||
--port=PORT Port to listen on (default: 8080)
|
||||
--address=ADDRESS IP address to bind to (default: 0.0.0.0)
|
||||
--daemon Run as daemon (background process)
|
||||
--threads=NUM Number of threads to use (default: 4)
|
||||
--timeout=SEC Timeout in seconds (default: 30)
|
||||
--root=PATH Root URL path
|
||||
--without-kernel Disable kernel support
|
||||
--help Show help message
|
||||
```
|
||||
|
||||
### Example: Full kiwix-serve Setup
|
||||
```bash
|
||||
# 1. Install build dependencies (requires sudo)
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y \
|
||||
cmake \
|
||||
g++ \
|
||||
libkiwix-dev \
|
||||
libmicrohttpd-dev \
|
||||
libzim-dev
|
||||
|
||||
# 2. Clone and build (as regular user)
|
||||
git clone https://github.com/kiwix/kiwix-serve.git
|
||||
cd kiwix-serve
|
||||
cmake .
|
||||
make
|
||||
sudo make install
|
||||
|
||||
# 3. Download ZIM files (as regular user)
|
||||
mkdir -p ~/kiwix/zim
|
||||
cd ~/kiwix/zim
|
||||
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
||||
|
||||
# 4. Create library.xml (as regular user)
|
||||
cat > ~/kiwix/library.xml << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<library>
|
||||
<book path="wikipedia_en_all_maxi_2024-01.zim" title="English Wikipedia" />
|
||||
</library>
|
||||
EOF
|
||||
|
||||
# 5. Run kiwix-serve (as regular user)
|
||||
kiwix-serve --port=8080 --daemon ~/kiwix/library.xml
|
||||
|
||||
# 6. Access at http://localhost:8080
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Docker Prerequisites
|
||||
|
||||
### Check if Docker is Installed
|
||||
```bash
|
||||
# Check Docker version
|
||||
docker --version
|
||||
|
||||
# Check if Docker daemon is running
|
||||
systemctl status docker
|
||||
|
||||
# Check if you can run Docker without sudo
|
||||
docker run hello-world
|
||||
```
|
||||
|
||||
### Install Docker (requires sudo)
|
||||
|
||||
**Ubuntu/Debian:**
|
||||
```bash
|
||||
# Update package index
|
||||
sudo apt-get update
|
||||
|
||||
# Install prerequisites
|
||||
sudo apt-get install -y \
|
||||
ca-certificates \
|
||||
curl \
|
||||
gnupg \
|
||||
lsb-release
|
||||
|
||||
# Add Docker's official GPG key
|
||||
sudo install -m 0755 -d /etc/apt/keyrings
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
|
||||
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
||||
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
||||
|
||||
# Add Docker repository
|
||||
echo \
|
||||
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
|
||||
https://download.docker.com/linux/ubuntu \
|
||||
$(lsb_release -cs) stable" | \
|
||||
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
||||
|
||||
# Install Docker Engine
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||
|
||||
# Start and enable Docker service
|
||||
sudo systemctl start docker
|
||||
sudo systemctl enable docker
|
||||
```
|
||||
|
||||
**Fedora/RHEL:**
|
||||
```bash
|
||||
# Add Docker repository
|
||||
sudo dnf -y dnf install dnf-plugins-core
|
||||
sudo dnf config-manager \
|
||||
--add-repo https://download.docker.com/linux/fedora/docker-ce.repo
|
||||
|
||||
# Install Docker Engine
|
||||
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||
|
||||
# Start and enable Docker service
|
||||
sudo systemctl start docker
|
||||
sudo systemctl enable docker
|
||||
```
|
||||
|
||||
### Add User to Docker Group (requires sudo)
|
||||
```bash
|
||||
# Add current user to docker group
|
||||
sudo usermod -aG docker $USER
|
||||
|
||||
# Verify group membership
|
||||
groups $USER
|
||||
|
||||
# IMPORTANT: Logout and login again for changes to take effect
|
||||
# Or run:
|
||||
newgrp docker
|
||||
|
||||
# Verify Docker works without sudo
|
||||
docker run hello-world
|
||||
```
|
||||
|
||||
### Verify Docker Installation
|
||||
```bash
|
||||
# Check Docker version
|
||||
docker --version
|
||||
docker-compose --version
|
||||
|
||||
# Test Docker installation
|
||||
docker run hello-world
|
||||
|
||||
# Check Docker service status
|
||||
systemctl status docker
|
||||
|
||||
# List Docker images
|
||||
docker images
|
||||
|
||||
# List running containers
|
||||
docker ps
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Complete Workflow Example
|
||||
|
||||
### Scenario: Setting up zim-llm with kiwix-serve
|
||||
|
||||
#### Step 1: Install Docker (requires sudo)
|
||||
```bash
|
||||
# Follow the Docker installation steps in Part 3
|
||||
# Add user to docker group and logout/login
|
||||
```
|
||||
|
||||
#### Step 2: Install zim-llm (as regular user)
|
||||
```bash
|
||||
git clone https://github.com/rouralberto/zim-llm.git
|
||||
cd zim-llm
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
#### Step 3: Download ZIM Files (as regular user)
|
||||
```bash
|
||||
mkdir -p zim_library
|
||||
cd zim_library
|
||||
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim
|
||||
cd ..
|
||||
```
|
||||
|
||||
#### Step 4: Build Vector Database (as regular user)
|
||||
```bash
|
||||
python zim_rag.py build --limit 1000 # Limit for faster testing
|
||||
```
|
||||
|
||||
#### Step 5: Run Queries (as regular user)
|
||||
```bash
|
||||
python zim_rag.py query "What is machine learning?"
|
||||
python zim_rag.py rag-query "Explain neural networks"
|
||||
```
|
||||
|
||||
#### Step 6: (Optional) Run kiwix-serve for Web Interface (as regular user)
|
||||
```bash
|
||||
# If you built kiwix-serve from source
|
||||
kiwix-serve --port=8080 zim_library/library.xml
|
||||
|
||||
# Access at http://localhost:8080
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links and Resources
|
||||
|
||||
### Official Documentation
|
||||
- **Kiwix Website:** https://kiwix.org
|
||||
- **Kiwix Wiki:** https://wiki.kiwix.org
|
||||
- **Kiwix GitHub:** https://github.com/kiwix
|
||||
- **ZIM File Downloads:** https://download.kiwix.org/zim/
|
||||
- **Kiwix Library Browser:** https://library.kiwix.org
|
||||
|
||||
### Repositories
|
||||
- **zim-llm:** https://github.com/rouralberto/zim-llm
|
||||
- **kiwix-serve:** https://github.com/kiwix/kiwix-serve
|
||||
- **libkiwix:** https://github.com/kiwix/libkiwix
|
||||
|
||||
### Related Tools
|
||||
- **Docker Model Runner:** https://github.com/ramses-rf/dmr
|
||||
- **ChromaDB:** https://www.trychroma.com
|
||||
- **FAISS:** https://github.com/facebookresearch/faiss
|
||||
|
||||
---
|
||||
|
||||
## Summary of Privilege Requirements
|
||||
|
||||
| Task | Privilege Level | Command Example |
|
||||
|------|----------------|-----------------|
|
||||
| Install Docker | sudo | `sudo apt-get install docker-ce` |
|
||||
| Add user to docker group | sudo | `sudo usermod -aG docker $USER` |
|
||||
| Clone git repositories | User | `git clone https://github.com/...` |
|
||||
| Install Python packages | User | `pip install -r requirements.txt` |
|
||||
| Download ZIM files | User | `wget https://...` |
|
||||
| Build vector database | User | `python zim_rag.py build` |
|
||||
| Run queries | User | `python zim_rag.py query "..."` |
|
||||
| Build kiwix-serve from source | sudo for deps, then user | `sudo apt-get install libkiwix-dev` then `cmake . && make` |
|
||||
| Run kiwix-serve | User | `kiwix-serve --port=8080 library.xml` |
|
||||
| Pull Docker images | User (if in docker group) | `docker pull image:tag` |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. "Permission denied" when running Docker commands**
|
||||
- Solution: Add user to docker group and logout/login
|
||||
- `sudo usermod -aG docker $USER`
|
||||
|
||||
**2. zim-llm build takes too long**
|
||||
- Solution: Use `--limit` flag to process fewer articles
|
||||
- `python zim_rag.py build --limit 100`
|
||||
|
||||
**3. Out of memory during build**
|
||||
- Solution: Use smaller ZIM files or increase RAM
|
||||
- Consider using FAISS instead of ChromaDB
|
||||
|
||||
**4. kiwix-serve won't compile**
|
||||
- Solution: Ensure all dependencies are installed
|
||||
- `sudo apt-get install libkiwix-dev libmicrohttpd-dev libzim-dev cmake g++`
|
||||
|
||||
**5. Cannot find ZIM files**
|
||||
- Solution: Check https://download.kiwix.org/zim/ for available files
|
||||
- Use smaller files for testing first
|
||||
Reference in New Issue
Block a user