# Pilot Toolkit Web — Deploy Workflow This document describes the workflow for deploying changes from the development machine to the SOL cluster. Keep it close. ## Quick Reference: source change to deployed 1. **Edit source** on dev machine (`~/pilot-toolkit-web`) 2. **Commit and push** to git ```bash git add -A git commit -m "Description of change" git push ``` 3. **SSH to sol0** and pull ```bash ssh sol0 cd ~/pilot-toolkit-web git pull ``` 4. **Rebuild image** ```bash docker build -t git.bennu.duckdns.org/jshackney/pilot-toolkit-web:latest . ``` 5. **Push to registry** ```bash docker push git.bennu.duckdns.org/jshackney/pilot-toolkit-web:latest ``` If it says `denied:`, refresh login: `docker login git.bennu.duckdns.org` 6. **Trigger rolling update** ```bash sudo docker service update --force --with-registry-auth \ --image git.bennu.duckdns.org/jshackney/pilot-toolkit-web:latest \ ptk_ptk ``` 7. **Verify** ```bash sudo docker service ps ptk_ptk ``` Tasks should reach `Running` and stay there past ~2 minutes. ## Mental model Three states must stay in sync: 1. **Source** — your code on disk and in git 2. **Image** — built artifact in local Docker (created by `docker build`) 3. **Registry** — uploaded image in Forgejo (uploaded by `docker push`) Editing source changes #1 only. You must `build` to update #2 and `push` to update #3. Swarm deploys from #3, not from git. A real rebuild produces some layers that say `Pushed`. If `docker push` shows `Layer already exists` for *every* layer, no new build happened. ## When things go wrong | Symptom | Likely cause | Fix | | ------------------------------------------------- | ------------------------------------ | ---------------------------------------------------- | | `docker push` says `denied:` | Stale login session | `docker login git.bennu.duckdns.org` | | `docker push` says HTTP 500 | Forgejo registry hiccup | Retry 2-3 times; works incrementally | | Update says "could not be accessed on a registry" | Worker nodes can't auth | Use `--with-registry-auth` flag | | Tasks die at exactly 95 seconds | Healthcheck failing | Test healthcheck manually inside running container | | Tasks die immediately | Container itself crashes | `docker service logs ptk_ptk` | | Service runs but URL gives 404 | Traefik routing issue | Check Traefik labels in stack.yml match Host() | | Service runs but URL gives 503 | Traefik can't reach container | Check container is on cluster-net | | `docker service ps` shows old timestamps | Looking at task history, not current | Add `--filter desired-state=running` | ## Verification commands ```bash # What image is the service currently configured to run? sudo docker service inspect ptk_ptk --pretty | grep -A1 Image # What's the healthcheck inside the locally-built image? docker inspect git.bennu.duckdns.org/jshackney/pilot-toolkit-web:latest \ --format '{{json .Config.Healthcheck}}' # Show only currently-running tasks (not historical) sudo docker service ps ptk_ptk --filter desired-state=running # Tail logs from running tasks sudo docker service logs -f --tail 50 ptk_ptk ``` ## Stack management ```bash # Show running services sudo docker service ls # Tear down stack entirely sudo docker stack rm ptk # Redeploy from scratch sudo docker stack deploy --with-registry-auth -c /root/ptk-stack.yml ptk # Force restart without changes sudo docker service update --force ptk_ptk ``` ## Healthcheck notes The healthcheck must use `127.0.0.1`, not `localhost`. BusyBox `wget` inside Alpine resolves `localhost` to its IPv6 address (`::1`) first, and nginx is only listening on IPv4. Using the explicit IPv4 address sidesteps the resolution issue entirely. The healthcheck timing is `interval=30s timeout=3s start-period=5s retries=3`, which means a failing healthcheck takes about 95 seconds to mark the container unhealthy. Tasks dying at the 95-second mark is the signature of a broken healthcheck. ## Cluster context | Component | Value | | --------- | ----- | | Cluster | SOL (4× Raspberry Pi 4, ARM64) | | Manager | sol0 | | Registry | `git.bennu.duckdns.org/jshackney/pilot-toolkit-web` | | Service | `ptk_ptk` (in stack `ptk`) | | URL | `https://ptk.bennu.duckdns.org` | | Stack file location on sol0 | `/root/ptk-stack.yml` |