diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 0978217f..751de3c0 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -3,7 +3,7 @@ ## Instructions for Claude - **When the user says "remember" something**: Always update this file (`.claude/CLAUDE.md`) with the information so it persists across sessions - **When discovering new patterns or versions**: Add them to the appropriate section below -- **Use `/update-knowledge` command**: Or edit this file directly to add learnings +- **Skills available**: Check `.claude/skills/` directory for specialized workflows (e.g., `setup-project.md` for deploying new services) ## Execution Environment (CRITICAL) - **Prefer running commands directly first** - only use remote executor as fallback if local execution fails @@ -19,33 +19,98 @@ - **helm**: chart operations - needs cluster access - **docker**: container operations on remote hosts - **ssh**: connections to infrastructure nodes +- **python/pip**: ALL Python and pip commands must run via remote executor - **Any command interacting with**: Proxmox, Kubernetes cluster, NFS server, other infrastructure ### Remote Command Execution (FALLBACK) -For commands that fail locally, use the file-based relay: +For commands that fail locally, use the file-based relay. Uses a **shared executor** at `~/.claude/` on the remote VM. + +**IMPORTANT: Always use multi-session mode** - create a session at the start of each conversation. + +#### Shared Executor Architecture +The executor lives at `~/.claude/` on the remote VM (wizard@10.0.10.10) and serves all projects: +- `~/.claude/remote-executor.sh` - The shared command executor +- `~/.claude/session-exec.sh` - Shared session management +- `~/.claude/sessions/` → symlink to project sessions (or shared sessions directory) + +Each session includes a `workdir.txt` specifying which project directory to use. + +#### Multi-Session Mode (REQUIRED) +Each Claude session gets isolated command execution: -**To execute a remote command:** ```bash -# 1. Write command -echo "your-command-here" > /Volumes/wizard/code/infra/.claude/cmd_input.txt -# 2. Wait and check status -sleep 1 && cat /Volumes/wizard/code/infra/.claude/cmd_status.txt -# 3. Read output (when status is "done:*") -cat /Volumes/wizard/code/infra/.claude/cmd_output.txt +# 1. Create a session (once per Claude session) +SESSION_ID=$(.claude/session-exec.sh) + +# 2. Write command to your session +echo "your-command-here" > .claude/sessions/$SESSION_ID/cmd_input.txt + +# 3. Wait and check status +sleep 1 && cat .claude/sessions/$SESSION_ID/cmd_status.txt + +# 4. Read output (when status is "done:*") +cat .claude/sessions/$SESSION_ID/cmd_output.txt + +# 5. Cleanup when done (optional - auto-cleaned after 24h) +.claude/session-exec.sh $SESSION_ID cleanup ``` **Status values:** `ready` | `running` | `done:N` (N = exit code) -**Requires user to start executor in another terminal:** +**Requires user to start shared executor in another terminal:** ```bash -.claude/remote-executor.sh wizard@10.0.10.10 /home/wizard/code/infra +# On wizard@10.0.10.10: +~/.claude/remote-executor.sh ``` +**Session helper commands:** +- `.claude/session-exec.sh` - Create new session (returns session ID) +- `.claude/session-exec.sh status` - Check session status +- `.claude/session-exec.sh cleanup` - Remove a session +- `.claude/session-exec.sh _ list` - List all active sessions + --- ## Overview Terraform-based infrastructure repository managing a home Kubernetes cluster on Proxmox VMs. Uses git-crypt for secrets encryption. +## Static File Paths (NEVER CHANGE) +- **Main config**: `terraform.tfvars` - All secrets, DNS, Cloudflare config, WireGuard peers +- **Root terraform**: `main.tf` - Proxmox provider, VM templates, kubernetes_cluster module +- **K8s services**: `modules/kubernetes/main.tf` - All service module definitions +- **Secrets**: `secrets/` - git-crypt encrypted TLS certs and keys + +## Network Topology (Static IPs) +``` +┌─────────────────────────────────────────────────────────────────┐ +│ 10.0.10.0/24 - Management Network │ +├─────────────────────────────────────────────────────────────────┤ +│ 10.0.10.10 - Wizard (main server / remote executor host) │ +│ 10.0.10.15 - NFS Server (TrueNAS) - /mnt/main/* │ +└─────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────┐ +│ 10.0.20.0/24 - Kubernetes Network │ +├─────────────────────────────────────────────────────────────────┤ +│ 10.0.20.1 - pfSense Gateway │ +│ 10.0.20.10 - Docker Registry VM (MAC: DE:AD:BE:EF:22:22) │ +│ 10.0.20.100 - k8s-master │ +│ 10.0.20.101 - Technitium DNS │ +│ 10.0.20.102 - MetalLB IP Pool Start │ +│ 10.0.20.200 - MetalLB IP Pool End │ +└─────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────┐ +│ 192.168.1.0/24 - Physical Network │ +├─────────────────────────────────────────────────────────────────┤ +│ 192.168.1.127 - Proxmox Hypervisor │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Domains +- **Public**: `viktorbarzin.me` (Cloudflare-managed) +- **Internal**: `viktorbarzin.lan` (Technitium DNS) + ## Directory Structure - `main.tf` - Main Terraform entry point, imports all modules - `modules/kubernetes/` - Kubernetes service deployments (one folder per service) @@ -77,6 +142,11 @@ volume { ``` Only use PV/PVC when the Helm chart requires `existingClaim` (like the Nextcloud Helm chart). +### Adding NFS Exports +To add a new NFS exported directory (on the remote VM via executor): +1. Edit `nfs_directories.txt` - add the new directory path, keep the list sorted +2. Run `nfs_exports.sh` to create the NFS export + ### Factory Pattern (for multi-user services) Used when a service needs one instance per user. Structure: ``` @@ -92,6 +162,33 @@ To add a new user: 2. Add Cloudflare route in tfvars 3. Add module block in main.tf calling factory +### Init Container Pattern (for database migrations) +Use when a service needs to run database migrations before starting: +```hcl +init_container { + name = "migration" + image = "service-image:tag" + command = ["sh", "-c", "migration-command"] + + dynamic "env" { + for_each = local.common_env + content { + name = env.value.name + value = env.value.value + } + } +} +``` +Example: AFFiNE runs `node ./scripts/self-host-predeploy.js` in init container. + +### SMTP/Email Configuration +When configuring services to use the mailserver: +- **Use public hostname**: `mail.viktorbarzin.me` (for TLS cert validation) +- **Do NOT use**: `mailserver.mailserver.svc.cluster.local` (TLS cert mismatch) +- **Port**: 587 (STARTTLS) +- **Credentials**: Use existing accounts from `mailserver_accounts` in tfvars +- **Common email**: `info@viktorbarzin.me` for service notifications + ## Common Variables - `tls_secret_name` - TLS certificate secret name - `tier` - Deployment tier label @@ -100,6 +197,7 @@ To add a new user: ## Service Versions (as of 2025-01) - Immich: v2.4.1 - Freedify: latest (music streaming, factory pattern) +- AFFiNE: stable (visual canvas, uses PostgreSQL + Redis) ## Useful Commands ```bash @@ -124,26 +222,137 @@ Top-level modules in `main.tf`: - `module.docker-registry-vm` - Docker registry VM - `module.kubernetes_cluster` - Main K8s cluster (contains all services) -### Kubernetes Services (under module.kubernetes_cluster.module.*) -Core (tier 0-1): -- `metallb`, `dbaas`, `technitium`, `nginx-ingress`, `crowdsec`, `cloudflared` -- `redis`, `metrics-server`, `authentik`, `nvidia`, `vaultwarden`, `reverse-proxy` -- `wireguard`, `headscale`, `xray`, `monitoring` +--- -GPU (tier 2): -- `immich`, `frigate`, `ollama`, `ebook2audiobook` +## Complete Service Catalog -Edge/Aux (tier 3-4): -- `blog`, `drone`, `hackmd`, `mailserver`, `privatebin`, `shadowsocks` -- `city-guesser`, `echo`, `url`, `webhook_handler`, `excalidraw`, `travel_blog` -- `dashy`, `send`, `ytdlp`, `uptime-kuma`, `calibre`, `audiobookshelf` -- `paperless-ngx`, `jsoncrack`, `servarr`, `ntfy`, `cyberchef`, `diun` -- `meshcentral`, `nextcloud`, `homepage`, `matrix`, `linkwarden`, `actualbudget` -- `owntracks`, `dawarich`, `changedetection`, `tandoor`, `n8n`, `real-estate-crawler` -- `tor-proxy`, `onlyoffice`, `forgejo`, `freshrss`, `navidrome`, `networking-toolbox` -- `tuya-bridge`, `stirling-pdf`, `isponsorblocktv`, `rybbit`, `wealthfolio` -- `kyverno`, `speedtest`, `freedify`, `netbox`, `f1-stream`, `kms`, `k8s-dashboard` -- `descheduler`, `reloader`, `infra-maintenance` +### DEFCON Level 1 (Critical - Network & Auth) +| Service | Description | Tier | +|---------|-------------|------| +| wireguard | VPN server | core | +| technitium | DNS server (10.0.20.101) | core | +| headscale | Tailscale control server | core | +| nginx-ingress | Ingress controller | core | +| xray | Proxy/tunnel | core | +| authentik | Identity provider (SSO) | core | +| cloudflared | Cloudflare tunnel | core | +| authelia | Auth middleware | core | +| monitoring | Prometheus/Grafana stack | core | + +### DEFCON Level 2 (Storage & Security) +| Service | Description | Tier | +|---------|-------------|------| +| vaultwarden | Bitwarden-compatible password manager | cluster | +| redis | Shared Redis at `redis.redis.svc.cluster.local` | cluster | +| immich | Photo management (GPU) | gpu | +| nvidia | GPU device plugin | gpu | +| metrics-server | K8s metrics | cluster | +| uptime-kuma | Status monitoring | cluster | +| crowdsec | Security/WAF | cluster | +| kyverno | Policy engine | cluster | + +### DEFCON Level 3 (Admin) +| Service | Description | Tier | +|---------|-------------|------| +| k8s-dashboard | Kubernetes dashboard | edge | +| reverse-proxy | Generic reverse proxy | edge | + +### DEFCON Level 4 (Active Use) +| Service | Description | Tier | +|---------|-------------|------| +| mailserver | Email (docker-mailserver) | edge | +| shadowsocks | Proxy | edge | +| webhook_handler | Webhook processing | edge | +| tuya-bridge | Smart home bridge | edge | +| dawarich | Location history | edge | +| owntracks | Location tracking | edge | +| nextcloud | File sync/share | edge | +| calibre | E-book management | edge | +| onlyoffice | Document editing | edge | +| f1-stream | F1 streaming | edge | +| rybbit | Analytics | edge | +| isponsorblocktv | SponsorBlock for TV | edge | +| actualbudget | Budgeting (factory pattern) | aux | + +### DEFCON Level 5 (Optional) +| Service | Description | Tier | +|---------|-------------|------| +| blog | Personal blog | aux | +| descheduler | Pod descheduler | aux | +| drone | CI/CD | aux | +| hackmd | Collaborative markdown | aux | +| kms | Key management | aux | +| privatebin | Encrypted pastebin | aux | +| vault | HashiCorp Vault | aux | +| reloader | ConfigMap/Secret reloader | aux | +| city-guesser | Game | aux | +| echo | Echo server | aux | +| url | URL shortener | aux | +| excalidraw | Whiteboard | aux | +| travel_blog | Travel blog | aux | +| dashy | Dashboard | aux | +| send | Firefox Send | aux | +| ytdlp | YouTube downloader | aux | +| wealthfolio | Finance tracking | aux | +| audiobookshelf | Audiobook server | aux | +| paperless-ngx | Document management | aux | +| jsoncrack | JSON visualizer | aux | +| servarr | Media automation (Sonarr/Radarr/etc) | aux | +| ntfy | Push notifications | aux | +| cyberchef | Data transformation | aux | +| diun | Docker image update notifier | aux | +| meshcentral | Remote management | aux | +| homepage | Dashboard/startpage | aux | +| matrix | Matrix chat server | aux | +| linkwarden | Bookmark manager | aux | +| changedetection | Web change detection | aux | +| tandoor | Recipe manager | aux | +| n8n | Workflow automation | aux | +| real-estate-crawler | Property crawler | aux | +| tor-proxy | Tor proxy | aux | +| forgejo | Git forge | aux | +| freshrss | RSS reader | aux | +| navidrome | Music streaming | aux | +| networking-toolbox | Network tools | aux | +| stirling-pdf | PDF tools | aux | +| speedtest | Speed testing | aux | +| freedify | Music streaming (factory pattern) | aux | +| netbox | Network documentation | aux | +| infra-maintenance | Maintenance jobs | aux | +| ollama | LLM server (GPU) | gpu | +| frigate | NVR/camera (GPU) | gpu | +| ebook2audiobook | E-book to audio (GPU) | gpu | +| affine | Visual canvas/whiteboard (PostgreSQL + Redis) | aux | + +--- + +## Cloudflare Domains + +### Proxied (CDN + WAF enabled) +``` +blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send, +audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden, +changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser, +travel, netbox +``` + +### Non-Proxied (Direct DNS) +``` +mail, wg, headscale, immich, calibre, vaultwarden, drone, +mailserver-antispam, mailserver-admin, webhook, uptime, +owntracks, dawarich, tuya, meshcentral, nextcloud, actualbudget, +onlyoffice, forgejo, freshrss, navidrome, ollama, openwebui, +isponsorblocktv, speedtest, freedify, rybbit, paperless, +servarr, prowlarr, bazarr, radarr, sonarr, flaresolverr, +jellyfin, jellyseerr, tdarr, affine +``` + +### Special Subdomains +- `*.viktor.actualbudget` - Actualbudget factory instances +- `*.freedify` - Freedify factory instances +- `mailserver.*` - Mail server components (antispam, admin) + +--- ## CI/CD - Drone CI (`.drone.yml`) for automated deployments @@ -152,10 +361,19 @@ Edge/Aux (tier 3-4): - **After committing, run `git push origin master`** to sync changes ## Infrastructure -- Proxmox hypervisor for VMs +- Proxmox hypervisor for VMs (192.168.1.127) - Kubernetes cluster with GPU node (5 nodes: k8s-master + k8s-node1-4, running v1.34.2) - NFS server at 10.0.10.15 for storage - Redis shared service at `redis.redis.svc.cluster.local` +- Docker registry at 10.0.20.10 + +### GPU Node (k8s-node1) +- **Taint**: `nvidia.com/gpu=true:NoSchedule` - Only GPU workloads can run here +- **Label**: `gpu=true` +- GPU workloads must have both: + - `node_selector = { "gpu": "true" }` + - `toleration { key = "nvidia.com/gpu", operator = "Equal", value = "true", effect = "NoSchedule" }` +- Taint is applied via `null_resource.gpu_node_taint` in `modules/kubernetes/nvidia/main.tf` ## Git Operations (IMPORTANT) - **Git is slow** on this repo due to many files - commands can take 30+ seconds @@ -170,10 +388,72 @@ Edge/Aux (tier 3-4): - Groups: "R730 Host", "Nvidia Tesla T4 GPU", "Power", "Cluster" - kube-state-metrics provides: `kube_deployment_*`, `kube_statefulset_*`, `kube_daemonset_*` -## DEFCON Levels -Services are organized by criticality in `modules/kubernetes/main.tf`: -- Level 1 (Critical): wireguard, technitium, headscale, nginx-ingress, xray, authentik, cloudflare, monitoring -- Level 2 (Storage): vaultwarden, redis, immich, nvidia, metrics-server, uptime-kuma, crowdsec, kyverno -- Level 3 (Admin): k8s-dashboard, reverse-proxy -- Level 4 (Active): mailserver, shadowsocks, dawarich, nextcloud, calibre, actualbudget, etc. -- Level 5 (Optional): blog, drone, hackmd, ollama, servarr, paperless-ngx, etc. +## Tier System +- **0-core**: Critical infrastructure (ingress, DNS, VPN, auth) +- **1-cluster**: Cluster services (Redis, metrics, security) +- **2-gpu**: GPU workloads (Immich, Ollama, Frigate) +- **3-edge**: User-facing services +- **4-aux**: Optional/auxiliary services + +--- + +## User Preferences + +### Calendar +- **Default calendar**: Nextcloud (always use unless otherwise specified) +- **Nextcloud URL**: `https://nextcloud.viktorbarzin.me` +- **CalDAV endpoint**: `https://nextcloud.viktorbarzin.me/remote.php/dav/calendars///` + +### Home Assistant +- **Default smart home**: Home Assistant (always use for smart home control) +- **HA URL**: `https://ha-london.viktorbarzin.me` +- **Script**: `.claude/home-assistant.py` +- **Aliases**: "ha" or "HA" = Home Assistant + +### Remote Executor +- **Always use multi-session mode** - never use legacy single-file mode +- Create a session at the start of each conversation with `.claude/session-exec.sh` +- Use session files at `.claude/sessions/$SESSION_ID/` for all remote commands + +### Development +- **Frontend framework**: Svelte (user is learning it, so use Svelte for all new web apps) + +--- + +## Skills & Workflows + +Skills are specialized workflows for common tasks. Located in `.claude/skills/`. + +### Available Skills + +**setup-project** (`.claude/skills/setup-project.md`) +- Deploy new self-hosted services from GitHub repos +- Automated workflow: Docker image → Terraform module → Deploy +- Handles database setup, ingress, DNS configuration +- **When to use**: User provides GitHub URL or wants to deploy a new service +- **Example**: "Deploy [GitHub repo] to the cluster" + +**setup-remote-executor** (`.claude/skills/setup-remote-executor.md`) +- Set up shared remote executor in new projects +- Creates session-exec.sh wrapper for the shared executor +- **When to use**: Adding Claude Code support to a new project +- **Example**: "Set up remote executor for this project" + +--- + +## Service-Specific Notes + +### AFFiNE (Visual Canvas) +- **Image**: `ghcr.io/toeverything/affine:stable` +- **Port**: 3010 +- **Requires**: PostgreSQL + Redis +- **Migration**: Init container runs `node ./scripts/self-host-predeploy.js` +- **Storage**: NFS at `/mnt/main/affine` mounted to `/root/.affine/storage` and `/root/.affine/config` +- **Key env vars**: + - `AFFINE_SERVER_EXTERNAL_URL` - Public URL (e.g., `https://affine.viktorbarzin.me`) + - `AFFINE_SERVER_HTTPS` - Set to `true` behind TLS ingress + - `DATABASE_URL` - PostgreSQL connection string + - `REDIS_SERVER_HOST` - Redis hostname + - `MAILER_*` - SMTP configuration for email invites +- **Local-first**: Data stored in browser by default; syncs to server when user creates account +- **Docs**: https://docs.affine.pro/self-host-affine diff --git a/.claude/calendar-query.py b/.claude/calendar-query.py new file mode 100644 index 00000000..4f985c10 --- /dev/null +++ b/.claude/calendar-query.py @@ -0,0 +1,391 @@ +#!/usr/bin/env python3 +""" +Nextcloud CalDAV Calendar Script +Queries and creates calendar events. +""" + +import argparse +import json +import os +import sys +import uuid +from datetime import datetime, timedelta +from urllib.parse import urljoin + +try: + import caldav + from icalendar import Calendar, Event, vText +except ImportError: + print("ERROR: Required packages not installed. Run:") + print(" pip install caldav icalendar") + sys.exit(1) + +# Configuration from environment variables +NEXTCLOUD_URL = os.environ.get("NEXTCLOUD_URL", "https://nextcloud.viktorbarzin.me") +CALDAV_URL = f"{NEXTCLOUD_URL}/remote.php/dav" +USERNAME = os.environ.get("NEXTCLOUD_USER") +APP_PASSWORD = os.environ.get("NEXTCLOUD_APP_PASSWORD") + +if not USERNAME or not APP_PASSWORD: + print("ERROR: NEXTCLOUD_USER and NEXTCLOUD_APP_PASSWORD environment variables must be set.") + print("These should be set when activating the Claude venv (~/.venvs/claude)") + sys.exit(1) + + +def get_client(): + """Create CalDAV client connection.""" + return caldav.DAVClient( + url=CALDAV_URL, + username=USERNAME, + password=APP_PASSWORD + ) + + +def list_calendars(): + """List all available calendars.""" + client = get_client() + principal = client.principal() + calendars = principal.calendars() + + result = [] + for cal in calendars: + result.append({ + "name": cal.name, + "url": str(cal.url) + }) + return result + + +def get_events(calendar_name=None, start_date=None, end_date=None, days=7): + """Get events from calendar(s) within a date range.""" + client = get_client() + principal = client.principal() + calendars = principal.calendars() + + if start_date is None: + start_date = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) + if end_date is None: + end_date = start_date + timedelta(days=days) + + all_events = [] + + for cal in calendars: + if calendar_name and cal.name.lower() != calendar_name.lower(): + continue + + try: + events = cal.search(start=start_date, end=end_date, expand=True) + + for event in events: + try: + ical = Calendar.from_ical(event.data) + for component in ical.walk(): + if component.name == "VEVENT": + event_data = { + "calendar": cal.name, + "summary": str(component.get("summary", "No title")), + "start": None, + "end": None, + "location": str(component.get("location", "")) or None, + "description": str(component.get("description", "")) or None, + "all_day": False + } + + dtstart = component.get("dtstart") + dtend = component.get("dtend") + + if dtstart: + dt = dtstart.dt + if hasattr(dt, 'hour'): + event_data["start"] = dt.strftime("%Y-%m-%d %H:%M") + else: + event_data["start"] = dt.strftime("%Y-%m-%d") + event_data["all_day"] = True + + if dtend: + dt = dtend.dt + if hasattr(dt, 'hour'): + event_data["end"] = dt.strftime("%Y-%m-%d %H:%M") + else: + event_data["end"] = dt.strftime("%Y-%m-%d") + + all_events.append(event_data) + except Exception as e: + pass # Skip malformed events + + except Exception as e: + print(f"Warning: Could not fetch from {cal.name}: {e}", file=sys.stderr) + + # Sort by start date + all_events.sort(key=lambda x: x["start"] or "") + return all_events + + +def create_event(summary, start_time, end_time=None, calendar_name="Personal", + location=None, description=None, all_day=False): + """Create a new calendar event.""" + client = get_client() + principal = client.principal() + calendars = principal.calendars() + + # Find the target calendar + target_cal = None + for cal in calendars: + if cal.name.lower() == calendar_name.lower(): + target_cal = cal + break + + if not target_cal: + # Try partial match + for cal in calendars: + if calendar_name.lower() in cal.name.lower(): + target_cal = cal + break + + if not target_cal: + raise ValueError(f"Calendar '{calendar_name}' not found. Available: {[c.name for c in calendars]}") + + # Create the event + cal = Calendar() + cal.add('prodid', '-//Claude Calendar Script//viktorbarzin.me//') + cal.add('version', '2.0') + + event = Event() + event.add('summary', summary) + event.add('uid', str(uuid.uuid4())) + event.add('dtstamp', datetime.now()) + + if all_day: + event.add('dtstart', start_time.date()) + if end_time: + event.add('dtend', end_time.date()) + else: + event.add('dtend', (start_time + timedelta(days=1)).date()) + else: + event.add('dtstart', start_time) + if end_time: + event.add('dtend', end_time) + else: + # Default to 1 hour duration + event.add('dtend', start_time + timedelta(hours=1)) + + if location: + event.add('location', location) + if description: + event.add('description', description) + + cal.add_component(event) + + # Save to calendar + target_cal.save_event(cal.to_ical().decode('utf-8')) + + return { + "status": "created", + "summary": summary, + "calendar": target_cal.name, + "start": start_time.strftime("%Y-%m-%d %H:%M") if not all_day else start_time.strftime("%Y-%m-%d"), + "end": end_time.strftime("%Y-%m-%d %H:%M") if end_time and not all_day else None + } + + +def format_events(events, output_format="text"): + """Format events for display.""" + if output_format == "json": + return json.dumps(events, indent=2) + + if not events: + return "No events found." + + lines = [] + current_date = None + + for event in events: + event_date = event["start"][:10] if event["start"] else "Unknown" + + if event_date != current_date: + current_date = event_date + try: + dt = datetime.strptime(event_date, "%Y-%m-%d") + lines.append(f"\n## {dt.strftime('%A, %B %d, %Y')}") + except: + lines.append(f"\n## {event_date}") + + time_str = "" + if not event["all_day"] and event["start"]: + time_str = event["start"][11:16] + if event["end"]: + time_str += f" - {event['end'][11:16]}" + else: + time_str = "All day" + + line = f"- **{event['summary']}** ({time_str})" + if event["location"]: + line += f" @ {event['location']}" + if event["calendar"] != "personal": + line += f" [{event['calendar']}]" + lines.append(line) + + if event["description"]: + # Truncate long descriptions + desc = event["description"][:200] + if len(event["description"]) > 200: + desc += "..." + lines.append(f" {desc}") + + return "\n".join(lines) + + +def parse_date_arg(date_str): + """Parse flexible date arguments.""" + today = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) + + if date_str == "today": + return today, today + timedelta(days=1) + elif date_str == "tomorrow": + return today + timedelta(days=1), today + timedelta(days=2) + elif date_str == "week" or date_str == "this week": + # Start from today, go to end of week (Sunday) + days_until_sunday = 6 - today.weekday() + return today, today + timedelta(days=days_until_sunday + 1) + elif date_str == "next week": + days_until_next_monday = 7 - today.weekday() + start = today + timedelta(days=days_until_next_monday) + return start, start + timedelta(days=7) + elif date_str == "month" or date_str == "this month": + return today, today + timedelta(days=30) + else: + # Try to parse as a date + try: + dt = datetime.strptime(date_str, "%Y-%m-%d") + return dt, dt + timedelta(days=1) + except: + return today, today + timedelta(days=7) + + +def parse_datetime(dt_str): + """Parse flexible datetime strings.""" + today = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) + + # Handle relative dates with time + if dt_str.startswith("today "): + time_part = dt_str.replace("today ", "") + try: + t = datetime.strptime(time_part, "%H:%M") + return today.replace(hour=t.hour, minute=t.minute) + except: + pass + + if dt_str.startswith("tomorrow "): + time_part = dt_str.replace("tomorrow ", "") + try: + t = datetime.strptime(time_part, "%H:%M") + return (today + timedelta(days=1)).replace(hour=t.hour, minute=t.minute) + except: + pass + + # Try full datetime format + for fmt in ["%Y-%m-%d %H:%M", "%Y-%m-%d %H:%M:%S", "%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]: + try: + return datetime.strptime(dt_str, fmt) + except: + continue + + # Try date only + try: + return datetime.strptime(dt_str, "%Y-%m-%d") + except: + pass + + raise ValueError(f"Could not parse datetime: {dt_str}. Use 'YYYY-MM-DD HH:MM' or 'tomorrow HH:MM'") + + +def main(): + parser = argparse.ArgumentParser(description="Query and manage Nextcloud Calendar") + parser.add_argument("command", choices=["list", "events", "today", "tomorrow", "week", "month", "create"], + help="Command to run") + parser.add_argument("--calendar", "-c", default="Personal", help="Calendar name (default: Personal)") + parser.add_argument("--days", "-d", type=int, default=7, help="Number of days to fetch") + parser.add_argument("--json", action="store_true", help="Output as JSON") + parser.add_argument("--date", help="Specific date (YYYY-MM-DD) or relative (today, tomorrow, week, month)") + # Create event options + parser.add_argument("--title", "-t", help="Event title (for create)") + parser.add_argument("--start", "-s", help="Start time: 'YYYY-MM-DD HH:MM' or 'tomorrow 10:00'") + parser.add_argument("--end", "-e", help="End time: 'YYYY-MM-DD HH:MM' (optional, defaults to +1 hour)") + parser.add_argument("--location", "-l", help="Event location") + parser.add_argument("--description", help="Event description") + parser.add_argument("--all-day", action="store_true", help="Create all-day event") + + args = parser.parse_args() + output_format = "json" if args.json else "text" + + try: + if args.command == "list": + calendars = list_calendars() + if output_format == "json": + print(json.dumps(calendars, indent=2)) + else: + print("Available calendars:") + for cal in calendars: + print(f" - {cal['name']}") + + elif args.command == "events": + if args.date: + start, end = parse_date_arg(args.date) + else: + start = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) + end = start + timedelta(days=args.days) + + events = get_events( + calendar_name=args.calendar, + start_date=start, + end_date=end + ) + print(format_events(events, output_format)) + + elif args.command in ["today", "tomorrow", "week", "month"]: + start, end = parse_date_arg(args.command) + events = get_events( + calendar_name=args.calendar, + start_date=start, + end_date=end + ) + print(format_events(events, output_format)) + + elif args.command == "create": + if not args.title: + print("ERROR: --title is required for create command", file=sys.stderr) + sys.exit(1) + if not args.start: + print("ERROR: --start is required for create command", file=sys.stderr) + sys.exit(1) + + # Parse start time + start_time = parse_datetime(args.start) + end_time = parse_datetime(args.end) if args.end else None + + result = create_event( + summary=args.title, + start_time=start_time, + end_time=end_time, + calendar_name=args.calendar, + location=args.location, + description=args.description, + all_day=args.all_day + ) + + if output_format == "json": + print(json.dumps(result, indent=2)) + else: + print(f"Event created: {result['summary']}") + print(f" Calendar: {result['calendar']}") + print(f" Start: {result['start']}") + if result['end']: + print(f" End: {result['end']}") + + except Exception as e: + print(f"Error: {e}", file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/.claude/home-assistant.py b/.claude/home-assistant.py new file mode 100644 index 00000000..3da35fd8 --- /dev/null +++ b/.claude/home-assistant.py @@ -0,0 +1,373 @@ +#!/usr/bin/env python3 +""" +Home Assistant API Script +Control and query Home Assistant entities. +""" + +import argparse +import json +import os +import sys +from urllib.parse import urljoin + +try: + import requests +except ImportError: + print("ERROR: Required package not installed. Run:") + print(" pip install requests") + sys.exit(1) + +# Configuration from environment variables +HA_URL = os.environ.get("HOME_ASSISTANT_URL", "").rstrip("/") +HA_TOKEN = os.environ.get("HOME_ASSISTANT_TOKEN") + +if not HA_URL or not HA_TOKEN: + print("ERROR: HOME_ASSISTANT_URL and HOME_ASSISTANT_TOKEN environment variables must be set.") + print("These should be set when activating the Claude venv (~/.venvs/claude)") + sys.exit(1) + +HEADERS = { + "Authorization": f"Bearer {HA_TOKEN}", + "Content-Type": "application/json", +} + + +def api_get(endpoint): + """Make GET request to HA API.""" + url = f"{HA_URL}/api/{endpoint}" + response = requests.get(url, headers=HEADERS, timeout=30) + response.raise_for_status() + return response.json() + + +def api_post(endpoint, data=None): + """Make POST request to HA API.""" + url = f"{HA_URL}/api/{endpoint}" + response = requests.post(url, headers=HEADERS, json=data or {}, timeout=30) + response.raise_for_status() + return response.json() if response.text else {} + + +def get_states(): + """Get all entity states.""" + return api_get("states") + + +def get_state(entity_id): + """Get state of a specific entity.""" + return api_get(f"states/{entity_id}") + + +def get_services(): + """Get all available services.""" + return api_get("services") + + +def call_service(domain, service, entity_id=None, data=None): + """Call a Home Assistant service.""" + payload = data or {} + if entity_id: + payload["entity_id"] = entity_id + return api_post(f"services/{domain}/{service}", payload) + + +def list_entities(domain_filter=None, area_filter=None): + """List all entities, optionally filtered by domain or area.""" + states = get_states() + entities = [] + + for state in states: + entity_id = state["entity_id"] + domain = entity_id.split(".")[0] + + if domain_filter and domain != domain_filter: + continue + + entities.append({ + "entity_id": entity_id, + "state": state["state"], + "friendly_name": state["attributes"].get("friendly_name", entity_id), + "domain": domain, + }) + + # Sort by domain, then entity_id + entities.sort(key=lambda x: (x["domain"], x["entity_id"])) + return entities + + +def turn_on(entity_id): + """Turn on an entity.""" + domain = entity_id.split(".")[0] + return call_service(domain, "turn_on", entity_id) + + +def turn_off(entity_id): + """Turn off an entity.""" + domain = entity_id.split(".")[0] + return call_service(domain, "turn_off", entity_id) + + +def toggle(entity_id): + """Toggle an entity.""" + domain = entity_id.split(".")[0] + return call_service(domain, "toggle", entity_id) + + +def set_value(entity_id, value): + """Set value for input entities (input_number, input_text, etc.).""" + domain = entity_id.split(".")[0] + + if domain == "input_number": + return call_service(domain, "set_value", entity_id, {"value": float(value)}) + elif domain == "input_text": + return call_service(domain, "set_value", entity_id, {"value": str(value)}) + elif domain == "input_boolean": + if value.lower() in ("true", "on", "1", "yes"): + return turn_on(entity_id) + else: + return turn_off(entity_id) + elif domain == "input_select": + return call_service(domain, "select_option", entity_id, {"option": str(value)}) + elif domain == "light": + # Assume value is brightness percentage + return call_service(domain, "turn_on", entity_id, {"brightness_pct": int(value)}) + elif domain == "climate": + return call_service(domain, "set_temperature", entity_id, {"temperature": float(value)}) + elif domain == "cover": + return call_service(domain, "set_cover_position", entity_id, {"position": int(value)}) + else: + print(f"Warning: set_value not implemented for domain '{domain}'", file=sys.stderr) + return {} + + +def run_script(script_id): + """Run a script.""" + if not script_id.startswith("script."): + script_id = f"script.{script_id}" + return call_service("script", "turn_on", script_id) + + +def run_scene(scene_id): + """Activate a scene.""" + if not scene_id.startswith("scene."): + scene_id = f"scene.{scene_id}" + return call_service("scene", "turn_on", scene_id) + + +def send_notification(message, title=None, target="notify"): + """Send a notification.""" + data = {"message": message} + if title: + data["title"] = title + return call_service("notify", target, data=data) + + +def format_entities(entities, output_format="text"): + """Format entities for display.""" + if output_format == "json": + return json.dumps(entities, indent=2) + + if not entities: + return "No entities found." + + lines = [] + current_domain = None + + for entity in entities: + if entity["domain"] != current_domain: + current_domain = entity["domain"] + lines.append(f"\n## {current_domain}") + + state = entity["state"] + name = entity["friendly_name"] + eid = entity["entity_id"] + + # Color-code common states + if state in ("on", "home", "open", "playing"): + state_display = f"[ON] {state}" + elif state in ("off", "away", "closed", "idle", "paused"): + state_display = f"[--] {state}" + elif state == "unavailable": + state_display = "[??] unavailable" + else: + state_display = state + + lines.append(f"- {name}: {state_display}") + lines.append(f" `{eid}`") + + return "\n".join(lines) + + +def search_entities(query): + """Search entities by name or ID.""" + query = query.lower() + states = get_states() + matches = [] + + for state in states: + entity_id = state["entity_id"] + friendly_name = state["attributes"].get("friendly_name", "").lower() + + if query in entity_id.lower() or query in friendly_name: + matches.append({ + "entity_id": entity_id, + "state": state["state"], + "friendly_name": state["attributes"].get("friendly_name", entity_id), + "domain": entity_id.split(".")[0], + }) + + matches.sort(key=lambda x: (x["domain"], x["entity_id"])) + return matches + + +def main(): + parser = argparse.ArgumentParser(description="Control Home Assistant") + subparsers = parser.add_subparsers(dest="command", help="Command to run") + + # List command + list_parser = subparsers.add_parser("list", help="List entities") + list_parser.add_argument("--domain", "-d", help="Filter by domain (light, switch, sensor, etc.)") + list_parser.add_argument("--json", action="store_true", help="Output as JSON") + + # Search command + search_parser = subparsers.add_parser("search", help="Search entities") + search_parser.add_argument("query", help="Search query") + search_parser.add_argument("--json", action="store_true", help="Output as JSON") + + # State command + state_parser = subparsers.add_parser("state", help="Get entity state") + state_parser.add_argument("entity_id", help="Entity ID") + state_parser.add_argument("--json", action="store_true", help="Output as JSON") + + # On command + on_parser = subparsers.add_parser("on", help="Turn on entity") + on_parser.add_argument("entity_id", help="Entity ID") + + # Off command + off_parser = subparsers.add_parser("off", help="Turn off entity") + off_parser.add_argument("entity_id", help="Entity ID") + + # Toggle command + toggle_parser = subparsers.add_parser("toggle", help="Toggle entity") + toggle_parser.add_argument("entity_id", help="Entity ID") + + # Set command + set_parser = subparsers.add_parser("set", help="Set entity value") + set_parser.add_argument("entity_id", help="Entity ID") + set_parser.add_argument("value", help="Value to set") + + # Script command + script_parser = subparsers.add_parser("script", help="Run a script") + script_parser.add_argument("script_id", help="Script ID (with or without 'script.' prefix)") + + # Scene command + scene_parser = subparsers.add_parser("scene", help="Activate a scene") + scene_parser.add_argument("scene_id", help="Scene ID (with or without 'scene.' prefix)") + + # Service command + service_parser = subparsers.add_parser("service", help="Call a service") + service_parser.add_argument("domain", help="Service domain") + service_parser.add_argument("service", help="Service name") + service_parser.add_argument("--entity", "-e", help="Entity ID") + service_parser.add_argument("--data", "-d", help="JSON data") + + # Services list command + services_parser = subparsers.add_parser("services", help="List available services") + services_parser.add_argument("--domain", "-d", help="Filter by domain") + services_parser.add_argument("--json", action="store_true", help="Output as JSON") + + # Notify command + notify_parser = subparsers.add_parser("notify", help="Send notification") + notify_parser.add_argument("message", help="Notification message") + notify_parser.add_argument("--title", "-t", help="Notification title") + notify_parser.add_argument("--target", default="notify", help="Notification target (default: notify)") + + args = parser.parse_args() + + if not args.command: + parser.print_help() + sys.exit(1) + + try: + if args.command == "list": + entities = list_entities(domain_filter=args.domain) + output_format = "json" if args.json else "text" + print(format_entities(entities, output_format)) + + elif args.command == "search": + entities = search_entities(args.query) + output_format = "json" if args.json else "text" + print(format_entities(entities, output_format)) + + elif args.command == "state": + state = get_state(args.entity_id) + if args.json: + print(json.dumps(state, indent=2)) + else: + print(f"Entity: {state['entity_id']}") + print(f"State: {state['state']}") + print(f"Name: {state['attributes'].get('friendly_name', 'N/A')}") + if state['attributes']: + print("Attributes:") + for key, value in state['attributes'].items(): + if key != 'friendly_name': + print(f" {key}: {value}") + + elif args.command == "on": + turn_on(args.entity_id) + print(f"Turned on: {args.entity_id}") + + elif args.command == "off": + turn_off(args.entity_id) + print(f"Turned off: {args.entity_id}") + + elif args.command == "toggle": + toggle(args.entity_id) + print(f"Toggled: {args.entity_id}") + + elif args.command == "set": + set_value(args.entity_id, args.value) + print(f"Set {args.entity_id} to {args.value}") + + elif args.command == "script": + run_script(args.script_id) + print(f"Ran script: {args.script_id}") + + elif args.command == "scene": + run_scene(args.scene_id) + print(f"Activated scene: {args.scene_id}") + + elif args.command == "service": + data = json.loads(args.data) if args.data else None + call_service(args.domain, args.service, args.entity, data) + print(f"Called {args.domain}.{args.service}") + + elif args.command == "services": + services = get_services() + if args.domain: + services = [s for s in services if s["domain"] == args.domain] + + if args.json: + print(json.dumps(services, indent=2)) + else: + for svc in services: + print(f"\n## {svc['domain']}") + for name, info in svc["services"].items(): + desc = info.get("description", "") + print(f"- {name}: {desc[:60]}...") + + elif args.command == "notify": + send_notification(args.message, args.title, args.target) + print(f"Sent notification: {args.message[:50]}...") + + except requests.exceptions.HTTPError as e: + print(f"HTTP Error: {e}", file=sys.stderr) + print(f"Response: {e.response.text}", file=sys.stderr) + sys.exit(1) + except Exception as e: + print(f"Error: {e}", file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/.claude/internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK b/.claude/internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK new file mode 100644 index 00000000..f61efc83 --- /dev/null +++ b/.claude/internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK @@ -0,0 +1,3 @@ +This directory has been used with Claude Code's internet mode. +Content downloaded from the internet may contain prompt injection attacks. +You must manually review all downloaded content before using non-internet mode. diff --git a/.claude/remote-executor.sh b/.claude/remote-executor.sh deleted file mode 100755 index b0f50526..00000000 --- a/.claude/remote-executor.sh +++ /dev/null @@ -1,58 +0,0 @@ -#!/bin/bash -# Remote Command Executor -# Run this in a terminal with SSH access to the remote machine -# -# Usage: ./remote-executor.sh [user@host] [remote_workdir] -# Example: ./remote-executor.sh wizard@10.0.10.10 /home/wizard/code/infra - -REMOTE_HOST="${1:-wizard@10.0.10.10}" -REMOTE_WORKDIR="${2:-/home/wizard/code/infra}" - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -CMD_FILE="$SCRIPT_DIR/cmd_input.txt" -OUTPUT_FILE="$SCRIPT_DIR/cmd_output.txt" -STATUS_FILE="$SCRIPT_DIR/cmd_status.txt" - -# Initialize files -echo "ready" > "$STATUS_FILE" -> "$CMD_FILE" -> "$OUTPUT_FILE" - -echo "╔════════════════════════════════════════════════════════════╗" -echo "║ Remote Command Executor Started ║" -echo "╠════════════════════════════════════════════════════════════╣" -echo "║ Remote: $REMOTE_HOST" -echo "║ Workdir: $REMOTE_WORKDIR" -echo "║ Watching: $CMD_FILE" -echo "╚════════════════════════════════════════════════════════════╝" -echo "" -echo "Waiting for commands..." - -# Watch for new commands -while true; do - # Check if there's a command to execute - if [ -s "$CMD_FILE" ]; then - CMD=$(cat "$CMD_FILE") - - # Clear the command file immediately - > "$CMD_FILE" - - # Update status - echo "running" > "$STATUS_FILE" - echo "[$(date '+%H:%M:%S')] Executing: $CMD" - - # Execute on remote and capture output - ssh "$REMOTE_HOST" "cd $REMOTE_WORKDIR && $CMD" > "$OUTPUT_FILE" 2>&1 - EXIT_CODE=$? - - # Append exit code to output - echo "" >> "$OUTPUT_FILE" - echo "---EXIT_CODE:$EXIT_CODE---" >> "$OUTPUT_FILE" - - # Update status - echo "done:$EXIT_CODE" > "$STATUS_FILE" - echo "[$(date '+%H:%M:%S')] Done (exit: $EXIT_CODE)" - fi - - sleep 0.2 -done diff --git a/.claude/session_id.txt b/.claude/session_id.txt new file mode 100644 index 00000000..7340af50 --- /dev/null +++ b/.claude/session_id.txt @@ -0,0 +1 @@ +1769818605-3230-4984 diff --git a/.claude/skills/bluestacks-burp-interception/SKILL.md b/.claude/skills/bluestacks-burp-interception/SKILL.md new file mode 100644 index 00000000..b9837dee --- /dev/null +++ b/.claude/skills/bluestacks-burp-interception/SKILL.md @@ -0,0 +1,175 @@ +--- +name: bluestacks-burp-interception +description: | + Intercept Android app HTTPS traffic using BlueStacks and Burp Suite on macOS. + Use when: (1) Need to analyze Android app API calls, (2) App ignores HTTP proxy, + (3) App uses SSL pinning that blocks interception, (4) Need to install Burp CA + as system certificate. Covers ADB setup, proxy configuration, Zygisk SSL unpinning, + and Magisk trustusercerts module for system CA installation. +author: Claude Code +version: 1.0.0 +date: 2026-01-24 +--- + +# BlueStacks + Burp Suite HTTPS Traffic Interception + +## Problem +You want to intercept HTTPS traffic from an Android app running in BlueStacks to analyze +API calls, but the app either ignores the proxy or uses SSL certificate pinning. + +## Context / Trigger Conditions +- Running BlueStacks on macOS with Burp Suite +- App traffic not appearing in Burp Suite +- App crashes or refuses to connect when proxy is set +- Need to bypass SSL pinning for security testing/research + +## Prerequisites +- BlueStacks with Magisk (kitsune variant) and root enabled +- Zygisk-SSL-Unpinning module installed +- trustusercerts Magisk module installed +- Android SDK installed (for ADB) +- Burp Suite running on port 8080 + +## Solution + +### Step 1: Connect ADB to BlueStacks + +```bash +# ADB location on macOS (Android SDK) +ADB=~/Library/Android/sdk/platform-tools/adb + +# Connect to BlueStacks +$ADB connect localhost:5555 + +# Verify connection +$ADB devices +# Should show: emulator-5554 or localhost:5555 +``` + +Note: BlueStacks runs **arm64-v8a** (not x86 as you might expect). + +### Step 2: Set HTTP Proxy + +Use your Mac's WiFi IP address (not 10.0.2.2 or localhost): + +```bash +# Get Mac WiFi IP +IP=$(ipconfig getifaddr en0) + +# Set proxy (Burp default port 8080) +$ADB shell settings put global http_proxy ${IP}:8080 + +# Verify +$ADB shell settings get global http_proxy + +# Disable proxy when done +$ADB shell settings put global http_proxy :0 +``` + +### Step 3: Configure SSL Unpinning for Target App + +```bash +# Find app package name +$ADB shell pm list packages | grep + +# Edit config +$ADB shell "su -c 'cat > /data/local/tmp/zyg.ssl/config.json << EOF +{ + \"targets\": [ + { + \"pkg_name\" : \"com.example.app\", + \"enable\": true, + \"start_safe\": true, + \"start_delay\": 1000 + } + ] +} +EOF'" + +# Restart the app +$ADB shell am force-stop com.example.app +$ADB shell monkey -p com.example.app -c android.intent.category.LAUNCHER 1 + +# Verify SSL unpinning is active +$ADB shell "logcat -d | grep -i ZygiskSSL | tail -10" +# Should show: "App detected: com.example.app" and "[*] SSL UNPINNING [#]" +``` + +### Step 4: Install Burp CA as System Certificate + +```bash +# Download Burp CA cert +curl -x http://127.0.0.1:8080 http://burp/cert -o /tmp/burp-cert.der + +# Convert to PEM +openssl x509 -inform DER -in /tmp/burp-cert.der -out /tmp/burp-cert.pem + +# Get hash for Android cert store naming +HASH=$(openssl x509 -inform PEM -subject_hash_old -in /tmp/burp-cert.pem | head -1) +cp /tmp/burp-cert.pem /tmp/${HASH}.0 + +# Push to device +$ADB push /tmp/${HASH}.0 /sdcard/ + +# Install via trustusercerts Magisk module +$ADB shell "su -c 'cp /sdcard/${HASH}.0 /data/adb/modules/trustusercerts/system/etc/security/cacerts/'" +$ADB shell "su -c 'chmod 644 /data/adb/modules/trustusercerts/system/etc/security/cacerts/${HASH}.0'" + +# Reboot required for Magisk overlay +$ADB shell "su -c 'reboot'" + +# After reboot, verify cert is in system store +$ADB shell "su -c 'ls /system/etc/security/cacerts/${HASH}.0'" +``` + +### Step 5: Test Interception + +1. Re-enable proxy after reboot: `$ADB shell settings put global http_proxy ${IP}:8080` +2. Launch target app +3. Check Burp Suite → Proxy → HTTP history for requests + +## Verification + +- Proxy set: `adb shell settings get global http_proxy` returns `:8080` +- SSL unpinning active: `logcat | grep ZygiskSSL` shows "SSL UNPINNING" +- Burp CA installed: `ls /system/etc/security/cacerts/.0` exists +- Traffic visible in Burp Suite HTTP history + +## Troubleshooting + +| Symptom | Cause | Fix | +|---------|-------|-----| +| No traffic in Burp | Proxy not set | Check `settings get global http_proxy` | +| App shows SSL error | Cert not installed | Verify cert in system store, reboot | +| SSL unpinning not working | Config not loaded | Force-stop app, check config.json syntax | +| ADB connection refused | BlueStacks ADB disabled | Enable in BlueStacks Settings → Advanced | +| Wrong cert hash | Using wrong openssl flag | Use `subject_hash_old` not `subject_hash` | + +## Notes + +- BlueStacks runs arm64-v8a, so Zygisk modules need arm64 support +- The trustusercerts module copies certs at boot via Magisk overlay +- System partition is read-only; use Magisk modules instead of direct mounting +- Burp cert hash is typically `9a5ba575` but verify for your instance +- Some apps may use additional protections (root detection, Frida detection) + +## Quick Reference + +```bash +# Set proxy +adb shell settings put global http_proxy :8080 + +# Disable proxy +adb shell settings put global http_proxy :0 + +# Check SSL unpinning logs +adb shell "logcat -d | grep -i ZygiskSSL" + +# Force restart app +adb shell am force-stop && adb shell monkey -p -c android.intent.category.LAUNCHER 1 +``` + +## References +- [Zygisk-SSL-Unpinning](https://github.com/m0szy/Zygisk-SSL-Unpinning) +- [MagiskTrustUserCerts](https://github.com/NVISOsecurity/MagiskTrustUserCerts) +- [Burp Suite Documentation](https://portswigger.net/burp/documentation) diff --git a/.claude/skills/claudeception b/.claude/skills/claudeception new file mode 160000 index 00000000..7d7f5915 --- /dev/null +++ b/.claude/skills/claudeception @@ -0,0 +1 @@ +Subproject commit 7d7f5915f90db26e1a3fc52db6ac2e68d6d705a2 diff --git a/.claude/skills/fastapi-svelte-gpu-webui/SKILL.md b/.claude/skills/fastapi-svelte-gpu-webui/SKILL.md new file mode 100644 index 00000000..1a223169 --- /dev/null +++ b/.claude/skills/fastapi-svelte-gpu-webui/SKILL.md @@ -0,0 +1,310 @@ +--- +name: fastapi-svelte-gpu-webui +description: | + Pattern for building web UIs for GPU-based CLI tools. Use when: + (1) Wrapping a command-line tool with a web interface, (2) Building job queue + systems for long-running GPU tasks, (3) Creating file upload/download workflows, + (4) Need real-time progress updates via WebSocket, (5) Deploying to Kubernetes + with GPU scheduling. Covers FastAPI backend, Svelte 5 frontend, NFS storage, + and Terraform deployment. +author: Claude Code +version: 1.0.0 +date: 2025-01-31 +--- + +# FastAPI + Svelte GPU WebUI Pattern + +## Problem +Many powerful tools are command-line only, making them inaccessible to non-technical +users. Building a web UI requires handling file uploads, job queuing, progress tracking, +and GPU resource scheduling. + +## Context / Trigger Conditions +- You have a CLI tool that does heavy processing (ML inference, media conversion, etc.) +- Want to add a web interface for easier access +- Need to track long-running job progress +- Deploying to Kubernetes with GPU nodes +- Files need to persist across pod restarts (NFS storage) + +## Solution Overview + +### Directory Structure +``` +project-web/ +├── backend/ +│ ├── main.py # FastAPI app +│ ├── api/ +│ │ ├── __init__.py +│ │ └── routes.py # REST endpoints +│ ├── services/ +│ │ ├── __init__.py +│ │ └── converter.py # CLI wrapper + job manager +│ ├── models/ +│ │ ├── __init__.py +│ │ └── schemas.py # Pydantic models +│ └── requirements.txt +├── frontend/ +│ ├── src/ +│ │ ├── App.svelte +│ │ ├── lib/ +│ │ │ ├── FileUpload.svelte +│ │ │ ├── JobsList.svelte +│ │ │ └── ProgressBar.svelte +│ │ └── stores/ +│ │ └── jobs.js +│ ├── package.json +│ └── vite.config.js +├── Dockerfile +└── README.md +``` + +### Backend: Job Manager Pattern +```python +# services/converter.py +import asyncio +import uuid +from datetime import datetime +from pathlib import Path +from typing import Optional, Callable +import subprocess + +class Job: + id: str + filename: str + status: str # pending, processing, completed, failed + progress: float + created_at: datetime + output_file: Optional[str] + error: Optional[str] + +class JobManager: + def __init__(self, storage_path: str = "/mnt"): + self.storage_path = Path(storage_path) + self.jobs: dict[str, Job] = {} + self.progress_callbacks: dict[str, list[Callable]] = {} + + def create_job(self, filename: str, **options) -> Job: + job_id = str(uuid.uuid4()) + job = Job( + id=job_id, + filename=filename, + status="pending", + progress=0.0, + created_at=datetime.now(), + **options + ) + self.jobs[job_id] = job + return job + + async def run_conversion(self, job_id: str): + job = self.jobs[job_id] + job.status = "processing" + + input_path = self.storage_path / "uploads" / job.filename + output_dir = self.storage_path / "outputs" / job_id + output_dir.mkdir(parents=True, exist_ok=True) + + # Build command for CLI tool + cmd = [ + "/path/to/cli-tool", + str(input_path), + "-o", str(output_dir), + # Add other options... + ] + + # Run with output capture for progress parsing + process = await asyncio.create_subprocess_exec( + *cmd, + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + + # Parse output for progress updates + async def read_output(stream): + while True: + line = await stream.readline() + if not line: + break + line_str = line.decode().strip() + # Parse progress from CLI output + if "%" in line_str: + # Extract and update progress + self.update_progress(job_id, parsed_progress) + + await asyncio.gather( + read_output(process.stdout), + read_output(process.stderr) + ) + + returncode = await process.wait() + + if returncode == 0: + output_files = list(output_dir.glob("*.m4b")) + if output_files: + job.output_file = output_files[0].name + job.status = "completed" + else: + job.status = "failed" + job.error = f"Exit code {returncode}" + +job_manager = JobManager() +``` + +### Backend: API Routes +```python +# api/routes.py +from fastapi import APIRouter, UploadFile, File, HTTPException +from fastapi.responses import FileResponse +from pathlib import Path +import shutil +import asyncio + +router = APIRouter(prefix="/api") + +@router.post("/upload") +async def upload_file(file: UploadFile = File(...)): + upload_dir = Path("/mnt/uploads") + upload_dir.mkdir(parents=True, exist_ok=True) + file_path = upload_dir / file.filename + + with file_path.open("wb") as buffer: + shutil.copyfileobj(file.file, buffer) + + return {"filename": file.filename, "size": file_path.stat().st_size} + +@router.post("/jobs") +async def create_job(request: JobCreate): + job = job_manager.create_job(filename=request.filename, ...) + asyncio.create_task(job_manager.run_conversion(job.id)) + return job + +@router.get("/jobs") +async def list_jobs(): + return job_manager.get_all_jobs() + +@router.get("/jobs/{job_id}/download") +async def download_job(job_id: str): + job = job_manager.get_job(job_id) + if not job or job.status != "completed": + raise HTTPException(404) + output_path = Path("/mnt/outputs") / job_id / job.output_file + return FileResponse(output_path, filename=job.output_file) +``` + +### Frontend: Svelte 5 Components +```svelte + + + +
{ e.preventDefault(); dragOver = true; }} + ondragleave={() => dragOver = false} + ondrop={(e) => { e.preventDefault(); handleUpload(e.dataTransfer.files[0]); }}> + Drop file here +
+``` + +### Dockerfile +```dockerfile +FROM python:3.12-slim + +# Install Node for frontend build +RUN apt-get update && apt-get install -y nodejs npm + +# Build frontend +COPY frontend/ /app/frontend/ +WORKDIR /app/frontend +RUN npm install && npm run build + +# Install backend +COPY backend/ /app/backend/ +WORKDIR /app/backend +RUN pip install -r requirements.txt + +# Serve static files from FastAPI +EXPOSE 8000 +CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +``` + +### Terraform Deployment (GPU) +```hcl +resource "kubernetes_deployment" "myapp" { + spec { + template { + spec { + node_selector = { "gpu" : "true" } + + toleration { + key = "nvidia.com/gpu" + operator = "Equal" + value = "true" + effect = "NoSchedule" + } + + container { + image = "myregistry/myapp@sha256:..." + name = "myapp" + + resources { + limits = { "nvidia.com/gpu" = "1" } + } + + volume_mount { + name = "data" + mount_path = "/mnt" + } + } + + volume { + name = "data" + nfs { + server = "10.0.10.15" + path = "/mnt/main/myapp" + } + } + } + } + } +} +``` + +## Verification +1. Upload a file via the UI +2. Start a conversion job +3. Watch progress update in real-time +4. Download the completed file +5. Verify files persist across pod restarts + +## Notes +- Use image digest for reliable deployments (see `k8s-docker-registry-cache-bypass` skill) +- NFS storage persists across pod restarts +- GPU node taints require matching tolerations +- Consider adding job persistence (database) for production use +- WebSocket can provide smoother progress updates than polling + +## See Also +- `k8s-docker-registry-cache-bypass` - Fixing image cache issues +- `k8s-gpu-no-nvidia-devices` - GPU device troubleshooting +- `python-filename-sanitization` - Secure file handling diff --git a/.claude/skills/home-assistant/SKILL.md b/.claude/skills/home-assistant/SKILL.md new file mode 100644 index 00000000..b20e06f3 --- /dev/null +++ b/.claude/skills/home-assistant/SKILL.md @@ -0,0 +1,191 @@ +--- +name: home-assistant +description: | + Control Home Assistant smart home devices and automations. Use when: + (1) User asks to turn on/off lights, switches, or devices, + (2) User asks about the state of sensors, devices, or entities, + (3) User says "turn on the lights", "set temperature", "lock the door", + (4) User asks to run a scene or script, + (5) User asks "what devices are on?" or "is the door locked?", + (6) User mentions smart home, IoT, or home automation. + Always use Home Assistant for smart home control. +author: Claude Code +version: 1.0.0 +date: 2025-01-25 +--- + +# Home Assistant Control + +## Problem +Need to control smart home devices, check sensor states, or run automations via Home Assistant. + +## Context / Trigger Conditions +- User asks to control lights, switches, covers, climate, etc. +- User asks about device states ("is the light on?", "what's the temperature?") +- User wants to run a scene or script +- User mentions turning things on/off +- User asks about smart home devices + +## Prerequisites +- Remote executor must be running (Python commands require remote execution) +- The `~/.venvs/claude` virtualenv must have `requests` package installed +- Environment variables `HOME_ASSISTANT_URL` and `HOME_ASSISTANT_TOKEN` must be set in the venv activation script + +## Solution + +### Script Location +``` +/home/wizard/code/infra/.claude/home-assistant.py +``` + +### Execution Pattern (CRITICAL) +Always use the remote executor with venv activation to get environment variables: + +```bash +source ~/.venvs/claude/bin/activate && cd /home/wizard/code/infra && python .claude/home-assistant.py [command] [options] +``` + +### Available Commands + +#### List Entities +```bash +# List all entities +python .claude/home-assistant.py list + +# List by domain +python .claude/home-assistant.py list --domain light +python .claude/home-assistant.py list --domain switch +python .claude/home-assistant.py list --domain sensor +python .claude/home-assistant.py list --domain climate +python .claude/home-assistant.py list --domain cover + +# JSON output +python .claude/home-assistant.py list --json +``` + +#### Search Entities +```bash +# Search by name or ID +python .claude/home-assistant.py search "living room" +python .claude/home-assistant.py search "temperature" +python .claude/home-assistant.py search "door" +``` + +#### Get Entity State +```bash +python .claude/home-assistant.py state light.living_room +python .claude/home-assistant.py state sensor.temperature +python .claude/home-assistant.py state --json light.living_room +``` + +#### Control Entities +```bash +# Turn on/off +python .claude/home-assistant.py on light.living_room +python .claude/home-assistant.py off switch.tv +python .claude/home-assistant.py toggle light.bedroom + +# Set values +python .claude/home-assistant.py set light.living_room 75 # brightness % +python .claude/home-assistant.py set climate.thermostat 22 # temperature +python .claude/home-assistant.py set cover.blinds 50 # position % +python .claude/home-assistant.py set input_number.volume 80 # numeric value +python .claude/home-assistant.py set input_boolean.away_mode on # boolean +python .claude/home-assistant.py set input_select.mode "Night" # select option +``` + +#### Run Scenes and Scripts +```bash +# Activate a scene +python .claude/home-assistant.py scene movie_night +python .claude/home-assistant.py scene scene.good_morning + +# Run a script +python .claude/home-assistant.py script bedtime_routine +python .claude/home-assistant.py script script.welcome_home +``` + +#### Call Any Service +```bash +# Generic service call +python .claude/home-assistant.py service light turn_on --entity light.kitchen --data '{"brightness": 255}' +python .claude/home-assistant.py service climate set_hvac_mode --entity climate.living_room --data '{"hvac_mode": "heat"}' +python .claude/home-assistant.py service media_player play_media --entity media_player.tv --data '{"media_content_id": "...", "media_content_type": "video"}' +``` + +#### List Services +```bash +# List all available services +python .claude/home-assistant.py services + +# Filter by domain +python .claude/home-assistant.py services --domain light +python .claude/home-assistant.py services --domain climate +``` + +#### Send Notifications +```bash +python .claude/home-assistant.py notify "Door left open!" +python .claude/home-assistant.py notify "Motion detected" --title "Security Alert" +python .claude/home-assistant.py notify "Hello" --target notify.mobile_app +``` + +## Complete Example via Remote Executor + +To turn on the living room light: + +1. Write command to remote executor: +```bash +# Using Write tool to write to cmd_input.txt: +source ~/.venvs/claude/bin/activate && cd /home/wizard/code/infra && python .claude/home-assistant.py on light.living_room +``` + +2. Wait and check status: +```bash +sleep 2 && cat /System/Volumes/Data/mnt/wizard/code/infra/.claude/cmd_status.txt +``` + +3. Read output: +```bash +cat /System/Volumes/Data/mnt/wizard/code/infra/.claude/cmd_output.txt +``` + +## Common Entity Domains + +| Domain | Description | Common Actions | +|--------|-------------|----------------| +| `light` | Lights | on, off, toggle, set brightness | +| `switch` | Switches | on, off, toggle | +| `sensor` | Sensors | state (read-only) | +| `binary_sensor` | Binary sensors | state (read-only) | +| `climate` | Thermostats | set temperature, set mode | +| `cover` | Blinds/covers | open, close, set position | +| `lock` | Locks | lock, unlock | +| `media_player` | Media devices | play, pause, volume | +| `input_boolean` | Helper toggles | on, off | +| `input_number` | Helper numbers | set value | +| `input_select` | Helper dropdowns | select option | +| `script` | Scripts | run | +| `scene` | Scenes | activate | +| `automation` | Automations | trigger, on, off | + +## Verification +- Commands print confirmation message on success +- Use `state` command to verify entity changed +- Exit code 0 = success, 1 = error + +## Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `HOME_ASSISTANT_URL and HOME_ASSISTANT_TOKEN must be set` | Didn't source venv activation | Use `source ~/.venvs/claude/bin/activate && python ...` | +| `404 Not Found` | Entity doesn't exist | Use `search` command to find correct entity ID | +| `401 Unauthorized` | Token invalid/expired | Generate new long-lived token in HA | +| `Connection refused` | HA not reachable | Check URL and network connectivity | + +## Notes + +1. **Entity IDs are case-sensitive** - use `search` to find exact IDs +2. **Token must have sufficient permissions** - ensure token has access to all entities +3. **Some entities require specific data** - use `services` command to see required fields +4. **HA URL**: `https://ha-london.viktorbarzin.me` diff --git a/.claude/skills/k8s-docker-registry-cache-bypass/SKILL.md b/.claude/skills/k8s-docker-registry-cache-bypass/SKILL.md new file mode 100644 index 00000000..85a761f4 --- /dev/null +++ b/.claude/skills/k8s-docker-registry-cache-bypass/SKILL.md @@ -0,0 +1,110 @@ +--- +name: k8s-docker-registry-cache-bypass +description: | + Fix for Kubernetes pods running old Docker images despite pushing new versions. + Use when: (1) kubectl shows correct image tag but container runs old code, + (2) Local registry mirror caches stale images, (3) imagePullPolicy: Always + doesn't force fresh pulls, (4) containerd config has mirror that intercepts pulls. + Solution: Use image digest instead of tag to bypass cache entirely. +author: Claude Code +version: 1.0.0 +date: 2025-01-31 +--- + +# Kubernetes Docker Registry Cache Bypass + +## Problem +Kubernetes pods continue running old Docker images even after pushing new versions with +the same tag (e.g., `:latest`). This happens when a local registry mirror caches images +and serves stale versions, ignoring `imagePullPolicy: Always`. + +## Context / Trigger Conditions +- Pod is running but application code is outdated +- `docker push` succeeded with new layers +- `kubectl describe pod` shows correct image tag +- Cluster has a local registry mirror configured (e.g., in containerd config) +- `imagePullPolicy: Always` doesn't fix the issue +- Nodes configured with registry mirrors at `/etc/containerd/certs.d/` or similar + +## Solution + +### 1. Get the image digest after pushing +```bash +docker push viktorbarzin/myimage:latest +# Output includes: latest: digest: sha256:abc123... size: 856 +``` + +### 2. Use digest instead of tag in deployment +```hcl +# Terraform +container { + # Use digest to bypass local registry cache + image = "docker.io/viktorbarzin/myimage@sha256:abc123..." + image_pull_policy = "Always" + name = "myimage" +} +``` + +```yaml +# Kubernetes YAML +containers: + - name: myimage + image: docker.io/viktorbarzin/myimage@sha256:abc123... + imagePullPolicy: Always +``` + +### 3. Apply and restart +```bash +terraform apply -target=module.kubernetes_cluster.module.myservice +kubectl rollout restart deployment/myservice -n mynamespace +``` + +## Why This Works +- Registry mirrors match by tag, not digest +- When you specify a digest, the node must fetch that exact manifest +- The mirror may not have the digest cached, forcing a pull from upstream +- Even if cached, the digest guarantees the exact image version + +## Verification +```bash +# Check the pod is using the new image +kubectl get pod -n mynamespace -o jsonpath='{.items[*].spec.containers[*].image}' + +# Verify application behavior reflects new code +kubectl exec -n mynamespace deploy/myservice -- +``` + +## Example + +Before (problematic): +```hcl +image = "docker.io/viktorbarzin/audiblez-web:latest" +``` + +After (fixed): +```hcl +image = "docker.io/viktorbarzin/audiblez-web@sha256:4d0e2c839555e2229bc91a0b1273569bac88529e8b3c3cadad3c3cf9d865fa29" +``` + +## Notes +- You must update the digest each time you push a new image +- Consider automating digest extraction in CI/CD pipelines +- This is a workaround; ideally fix the registry mirror configuration +- To find your registry mirror config: `cat /etc/containerd/config.toml` on nodes +- Common mirror locations: `/etc/containerd/certs.d/docker.io/hosts.toml` + +## Diagnosing Registry Mirror Issues +```bash +# On a k8s node, check containerd config +cat /etc/containerd/config.toml | grep -A5 mirrors + +# Check if mirror is intercepting +crictl pull docker.io/library/alpine:latest --debug 2>&1 | grep -i mirror + +# List cached images on node +crictl images | grep myimage +``` + +## References +- [Kubernetes imagePullPolicy documentation](https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy) +- [containerd registry configuration](https://github.com/containerd/containerd/blob/main/docs/hosts.md) diff --git a/.claude/skills/k8s-gpu-no-nvidia-devices/SKILL.md b/.claude/skills/k8s-gpu-no-nvidia-devices/SKILL.md new file mode 100644 index 00000000..1d31b060 --- /dev/null +++ b/.claude/skills/k8s-gpu-no-nvidia-devices/SKILL.md @@ -0,0 +1,147 @@ +--- +name: k8s-gpu-no-nvidia-devices +description: | + Fix for Kubernetes GPU pods showing "CUDA not supported" or no /dev/nvidia* devices + despite nvidia.com/gpu resource allocation. Use when: (1) container runs but torch.cuda.is_available() + returns False, (2) ls /dev/nvidia* shows "no matches found", (3) nvidia-smi fails inside pod + but works on host, (4) PyTorch/TensorFlow falls back to CPU despite GPU allocation. + Covers NVIDIA device plugin, time-slicing, and container runtime issues. +author: Claude Code +version: 1.0.0 +date: 2026-01-27 +--- + +# Kubernetes GPU Pod - No NVIDIA Devices Found + +## Problem + +A Kubernetes pod requests GPU resources (`nvidia.com/gpu: 1`) and schedules on a GPU node, +but inside the container there are no NVIDIA devices visible. The application falls back +to CPU with messages like "CUDA not supported by the Torch installed!" despite running +in a CUDA-enabled container image. + +## Context / Trigger Conditions + +- Pod shows `Running` status and is on a node with `gpu=true` label +- `kubectl describe pod` shows GPU limit/request is satisfied +- Inside container: `ls /dev/nvidia*` returns "no matches found" +- Inside container: `nvidia-smi` fails or command not found +- Application logs show: "CUDA not supported", "Switching to CPU", "torch.cuda.is_available() = False" +- On the host node: `nvidia-smi` works fine + +## Solution + +### Step 1: Verify GPU Availability + +Check if other pods are consuming the GPU: + +```bash +# List all pods using GPU resources +kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].resources.limits."nvidia.com/gpu" != null) | "\(.metadata.namespace)/\(.metadata.name)"' + +# Check NVIDIA device plugin pods +kubectl get pods -n nvidia -l app=nvidia-device-plugin +kubectl logs -n nvidia -l app=nvidia-device-plugin --tail=50 +``` + +### Step 2: Free GPU Resources + +If another workload is using the GPU, unload it: + +```bash +# For Ollama specifically +kubectl exec -n ollama deployment/ollama -- ollama stop + +# Or scale down the conflicting deployment +kubectl scale deployment/ -n --replicas=0 +``` + +### Step 3: Restart the Affected Pod + +After freeing GPU resources, restart the pod to get fresh device allocation: + +```bash +kubectl rollout restart deployment/ -n + +# Or delete the pod directly +kubectl delete pod -n +``` + +### Step 4: Verify GPU Access + +```bash +# Check devices are now visible +kubectl exec -n deployment/ -- ls -la /dev/nvidia* + +# Test nvidia-smi +kubectl exec -n deployment/ -- nvidia-smi + +# Test PyTorch CUDA +kubectl exec -n deployment/ -- python3 -c "import torch; print('CUDA:', torch.cuda.is_available())" +``` + +## Verification + +After restart, you should see: + +``` +/dev/nvidia0 +/dev/nvidiactl +/dev/nvidia-uvm +/dev/nvidia-uvm-tools +``` + +And `nvidia-smi` should show the GPU with your container process. + +## Example + +```bash +# Problem: ebook2audiobook shows "CUDA not supported" +$ kubectl exec -n ebook2audiobook deployment/ebook2audiobook -- ls /dev/nvidia* +zsh:1: no matches found: /dev/nvidia* + +# Solution: Unload Ollama model holding the GPU +$ kubectl exec -n ollama deployment/ollama -- ollama ps +NAME SIZE PROCESSOR +qwen2.5:14b 10 GB 33%/67% CPU/GPU + +$ kubectl exec -n ollama deployment/ollama -- ollama stop qwen2.5:14b + +# Restart the affected pod +$ kubectl rollout restart deployment/ebook2audiobook -n ebook2audiobook + +# Verify +$ kubectl exec -n ebook2audiobook deployment/ebook2audiobook -- nvidia-smi +# Should now show the Tesla T4 GPU +``` + +## Notes + +- **GPU Time-Slicing**: If using NVIDIA GPU time-slicing (configured in GPU Operator), + multiple pods can share a GPU. However, device injection still requires proper timing. + +- **Pod Scheduling Order**: Pods that start while GPU is fully allocated may not get + devices injected even after GPU becomes available - a restart is required. + +- **Container Runtime**: The NVIDIA Container Toolkit must be properly configured. + Issues can arise from: + - cgroup driver mismatch (systemd vs cgroupfs) + - Container updates causing device loss + - SELinux blocking device access + +- **Image Compatibility**: The container image must have CUDA libraries matching the + driver version. Check with `nvidia-smi` on host for driver version. + +- **This Cluster**: Uses NVIDIA GPU Operator with time-slicing (20 replicas per GPU). + GPU node is `k8s-node1` with Tesla T4. + +## See Also + +- Check GPU Operator status: `kubectl get pods -n nvidia` +- View time-slicing config: `kubectl get configmap -n nvidia time-slicing-config -o yaml` + +## References + +- [NVIDIA Container Toolkit Troubleshooting](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html) +- [Kubernetes GPU Device Plugin](https://github.com/NVIDIA/k8s-device-plugin) +- [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html) diff --git a/.claude/skills/k8s-nfs-mount-troubleshooting/SKILL.md b/.claude/skills/k8s-nfs-mount-troubleshooting/SKILL.md new file mode 100644 index 00000000..da4764dc --- /dev/null +++ b/.claude/skills/k8s-nfs-mount-troubleshooting/SKILL.md @@ -0,0 +1,96 @@ +--- +name: k8s-nfs-mount-troubleshooting +description: | + Debug Kubernetes NFS volume mount failures. Use when: (1) Pod stuck in ContainerCreating + for extended time, (2) kubectl describe shows "MountVolume.SetUp failed" with NFS errors, + (3) Error message shows "Protocol not supported" or "mount.nfs: access denied", + (4) NFS volume defined in pod spec but container won't start. Common root cause is + missing NFS export on the server, not a protocol issue. +author: Claude Code +version: 1.0.0 +date: 2026-01-28 +--- + +# Kubernetes NFS Mount Troubleshooting + +## Problem +Pods with NFS volumes get stuck in `ContainerCreating` state indefinitely. The error +messages from `kubectl describe pod` can be misleading, showing protocol or permission +errors when the actual issue is the NFS export doesn't exist. + +## Context / Trigger Conditions +- Pod status shows `ContainerCreating` for more than 1-2 minutes +- `kubectl describe pod` shows events like: + - `MountVolume.SetUp failed for volume "data" : mount failed: exit status 32` + - `mount.nfs: Protocol not supported` + - `mount.nfs: access denied by server` +- Pod spec includes an NFS volume mount +- Other pods on the same node work fine + +## Solution + +### Step 1: Identify the NFS path +```bash +kubectl describe pod -n | grep -A5 "Volumes:" +``` +Look for the NFS server and path (e.g., `10.0.10.15:/mnt/main/myservice`) + +### Step 2: Verify the export exists on NFS server +SSH to the NFS server and check: +```bash +ssh root@ "ls -la /mnt/main/myservice" +``` + +### Step 3: If directory doesn't exist, create it +```bash +ssh root@ "mkdir -p /mnt/main/myservice && chmod 777 /mnt/main/myservice" +``` + +### Step 4: Add to NFS exports (TrueNAS specific) +For TrueNAS, add the path to the NFS share configuration: +1. Add directory to `scripts/nfs_directories.txt` +2. Run `scripts/nfs_exports.sh` to update the share via API + +### Step 5: Restart the pod +```bash +kubectl delete pod -n -l app= +``` +The deployment will create a new pod that should now mount successfully. + +## Verification +```bash +kubectl get pods -n +# Should show 1/1 Running instead of 0/1 ContainerCreating + +kubectl exec -n -- ls -la /app/data +# Should show the mounted directory contents +``` + +## Example +**Symptom:** +``` +Events: + Warning FailedMount 55s (x13 over 11m) kubelet MountVolume.SetUp failed for volume "data" : mount failed: exit status 32 + Mounting command: mount + Mounting arguments: -t nfs 10.0.10.15:/mnt/main/resume /var/lib/kubelet/pods/.../data + Output: mount.nfs: Protocol not supported +``` + +**Root Cause:** The directory `/mnt/main/resume` didn't exist on the TrueNAS server. + +**Fix:** +```bash +ssh root@10.0.10.15 'mkdir -p /mnt/main/resume && chmod 777 /mnt/main/resume' +# Then add to NFS exports and restart pod +``` + +## Notes +- The "Protocol not supported" error is misleading - it often means the export path doesn't exist +- Always check the NFS server first before investigating protocol/firewall issues +- For TrueNAS, the NFS share must be updated via API/UI after creating new directories +- NFSv3 vs NFSv4 issues are rare in modern setups; missing paths are more common +- Check that the NFS client packages are installed on Kubernetes nodes if this is a new cluster + +## See Also +- TrueNAS NFS configuration documentation +- Kubernetes NFS volume documentation diff --git a/.claude/skills/nextcloud-calendar/SKILL.md b/.claude/skills/nextcloud-calendar/SKILL.md new file mode 100644 index 00000000..5560046e --- /dev/null +++ b/.claude/skills/nextcloud-calendar/SKILL.md @@ -0,0 +1,142 @@ +--- +name: nextcloud-calendar +description: | + Create, list, and query calendar events in Nextcloud via CalDAV. Use when: + (1) User asks to create a calendar event, (2) User asks what's on their calendar, + (3) User says "add to calendar" or "schedule", (4) User asks about upcoming events. + Always use Nextcloud calendar unless user specifies otherwise. +author: Claude Code +version: 1.0.0 +date: 2025-01-25 +--- + +# Nextcloud Calendar Management + +## Problem +Need to create, query, or manage calendar events in the user's Nextcloud calendar. + +## Context / Trigger Conditions +- User asks to create/add a calendar event +- User asks "what's on my calendar?" or similar +- User mentions scheduling something +- User says "remind me" with a date (create calendar event) +- Default calendar is always Nextcloud unless otherwise specified + +## Prerequisites +- Remote executor must be running (Python commands require remote execution) +- The `~/.venvs/claude` virtualenv must have `caldav` and `icalendar` packages installed +- Environment variables `NEXTCLOUD_USER` and `NEXTCLOUD_APP_PASSWORD` must be set in the venv activation script + +## Solution + +### Script Location +``` +/home/wizard/code/infra/.claude/calendar-query.py +``` + +### Execution Pattern (CRITICAL) +Always use the remote executor with venv activation to get environment variables: + +```bash +source ~/.venvs/claude/bin/activate && cd /home/wizard/code/infra && python .claude/calendar-query.py [command] [options] +``` + +### Available Commands + +#### List Calendars +```bash +python .claude/calendar-query.py list +``` + +#### Query Events +```bash +# Today's events +python .claude/calendar-query.py today + +# Tomorrow's events +python .claude/calendar-query.py tomorrow + +# This week +python .claude/calendar-query.py week + +# This month +python .claude/calendar-query.py month + +# Custom date range +python .claude/calendar-query.py events --days 14 +python .claude/calendar-query.py events --date 2026-04-10 + +# From specific calendar +python .claude/calendar-query.py today --calendar "Work" +``` + +#### Create Events +```bash +# All-day event (single day) +python .claude/calendar-query.py create --title "Doctor appointment" --start "2026-03-15" --all-day + +# All-day event (multi-day) - end date is EXCLUSIVE +# For April 10-13, use end date April 14 +python .claude/calendar-query.py create --title "Vacation" --start "2026-04-10" --end "2026-04-14" --all-day + +# Timed event +python .claude/calendar-query.py create --title "Meeting" --start "2026-03-15 14:00" --end "2026-03-15 15:00" + +# With location and description +python .claude/calendar-query.py create --title "Lunch" --start "tomorrow 12:00" --location "Cafe" --description "Team lunch" + +# Relative dates work +python .claude/calendar-query.py create --title "Call" --start "today 16:00" +python .claude/calendar-query.py create --title "Review" --start "tomorrow 10:00" +``` + +### Output Formats +```bash +# JSON output (for parsing) +python .claude/calendar-query.py today --json + +# Text output (default, human-readable) +python .claude/calendar-query.py week +``` + +## Complete Example via Remote Executor + +To create an event "Team offsite" from March 20-22, 2026: + +1. Write command to remote executor: +```bash +echo 'source ~/.venvs/claude/bin/activate && cd /home/wizard/code/infra && python .claude/calendar-query.py create --title "Team offsite" --start "2026-03-20" --end "2026-03-23" --all-day' > /System/Volumes/Data/mnt/wizard/code/infra/.claude/cmd_input.txt +``` + +2. Wait and check status: +```bash +sleep 3 && cat /System/Volumes/Data/mnt/wizard/code/infra/.claude/cmd_status.txt +``` + +3. Read output: +```bash +cat /System/Volumes/Data/mnt/wizard/code/infra/.claude/cmd_output.txt +``` + +## Important Notes + +1. **End dates are exclusive** for all-day events (CalDAV standard). To create an event spanning April 10-13, set end to April 14. + +2. **Must source venv activation** - Using `~/.venvs/claude/bin/python` directly won't work because environment variables (`NEXTCLOUD_USER`, `NEXTCLOUD_APP_PASSWORD`) are set in the activation script. + +3. **No delete/update commands** - The script currently only supports create and query. To modify events, user must do it manually in Nextcloud. + +4. **Default calendar** is "Personal" - use `--calendar` flag for others. + +## Verification +- For queries: Output shows formatted event list +- For creates: Output shows "Event created: [title]" with calendar name and start date +- Exit code 0 = success, 1 = error (check output for details) + +## Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `NEXTCLOUD_USER and NEXTCLOUD_APP_PASSWORD must be set` | Didn't source venv activation | Use `source ~/.venvs/claude/bin/activate && python ...` | +| `Required packages not installed` | caldav/icalendar missing | Run `~/.venvs/claude/bin/pip install caldav icalendar` | +| `Calendar 'X' not found` | Wrong calendar name | Run `list` command to see available calendars | diff --git a/.claude/skills/python-filename-sanitization/SKILL.md b/.claude/skills/python-filename-sanitization/SKILL.md new file mode 100644 index 00000000..c422a735 --- /dev/null +++ b/.claude/skills/python-filename-sanitization/SKILL.md @@ -0,0 +1,182 @@ +--- +name: python-filename-sanitization +description: | + Secure filename sanitization pattern for Python web applications. Use when: + (1) Accepting user-provided filenames for file operations, (2) Building file + rename/upload functionality, (3) Preventing path traversal attacks (../../../etc/passwd), + (4) Preventing shell injection through filenames, (5) FastAPI/Flask file handling. + Provides regex-based whitelist approach with pathlib for safe file operations. +author: Claude Code +version: 1.0.0 +date: 2025-01-31 +--- + +# Python Filename Sanitization + +## Problem +User-provided filenames can contain malicious characters that enable path traversal +attacks, shell injection, or filesystem corruption. Direct use of user input in +file paths is a security vulnerability. + +## Context / Trigger Conditions +- Building file upload, rename, or download functionality +- User can specify filenames via API or form input +- Files are stored on server filesystem +- Need to prevent: `../`, shell metacharacters, null bytes, etc. + +## Solution + +### Complete Sanitization Function +```python +import re +from pathlib import Path + +def sanitize_filename(filename: str, max_length: int = 200) -> str: + """ + Sanitize a filename to prevent path traversal and shell injection. + Only allows alphanumeric characters, spaces, hyphens, underscores, + parentheses, and dots. + """ + if not filename: + raise ValueError("Filename cannot be empty") + + # Remove any path components (prevent path traversal) + filename = Path(filename).name + + # Only allow safe characters: alphanumeric, space, hyphen, underscore, parentheses, dot + # This regex removes anything that isn't in the allowed set + safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename) + + # Collapse multiple spaces/dots + safe_filename = re.sub(r'\s+', ' ', safe_filename) + safe_filename = re.sub(r'\.+', '.', safe_filename) + + # Strip leading/trailing whitespace and dots + safe_filename = safe_filename.strip(' .') + + # Limit length + if len(safe_filename) > max_length: + safe_filename = safe_filename[:max_length] + + if not safe_filename: + raise ValueError("Filename contains no valid characters") + + return safe_filename +``` + +### FastAPI Integration Example +```python +from fastapi import APIRouter, HTTPException +from pydantic import BaseModel +from pathlib import Path + +class RenameRequest(BaseModel): + new_name: str + +@router.patch("/files/{file_id}/rename") +async def rename_file(file_id: str, request: RenameRequest): + """Rename a file with sanitized input.""" + file_dir = Path("/data/files") / file_id + + if not file_dir.exists(): + raise HTTPException(status_code=404, detail="File not found") + + # Find existing file + files = list(file_dir.glob("*")) + if not files: + raise HTTPException(status_code=404, detail="No file found") + + current_file = files[0] + current_extension = current_file.suffix + + # Sanitize the new name + try: + safe_name = sanitize_filename(request.new_name) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) + + # Preserve original extension + if not safe_name.lower().endswith(current_extension.lower()): + safe_name = safe_name + current_extension + + # Create new path (same directory, new filename) + new_file = file_dir / safe_name + + # Check for conflicts + if new_file.exists() and new_file != current_file: + raise HTTPException(status_code=400, detail="A file with that name already exists") + + # Rename using pathlib (no shell commands!) + current_file.rename(new_file) + + return {"status": "renamed", "new_filename": safe_name} +``` + +## Key Security Principles + +### 1. Whitelist, Don't Blacklist +```python +# BAD: Trying to block dangerous characters +filename = filename.replace('../', '').replace('\x00', '') + +# GOOD: Only allow known-safe characters +safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename) +``` + +### 2. Use pathlib, Not Shell Commands +```python +# BAD: Shell command (vulnerable to injection) +os.system(f'mv "{old_path}" "{new_path}"') + +# GOOD: Pure Python (no shell) +old_path.rename(new_path) +``` + +### 3. Extract Basename First +```python +# BAD: User could submit "../../../etc/passwd" +filename = user_input + +# GOOD: Extract just the filename part +filename = Path(user_input).name +``` + +### 4. Validate After Sanitization +```python +# Ensure something remains after sanitization +if not safe_filename: + raise ValueError("Filename contains no valid characters") +``` + +## Verification +```python +# Test cases that should be handled safely +assert sanitize_filename("normal.txt") == "normal.txt" +assert sanitize_filename("../../../etc/passwd") == "etcpasswd" +assert sanitize_filename("file; rm -rf /") == "file rm -rf" +assert sanitize_filename(" spaces .txt") == "spaces.txt" +assert sanitize_filename("$(whoami).txt") == "whoami.txt" + +# Test cases that should raise errors +try: + sanitize_filename("") # Should raise ValueError +except ValueError: + pass + +try: + sanitize_filename("$#@!") # Should raise ValueError (no valid chars) +except ValueError: + pass +``` + +## Notes +- This is intentionally restrictive; expand the regex if you need Unicode support +- For Unicode filenames, consider `unicodedata.normalize('NFKD', ...)` first +- Max length of 200 is conservative; filesystem limits vary (255 bytes typical) +- Always preserve file extensions when renaming to avoid breaking file associations +- Consider adding a UUID prefix for guaranteed uniqueness in upload scenarios + +## References +- [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal) +- [CWE-22: Path Traversal](https://cwe.mitre.org/data/definitions/22.html) +- [Python pathlib documentation](https://docs.python.org/3/library/pathlib.html) diff --git a/.claude/skills/setup-project.md b/.claude/skills/setup-project.md new file mode 100644 index 00000000..92dbfc64 --- /dev/null +++ b/.claude/skills/setup-project.md @@ -0,0 +1,388 @@ +# Setup Project Skill + +**Purpose**: Deploy a new self-hosted service to the Kubernetes cluster from a GitHub repository. + +**When to use**: User provides a GitHub URL or project name and wants to deploy it to the cluster. + +## Workflow + +### 1. Research Phase + +**Input**: GitHub repository URL or project name + +**Actions**: +- Visit the GitHub repository +- Check the README for: + - Official Docker image (Docker Hub, ghcr.io, etc.) + - docker-compose.yml file + - Self-hosting documentation + - Required dependencies (PostgreSQL, MySQL, Redis, etc.) + - Environment variables needed + - Default ports + - Storage requirements + +**Find Docker Image Priority**: +1. Check official documentation for recommended image +2. Look in docker-compose.yml for `image:` directive +3. Check GitHub Container Registry: `ghcr.io//` +4. Check Docker Hub: `/` +5. Check releases page for container images +6. Last resort: Build from Dockerfile (avoid if possible) + +**Extract Configuration**: +- Container port (default port the app listens on) +- Environment variables (DATABASE_URL, REDIS_HOST, SMTP, etc.) +- Volume mounts (what data needs persistence) +- Dependencies (database type, cache, etc.) + +### 2. Database Setup (if needed) + +**If project requires PostgreSQL**: +- User provides database credentials or use pattern: `` user with secure password +- Database will be created in shared `postgresql.dbaas.svc.cluster.local` +- Connection string format: `postgresql://:@postgresql.dbaas.svc.cluster.local:5432/` + +**If project requires MySQL**: +- User provides database credentials +- Database in shared `mysql.dbaas.svc.cluster.local` +- Connection string format: `mysql://:@mysql.dbaas.svc.cluster.local:3306/` + +**If project requires Redis**: +- Use shared Redis: `redis.redis.svc.cluster.local:6379` +- No password required + +**IMPORTANT**: Never create databases yourself - always ask user for credentials to use. + +### 3. Terraform Module Creation + +**Create module directory**: +```bash +mkdir -p modules/kubernetes// +``` + +**Create `modules/kubernetes//main.tf`**: + +```hcl +variable "tls_secret_name" {} +variable "tier" { type = string } +variable "postgresql_password" {} # Only if needed +# Add other variables as needed (smtp_password, api_keys, etc.) + +resource "kubernetes_namespace" "" { + metadata { + name = "" + } +} + +module "tls_secret" { + source = "../setup_tls_secret" + namespace = kubernetes_namespace..metadata[0].name + tls_secret_name = var.tls_secret_name +} + +# If database migrations needed, add init_container +resource "kubernetes_deployment" "" { + metadata { + name = "" + namespace = kubernetes_namespace..metadata[0].name + labels = { + app = "" + tier = var.tier + } + } + spec { + replicas = 1 + selector { + match_labels = { + app = "" + } + } + template { + metadata { + labels = { + app = "" + } + } + spec { + # Init container for migrations (if needed) + # init_container { ... } + + container { + name = "" + image = ":" + + port { + container_port = + } + + # Environment variables + env { + name = "DATABASE_URL" + value = "postgresql://:${var.postgresql_password}@postgresql.dbaas.svc.cluster.local:5432/" + } + # Add other env vars as needed + + # Volume mounts for persistent data + volume_mount { + name = "data" + mount_path = "" + sub_path = "" + } + + resources { + requests = { + memory = "256Mi" + cpu = "100m" + } + limits = { + memory = "2Gi" + cpu = "1" + } + } + + # Health checks (if endpoints exist) + liveness_probe { + http_get { + path = "/health" # or /healthz, /, etc. + port = + } + initial_delay_seconds = 60 + period_seconds = 30 + } + } + + # NFS volume for persistence + volume { + name = "data" + nfs { + server = "10.0.10.15" + path = "/mnt/main/" + } + } + } + } + } +} + +resource "kubernetes_service" "" { + metadata { + name = "" + namespace = kubernetes_namespace..metadata[0].name + labels = { + app = "" + } + } + + spec { + selector = { + app = "" + } + port { + name = "http" + port = 80 + target_port = + } + } +} + +module "ingress" { + source = "../ingress_factory" + namespace = kubernetes_namespace..metadata[0].name + name = "" + tls_secret_name = var.tls_secret_name + # Add extra_annotations if needed (proxy-body-size, timeouts, etc.) +} +``` + +### 4. Update Main Terraform Files + +**Add to `modules/kubernetes/main.tf`**: + +1. Add variable declarations at top: +```hcl +variable "_postgresql_password" { type = string } +``` + +2. Add to appropriate DEFCON level (ask user which level, default to 5): +```hcl +5 : [ + ..., + "" +] +``` + +3. Add module block at bottom: +```hcl +module "" { + source = "./" + for_each = contains(local.active_modules, "") ? { = true } : {} + tls_secret_name = var.tls_secret_name + postgresql_password = var._postgresql_password + tier = local.tiers.aux # or appropriate tier + + depends_on = [null_resource.core_services] +} +``` + +**Add to `main.tf`**: + +1. Add variable: +```hcl +variable "_postgresql_password" { type = string } +``` + +2. Pass to kubernetes_cluster module: +```hcl +module "kubernetes_cluster" { + ... + _postgresql_password = var._postgresql_password +} +``` + +**Update `terraform.tfvars`**: + +1. Add password/credentials: +```hcl +_postgresql_password = "" +``` + +2. Add to Cloudflare DNS (ask user if proxied or non-proxied): +```hcl +cloudflare_non_proxied_names = [ + ..., + "" +] +``` + +### 5. Email/SMTP Configuration (if needed) + +If service needs to send emails: +```hcl +env { + name = "MAILER_HOST" + value = "mailserver.viktorbarzin.me" # Public hostname for TLS +} +env { + name = "MAILER_PORT" + value = "587" +} +env { + name = "MAILER_USER" + value = "info@viktorbarzin.me" +} +env { + name = "MAILER_PASSWORD" + value = var.mailserver_accounts["info@viktorbarzin.me"] # Pass from module +} +``` + +Add to module call: +```hcl +smtp_password = var.mailserver_accounts["info@viktorbarzin.me"] +``` + +### 6. Apply Terraform + +```bash +# Via remote executor +terraform init +terraform apply -target=module.kubernetes_cluster.module. -auto-approve +``` + +### 7. Verification + +```bash +kubectl get pods -n +kubectl logs -n -l app= --tail=50 +``` + +Test URL: `https://.viktorbarzin.me` + +### 8. Commit Changes + +```bash +git add modules/kubernetes// main.tf modules/kubernetes/main.tf terraform.tfvars +git commit -m "Add deployment + +- Deploy as +- Uses +- Ingress at .viktorbarzin.me + +[ci skip]" +``` + +## Common Patterns + +### Init Container for Migrations +```hcl +init_container { + name = "migration" + image = "" + command = ["sh", "-c", ""] + + # Same env vars and volumes as main container +} +``` + +### Dynamic Environment Variables +```hcl +locals { + common_env = [ + { name = "VAR1", value = "value1" }, + { name = "VAR2", value = "value2" }, + ] +} + +dynamic "env" { + for_each = local.common_env + content { + name = env.value.name + value = env.value.value + } +} +``` + +### External URL Configuration +Many apps need their public URL configured: +```hcl +env { + name = "APP_URL" # or PUBLIC_URL, EXTERNAL_URL, etc. + value = "https://.viktorbarzin.me" +} +env { + name = "HTTPS" # or ENABLE_HTTPS, etc. + value = "true" +} +``` + +## Checklist + +- [ ] Find official Docker image or docker-compose +- [ ] Identify dependencies (DB, Redis, etc.) +- [ ] Ask user for database credentials (never create yourself) +- [ ] Create `modules/kubernetes//main.tf` +- [ ] Update `modules/kubernetes/main.tf` (variables, DEFCON level, module block) +- [ ] Update `main.tf` (variable, pass to module) +- [ ] Update `terraform.tfvars` (password, Cloudflare DNS) +- [ ] Run `terraform init` and `terraform apply` +- [ ] Verify pods are running +- [ ] Test the URL +- [ ] Commit changes with `[ci skip]` + +## Questions to Ask User + +1. What DEFCON level should this service be in? (Default: 5) +2. Should Cloudflare proxy this domain? (Default: no, add to non_proxied_names) +3. Does this need email/SMTP? (Configure if yes) +4. What database credentials should I use? (Never create yourself) +5. What tier? (core/cluster/gpu/edge/aux - default: aux) + +## Notes + +- **Always use official documentation** as the source of truth +- **Prefer stable/latest tags** over specific versions for self-hosted +- **Use shared infrastructure**: PostgreSQL at `postgresql.dbaas.svc.cluster.local`, Redis at `redis.redis.svc.cluster.local` +- **NFS storage**: Always at `10.0.10.15:/mnt/main/` +- **Email**: Use `mailserver.viktorbarzin.me` (public hostname) not internal service name +- **Resource limits**: Start conservative, can increase if needed +- **Health checks**: Only add if the app has health endpoints diff --git a/.claude/skills/setup-remote-executor.md b/.claude/skills/setup-remote-executor.md new file mode 100644 index 00000000..f826c7cc --- /dev/null +++ b/.claude/skills/setup-remote-executor.md @@ -0,0 +1,102 @@ +# Setup Shared Remote Executor + +Skill for setting up Claude Code's shared remote executor in new projects. + +## When to Use +- When adding Claude Code support to a new project +- When the user says "set up remote executor for this project" +- When working on a new project that needs remote command execution + +## Prerequisites +- Shared executor already deployed at `~/.claude/` on wizard@10.0.10.10 +- Project accessible via NFS from both macOS and the remote VM + +## Setup Steps + +### 1. Create .claude Directory +```bash +mkdir -p .claude/sessions +``` + +### 2. Create session-exec.sh Wrapper +Create `.claude/session-exec.sh` with the following content (adjust PROJECT_ROOT): + +```bash +#!/bin/bash +# Project-Local Session Helper - Wrapper for shared executor + +set -euo pipefail + +SHARED_SESSION_EXEC="/home/wizard/.claude/session-exec.sh" +PROJECT_ROOT="/home/wizard/path/to/project" # UPDATE THIS + +if [ -f "$SHARED_SESSION_EXEC" ]; then + if [ "${1:-}" = "create" ] || [ -z "${1:-}" ]; then + "$SHARED_SESSION_EXEC" create "$PROJECT_ROOT" + else + "$SHARED_SESSION_EXEC" "$@" + fi +else + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + SESSIONS_DIR="$SCRIPT_DIR/sessions" + SESSION_ID="${1:-$(date +%s)-$$-$RANDOM}" + ACTION="${2:-create}" + SESSION_DIR="$SESSIONS_DIR/$SESSION_ID" + + case "$ACTION" in + create|init|"") + mkdir -p "$SESSION_DIR" + echo "ready" > "$SESSION_DIR/cmd_status.txt" + echo "$PROJECT_ROOT" > "$SESSION_DIR/workdir.txt" + > "$SESSION_DIR/cmd_input.txt" + > "$SESSION_DIR/cmd_output.txt" + echo "$SESSION_ID" + ;; + cleanup|remove|delete) + [ -d "$SESSION_DIR" ] && rm -rf "$SESSION_DIR" + ;; + status) + [ -d "$SESSION_DIR" ] && cat "$SESSION_DIR/cmd_status.txt" + ;; + list) + [ -d "$SESSIONS_DIR" ] && ls -1 "$SESSIONS_DIR" 2>/dev/null + ;; + esac +fi +``` + +Make executable: `chmod +x .claude/session-exec.sh` + +### 3. Link Sessions Directory (on remote VM) +Run on the remote VM to add project sessions to the shared executor: + +```bash +# Option A: Symlink project sessions (if using project-local sessions) +ln -sfn /path/to/project/.claude/sessions ~/.claude/sessions + +# Option B: Use shared sessions (all projects share one directory) +# Just ensure ~/.claude/sessions exists +``` + +### 4. Create CLAUDE.md +Add execution instructions to `.claude/CLAUDE.md`: + +```markdown +## Remote Command Execution +Uses shared executor at `~/.claude/` on wizard@10.0.10.10. + +### Usage +\```bash +SESSION_ID=$(.claude/session-exec.sh) +echo "command" > .claude/sessions/$SESSION_ID/cmd_input.txt +sleep 1 && cat .claude/sessions/$SESSION_ID/cmd_status.txt +cat .claude/sessions/$SESSION_ID/cmd_output.txt +\``` + +Start executor: `~/.claude/remote-executor.sh` (on remote VM) +``` + +## Shared Executor Location +- Scripts: `~/.claude/remote-executor.sh`, `~/.claude/session-exec.sh` +- Sessions: `~/.claude/sessions/` +- Remote VM: wizard@10.0.10.10 diff --git a/.claude/skills/terraform-state-identity-mismatch/SKILL.md b/.claude/skills/terraform-state-identity-mismatch/SKILL.md new file mode 100644 index 00000000..16c204d4 --- /dev/null +++ b/.claude/skills/terraform-state-identity-mismatch/SKILL.md @@ -0,0 +1,97 @@ +--- +name: terraform-state-identity-mismatch +description: | + Fix Terraform "Unexpected Identity Change" errors during plan/apply. Use when: + (1) Terraform fails with "the Terraform Provider unexpectedly returned a different + identity", (2) State refresh shows identity mismatch between stored and current values, + (3) Resource was created but terraform apply timed out, leaving state inconsistent. + Solution involves removing and reimporting the affected resource. +author: Claude Code +version: 1.0.0 +date: 2026-01-28 +--- + +# Terraform State Identity Mismatch Fix + +## Problem +Terraform fails during plan or apply with an "Unexpected Identity Change" error, +indicating the stored state identity doesn't match what the provider returns when +reading the resource. + +## Context / Trigger Conditions +- Error message contains: "Unexpected Identity Change: During the read operation, + the Terraform Provider unexpectedly returned a different identity" +- Often occurs after a terraform apply times out mid-creation +- Resource exists in the cluster/cloud but state is corrupted +- Common with Kubernetes provider after deployment rollout timeouts + +## Solution + +### Step 1: Identify the affected resource +The error message includes the resource address: +``` +with module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume +``` + +### Step 2: Remove from state +```bash +terraform state rm 'module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume' +``` +Note: Use single quotes around the address to handle brackets properly. + +### Step 3: Import the resource back +```bash +terraform import 'module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume' / +``` +For Kubernetes deployments, the import ID is `namespace/deployment-name`. + +### Step 4: Verify with plan +```bash +terraform plan -target= +``` +Should show minimal or no changes if import was successful. + +### Step 5: Apply to sync any drift +```bash +terraform apply -target= +``` + +## Verification +- `terraform plan` runs without identity errors +- `terraform apply` completes successfully +- Resource still exists and functions correctly + +## Example +**Error:** +``` +Error: Unexpected Identity Change + +Current Identity: cty.ObjectVal(map[string]cty.Value{"api_version":cty.NullVal...}) +New Identity: cty.ObjectVal(map[string]cty.Value{"api_version":cty.StringVal("apps/v1")...}) + +with module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume +``` + +**Fix:** +```bash +terraform state rm 'module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume' +# Output: Removed ... Successfully removed 1 resource instance(s). + +terraform import 'module.kubernetes_cluster.module.resume["resume"].kubernetes_deployment.resume' resume/resume +# Output: Import successful! + +terraform apply -target=module.kubernetes_cluster.module.resume -auto-approve +# Output: Apply complete! Resources: 0 added, 1 changed, 0 destroyed. +``` + +## Notes +- This is a provider bug, not user error - consider reporting to provider maintainers +- The resource continues to work fine; only the terraform state is affected +- Always verify the resource exists before importing (don't import non-existent resources) +- For Kubernetes resources, import IDs are typically `namespace/name` +- For AWS resources, import IDs vary by resource type (check provider docs) +- Consider adding `-lock=false` if state locking causes issues during recovery + +## See Also +- Terraform state management documentation +- Kubernetes provider import documentation