infra/.claude/skills/python-filename-sanitization/SKILL.md
2026-02-06 20:10:02 +00:00

5.7 KiB

name description author version date
python-filename-sanitization Secure filename sanitization pattern for Python web applications. Use when: (1) Accepting user-provided filenames for file operations, (2) Building file rename/upload functionality, (3) Preventing path traversal attacks (../../../etc/passwd), (4) Preventing shell injection through filenames, (5) FastAPI/Flask file handling. Provides regex-based whitelist approach with pathlib for safe file operations. Claude Code 1.0.0 2025-01-31

Python Filename Sanitization

Problem

User-provided filenames can contain malicious characters that enable path traversal attacks, shell injection, or filesystem corruption. Direct use of user input in file paths is a security vulnerability.

Context / Trigger Conditions

  • Building file upload, rename, or download functionality
  • User can specify filenames via API or form input
  • Files are stored on server filesystem
  • Need to prevent: ../, shell metacharacters, null bytes, etc.

Solution

Complete Sanitization Function

import re
from pathlib import Path

def sanitize_filename(filename: str, max_length: int = 200) -> str:
    """
    Sanitize a filename to prevent path traversal and shell injection.
    Only allows alphanumeric characters, spaces, hyphens, underscores,
    parentheses, and dots.
    """
    if not filename:
        raise ValueError("Filename cannot be empty")

    # Remove any path components (prevent path traversal)
    filename = Path(filename).name

    # Only allow safe characters: alphanumeric, space, hyphen, underscore, parentheses, dot
    # This regex removes anything that isn't in the allowed set
    safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)

    # Collapse multiple spaces/dots
    safe_filename = re.sub(r'\s+', ' ', safe_filename)
    safe_filename = re.sub(r'\.+', '.', safe_filename)

    # Strip leading/trailing whitespace and dots
    safe_filename = safe_filename.strip(' .')

    # Limit length
    if len(safe_filename) > max_length:
        safe_filename = safe_filename[:max_length]

    if not safe_filename:
        raise ValueError("Filename contains no valid characters")

    return safe_filename

FastAPI Integration Example

from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from pathlib import Path

class RenameRequest(BaseModel):
    new_name: str

@router.patch("/files/{file_id}/rename")
async def rename_file(file_id: str, request: RenameRequest):
    """Rename a file with sanitized input."""
    file_dir = Path("/data/files") / file_id

    if not file_dir.exists():
        raise HTTPException(status_code=404, detail="File not found")

    # Find existing file
    files = list(file_dir.glob("*"))
    if not files:
        raise HTTPException(status_code=404, detail="No file found")

    current_file = files[0]
    current_extension = current_file.suffix

    # Sanitize the new name
    try:
        safe_name = sanitize_filename(request.new_name)
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))

    # Preserve original extension
    if not safe_name.lower().endswith(current_extension.lower()):
        safe_name = safe_name + current_extension

    # Create new path (same directory, new filename)
    new_file = file_dir / safe_name

    # Check for conflicts
    if new_file.exists() and new_file != current_file:
        raise HTTPException(status_code=400, detail="A file with that name already exists")

    # Rename using pathlib (no shell commands!)
    current_file.rename(new_file)

    return {"status": "renamed", "new_filename": safe_name}

Key Security Principles

1. Whitelist, Don't Blacklist

# BAD: Trying to block dangerous characters
filename = filename.replace('../', '').replace('\x00', '')

# GOOD: Only allow known-safe characters
safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)

2. Use pathlib, Not Shell Commands

# BAD: Shell command (vulnerable to injection)
os.system(f'mv "{old_path}" "{new_path}"')

# GOOD: Pure Python (no shell)
old_path.rename(new_path)

3. Extract Basename First

# BAD: User could submit "../../../etc/passwd"
filename = user_input

# GOOD: Extract just the filename part
filename = Path(user_input).name

4. Validate After Sanitization

# Ensure something remains after sanitization
if not safe_filename:
    raise ValueError("Filename contains no valid characters")

Verification

# Test cases that should be handled safely
assert sanitize_filename("normal.txt") == "normal.txt"
assert sanitize_filename("../../../etc/passwd") == "etcpasswd"
assert sanitize_filename("file; rm -rf /") == "file rm -rf"
assert sanitize_filename("  spaces  .txt") == "spaces.txt"
assert sanitize_filename("$(whoami).txt") == "whoami.txt"

# Test cases that should raise errors
try:
    sanitize_filename("")  # Should raise ValueError
except ValueError:
    pass

try:
    sanitize_filename("$#@!")  # Should raise ValueError (no valid chars)
except ValueError:
    pass

Notes

  • This is intentionally restrictive; expand the regex if you need Unicode support
  • For Unicode filenames, consider unicodedata.normalize('NFKD', ...) first
  • Max length of 200 is conservative; filesystem limits vary (255 bytes typical)
  • Always preserve file extensions when renaming to avoid breaking file associations
  • Consider adding a UUID prefix for guaranteed uniqueness in upload scenarios

References