182 lines
5.7 KiB
Markdown
182 lines
5.7 KiB
Markdown
---
|
|
name: python-filename-sanitization
|
|
description: |
|
|
Secure filename sanitization pattern for Python web applications. Use when:
|
|
(1) Accepting user-provided filenames for file operations, (2) Building file
|
|
rename/upload functionality, (3) Preventing path traversal attacks (../../../etc/passwd),
|
|
(4) Preventing shell injection through filenames, (5) FastAPI/Flask file handling.
|
|
Provides regex-based whitelist approach with pathlib for safe file operations.
|
|
author: Claude Code
|
|
version: 1.0.0
|
|
date: 2025-01-31
|
|
---
|
|
|
|
# Python Filename Sanitization
|
|
|
|
## Problem
|
|
User-provided filenames can contain malicious characters that enable path traversal
|
|
attacks, shell injection, or filesystem corruption. Direct use of user input in
|
|
file paths is a security vulnerability.
|
|
|
|
## Context / Trigger Conditions
|
|
- Building file upload, rename, or download functionality
|
|
- User can specify filenames via API or form input
|
|
- Files are stored on server filesystem
|
|
- Need to prevent: `../`, shell metacharacters, null bytes, etc.
|
|
|
|
## Solution
|
|
|
|
### Complete Sanitization Function
|
|
```python
|
|
import re
|
|
from pathlib import Path
|
|
|
|
def sanitize_filename(filename: str, max_length: int = 200) -> str:
|
|
"""
|
|
Sanitize a filename to prevent path traversal and shell injection.
|
|
Only allows alphanumeric characters, spaces, hyphens, underscores,
|
|
parentheses, and dots.
|
|
"""
|
|
if not filename:
|
|
raise ValueError("Filename cannot be empty")
|
|
|
|
# Remove any path components (prevent path traversal)
|
|
filename = Path(filename).name
|
|
|
|
# Only allow safe characters: alphanumeric, space, hyphen, underscore, parentheses, dot
|
|
# This regex removes anything that isn't in the allowed set
|
|
safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)
|
|
|
|
# Collapse multiple spaces/dots
|
|
safe_filename = re.sub(r'\s+', ' ', safe_filename)
|
|
safe_filename = re.sub(r'\.+', '.', safe_filename)
|
|
|
|
# Strip leading/trailing whitespace and dots
|
|
safe_filename = safe_filename.strip(' .')
|
|
|
|
# Limit length
|
|
if len(safe_filename) > max_length:
|
|
safe_filename = safe_filename[:max_length]
|
|
|
|
if not safe_filename:
|
|
raise ValueError("Filename contains no valid characters")
|
|
|
|
return safe_filename
|
|
```
|
|
|
|
### FastAPI Integration Example
|
|
```python
|
|
from fastapi import APIRouter, HTTPException
|
|
from pydantic import BaseModel
|
|
from pathlib import Path
|
|
|
|
class RenameRequest(BaseModel):
|
|
new_name: str
|
|
|
|
@router.patch("/files/{file_id}/rename")
|
|
async def rename_file(file_id: str, request: RenameRequest):
|
|
"""Rename a file with sanitized input."""
|
|
file_dir = Path("/data/files") / file_id
|
|
|
|
if not file_dir.exists():
|
|
raise HTTPException(status_code=404, detail="File not found")
|
|
|
|
# Find existing file
|
|
files = list(file_dir.glob("*"))
|
|
if not files:
|
|
raise HTTPException(status_code=404, detail="No file found")
|
|
|
|
current_file = files[0]
|
|
current_extension = current_file.suffix
|
|
|
|
# Sanitize the new name
|
|
try:
|
|
safe_name = sanitize_filename(request.new_name)
|
|
except ValueError as e:
|
|
raise HTTPException(status_code=400, detail=str(e))
|
|
|
|
# Preserve original extension
|
|
if not safe_name.lower().endswith(current_extension.lower()):
|
|
safe_name = safe_name + current_extension
|
|
|
|
# Create new path (same directory, new filename)
|
|
new_file = file_dir / safe_name
|
|
|
|
# Check for conflicts
|
|
if new_file.exists() and new_file != current_file:
|
|
raise HTTPException(status_code=400, detail="A file with that name already exists")
|
|
|
|
# Rename using pathlib (no shell commands!)
|
|
current_file.rename(new_file)
|
|
|
|
return {"status": "renamed", "new_filename": safe_name}
|
|
```
|
|
|
|
## Key Security Principles
|
|
|
|
### 1. Whitelist, Don't Blacklist
|
|
```python
|
|
# BAD: Trying to block dangerous characters
|
|
filename = filename.replace('../', '').replace('\x00', '')
|
|
|
|
# GOOD: Only allow known-safe characters
|
|
safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)
|
|
```
|
|
|
|
### 2. Use pathlib, Not Shell Commands
|
|
```python
|
|
# BAD: Shell command (vulnerable to injection)
|
|
os.system(f'mv "{old_path}" "{new_path}"')
|
|
|
|
# GOOD: Pure Python (no shell)
|
|
old_path.rename(new_path)
|
|
```
|
|
|
|
### 3. Extract Basename First
|
|
```python
|
|
# BAD: User could submit "../../../etc/passwd"
|
|
filename = user_input
|
|
|
|
# GOOD: Extract just the filename part
|
|
filename = Path(user_input).name
|
|
```
|
|
|
|
### 4. Validate After Sanitization
|
|
```python
|
|
# Ensure something remains after sanitization
|
|
if not safe_filename:
|
|
raise ValueError("Filename contains no valid characters")
|
|
```
|
|
|
|
## Verification
|
|
```python
|
|
# Test cases that should be handled safely
|
|
assert sanitize_filename("normal.txt") == "normal.txt"
|
|
assert sanitize_filename("../../../etc/passwd") == "etcpasswd"
|
|
assert sanitize_filename("file; rm -rf /") == "file rm -rf"
|
|
assert sanitize_filename(" spaces .txt") == "spaces.txt"
|
|
assert sanitize_filename("$(whoami).txt") == "whoami.txt"
|
|
|
|
# Test cases that should raise errors
|
|
try:
|
|
sanitize_filename("") # Should raise ValueError
|
|
except ValueError:
|
|
pass
|
|
|
|
try:
|
|
sanitize_filename("$#@!") # Should raise ValueError (no valid chars)
|
|
except ValueError:
|
|
pass
|
|
```
|
|
|
|
## Notes
|
|
- This is intentionally restrictive; expand the regex if you need Unicode support
|
|
- For Unicode filenames, consider `unicodedata.normalize('NFKD', ...)` first
|
|
- Max length of 200 is conservative; filesystem limits vary (255 bytes typical)
|
|
- Always preserve file extensions when renaming to avoid breaking file associations
|
|
- Consider adding a UUID prefix for guaranteed uniqueness in upload scenarios
|
|
|
|
## References
|
|
- [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal)
|
|
- [CWE-22: Path Traversal](https://cwe.mitre.org/data/definitions/22.html)
|
|
- [Python pathlib documentation](https://docs.python.org/3/library/pathlib.html)
|