Symlink Attack in Cassandra
How Symlink Attack Manifests in Cassandra
Cassandra's distributed architecture relies heavily on filesystem operations for data persistence, including commit logs, SSTables, and snapshots. A symlink attack exploits this by manipulating symbolic links to redirect Cassandra's file operations to unintended locations. This typically occurs when an application layer (often an API) accepts user-controlled file paths without proper validation and passes them to Cassandra's administrative operations.
Attack Vector via API Endpoints: Consider a REST API endpoint designed to restore a database snapshot. A vulnerable implementation might directly use a user-supplied path parameter with Cassandra's nodetool refresh or programmatic snapshot restoration APIs. An attacker could provide a path like /var/lib/cassandra/data/keyspace1/../../../../etc/passwd or, more insidiously, replace a legitimate snapshot directory with a symlink pointing to a sensitive system file. When Cassandra attempts to read the snapshot, it follows the symlink and may expose sensitive data (e.g., /etc/shadow) in error messages or logs, or even overwrite critical system files if the operation involves writes.
Cassandra-Specific Code Paths: The danger surfaces in several Cassandra subsystems:
- Snapshot Operations:
org.apache.cassandra.service.StorageServicemethods likeloadNewSSTablesorrestoreSnapshotthat take filesystem paths. If these paths are attacker-controlled, symlinks can cause arbitrary file read/write. - Commit Log Recovery: During startup, Cassandra replays commit logs from
commitlog_directory. A symlink here could redirect writes to arbitrary locations, potentially filling up critical partitions or overwriting files. - SSTable Loading: When adding a new data directory or during repair, Cassandra scans directories for SSTables. A symlink within a data directory could cause Cassandra to read files outside the intended data path, leading to data corruption or information disclosure.
Realistic Attack Scenario: An API endpoint POST /api/v1/snapshots/restore accepts JSON: { "keyspace": "user_data", "snapshot_name": "backup_2024" }. The backend constructs a path: String snapshotPath = "/var/lib/cassandra/data/" + keyspace + "/snapshots/" + snapshotName; and calls StorageService.restoreSnapshot(keyspace, snapshotPath). An attacker submits snapshot_name as ../../../..//etc/cron.d/malicious. If the attacker has previously placed a symlink at the expected location pointing to /etc/cron.d/malicious, Cassandra might parse the file as an SSTable (causing errors) or, in a write scenario, overwrite the cron file. Even without prior placement, if the API allows specifying an absolute path (e.g., via a misconfigured snapshot_path parameter), direct symlink targeting is possible.
Cassandra-Specific Detection
Detecting symlink vulnerabilities in Cassandra requires examining both the API layer and Cassandra's operational behavior. The core issue is a lack of canonical path validation before filesystem operations. middleBrick's security scan includes an Input Validation check that actively probes API endpoints for path traversal and symlink handling weaknesses. During a scan, middleBrick tests endpoints that accept file paths or identifiers that influence storage operations (e.g., snapshot names, data directory parameters, import/export paths).
Scanning with middleBrick: Submit the target API URL to middleBrick. The scanner will:
- Identify endpoints that accept path-like parameters (e.g., via OpenAPI
pathparameters or query strings namedpath,file,directory). - Send payloads containing symlink indicators (e.g.,
../sequences, absolute paths like/tmp/malicious_link) and observe responses. - Analyze error messages, HTTP status codes, and response bodies for signs of filesystem interaction (e.g.,
NoSuchFileException,InvalidSSTableException, or unexpected data leakage). - Cross-reference findings with the OpenAPI spec to map vulnerable parameters to Cassandra operations (e.g., a parameter named
snapshot_namelinked to arestoreSnapshotaction in the spec).
Manual Detection Indicators: Look for API responses that reveal internal filesystem paths, such as:
File /var/lib/cassandra/data/keyspace1/../../../../etc/passwd not foundInvalid SSTable magic number at /tmp/uploaded_file(indicating a non-SSTable file was read)- Unexpected file creation in system directories (check
/tmpor/etcafter API calls).
Cassandra Log Review: Examine system.log and debug.log for IOException or SecurityException when processing API-triggered operations. Logs showing access to paths outside cassandra.yaml configured directories (data_file_directories, commitlog_directory) are strong indicators.
Cassandra-Specific Remediation
Remediation focuses on strict path validation at the application layer and hardening Cassandra's filesystem permissions. Cassandra itself does not provide built-in symlink protection for all operations; thus, the application must ensure paths are canonical and within allowed directories.
1. Application-Level Path Canonicalization: Before passing any user-supplied path to Cassandra's Java driver or nodetool commands, resolve it to its canonical path and verify it resides within an approved base directory. Example in Java:
import java.nio.file.*;
public class CassandraPathValidator {
private static final Path ALLOWED_BASE = Paths.get("/var/lib/cassandra/data");
public static Path validateAndResolve(String userSuppliedPath) throws SecurityException {
try {
// Resolve symlinks, normalize path (remove .., .)
Path resolved = Paths.get(userSuppliedPath).toRealPath(LinkOption.NOFOLLOW_LINKS);
// Ensure resolved path is within ALLOWED_BASE
if (!resolved.startsWith(ALLOWED_BASE)) {
throw new SecurityException("Path outside allowed data directory: " + resolved);
}
return resolved;
} catch (IOException e) {
throw new SecurityException("Invalid path or symlink target inaccessible", e);
}
}
}
// Usage in API handler:
String userSnapshot = request.getParameter("snapshot_name");
Path safePath = CassandraPathValidator.validateAndResolve("/var/lib/cassandra/data/keyspace1/snapshots/" + userSnapshot);
// Pass safePath to Cassandra admin API
2. Restrict Filesystem Permissions: Ensure Cassandra's data directories (data_file_directories, commitlog_directory) are owned by the cassandra user and have strict permissions (750 or 700). Prevent other users from creating symlinks within these directories. Example on Linux:
sudo chown -R cassandra:cassandra /var/lib/cassandra
sudo chmod -R 750 /var/lib/cassandra
# Ensure no world-writable directories
find /var/lib/cassandra -type d -perm -o=w -exec chmod o-w {} \;
3. Disable Symlink Following in Cassandra (if possible): Cassandra does not offer a global switch to disable symlink following. However, you can mitigate by setting disk_failure_policy: stop in cassandra.yaml to halt on filesystem errors, reducing the impact of corrupted reads. Additionally, use OS-level protections like mount --bind with noatime,nodiratime and consider chroot jails for the Cassandra process (though complex in distributed setups).
4. API Design Hardening: Avoid accepting raw filesystem paths in APIs. Instead, use abstract identifiers (e.g., snapshot IDs) and map them to internal paths server-side. If paths are necessary, whitelist allowed characters and reject any containing .. or symlink indicators.
5. Monitoring and Alerts: Use middleBrick's Pro plan with continuous monitoring to track Input Validation scores over time. Set up GitHub Action gates to fail builds if new endpoints with risky path parameters are introduced. Integrate alerts (Slack/Teams) for score drops indicating potential regression.