📄 skillActivePromisingcritical
BoxPwnr
by 0ca
A modular framework for benchmarking LLMs and agentic strategies on security challenges across HackTheBox, TryHackMe, PortSwigger Labs, Cybench, picoCTF and more.
Supported Platforms
🤖 Claude Code🧠Codex
Quality Breakdown
87/ 200
Content Signals
Repo Health
Multi-platform bonus: +5 pts if tool supports 2+ platforms. Score derived from 12 structural signals — not stars or popularity.
Trust & Verification
critical
Manual security review required. Use with extreme caution.
Active
Updated within the last 90 days. Actively maintained.
Risk Assessment
- Contains multi-agent orchestration system (.claude/agents directory with implement-new-platform.md)
- Autonomous LLM-based executor framework with multiple solver types (single_loop, claude_code, hacksynth, external)
- SSH executor capability (src/boxpwnr/executors/ssh/) enabling remote command execution
- Docker executor capability (src/boxpwnr/executors/docker/) for containerized command execution
- PTY manager for interactive terminal session handling without explicit user approval gates
- LLM orchestrator core (src/boxpwnr/core/orchestrator.py) coordinating autonomous agent behavior