Computer Science > Cryptography and Security

arXiv:2310.11409 (cs)

[Submitted on 17 Oct 2023 (v1), last revised 15 Oct 2025 (this version, v6)]

Title:LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Authors:Andreas Happe, Aaron Kaplan, Juergen Cito

Abstract:Penetration-testing is crucial for identifying system vulnerabilities, with privilege-escalation being a critical subtask to gain elevated access to protected resources. Language Models (LLMs) presents new avenues for automating these security practices by emulating human behavior. However, a comprehensive understanding of LLMs' efficacy and limitations in performing autonomous Linux privilege-escalation attacks remains under-explored. To address this gap, we introduce hackingBuddyGPT, a fully automated LLM-driven prototype designed for autonomous Linux privilege-escalation. We curated a novel, publicly available Linux privilege-escalation benchmark, enabling controlled and reproducible evaluation.
Our empirical analysis assesses the quantitative success rates and qualitative operational behaviors of various LLMs -- GPT-3.5-Turbo, GPT-4-Turbo, and Llama3 -- against baselines of human professional pen-testers and traditional automated tools. We investigate the impact of context management strategies, different context sizes, and various high-level guidance mechanisms on LLM performance.
Results show that GPT-4-Turbo demonstrates high efficacy, successfully exploiting 33-83% of vulnerabilities, a performance comparable to human pen-testers (75%). In contrast, local models like Llama3 exhibited limited success (0-33%), and GPT-3.5-Turbo achieved moderate rates (16-50%). We show that both high-level guidance and state-management through LLM-driven reflection significantly boost LLM success rates.
Qualitative analysis reveals both LLMs' strengths and weaknesses in generating valid commands and highlights challenges in common-sense reasoning, error handling, and multi-step exploitation, particularly with temporal dependencies. Cost analysis indicates that GPT-4-Turbo can achieve human-comparable performance at competitive costs, especially with optimized context management.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.11409 [cs.CR]
	(or arXiv:2310.11409v6 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2310.11409

Submission history

From: Andreas Happe [view email]
[v1] Tue, 17 Oct 2023 17:15:41 UTC (261 KB)
[v2] Mon, 23 Oct 2023 16:48:02 UTC (261 KB)
[v3] Tue, 19 Mar 2024 14:23:07 UTC (262 KB)
[v4] Thu, 1 Aug 2024 06:42:27 UTC (292 KB)
[v5] Tue, 18 Feb 2025 12:53:47 UTC (514 KB)
[v6] Wed, 15 Oct 2025 10:14:34 UTC (416 KB)

Computer Science > Cryptography and Security

Title:LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators