Apache Tika XXE Vulnerability (CVE-2025-66516) – Critical PDF Parsing Exploit
A severe flaw has been discovered in Apache Tika, the widely adopted framework for document parsing and content extraction. Tracked as CVE-2025-66516 with a CVSS score of 10.0, the issue enables XML External Entity (XXE) attacks through specially crafted PDF files.
This new advisory replaces CVE-2025-54988. Although the earlier notice pointed to the PDF parser component, deeper investigation showed the underlying problem resides in tika-core. As a result, updating just the PDF module does not mitigate the risk; you need to upgrade the core library to ensure proper protection.
What is Apache Tika?
Apache Tika is an open-source toolkit used for detecting and extracting metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
It is a staple in document processing workflows, search indexing, and content analysis applications across industries like finance, legal, government, and media.
What are the affected modules?
This vulnerability affects a broad range of versions due to the issue residing in the core logic. If you are using any of the following, you are at risk:
- Apache Tika Core:
org.apache.tika:tika-coreversions1.13through3.2.1 - Apache Tika Parsers:
org.apache.tika:tika-parsersversions1.13before2.0.0, In 1.x releases, the PDFParser was bundled in this module. - Apache Tika PDF Parser Module:
org.apache.tika:tika-parser-pdf-moduleversions2.0.0through3.2.1
How to identify if you are vulnerable
Check your build configuration files (Maven pom.xml or Gradle build.gradle) for these vulnerable dependency definitions.
Vulnerable Maven Configuration:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parser-pdf-module</artifactId>
<version>2.9.0</version>
</dependency>
Copied
Vulnerable Code Implementation
The vulnerability is triggered by standard usage of the library. If your application parses untrusted PDF files using the standard AutoDetectParser or Tika facade, it is susceptible to the XXE attack embedded in the XFA data.
import org.apache.tika.Tika;
import java.io.File;
public class DocumentProcessor {
public void processUpload(File uploadedFile) {
try {
// Instantiating Tika (uses default configuration including vulnerable parsers)
Tika tika = new Tika();
// The vulnerability is triggered here when parsing a malicious PDF
// No special configuration is required to be vulnerable
String content = tika.parseToString(uploadedFile);
System.out.println("Extracted content: " + content);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Copied
What use cases are affected by CVE-2025-66516?
This vulnerability is triggered whenever Tika processes a PDF that includes a maliciously crafted XFA (XML Forms Architecture) file. An attacker only needs your system to parse the PDF, no authentication or user interaction is required.
The impact can be significant – attackers may read sensitive files on the host (including regulated data), force resource exhaustion that leads to service disruption, or even use the XXE flaw to perform server-side request forgery (SSRF).
Any application that lets users upload documents for automated analysis, indexing, or preview generation is at risk.
How to prevent it? (Mitigations & Workarounds)
If an immediate upgrade to tika-core 3.2.2 is not feasible, you must implement strict defense-in-depth measures to neutralize the attack vector.
Disable the PDF Parser (Configuration Change)
The most effective workaround is to disable Tika’s PDF parsing capability entirely. This prevents the application from processing the malicious XFA content. You can do this by configuring a tika-config.xml file to exclude the PDF parser.
Input Validation & Sanitization
- Reject XFA: If possible, use a lightweight pre-processor (like
pdfid.pyorqpdf) to scan PDF headers before passing them to Tika. Reject any PDF that contains the string/AcroFormor reference to XFA forms. - WAF Rules: Deploy Web Application Firewall rules to detect and block XML payloads attempting to define external entities, although this is difficult to detect when embedded deep within a binary PDF.
Here are a few options to tune the closing paragraph.
Crucial Correction: The text you provided referenced a different CVE (CVE-2025-55182) and a React library react-server-dom-webpack. I have corrected these to match the Apache Tika (CVE-2025-66516) context and the specific behavior of XXE attacks (which involve unauthorized file access and network requests).
Securing Your Application with Upwind
While patching is the primary defense, ensuring your runtime environment is secure is critical. Upwind protects your applications from the active exploitation of CVE-2025-66516 through:
- Exploit Detection: Upwind has added dedicated runtime detections to identify and block malicious inputs attempting to trigger the XXE flaw in Tika.
- Real-Time Behavioral Monitoring: Our sensor technology detects the specific anomalies associated with this attack, such as unauthorized file access or unexpected outbound network connections, and stops them before data is exfiltrated.
- Deep Dependency Visibility: We automatically map your entire application, including nested Java dependencies like
tika-coreandtika-parsers, to instantly highlight vulnerable workloads across your infrastructure.
For assistance in identifying vulnerable components or to learn more about our runtime protections, reach out to [email protected].
