Understanding YAML Deserialization Vulnerabilities and Mitigation

Chapter 1: Introduction to YAML Deserialization

Deserialization attacks are increasingly prevalent in programming languages like Java, Python, and Ruby. These vulnerabilities arise when data streams are deserialized without adequate validation, potentially allowing the execution of remote code. In this article, we will explore a specific deserialization technique within the context of YAML.

Before we delve into YAML deserialization, let's clarify the concepts of serialization and deserialization.

Section 1.1: What is Serialization?

Consider an online game where your character has various attributes such as username, avatar, clothing, rank, and weapons. How are these attributes communicated and stored on the server? The answer lies in serialization.

Defining Serialization

Serialization is the process of converting an object into a byte stream or a flat structure. This byte stream, often referred to as a simplified version of the object, can be transmitted over networks or saved in files, databases, and more.

Deserialization, the counterpart to serialization, is the process of transforming these byte streams back into their original object form.

Potential Vulnerabilities

The vulnerability arises when a web server accepts a serialized value without validating it before deserializing. If users can manipulate the serialized value, it could lead to unexpected behaviors during deserialization.

Section 1.2: What is YAML?

YAML, which stands for "Yet Another Markup Language," is defined by Wikipedia as "a human-readable data-serialization language." It is commonly employed for configuration files and in scenarios where data needs to be stored or transmitted. YAML utilizes Python-style indentation for nesting and features a compact format using [] for lists and {} for maps.

Unlike traditional programming languages, YAML lacks a strict format, making it distinct.

Example of YAML Serialization

Consider the following un-serialized data:

{

'name': 'Manish',

'age': 12,

'skills': ['programming', 'soft skills']

}

Upon serialization, it would appear as:

name: Manish

age: 12

skills:

programming

soft skills

Understanding Vulnerabilities in YAML

While this example is not inherently vulnerable, issues can arise when serialized objects are executed instead of merely deserialized. Libraries like PyYAML and ruamel.yaml are commonly used but can be prone to deserialization attacks if insecure methods are employed.

Serialization Methods

Common serialization methods include:

dump()
dump_all()
safe_dump()
safe_dump_all()

To deserialize, methods such as:

load()
load_all()
full_load()
full_load_all()
safe_load()
safe_load_all()

Chapter 2: Creating Payloads

To illustrate the creation of payloads, we utilize the __reduce__() method, which functions with both PyYAML and ruamel.yaml, assuming the backend operates on a Unix-based system.

import yaml

import subprocess

class Payload(object):

def __reduce__(self):

return (subprocess.Popen, ('ls',))

deserialized_data = yaml.dump(Payload())

print(deserialized_data)

The serialized output will be:

!python/object/apply:subprocess.Popen

Payload Explanation

In this example, we create a payload using the __reduce__() method and the subprocess module to spawn a new process. The Popen method executes a command, in this case, ls, which lists files in the current directory.

When this payload is input into a vulnerable web application, it not only deserializes but also executes the command.

deserialized_data = yaml.load(data)

The command executes successfully, revealing the contents of the current directory.

Remediation Strategies

This vulnerability was assigned CVE-2017-18342 and was patched in PyYAML version 5.1. It is advisable to ensure that your version of PyYAML is greater than 5.1 (the current version is 6.0).

However, if you wish to bypass this vulnerability, it depends on the version in use. While certain unsafe methods can be executed, specific conditions must be met.

Note: If attempting this, consider using PyYAML version < 5.1. The load() method will fail with the above payload unless you specify "Loader=Loader" or change it to unsafe_load(data, Loader=Loader).

Conclusion

Deserialization vulnerabilities can lead to severe repercussions, including remote code execution or privilege escalation. Therefore, it is crucial to sanitize and validate any incoming data before deserializing it.

Learn how to deserialize YAML in C# with YamlDotNET.

A comprehensive YAML course from beginner to advanced, particularly for DevOps and more!

attheoaks.com

Understanding YAML Deserialization Vulnerabilities and Mitigation

Chapter 1: Introduction to YAML Deserialization

Section 1.1: What is Serialization?

Section 1.2: What is YAML?

Chapter 2: Creating Payloads

Conclusion

Share the page:

Recent Post:

Overcoming Gaslighting: 5 Essential Lessons Learned

Mild Recession Signals Stagflation Amidst Job Growth Trends

Overcome Fear of Judgment to Live Authentically and Fully

The Journey of Choice: Alice's Awakening in Wonderland

Exploring Earth's Future: What Awaits Us in 100 Million Years

DAOs: The Underdogs and Why I Believe in Their Future

Fitness at 65+: Why Staying Active is Essential for Longevity

# Embracing Time: A Journey of Self-Discovery for Liana