trailofbits.python.pickles-in-pandas.pickles-in-pandas

Author
unknown
Download Count*
License
Functions reliant on pickle can result in arbitrary code execution. Consider using fickling or switching to a safer serialization method
Run Locally
Run in CI
Defintion
rules:
- id: pickles-in-pandas
message: Functions reliant on pickle can result in arbitrary code execution.
Consider using fickling or switching to a safer serialization method
languages:
- python
severity: ERROR
metadata:
category: security
cwe: "CWE-502: Deserialization of Untrusted Data"
subcategory:
- vuln
confidence: MEDIUM
likelihood: MEDIUM
impact: HIGH
technology:
- pandas
description: Potential arbitrary code execution from `Pandas` functions reliant
on pickling
references:
- https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/
license: CC-BY-NC-SA-4.0
patterns:
- pattern-either:
- pattern: pandas.read_pickle(...)
- pattern: pandas.to_pickle(...)
- patterns:
- pattern-inside: |
import pandas
...
- pattern: $SMTH.to_pickle(...)
- pattern-not: pandas.read_pickle("...")
- pattern-not: pandas.to_pickle(..., "...")
- pattern-not: $SMTH.to_pickle("...")
Examples
pickles-in-pandas.py
import pandas as pd
original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
# ok: pickles-in-pandas
pd.to_pickle(original_df, "./dummy.pkl")
# ok: pickles-in-pandas
original_df.to_pickle("./dummy.pkl")
# ok: pickles-in-pandas
unpickled_df = pd.read_pickle("./dummy.pkl")
# ok: pickles-in-pandas
def invalid_func(another = original_df.to_pickle("./dummy.pkl")):
return another
# ok: pickles-in-pandas
uncsved_df = pd.read_csv("./dummy.pkl")
def test(data, file):
# ruleid: pickles-in-pandas
pd.to_pickle(data, file)
# ruleid: pickles-in-pandas
data.to_pickle(file)
# ruleid: pickles-in-pandas
unpickled_df = pd.read_pickle(file)
# ruleid: pickles-in-pandas
def invalid_func(another = original_df.to_pickle(file)):
return another
# ok: pickles-in-pandas
uncsved_df = pd.read_csv(file)
Short Link: https://sg.run/bXQW