trailofbits.python.pickles-in-pandas.pickles-in-pandas

profile photo of trailofbitstrailofbits
Author
unknown
Download Count*

Functions reliant on pickle can result in arbitrary code execution. Consider using fickling or switching to a safer serialization method

Run Locally

Run in CI

Defintion

rules:
  - id: pickles-in-pandas
    message: Functions reliant on pickle can result in arbitrary code execution.
      Consider using fickling or switching to a safer serialization method
    languages:
      - python
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-502: Deserialization of Untrusted Data"
      subcategory:
        - vuln
      confidence: MEDIUM
      likelihood: MEDIUM
      impact: HIGH
      technology:
        - pandas
      description: Potential arbitrary code execution from `Pandas` functions reliant
        on pickling
      references:
        - https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/
      license: AGPL-3.0 license
      vulnerability_class:
        - "Insecure Deserialization "
    patterns:
      - pattern-either:
          - pattern: pandas.read_pickle(...)
          - pattern: pandas.to_pickle(...)
          - patterns:
              - pattern-inside: |
                  import pandas
                  ...
              - pattern: $SMTH.to_pickle(...)
      - pattern-not: pandas.read_pickle("...")
      - pattern-not: pandas.to_pickle(..., "...")
      - pattern-not: $SMTH.to_pickle("...")

Examples

pickles-in-pandas.py

import pandas as pd 

original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})  

# ok: pickles-in-pandas
pd.to_pickle(original_df, "./dummy.pkl")

# ok: pickles-in-pandas
original_df.to_pickle("./dummy.pkl")

# ok: pickles-in-pandas
unpickled_df = pd.read_pickle("./dummy.pkl")  

# ok: pickles-in-pandas
def invalid_func(another = original_df.to_pickle("./dummy.pkl")):
  return another

# ok: pickles-in-pandas
uncsved_df = pd.read_csv("./dummy.pkl")  


def test(data, file):
  # ruleid: pickles-in-pandas
  pd.to_pickle(data, file)

  # ruleid: pickles-in-pandas
  data.to_pickle(file)

  # ruleid: pickles-in-pandas
  unpickled_df = pd.read_pickle(file)  

  # ruleid: pickles-in-pandas
  def invalid_func(another = original_df.to_pickle(file)):
    return another

  # ok: pickles-in-pandas
  uncsved_df = pd.read_csv(file)