trailofbits.python.lxml-in-pandas.lxml-in-pandas

profile photo of trailofbitstrailofbits
Author
unknown
Download Count*

Found usage of the $FLAVOR library, which is vulnerable to attacks such as XML external entity (XXE) attacks

Run Locally

Run in CI

Defintion

rules:
  - id: lxml-in-pandas
    message: Found usage of the `$FLAVOR` library, which is vulnerable to attacks
      such as XML external entity (XXE) attacks
    languages:
      - python
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-611: Improper Restriction of XML External Entity Reference"
      subcategory:
        - vuln
      confidence: HIGH
      likelihood: MEDIUM
      impact: MEDIUM
      technology:
        - pandas
      description: Potential XXE attacks from loading `lxml` in pandas
      references:
        - https://lxml.de/FAQ.html
      license: AGPL-3.0 license
      vulnerability_class:
        - XML Injection
    pattern-either:
      - patterns:
          - pattern: pandas.read_html($IO)
          - pattern-not: pandas.read_html(**$KWARGS)
      - patterns:
          - metavariable-pattern:
              metavariable: $FLAVOR
              patterns:
                - pattern: ...
                - pattern-not: |
                    "bs4"
                - pattern-not: |
                    "html5lib"
          - pattern-either:
              - pattern: pandas.read_html(..., flavor=$FLAVOR, ...)
              - patterns:
                  - pattern-inside: |
                      $KWARGS = {..., "flavor": $FLAVOR, ...}
                      ...
                  - pattern: |
                      pandas.read_html(**$KWARGS)

Examples

lxml-in-pandas.py

import pandas as pd
import pandas

touch = 1
touch2 = 2 
touch3 = 3

# ruleid: lxml-in-pandas
pd.read_html(touch)

# ruleid: lxml-in-pandas
pandas.read_html(touch, flavor="lxml")

kwargs = {"io": touch, "flavor":"lxml"}
# ruleid: lxml-in-pandas
pd.read_html(**kwargs)

# ok: lxml-in-pandas
pandas.read_html(touch, flavor="bs4")

# ok: lxml-in-pandas
pd.read_html(touch, flavor="html5lib")

kwargs2 = {"io": touch, "flavor":"html5lib"}
# ok: lxml-in-pandas
pd.read_html(**kwargs2)

def test(**kwargs):
    # ruleid: lxml-in-pandas
    pd.read_html(**kwargs)