generic.unicode.security.bidi.contains-bidirectional-characters

profile photo of semgrepsemgrep
Author
unknown
Download Count*

This code contains bidirectional (bidi) characters. While this is useful for support of right-to-left languages such as Arabic or Hebrew, it can also be used to trick language parsers into executing code in a manner that is different from how it is displayed in code editing and review tools. If this is not what you were expecting, please review this code in an editor that can reveal hidden Unicode characters.

Run Locally

Run in CI

Defintion

rules:
  - id: contains-bidirectional-characters
    patterns:
      - pattern-either:
          - pattern-regex: ‪
          - pattern-regex: ‫
          - pattern-regex: ‭
          - pattern-regex: ‮
          - pattern-regex: ⁦
          - pattern-regex: ⁧
          - pattern-regex: ⁨
          - pattern-regex: ‬
          - pattern-regex: ⁩
    message: This code contains bidirectional (bidi) characters. While this is
      useful for support of right-to-left languages such as Arabic or Hebrew, it
      can also be used to trick language parsers into executing code in a manner
      that is different from how it is displayed in code editing and review
      tools. If this is not what you were expecting, please review this code in
      an editor that can reveal hidden Unicode characters.
    metadata:
      cwe:
        - "CWE-94: Improper Control of Generation of Code ('Code Injection')"
      category: security
      technology:
        - unicode
      references:
        - https://trojansource.codes/
      confidence: LOW
      owasp:
        - A03:2021 - Injection
      cwe2022-top25: true
      subcategory:
        - audit
      likelihood: LOW
      impact: HIGH
      license: Commons Clause License Condition v1.0[LGPL-2.1-only]
      vulnerability_class:
        - Code Injection
    languages:
      - bash
      - c
      - csharp
      - go
      - java
      - javascript
      - json
      - kotlin
      - lua
      - ocaml
      - php
      - python
      - ruby
      - rust
      - scala
      - sh
      - typescript
      - yaml
    severity: WARNING

Examples

bidi.py

# -*- coding: utf-8 -*-
from types import SimpleNamespace

# A useful lookup table for the bidirectional (bidi) characters
bidi = SimpleNamespace(
    **{
        "LRE": "\u202a",  # left-to-right embedding (LRE)
        "RLE": "\u202b",  # right-to-left embedding (RLE)
        "LRO": "\u202d",  # left-to-right override (LRO)
        "RLO": "\u202e",  # right-to-left override (RLO)
        "LRI": "\u2066",  # left-to-right isolate (LRI)
        "RLI": "\u2067",  # right-to-left isolate (RLI)
        "FSI": "\u2068",  # first strong isolate (FSI)
        "PDF": "\u202c",  # pop directional formatting (PDF)
        "PDI": "\u2069",  # pop directional isolate (PDI)
    }
)

# Currently all bidi characters are forbidden
FORBIDDEN_BIDI_CHARACTERS = list(vars(bidi).values())


def test_forbidden_bidi_characters():
    # type: () -> None
    assert has_bidi("hello world") is False

    # Proper unicode display of this string should be: d e f a b c
    assert (
        has_bidi(
            "{}{}a b c{} {}d e f{}{}".format(
                bidi.RLI, bidi.LRI, bidi.PDI, bidi.LRI, bidi.PDI, bidi.PDI
            )
        )
        is True
    )

    # The same string as above but with the unicode bidi characters instead of a
    # mapping to them

    # If this shows up as "d e f a b c" in your code review without being blocked
    # or flagged, then it indicates that you may be vulnerable to trojan.codes.
    # ruleid: contains-bidirectional-characters
    assert has_bidi("⁧⁦a b c⁩ ⁦d e f⁩⁩") is True


def has_bidi(testable_string):
    # type: (str) -> bool
    return any(char in testable_string for char in FORBIDDEN_BIDI_CHARACTERS)