generic.unicode.security.bidi.contains-bidirectional-characters
semgrep
Author
unknown
Download Count*
License
This code contains bidirectional (bidi) characters. While this is useful for support of right-to-left languages such as Arabic or Hebrew, it can also be used to trick language parsers into executing code in a manner that is different from how it is displayed in code editing and review tools. If this is not what you were expecting, please review this code in an editor that can reveal hidden Unicode characters.
Run Locally
Run in CI
Defintion
rules:
- id: contains-bidirectional-characters
patterns:
- pattern-either:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
- pattern-regex:
message: This code contains bidirectional (bidi) characters. While this is
useful for support of right-to-left languages such as Arabic or Hebrew, it
can also be used to trick language parsers into executing code in a manner
that is different from how it is displayed in code editing and review
tools. If this is not what you were expecting, please review this code in
an editor that can reveal hidden Unicode characters.
metadata:
cwe:
- "CWE-94: Improper Control of Generation of Code ('Code Injection')"
category: security
technology:
- unicode
references:
- https://trojansource.codes/
confidence: LOW
owasp:
- A03:2021 - Injection
cwe2022-top25: true
subcategory:
- audit
likelihood: LOW
impact: HIGH
license: Commons Clause License Condition v1.0[LGPL-2.1-only]
vulnerability_class:
- Code Injection
languages:
- bash
- c
- csharp
- go
- java
- javascript
- json
- kotlin
- lua
- ocaml
- php
- python
- ruby
- rust
- scala
- sh
- typescript
- yaml
severity: WARNING
Examples
bidi.py
# -*- coding: utf-8 -*-
from types import SimpleNamespace
# A useful lookup table for the bidirectional (bidi) characters
bidi = SimpleNamespace(
**{
"LRE": "\u202a", # left-to-right embedding (LRE)
"RLE": "\u202b", # right-to-left embedding (RLE)
"LRO": "\u202d", # left-to-right override (LRO)
"RLO": "\u202e", # right-to-left override (RLO)
"LRI": "\u2066", # left-to-right isolate (LRI)
"RLI": "\u2067", # right-to-left isolate (RLI)
"FSI": "\u2068", # first strong isolate (FSI)
"PDF": "\u202c", # pop directional formatting (PDF)
"PDI": "\u2069", # pop directional isolate (PDI)
}
)
# Currently all bidi characters are forbidden
FORBIDDEN_BIDI_CHARACTERS = list(vars(bidi).values())
def test_forbidden_bidi_characters():
# type: () -> None
assert has_bidi("hello world") is False
# Proper unicode display of this string should be: d e f a b c
assert (
has_bidi(
"{}{}a b c{} {}d e f{}{}".format(
bidi.RLI, bidi.LRI, bidi.PDI, bidi.LRI, bidi.PDI, bidi.PDI
)
)
is True
)
# The same string as above but with the unicode bidi characters instead of a
# mapping to them
# If this shows up as "d e f a b c" in your code review without being blocked
# or flagged, then it indicates that you may be vulnerable to trojan.codes.
# ruleid: contains-bidirectional-characters
assert has_bidi("a b c d e f") is True
def has_bidi(testable_string):
# type: (str) -> bool
return any(char in testable_string for char in FORBIDDEN_BIDI_CHARACTERS)
Short Link: https://sg.run/nK4r