trailofbits.python.numpy-in-torch-datasets.numpy-in-torch-datasets

Author
unknown
Download Count*
License
Using the NumPy RNG inside of a Torch dataset can lead to a number of issues with loading data, including identical augmentations. Instead, use the random number generators built into Python and PyTorch
Run Locally
Run in CI
Defintion
rules:
- id: numpy-in-torch-datasets
message: "Using the NumPy RNG inside of a Torch dataset can lead to a number of
issues with loading data, including identical augmentations. Instead, use
the random number generators built into Python and PyTorch "
languages:
- python
severity: WARNING
metadata:
category: security
cwe: "CWE-330: Use of Insufficiently Random Values"
subcategory:
- audit
confidence: HIGH
likelihood: MEDIUM
impact: LOW
references:
- https://tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects
license: CC-BY-NC-SA-4.0
patterns:
- pattern: |
class $X(torch.utils.data.Dataset):
...
def __getitem__(...):
...
numpy.random.randint(...)
...
Examples
numpy-in-torch-datasets.py
import numpy as np
from torch.utils.data import Dataset
from tob.strangelib import Dataset as DatasetStrange
# ruleid: numpy-in-torch-datasets
class RandomDataset(Dataset):
def __getitem__(self, index):
return np.random.randint(0, 1000, 3)
def __len__(self):
return 1000
# ruleid: numpy-in-torch-datasets
class AnotherRandomDataset(Dataset):
def __len__(self):
return 1000
def __getitem__(self, index):
print("Hello World")
x = np.random.randint(0, 1000, 3)
return x
# ruleid: numpy-in-torch-datasets
class AnotherRandomDatasetOther(Dataset):
def __len__(self):
return 1000
def __getitem__(self, index):
print("Hello World")
x = numpy.random.randint(0, 1000, 3)
return x
# ok: numpy-in-torch-datasets
class NotTorchDataset(DatasetStrange):
def __len__(self):
return 1000
def __getitem__(self, index):
print("Hello World")
x = numpy.random.randint(0, 1000, 3)
return x
# ok: numpy-in-torch-datasets
class YetAnotherRandomDataset(Dataset):
def __len__(self):
return 1000
def __getitem__(self, index):
return index
Short Link: https://sg.run/yPpP