Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSK-2280] Feature: Added Number-to-Words Transformation #1615

Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
47d1a73
Added Numbers to Words Transformation
sagar118 Nov 17, 2023
bae694a
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
mattbit Nov 17, 2023
1cd39ee
Merge remote-tracking branch 'origin/main' into GSK-1567-add-number-t…
kevinmessiaen Dec 14, 2023
64a1f66
Updated TextNumberToWordTransformation to use row by row language ins…
kevinmessiaen Dec 14, 2023
0cb8fec
Update tests/scan/test_text_transformations.py
kevinmessiaen Dec 14, 2023
2208b5e
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
mattbit Dec 14, 2023
f153538
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Dec 18, 2023
c480246
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Dec 19, 2023
7884690
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
mattbit Dec 19, 2023
9d9976f
Fixing lockfile and dependencies issues
Hartorn Dec 19, 2023
4691156
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Dec 21, 2023
bfb6c98
Fixed tests
kevinmessiaen Dec 21, 2023
f9dc4ca
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Dec 21, 2023
c2db118
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Dec 22, 2023
7131491
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
mattbit Dec 22, 2023
f3c71ef
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Jan 1, 2024
866d8be
Wrong modifications
kevinmessiaen Jan 1, 2024
1379407
Removed useless constructor
kevinmessiaen Jan 1, 2024
2c76855
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
andreybavt Jan 2, 2024
19e7996
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
andreybavt Jan 4, 2024
607122d
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Jan 26, 2024
345c81f
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Jan 29, 2024
6af109c
Updated PDM lockfile
kevinmessiaen Jan 29, 2024
27a100b
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Jan 30, 2024
300b297
Merge branch 'main' into GSK-1567-add-number-to-word-transformation
kevinmessiaen Jan 31, 2024
1a52ffb
Limited scipy version to prevent nan issue with test data drift
kevinmessiaen Jan 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions giskard/scanner/robustness/text_perturbation_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ def _get_default_transformations(self, model: BaseModel, dataset: Dataset) -> Se
TextTitleCase,
TextTypoTransformation,
TextUppercase,
TextNumberToWordTransformation,
)

return [
Expand All @@ -38,4 +39,5 @@ def _get_default_transformations(self, model: BaseModel, dataset: Dataset) -> Se
TextTitleCase,
TextTypoTransformation,
TextPunctuationRemovalTransformation,
TextNumberToWordTransformation,
]
16 changes: 16 additions & 0 deletions giskard/scanner/robustness/text_transformations.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import numpy as np
import pandas as pd
from num2words import num2words

from ...core.core import DatasetProcessFunctionMeta
from ...datasets import Dataset
Expand Down Expand Up @@ -209,6 +210,21 @@ def _switch(self, word, language):
return None


class TextNumberToWordTransformation(TextLanguageBasedTransformation):
name = "Transform numbers to words"

def __init__(self, column):
super().__init__(column)

def _load_dictionaries(self):
# Regex to match numbers in text
self._regex = re.compile(r"(?<!\d/)(?<!\d\.)\b\d+(?:\.\d+)?\b(?!(?:\.\d+)?@|\d?/?\d)")

def make_perturbation(self, row):
# Replace numbers with words
return self._regex.sub(lambda x: num2words(x.group(), lang=row["language__gsk__meta"]), row[self.column])


class TextReligionTransformation(TextLanguageBasedTransformation):
name = "Switch Religion"

Expand Down
Loading