veil.entity_detectors.regex

class veil.entity_detectors.regex.RegexEntityDetector(cfg)[source]

Bases: BaseEntityDetector[RegexEntityType]

Pure regular-expression-based anonymizer.

Uses optimized regex patterns to detect sensitive identifiers with optional checksum validation for higher precision.

Parameters:

cfg (RegexEntityDetectorConfig)

ENTITY_TYPES: Set[E] = {RegexEntityType.DNI, RegexEntityType.CIF, RegexEntityType.NIE, RegexEntityType.NSS, RegexEntityType.EMAIL, RegexEntityType.PHONE, RegexEntityType.IBAN, RegexEntityType.IPV4, RegexEntityType.IPV6}
detect_entities(doc)[source]

Detect entities in the text using regex patterns.

Parameters:

doc (Document) – Document to analyze

Returns:

List of detected entities

Return type:

List[Span]

get_pattern_info(entity_type)[source]

Get pattern information for an entity type.

Parameters:

entity_type (str) – Entity type

Returns:

Pattern information

Return type:

dict

test_pattern(entity_type, test_text)[source]

Test whether a text matches a specific pattern.

Parameters:
  • entity_type (str) – Entity type to test

  • test_text (str) – Text to test

Returns:

True if the text matches the pattern

Return type:

bool