veil.entity_detectors

class veil.entity_detectors.EntityDetectorRegistry[source]

Bases: BaseRegistry

Concrete registry for entity detector implementations.

classmethod get_key_from_str(key_str)[source]

Convert a CLI/YAML string into a EntityDetectorType.

Parameters:

key_str (str)

Return type:

EntityDetectorType

class veil.entity_detectors.GlinerEntityDetector(config)[source]

Bases: BaseEntityDetector[GlinerEntityType]

Adapter over Gliner entity detection models with support for chunking long documents.

Parameters:

config (GlinerEntityDetectorConfig)

ENTITY_TYPES: Set[E] = {GlinerEntityType.NAME, GlinerEntityType.COMPANY, GlinerEntityType.ADDRESS}
detect_entities(doc)[source]

Detect entities in the document using GLiNER model with chunking support.

Long documents are automatically split into overlapping chunks to handle the model’s maximum sequence length limitation.

Parameters:

doc (Document) – Document to process

Returns:

List of Span objects with detected entities

Return type:

List[Span]

class veil.entity_detectors.HostedMaskerApiEntityDetector(cfg)[source]

Bases: MaskerApiEntityDetector

Anonymizer backed by a hosted Masker API. Wrapper around MaskerApiEntityDetector with specific entity types.

Parameters:

cfg (HostedMaskerApiEntityDetectorConfig)

ENTITY_TYPES: Set[EntityTypeBase] = {HostedMaskerApiEntityType.NAME, HostedMaskerApiEntityType.ADDRESS, HostedMaskerApiEntityType.COMPANY}
class veil.entity_detectors.RegexEntityDetector(cfg)[source]

Bases: BaseEntityDetector[RegexEntityType]

Pure regular-expression-based anonymizer.

Uses optimized regex patterns to detect sensitive identifiers with optional checksum validation for higher precision.

Parameters:

cfg (RegexEntityDetectorConfig)

ENTITY_TYPES: Set[E] = {RegexEntityType.DNI, RegexEntityType.CIF, RegexEntityType.NIE, RegexEntityType.NSS, RegexEntityType.EMAIL, RegexEntityType.PHONE, RegexEntityType.IBAN, RegexEntityType.IPV4, RegexEntityType.IPV6}
detect_entities(doc)[source]

Detect entities in the text using regex patterns.

Parameters:

doc (Document) – Document to analyze

Returns:

List of detected entities

Return type:

List[Span]

get_pattern_info(entity_type)[source]

Get pattern information for an entity type.

Parameters:

entity_type (str) – Entity type

Returns:

Pattern information

Return type:

dict

test_pattern(entity_type, test_text)[source]

Test whether a text matches a specific pattern.

Parameters:
  • entity_type (str) – Entity type to test

  • test_text (str) – Text to test

Returns:

True if the text matches the pattern

Return type:

bool

class veil.entity_detectors.SpacyEntityDetector(config)[source]

Bases: BaseEntityDetector[SpacyEntityType]

Adapter over spaCy NER models.

Parameters:

config (SpacyEntityDetectorConfig)

ENTITY_TYPES: Set[E] = {SpacyEntityType.NAME, SpacyEntityType.COMPANY, SpacyEntityType.ADDRESS}
detect_entities(doc)[source]

Detect entities in the document.

Parameters:

doc (Document)

Return type:

List[Span]