veil.entity_detectors¶
- class veil.entity_detectors.EntityDetectorRegistry[source]¶
Bases:
BaseRegistryConcrete registry for entity detector implementations.
- class veil.entity_detectors.GlinerEntityDetector(config)[source]¶
Bases:
BaseEntityDetector[GlinerEntityType]Adapter over Gliner entity detection models with support for chunking long documents.
- Parameters:
config (GlinerEntityDetectorConfig)
- ENTITY_TYPES: Set[E] = {GlinerEntityType.NAME, GlinerEntityType.COMPANY, GlinerEntityType.ADDRESS}¶
- detect_entities(doc)[source]¶
Detect entities in the document using GLiNER model with chunking support.
Long documents are automatically split into overlapping chunks to handle the model’s maximum sequence length limitation.
- Parameters:
doc (Document) – Document to process
- Returns:
List of Span objects with detected entities
- Return type:
List[Span]
- class veil.entity_detectors.HostedMaskerApiEntityDetector(cfg)[source]¶
Bases:
MaskerApiEntityDetectorAnonymizer backed by a hosted Masker API. Wrapper around MaskerApiEntityDetector with specific entity types.
- Parameters:
- ENTITY_TYPES: Set[EntityTypeBase] = {HostedMaskerApiEntityType.NAME, HostedMaskerApiEntityType.ADDRESS, HostedMaskerApiEntityType.COMPANY}¶
- chunk_char_limit: int¶
- chunk_on_truncation: bool¶
- frequency_penalty: float¶
- headers: Dict[str, str]¶
- max_tokens: int¶
- model: str | None¶
- presence_penalty: float¶
- retries: int¶
- retry_backoff_base: float¶
- retry_on_truncation: bool¶
- system_prompt: str | None¶
- temperature: float¶
- timeout: float¶
- top_k: int¶
- top_p: float¶
- truncation_min_fraction: float¶
- class veil.entity_detectors.RegexEntityDetector(cfg)[source]¶
Bases:
BaseEntityDetector[RegexEntityType]Pure regular-expression-based anonymizer.
Uses optimized regex patterns to detect sensitive identifiers with optional checksum validation for higher precision.
- Parameters:
- ENTITY_TYPES: Set[E] = {RegexEntityType.DNI, RegexEntityType.CIF, RegexEntityType.NIE, RegexEntityType.NSS, RegexEntityType.EMAIL, RegexEntityType.PHONE, RegexEntityType.IBAN, RegexEntityType.IPV4, RegexEntityType.IPV6}¶
- detect_entities(doc)[source]¶
Detect entities in the text using regex patterns.
- Parameters:
doc (Document) – Document to analyze
- Returns:
List of detected entities
- Return type:
List[Span]
- class veil.entity_detectors.SpacyEntityDetector(config)[source]¶
Bases:
BaseEntityDetector[SpacyEntityType]Adapter over spaCy NER models.
- Parameters:
config (SpacyEntityDetectorConfig)
- ENTITY_TYPES: Set[E] = {SpacyEntityType.NAME, SpacyEntityType.COMPANY, SpacyEntityType.ADDRESS}¶