Implementing pipeline components¶
This chapter describes how to extend Veil by implementing new pipeline components, with a special focus on entity detectors. You will see what input each stage receives, what output it produces, and how it fits into the overall flow.
See also:
Architecture: Basic Architecture
Declaring pipelines: Declaring pipelines
API Reference: veil.pipeline, veil.entity_detectors, veil.entity_resolvers, veil.masker, veil.config
Flow and Interfaces by Component¶
At a high level, the pipeline (see Pipeline.process in veil.pipeline) executes the following steps, given an input Document:
Entity detectors → Lists of
Spanper detector.Entity resolvers (optional) → Adjust/merge
Spans and can assign consistentids.Overlap resolution → Final selection of
Spanaccording to priorities/hierarchy.Masking →
MaskResultwithmasked_textandentities.Evaluation (optional) → Metrics with
ground_truthif available.
1) Entity Detectors (main extension)¶
Base class to inherit from:
veil.core.base_entity_detector.BaseEntityDetectorin veil.entity_detectors.Associated config: subclasses of
veil.config.entity_detectors.BaseEntityDetectorConfigin veil.config.Registry:
veil.entity_detectors.registry.EntityDetectorRegistryin veil.entity_detectors mapsEntityDetectorType→ concrete class.
Implementation contract:
Define
ENTITY_TYPES: Set[EntityTypeBase]or a specificEntityTypeBase(enum) with the types supported by the detector. Examples of types:RegexEntityType,GlinerEntityType(see veil.entity_detectors).Implement
detect_entities(self, doc: Document) -> List[Span]returning spans withstart,end,entity_type, and, if applicable,confidence.Use the
configpassed in the constructor (a subclass ofBaseEntityDetectorConfig) for its own parameters. Common available fields:priority(by canonical type) andhierarchy_position(global precedence). See veil.config.
Inputs and outputs:
Input:
Document(doc.text,doc.doc_id).Output:
List[Span]detected by the detector.
Automatic integration into the pipeline:
The pipeline instantiates detectors from the
entity_detectorslist inPipelineConfigusing the registry. The order in the list defines the execution order. SeePipeline.__init__andPipeline.processin veil.pipeline.
Quick steps to add a new detector
Define the entity types (subclass of
EntityTypeBase) and, if applicable, theiraliases().Implement the detector class inheriting from
BaseEntityDetector, defineENTITY_TYPESanddetect_entities().Create the
Configclass inheriting fromBaseEntityDetectorConfig, add your parameters andget_type().Register the class in
EntityDetectorRegistryunder anEntityDetectorType.Add the block to your
PipelineConfig.entity_detectorsYAML withtype:and your parameters.Adjust
priorityby type andhierarchy_positionfor the desired overlap behavior.
Important note on polymorphic types (type):
In addition to registering the class, import your new configuration class in
veil/config/__init__.pyto expose it to the configuration system, and if you create a new detector module/package, expose it also inveil/entity_detectors/__init__.py. This allows selecting it bytypefrom YAML/CLI.
Best practices for detectors:
Fill in
confidencewhen a natural score exists; it helps to break ties in overlaps.Maintain
ENTITY_TYPESandEntityTypeBase.aliases()if you need to normalize alternative names (the pipeline usesEntityTypeBase.global_alias_map()for canonical names).Use
priorityby type andhierarchy_positionto guide selection in conflicts (see overlap section).
Complete example (skeleton)
# 1) Types
from veil.core.base_entity_type import EntityTypeBase
class MyEntityType(EntityTypeBase):
NAME = 1
# 2) Detector
from typing import List, Set
from veil.core.base_entity_detector import BaseEntityDetector
from veil.core.document import Document
from veil.core.span import Span
class MyDetector(BaseEntityDetector[MyEntityType]):
ENTITY_TYPES: Set[MyEntityType] = {MyEntityType.NAME}
def __init__(self, config):
super().__init__(config)
def detect_entities(self, doc: Document) -> List[Span]:
return []
# 3) Config
from dataclasses import field
from veil.config.core.frozen_dataclass import frozen_dataclass
from veil.config.entity_detectors import BaseEntityDetectorConfig
from veil.core.enums.entity_detector_type import EntityDetectorType
@frozen_dataclass
class MyDetectorConfig(BaseEntityDetectorConfig):
my_threshold: float = field(default=0.5)
@classmethod
def get_type(cls):
return EntityDetectorType.GLINER # or a new type if added
# 4) Registry
from veil.entity_detectors.registry import EntityDetectorRegistry
EntityDetectorRegistry.register(EntityDetectorType.GLINER, MyDetector)
# 5) (polymorphic) Expose Config and detector in __init__.py
# - Add `from .entity_detectors import MyDetectorConfig` in `veil/config/__init__.py`
# - Add `from .my_detector import MyDetector` in `veil/entity_detectors/__init__.py` (if applicable)
And in YAML:
entity_detectors:
- type: gliner # or your new type
my_threshold: 0.6
priority:
NAME: 0
hierarchy_position: 0
2) Entity Resolvers (optional)¶
Practical interface: any class with
resolve(self, doc: Document, spans: List[Span], entity_cache: Optional[Dict[str, Dict[int, Set[str]]]] = None) -> List[Span]works (seeEmbeddingsEntityResolverin veil.entity_resolvers).Config: inherit from
BaseEntityResolverConfigand returnget_type().Input: the document and the combined list of detected spans so far (accumulated from all detectors); optionally an
entity_cacheby type.Output: the same list of spans, potentially with assigned
ids or deduplications/merges applied.
Simple example: assign sequential IDs by type
from typing import Dict, List, Optional, Set
from veil.core.document import Document
from veil.core.span import Span
class SimpleIdResolver:
def __init__(self, config):
self.config = config
def resolve(self, doc: Document, spans: List[Span], entity_cache: Optional[Dict[str, Dict[int, Set[str]]]] = None) -> List[Span]:
next_id_per_type: Dict[str, int] = {}
out: List[Span] = []
for s in spans:
et = getattr(s.entity_type, "name", "")
if getattr(s, "id", None) is None:
nid = next_id_per_type.get(et, 1)
next_id_per_type[et] = nid + 1
out.append(Span(start=s.start, end=s.end, entity_type=s.entity_type, id=nid, replacement=s.replacement, confidence=s.confidence))
else:
out.append(s)
return out
3) Overlap Resolution (OverlapResolver)¶
Class:
veil.overlap_resolver.OverlapResolverin veil.pipeline and configuration in veil.config.Inputs: per detector, its list of spans and a
prioritymap by canonical type; also a globalhierarchy_positionper detector.iou_thresholdparameter inOverlapResolverConfig.Output:
List[Span]selected, resolving conflicts between spans of the same type.
It is not usually necessary to implement a new one; adjust priority and hierarchy_position from the detectors’ configuration.
4) Masker¶
Class:
veil.masker.Maskerin veil.masker. It receives aDocumentand spans and returns aMaskResult.Mask types: controlled by
MaskerConfig.method(see veil.config).
You will generally not need to extend it to add detectors; its interface is stable.
5) Evaluation (Evaluator)¶
Orchestration class:
veil.evaluator.Evaluatorin veil.pipeline; configuration in veil.config.Main input:
document,mask_result(final spans),component_spans(spans per detector), a map of supported types per component, andmetric_store.Output: a dictionary with metric variants (exact, iou@THRESH, etc.) which is attached to
MaskResult.evaluation.
For most extensions (new detectors), you do not need to touch the evaluation.