whatever2sbom is built around a small pipeline and a registry of pluggable pieces. Understanding this is the first step before extending it.
The pipeline¶
Every run builds and executes a SbomPipeline
(pipeline.py):
Collector → Enricher (0..n) → Formatter → Validator (1..n)
- Collector — gathers raw package data from one ecosystem (e.g. the local
dpkgdatabase) and returns a list ofPackageRecordobjects. - Enrichers — each takes the list of
PackageRecords and returns an updated list, adding information the collector didn't have (hashes, licenses, …). Enrichers run in order and are independent of each other. - Formatter — converts the enriched
PackageRecords into the final output document (a dict that gets serialized to JSON), e.g. a CycloneDX 1.6 BOM. - Validators — check the formatted document. Schema validation always runs and is fatal
(
ValidationErroraborts the run before anything is written).
class SbomPipeline:
def run(self) -> dict:
packages = self.collector.collect()
for enricher in self.enrichers:
packages = enricher.enrich(packages)
bom = self.formatter.format(packages)
for validator in self.validators:
errors = validator.validate(bom)
if errors:
raise ValidationError(errors)
return bom
The registry¶
registry.py
is the single source of truth for what's available:
- Systems (
register_system) — one per ecosystem, keyed by name ("dpkg","pip"). Selected via--system. - Formatters (
register_formatter) — keyed by(schema, spec_version), e.g.("cyclonedx", "1.6"). Selected via--schema/--spec-version. - Validators (
register_validator) — keyed the same way as formatters, and run alongside the matching formatter.
cli.py builds its argparse choices (--system, --schema, available --spec-versions,
default file extension, …) entirely from what's registered — nothing is hardcoded. Registering a
new plugin automatically makes it selectable on the command line.
A SystemPlugin ties it together¶
A SystemPlugin
is the unit of "an ecosystem you can scan". It has four jobs:
- declare any CLI options it needs (
add_arguments) - build its
Collectorfrom the parsed args (make_collector) - build its ordered list of
Enrichers from the parsed args (make_enrichers) - declare the default CycloneDX component type for
metadata.component(default_product_type)
The built-in DpkgSystem registers --distro, --no-apt-cache, and --no-licenses, builds a
DpkgCollector, conditionally adds AptCacheEnricher and CopyrightEnricher, and inherits the
base default_product_type of "operating-system".
PipSystem registers --venv-dir and --project-dir, builds a PipCollector (which resolves
licenses and the dependency graph itself, so it has no enrichers), and defaults
default_product_type to "application".
The PackageRecord model¶
PackageRecord
is the uniform representation of "one package" that flows between collectors, enrichers, and
formatters. It's a single dataclass covering identity, provenance, hashes, licenses, dependency
graph references, and CycloneDX-specific classification (component_type, scope,
bsi_executable/bsi_archive/bsi_structured).
Two fields are worth calling out because they're computed by the collector, not the formatter:
purl— the matchable coordinate a vulnerability scanner keys on (fordpkg: the source package +arch=source; forpip:pkg:pypi/<name>@<version>with the name PEP 503 normalized).bom_ref— a unique dependency-graph node id (fordpkg: the per-binary coordinate includingarch; forpip: the same PURL, since each installed distribution is unique).
Formatters emit purl/bom_ref verbatim and never construct PURLs themselves — this keeps
ecosystem-specific PURL rules (which differ a lot between deb, npm, pip, …) out of the formatter,
so the same CycloneDXFormatter works for every system.
extra_properties is the escape hatch for ecosystem-specific metadata that doesn't fit anywhere
else (e.g. ("dpkg:section", "libs")) — formatters emit these as CycloneDX properties verbatim.
Where things live¶
| Concept | Base class | Built-in implementation |
|---|---|---|
| System plugin | systems/base.py::SystemPlugin |
systems/dpkg.py::DpkgSystem, systems/pip.py::PipSystem |
| Collector | collectors/base.py::Collector |
collectors/dpkg.py::DpkgCollector, collectors/pip.py::PipCollector |
| Enricher | enrichers/base.py::Enricher |
enrichers/apt_cache.py, enrichers/copyright.py |
| Formatter | formatters/base.py::Formatter |
formatters/cyclonedx16.py::CycloneDXFormatter |
| Validator | validators/base.py::Validator |
validators/jsonschema_validator.py::CycloneDXSchemaValidator, validators/bsi_tr03183.py::BsiTr03183Validator |
Continue to Extending whatever2sbom for step-by-step guides and code examples for each of these.