This guide walks through the three ways whatever2sbom is meant to be extended: a new system (scan a different ecosystem), a new output schema (emit something other than CycloneDX), and a new spec version of an existing schema. If you haven't already, read Architecture first — it explains the pipeline and the registry these examples plug into.
All built-in implementations live under src/whatever2sbom/ and are good references: dpkg.py
for a system, cyclonedx16.py for a formatter, jsonschema_validator.py and bsi_tr03183.py for
validators.
Adding a new system¶
A system is "an ecosystem you can pass to --system" — for example pip, npm, or a container
image scanner. It's made of three pieces: a Collector, zero or more Enrichers, and a
SystemPlugin that wires them together and declares any extra CLI flags.
1. Write the collector¶
The collector's only job is to produce a list of PackageRecord
objects. Populate as many fields as your ecosystem naturally provides — leave the rest at their
defaults (None / empty list).
# src/whatever2sbom/collectors/pip.py
import json
import subprocess
from whatever2sbom.collectors.base import Collector
from whatever2sbom.models import PackageRecord
from whatever2sbom import purl as _purl
class PipCollector(Collector):
name = "pip"
def collect(self) -> list[PackageRecord]:
raw = subprocess.run(
["pip", "list", "--format=json"],
capture_output=True, text=True, check=True,
).stdout
packages = []
for entry in json.loads(raw):
pkg_purl = f"pkg:pypi/{entry['name'].lower()}@{entry['version']}"
packages.append(PackageRecord(
name=entry["name"],
version=entry["version"],
purl=pkg_purl,
bom_ref=pkg_purl,
component_type="library",
))
return packages
A few conventions worth following:
purlandbom_refare the collector's responsibility (see Architecture) — formatters emit them verbatim. If your ecosystem has no separate "source package" concept,purlandbom_refcan be the same.component_typeandscopedrive the CycloneDXtype/scopefields. Map them from whatever classification your ecosystem provides; default to"library"/"required"if there isn't one.- Anything that doesn't map to an existing
PackageRecordfield goes inextra_propertiesas(name, value)tuples, e.g.[("pip:editable", "true")].
2. Write enrichers (optional)¶
Enrichers add data the collector doesn't have, without changing the collector. They're useful for expensive or optional steps (network calls, extra subprocess invocations, file reads) that a user might want to skip with a flag.
# src/whatever2sbom/enrichers/pip_licenses.py
from whatever2sbom.enrichers.base import Enricher
from whatever2sbom.models import PackageRecord
class PipLicenseEnricher(Enricher):
name = "pip-licenses"
def enrich(self, packages: list[PackageRecord]) -> list[PackageRecord]:
for pkg in packages:
license_id = _read_license_metadata(pkg.name, pkg.version)
if license_id:
pkg.licenses.append(license_id)
return packages
def _read_license_metadata(name: str, version: str) -> str | None:
...
Enrichers run in the order returned by make_enrichers and mutate/replace the list in place — an
enricher that depends on data from an earlier one (e.g. "only fetch hashes for packages that don't
have one yet") just needs to run after it in that list.
3. Write the system plugin¶
The plugin ties the collector and enrichers together and declares any CLI options under its own
argument group. CLI flags are received as argparse.Namespace; use getattr(args, "...", default)
for anything optional so the plugin still works if its add_arguments wasn't called (e.g. in
tests).
# src/whatever2sbom/systems/pip.py
import argparse
from whatever2sbom.collectors.pip import PipCollector
from whatever2sbom.enrichers.pip_licenses import PipLicenseEnricher
from whatever2sbom.systems.base import SystemPlugin
class PipSystem(SystemPlugin):
name = "pip"
description = "Installed Python packages (pip list)"
def add_arguments(self, parser: argparse.ArgumentParser) -> None:
grp = parser.add_argument_group("pip system options")
grp.add_argument(
"--no-licenses",
action="store_true",
help="Skip license metadata lookup",
)
def make_collector(self, args: argparse.Namespace) -> PipCollector:
return PipCollector()
def make_enrichers(self, args: argparse.Namespace) -> list:
if getattr(args, "no_licenses", False):
return []
return [PipLicenseEnricher()]
4. Register it¶
Add one line to the bottom of registry.py:
from whatever2sbom.systems.pip import PipSystem
register_system(PipSystem())
--system pip is now available, --help lists it automatically (system_names() is read from
the registry), and the "pip system options" argument group appears in --help whenever it's
registered.
5. Test it¶
Add a test under tests/ following the existing dpkg tests as a template — typically:
- a unit test for the collector against a fixed/mocked subprocess output
- a unit test for each enricher
- an end-to-end test running the pipeline with
--system pipagainst fixture data and asserting the output validates against the schema
Adding a new output schema¶
A schema is "what --schema/--spec-version selects" — a Formatter that turns
list[PackageRecord] into a document, plus a Validator that checks that document.
1. Write the formatter¶
# src/whatever2sbom/formatters/spdx23.py
from whatever2sbom.formatters.base import Formatter
from whatever2sbom.models import PackageRecord
class SpdxFormatter(Formatter):
schema_name = "spdx"
spec_version = "2.3"
output_extension = "spdx.json"
name = "spdx-2.3"
def __init__(self, product_name: str | None = None, **_ignored) -> None:
self.product_name = product_name
def format(self, packages: list[PackageRecord]) -> dict:
return {
"spdxVersion": "SPDX-2.3",
"name": self.product_name or "system",
"packages": [_to_spdx_package(pkg) for pkg in packages],
# ...
}
def _to_spdx_package(pkg: PackageRecord) -> dict:
return {
"name": pkg.name,
"versionInfo": pkg.version,
"SPDXID": f"SPDXRef-{pkg.name}-{pkg.version}",
# ...
}
schema_name, spec_version, and output_extension are class attributes the registry and CLI
read directly — they're the single source of truth for --schema spdx --spec-version 2.3 and the
default output filename (sbom_<timestamp>.spdx.json).
get_formatter() inspects your __init__ signature and only forwards the kwargs it actually
declares, so you can accept just the product-metadata fields your format needs (here, only
product_name) and ignore the rest via **_ignored.
2. Write the validator¶
# src/whatever2sbom/validators/spdx_validator.py
from whatever2sbom.validators.base import Validator
class SpdxValidator(Validator):
schema_name = "spdx"
spec_version = "2.3"
name = "spdx-2.3-jsonschema"
def validate(self, bom: dict) -> list[str]:
errors = []
if "spdxVersion" not in bom:
errors.append("missing required field: spdxVersion")
return errors
Return a list of human-readable error strings; an empty list means the document is valid. A
non-empty list raises ValidationError in the pipeline and aborts the run before anything is
written — schema validation is always fatal.
If you're validating against a JSON Schema, follow jsonschema_validator.py's pattern: bundle the
schema file under src/whatever2sbom/schema/, list it in
pyproject.toml's [tool.setuptools.package-data], and load it relative to __file__ so it works
from an installed wheel with no network access.
3. Register both¶
from whatever2sbom.formatters.spdx23 import SpdxFormatter
from whatever2sbom.validators.spdx_validator import SpdxValidator
register_formatter(SpdxFormatter.schema_name, SpdxFormatter.spec_version, SpdxFormatter)
register_validator(SpdxValidator.schema_name, SpdxValidator.spec_version, SpdxValidator)
--schema spdx --spec-version 2.3 is now selectable, and --schema spdx alone works once 2.3
is the only (or first) registered version for spdx.
Adding a new spec version of an existing schema¶
For a new CycloneDX release (e.g. 1.7) that mostly extends 1.6:
- Drop the new
bom-1.7.schema.json(and an updatedspdx.schema.json, if it changed) intosrc/whatever2sbom/schema/cdx/. - Subclass the formatter and validator with
spec_version = "1.7", overriding only what actually changed:
class CycloneDXFormatter17(CycloneDXFormatter):
spec_version = "1.7"
def format(self, packages: list[PackageRecord]) -> dict:
bom = super().format(packages)
bom["specVersion"] = "1.7"
# ... add/adjust whatever changed in 1.7
return bom
CycloneDXSchemaValidator already derives its schema path from spec_version
(schema/cdx/bom-<spec_version>.schema.json), so a bare subclass with just the new
spec_version is often enough on the validator side.
- Register both:
register_formatter("cyclonedx", "1.7", CycloneDXFormatter17)
register_validator("cyclonedx", "1.7", CycloneDXSchemaValidator17)
--spec-version 1.7 now appears automatically in --help (spec_versions_for("cyclonedx") reads
the registry), alongside the existing 1.6.
Checklist¶
Whatever you're adding, the registry is the only place new pieces need to be wired in —
everything else (--help choices, defaults, output extensions) is derived from it:
- [ ] Implement the base class (
Collector,Enricher,SystemPlugin,Formatter, orValidator) in the matching package. - [ ] Register it at the bottom of
registry.py. - [ ] Bundle any reference data (schemas, license lists, …) under
src/whatever2sbom/schema/and list it inpyproject.toml'spackage-data— runtime stays offline, no exceptions. - [ ] Add tests under
tests/. - [ ] Document new CLI options in CLI reference and new systems/schemas in Systems and schemas.