Skip to main content

Core Concepts: Datasets & Collections

Log events in the Bronto Logging Platform are organized and stored in datasets and collections.

Datasets

A Dataset (named log in the current API) is the fundamental unit of storage. It acts as a container for log events that share a similar format and origin, such as:
  • application-logs
  • nginx-access
Key characteristics:
  • All events within the same dataset are always queried together.
  • The dataset must be specified at ingestion time by configuring the agent or integration.
  • Dataset names must be unique within a collection.
To learn how to configure datasets during ingestion, see the agent setup documentation.
Note
Bronto supports up to 5,000 unique datasets per organization.

Collections

A Collection (named logset in the current API) is a logical grouping of datasets. Think of a collection as a folder that helps you organize related datasets and distinguish between datasets that may share the same name but come from different environments. Key points:
  • Collections improve discoverability and organization.
  • The collection can be specified using the x-bronto-collection HTTP header (or equivalent).
  • Collection configuration is described in the agent setup documentation.

Tags

Tags are key-value pairs assigned to datasets to categorize them across your system, for example:
  • team:backend
  • env:production
  • type:gc
  • service:catalog
Tags make it easier to:
  • Select datasets in the UI
  • Query multiple datasets together
  • Build precise dashboards and monitors at scale
Proper tagging is essential when managing thousands of datasets.

Assigning Tags

Tags can be assigned in several ways:
  • Via UI: from the Logs screen
  • Via API: programmatically using the tags endpoint
  • At ingestion: by specifying the x-bronto-tags HTTP header (or equivalent)
See the agent setup documentation for ingestion examples.
Note
Each dataset can have up to 10 tags.
Tags can be used simply as labels to add metadata and context, or as part of a more advanced data organization strategy using partition tags.

Partition Tags

Partition Tags are a special subset of tags defined at the organization level. They allow Bronto to automatically organize your data in a consistent and scalable way. Partition tags are especially useful for:
  • Large environments
  • Structured infrastructures
  • Reducing agent configuration complexity
  • Enforcing consistent collection and dataset placement

How Partition Tags Work

  • Bronto uses partition tags to determine how incoming data is grouped into collections and datasets.
  • If a collection header is explicitly set, it takes precedence and overrides partition tag logic.
  • To fully leverage partition tags, omit the collection header during ingestion.

Choosing Your Partition Tags

Partition tags define your data hierarchy, so they should be chosen carefully—ideally during the initial account configuration.

Characteristics of Good Partition Tags

Partition tags should:
  • Be low-cardinality
  • Represent attributes that are intrinsically tied to the data source
  • Remain stable over time
Good examples:
  • environment
  • region
  • datacenter
  • account
  • cluster
  • service
  • product
Bad examples:
  • team
    Teams and responsibilities change frequently. Team ownership is valuable metadata, but not suitable for defining data hierarchy.
  • High-cardinality or volatile attributes such as:
    • hostname
    • instance_id
    • pod_id
    • ip_address
These attributes change often and can lead to excessive dataset creation. They also do not make sense as dataset tags and should instead be included at the log event level.
Warning
While partition tags can be modified at any time, changing them on live data is a sensitive operation. Updates may cause new datasets to be created for existing streams, potentially breaking dashboards, monitors, and saved queries.

Example: Designing Partition Tags

Context

Your infrastructure includes:
  • Environments: staging, production
  • Regions: eu, us
  • Services: catalog, checkout, payment, cart
  • Teams: team-a, team-b
Each service produces multiple log streams, such as:
  • Syslog
  • Application logs
  • Garbage collection logs
  • environment
  • region
  • service
Why this works:
  • These descriptors are stable and directly tied to where and what generates logs.
  • They clearly identify the source of the data.
  • They scale well as your infrastructure grows.
In this setup:
  • The dataset name should represent the log stream (e.g., application, gc, syslog).
  • Collections and datasets are automatically and consistently organized.

Why Not Use team as a Partition Tag?

Team ownership changes over time. If a service moves from team-a to team-b:
  • The partition tag combination changes
  • New datasets are created for the same service
  • Old datasets stop receiving data
  • Existing dashboards, monitors, and queries may break or return incomplete results
Best practice:
Use team as a regular tag, not a partition tag. This allows you to update ownership metadata without disrupting your data organization or historical visibility.