An Enterprise Data Governance Glossary Operators Can Use

Outcome focus: Created a shared vocabulary and term-entry contract that helps governance, data engineering, analytics, security, and business teams align definitions before certifying data products.

The first enterprise glossary failed because it defined words and left decisions untouched.

The definitions were reasonable. Business Intelligence meant turning raw data into information for decisions. Data Steward meant the business role responsible for metadata and data quality. Metadata meant data about data. Data retention meant how long data was kept. Nobody argued much with the words.

Then the customer dashboard shipped.

Sales used one definition of active customer. Finance used another. Support wanted canceled customers included while service obligations were still open. Security wanted a sensitive field masked. Analytics wanted drill-through access to transaction-level detail. Legal asked whether the retention schedule allowed the historical table to exist in the first place.

The glossary had entries. It did not have operating force.

That is the standard I use now: a governance glossary is useful only if it changes how a data product is reviewed, approved, accessed, monitored, and retired. A definition should name the thing, but it should also tell the team who owns it, where it appears, how it is measured, and what decision it affects.

This post is a practical glossary for enterprise data governance. It is not a legal taxonomy, and it is not a replacement for frameworks like EDM Council DCAM, NIST CSF 2.0, NIST Privacy Framework, or NIST SP 800-53. It is the operating language I want in the room when business, analytics, platform, security, privacy, and governance teams have to make the same data product mean the same thing.

How to Use the Glossary#

A glossary entry should answer more than "what does this term mean?"

It should answer:

Who owns the term?
Which data products use it?
Which source is authoritative?
Which quality rules prove it is fit for use?
Which privacy or security controls apply?
Which downstream decisions break if the meaning changes?

A glossary term becomes useful when it connects definition, rule, product, control, decision, and evidence.

The tradeoff is precision over speed. It is faster to let every team define terms locally. It is also how enterprises end up with five revenue numbers, three customer counts, and a privacy review that discovers sensitive data after the dashboard is already trusted.

The better path is slower at the definition boundary and faster everywhere downstream.

Business and Analytics Terms#

These terms describe how data becomes decision support.

Term	Working definition	Review question
Business Intelligence (BI)	Tools, processes, and practices that turn raw or modeled data into information for decision-making.	Which decision or operating rhythm does this BI asset support?
Dashboard	A monitoring surface that shows current or frequently refreshed status.	Is the dashboard for live monitoring, or is it being used as a static report?
Scorecard	A performance snapshot against targets, goals, or thresholds.	Are targets approved, current, and owned?
Report	A structured analytical output, often with more detail, context, and interpretation than a dashboard.	Is the report exploratory, official, regulatory, or operational?
Drill down	Navigation from a summary level into lower levels within the same hierarchy, such as year to quarter to month.	Does the hierarchy match the governed dimensional model?
Drill through	Navigation from one analytical view into a related detail page or dataset filtered to a selected context.	Is the detail view governed at the same classification and access level?
Metric	A quantifiable measure of activity, quality, risk, cost, or outcome.	Is the calculation documented and reproducible?
Key Performance Indicator (KPI)	A metric tied to a strategic or operational target.	Who owns the target, tolerance, and interpretation?
OLAP	Online Analytical Processing: multidimensional analysis across measures, dimensions, hierarchies, and aggregations.	Are dimensions, grains, and aggregations consistent across tools?
Data mining	Statistical or computational analysis to discover patterns, associations, clusters, anomalies, or predictive signals.	Is the discovered pattern approved for action or only exploration?
Semantic layer	A governed business-facing model of metrics, dimensions, relationships, and calculations.	Is the semantic layer the source for official BI definitions?
Self-service analytics	A model where business users explore governed data with approved tools and guardrails.	Which datasets are certified for self-service use?

Microsoft's Power BI drillthrough documentation is a useful reminder that navigation features carry governance implications. Drill-through often moves a user from summary to detail. That can change privacy risk, row-level access needs, and interpretation.

Governance Roles and Structures#

These terms describe who can define, approve, implement, and escalate.

Term	Working definition	Review question
Data Governance (DG)	The operating system for managing, improving, protecting, and using data as an enterprise asset.	Which decision rights, artifacts, and cadences make the governance real?
Governance framework	The roles, policies, processes, standards, controls, and measures that define how governance runs.	Is the framework used in release gates or only documented?
Governance maturity model	A staged way to assess how repeatable, measured, and embedded governance practices are.	What evidence moves a domain from one level to the next?
Data Owner	Senior business accountable for domain data outcomes, risk, source-of-truth decisions, and policy approval.	Can this person approve tradeoffs when functions disagree?
Data Steward	Business-facing authority for meaning, quality rules, classifications, usage guidance, and issue triage.	Has the steward translated definitions into testable rules?
Data Custodian	Technical role that implements controls, pipelines, access, metadata capture, lineage, and operational reliability.	Are approved governance rules automated and observable?
Business SME	Functional expert who validates process reality, edge cases, and fitness for use.	Which process or report does the SME represent?
Data Steward Working Group	Domain or cross-domain forum where stewards coordinate definitions, rules, issues, and changes.	Does it resolve issues or only discuss them?
Executive Sponsor	Senior leader who provides funding, advocacy, escalation, and strategic priority for the governance program.	What decision can the sponsor unblock?
Governance co-chair	Leader responsible for running governance forums, managing agendas, tracking decisions, and linking working groups to executives.	Are decisions captured with owners and due dates?
EDGC	Enterprise Data Governance Committee: escalation body for cross-domain policy, exceptions, and unresolved risk.	Which issues qualify for escalation?

I wrote a companion role model in Data Governance Roles Need Decision Rights. The short version here is simple: owners decide, stewards define, custodians implement, SMEs validate, and the committee escalates enterprise risk.

Governance Concepts#

These terms describe the rules of the operating model.

Term	Working definition	Review question
Data governance standard	A mandatory rule or practice adopted by governed domains.	How is compliance measured and enforced?
Policy	A formal statement of required behavior, risk posture, or control expectation.	Who approved it, and what happens when it is violated?
Standard	A specific required implementation pattern or minimum bar.	Is it testable?
Procedure	Step-by-step process for executing a policy or standard.	Who follows it, and how is evidence captured?
Control	A safeguard or process that reduces risk or enforces policy.	Is the control preventive, detective, or corrective?
Exception	Approved deviation from policy, standard, or threshold.	Who accepted the risk, for how long, and with what compensating control?
Waiver	Temporary permission to proceed despite unmet criteria.	What expiry date and remediation plan exist?
Data domain	Business area with related concepts, processes, data products, and ownership.	Are domain boundaries clear enough to assign accountability?
Data product	Governed dataset, view, API, feature set, or analytical asset designed for consumption.	What contract proves it is fit for use?
Source of truth	The authoritative source used to resolve conflicts for a defined data element or domain.	Is authority scoped by purpose and time?
Golden record	Consolidated best representation of an entity, often produced through matching, survivorship, and merge rules.	Which survivorship rules choose winning values?
Data contract	Explicit agreement for schema, semantics, quality, freshness, ownership, and change behavior.	Does breaking the contract block release?
Certification	Governance approval that a data asset meets defined quality, metadata, access, lineage, and ownership standards.	What evidence supports the certified label?

The failure mode is treating "source of truth" as a universal title. It is rarely universal. A billing system may be authoritative for invoice status. A CRM may be authoritative for account owner. A support platform may be authoritative for service obligations. A governed domain needs scoped authority, not slogans.

Data Management Concepts#

These terms describe how data is created, moved, described, and made reusable.

Term	Working definition	Review question
ETL	Extract, Transform, Load: data is transformed before loading into the target.	Where are transformation rules versioned and tested?
ELT	Extract, Load, Transform: data is loaded first, then transformed inside the target platform.	Which layers are raw, curated, and certified?
MDM	Master Data Management: rules, processes, and systems that maintain authoritative shared entities such as customer, product, supplier, or employee.	Which domain entity needs a golden record, and why?
Reference data	Shared code sets, classifications, hierarchies, and lookup values used across systems.	Who approves changes to shared codes and hierarchies?
Metadata	Data about data: meaning, structure, ownership, lineage, classification, quality, use, and context.	Is the metadata complete enough to support trust and impact analysis?
Business metadata	Business definitions, policies, owners, usage constraints, classifications, and context.	Can a business user understand and use it?
Technical metadata	Schemas, data types, jobs, tables, columns, partitions, code, lineage, and operational properties.	Can an engineer trace and operate it?
Operational metadata	Runtime information such as freshness, failures, volume, latency, cost, and usage.	Can the team see whether the asset is healthy?
Metadata management	Processes and tools for maintaining metadata quality, access, lineage, and discoverability.	Who keeps metadata current after release?
Data dictionary	Technical inventory of fields, attributes, formats, constraints, and definitions.	Is it synchronized with actual schemas?
Data catalog	Searchable inventory of data assets, metadata, ownership, classifications, lineage, and usage signals.	Can consumers find certified assets and understand restrictions?
Business glossary	Approved vocabulary of business terms, definitions, owners, and relationships.	Are glossary terms linked to physical data assets?
Data lineage	Record of where data originates, how it moves, how it transforms, and where it is consumed.	Can the team trace impact before changing a field?
Data profiling	Analysis of data structure, values, distributions, patterns, nulls, duplicates, and anomalies.	Did profiling produce rules or only observations?

Microsoft Purview's data governance glossary and lineage documentation are useful examples of how catalog terms, classifications, assets, and lineage become a connected operating surface.

The most common mistake is separating glossary and catalog work. A glossary without asset links is vocabulary. A catalog without business terms is inventory. Governance needs both.

Security, Privacy, and Compliance Terms#

These terms describe how data is protected and constrained.

Term	Working definition	Review question
Information security	Policies, controls, and practices that protect confidentiality, integrity, and availability.	Which control objective applies to this asset?
Privacy	Rules and practices for responsible collection, use, sharing, retention, and disposal of personal or sensitive data.	What purpose, lawful basis, notice, and minimization constraints apply?
Data security	Technical and administrative protection for data assets, including access control, encryption, masking, monitoring, and incident response.	What protects the data at rest, in transit, and in use?
Data classification	Categorization of data by sensitivity, confidentiality, regulatory obligation, or handling requirement.	Is classification applied at dataset and field level?
Sensitive data	Data requiring additional protection because disclosure, misuse, or alteration creates harm or legal risk.	Which fields require masking, approval, or special handling?
PII	Personally identifiable information: data that identifies or can reasonably identify a person.	Is this field direct, indirect, derived, or linkable?
PHI	Protected health information under HIPAA context.	Is the organization a covered entity, business associate, or outside HIPAA scope?
Data Loss Prevention (DLP)	Processes and technologies that detect, classify, monitor, and prevent unauthorized exposure or exfiltration.	Are DLP findings routed to owners and remediated?
Data masking	Obscuring sensitive values while preserving some operational utility.	Is masking irreversible, format-preserving, dynamic, or only display-level?
Tokenization	Replacing sensitive values with tokens managed by a protected mapping service.	Who can re-identify, and under what approval?
Anonymization	Transformation intended to prevent identification of individuals.	Has re-identification risk been assessed for the actual context?
De-identification	Removal or alteration of identifiers to reduce privacy risk, often under a specific regulatory or analytical framework.	Which method and evidence prove the data is de-identified enough for the use?
Pseudonymization	Replacing identifiers while retaining a way to relink with additional information.	Is the key separated and controlled?
RBAC	Role-Based Access Control: granting permissions through assigned roles.	Do roles map to business need and least privilege?
Audit logging	Records of access, modification, administrative action, and policy events.	Are logs complete enough to reconstruct what happened?
Audit trail	End-to-end evidence chain showing who did what, when, and under which approval.	Can audit evidence survive staff turnover and tool migration?
Data ethics	Principles for fair, accountable, transparent, and responsible data collection and use.	Could the use be legal but still unacceptable?
Regulatory compliance	Meeting applicable legal, contractual, and industry obligations.	Which regulation or control actually applies?
Data residency	Requirement that data be stored in a defined geography.	Which storage, backup, and replication locations are in scope?
Data sovereignty	Legal or jurisdictional control over data based on where it is located, processed, or accessed.	Which laws govern processing and access?

For security and privacy terms, current link-outs matter. NIST CSF 2.0 provides a general cyber risk management frame. NIST SP 800-53 organizes detailed control families such as access control, audit and accountability, incident response, PII processing and transparency, and system integrity. NIST's RBAC project gives historical and technical context for role-based access control.

For de-identification, Google Sensitive Data Protection documents transformations such as masking, and HHS explains HIPAA de-identification through Expert Determination and Safe Harbor. For GDPR-oriented teams, the European Commission's GDPR principles are a cleaner anchor than secondhand summaries.

Lifecycle Management Terms#

These terms describe how data quality and data lifespan are governed.

Term	Working definition	Review question
Data quality	Fitness of data for intended use across dimensions such as accuracy, completeness, consistency, timeliness, validity, uniqueness, and reliability.	Which quality dimensions are critical for this decision?
Accuracy	Data correctly represents the real-world object, event, or measurement.	What source or process verifies correctness?
Completeness	Required values or records are present.	Which missing values block use?
Consistency	Values do not conflict across systems, records, or time.	Which conflicts need survivorship rules?
Timeliness	Data is current enough for its use.	What freshness SLA matters?
Validity	Values conform to allowed formats, ranges, codes, and rules.	Which checks are automated?
Uniqueness	Entities or records are not duplicated beyond accepted rules.	What match and merge logic applies?
Reliability	Data can be depended on repeatedly under expected conditions.	What monitoring proves stability?
Data retention	Policy defining how long data is stored for business, legal, risk, or operational needs.	Which clock starts retention?
Archival	Moving inactive data to long-term storage while preserving access, integrity, and policy compliance.	Who can retrieve archived data and why?
Disposal or disposition	Secure deletion, destruction, anonymization, or transfer at the end of the lifecycle.	What evidence proves disposal happened?
Legal hold	Suspension of normal disposal because of litigation, investigation, or regulatory need.	Which datasets and backups are included?
Data minimization	Collecting, storing, and processing only what is necessary for the defined purpose.	Which fields can be removed without harming the purpose?
Purpose limitation	Using data only for specified, legitimate, and compatible purposes.	Is the secondary use approved?
Storage limitation	Keeping identifiable personal data no longer than necessary for the purpose, subject to legitimate exceptions.	Which retention rule prevents indefinite storage?

Lifecycle terms are where privacy, cost, and analytics collide. Analysts often want history. Legal may need retention. Privacy may require minimization and deletion. Platform teams want storage cost under control. The glossary should not pretend those tensions disappear. It should name the decision owner and the evidence required.

Term Entry Contract#

The glossary becomes operational when each governed term has a minimum contract.

governance-glossary-entry.yaml

term: "active customer"
domain: "customer"
status: "approved"
owner: "customer data owner"
steward: "customer data steward"
definition: "A customer with an active billable contract during the reporting period."
business_context: "Used for executive reporting, retention analysis, and revenue operations."
authoritative_source: "crm.account_contract_status"
related_terms:
  - "billable customer"
  - "service obligation"
  - "churned customer"
quality_rules:
  - name: "contract status is populated"
    threshold: ">= 99.5 percent"
  - name: "reporting period has one status per account"
    threshold: "100 percent for certified reporting"
security_and_privacy:
  classification: "confidential"
  pii_fields:
    - "account_contact_email"
  access_model: "RBAC with steward approval for detail-level drill-through"
lineage:
  source: "crm"
  certified_asset: "customer_profile_certified"
  consumers:
    - "executive_customer_scorecard"
    - "retention_model_features"
change_control:
  breaking_change_requires:
    - "steward review"
    - "owner approval"
    - "consumer migration note"
retention:
  rule: "retain certified monthly snapshots for 7 years unless legal hold applies"
evidence:
  - "definition approval record"
  - "quality dashboard"
  - "lineage registration"
  - "access review log"

This looks heavier than a plain definition because plain definitions are not enough for enterprise use. A term that drives reporting, access, AI features, compliance, or customer action needs ownership, source, rules, controls, lineage, and change behavior.

What I Would Add Before Calling It Complete#

The starter glossary above covers the core vocabulary from business analytics, roles, governance, data management, security, privacy, compliance, and lifecycle management. Before using it as an enterprise standard, I would add a few organization-specific columns:

Column	Why it matters
Domain	Prevents global terms from hiding local ownership
Owner	Names final accountability
Steward	Names the person who maintains meaning and rules
System or asset links	Connects words to real tables, dashboards, APIs, and models
Classification	Connects meaning to access and handling
Quality rules	Turns definitions into tests
Consumers	Shows blast radius when a term changes
Change history	Preserves decisions and reversals
Status	Separates proposed, approved, deprecated, and retired terms

The status field is underrated. Proposed terms should not be treated like approved terms. Deprecated terms should not vanish quietly. Retired terms should leave a pointer to their replacement or the reason they stopped being valid.

The Tradeoff#

Glossaries slow people down at the start.

Someone has to decide whether "customer" means buyer, account, household, patient, subscriber, merchant, tenant, or legal entity. Someone has to separate dashboard convenience from source-of-truth authority. Someone has to say that a field cannot be used for a new purpose until privacy and access have been reviewed.

That friction is the cost of enterprise meaning.

The alternative is local speed and global confusion. Every team moves quickly until a metric dispute reaches leadership, a model trains on an unstable feature, a regulatory request takes weeks, or a data quality incident reveals that nobody knows which definition was official.

I would rather pay the definition cost once and attach it to artifacts than keep paying the trust tax forever.

The Review Checklist#

Use this checklist when adding or approving a glossary term.

Check	Pass condition
Meaning	Definition is specific enough to distinguish from adjacent terms
Owner	Accountable owner and steward are named
Source	Authoritative source is scoped and linked
Assets	Physical datasets, reports, models, or APIs are connected
Rules	Quality rules and thresholds exist for governed use
Controls	Classification, access, masking, and retention are documented
Lineage	Upstream source and downstream consumers are known
Change	Breaking-change approval and notification path exists
Evidence	Approval record, tests, and catalog metadata are available
Status	Term lifecycle state is clear

That is the difference between a glossary and a governance asset.

A glossary is not successful because it has many entries. It is successful when the next dashboard, data product, AI feature, access request, and incident review all use the same language without reopening the same argument.