Outcome focus: Created a shared vocabulary and term-entry contract that helps governance, data engineering, analytics, security, and business teams align definitions before certifying data products.
data governancedata managementmetadataprivacybusiness intelligence
The first enterprise glossary failed because it defined words and left decisions untouched.
The definitions were reasonable. Business Intelligence meant turning raw data into information for decisions. Data Steward meant the business role responsible for metadata and data quality. Metadata meant data about data. Data retention meant how long data was kept. Nobody argued much with the words.
Then the customer dashboard shipped.
Sales used one definition of active customer. Finance used another. Support wanted canceled customers included while service obligations were still open. Security wanted a sensitive field masked. Analytics wanted drill-through access to transaction-level detail. Legal asked whether the retention schedule allowed the historical table to exist in the first place.
The glossary had entries. It did not have operating force.
That is the standard I use now: a governance glossary is useful only if it changes how a data product is reviewed, approved, accessed, monitored, and retired. A definition should name the thing, but it should also tell the team who owns it, where it appears, how it is measured, and what decision it affects.
This post is a practical glossary for enterprise data governance. It is not a legal taxonomy, and it is not a replacement for frameworks like EDM Council DCAM, NIST CSF 2.0, NIST Privacy Framework, or NIST SP 800-53. It is the operating language I want in the room when business, analytics, platform, security, privacy, and governance teams have to make the same data product mean the same thing.
How to Use the Glossary#
A glossary entry should answer more than "what does this term mean?"
It should answer:
- Who owns the term?
- Which data products use it?
- Which source is authoritative?
- Which quality rules prove it is fit for use?
- Which privacy or security controls apply?
- Which downstream decisions break if the meaning changes?
The tradeoff is precision over speed. It is faster to let every team define terms locally. It is also how enterprises end up with five revenue numbers, three customer counts, and a privacy review that discovers sensitive data after the dashboard is already trusted.
The better path is slower at the definition boundary and faster everywhere downstream.
Business and Analytics Terms#
These terms describe how data becomes decision support.
| Term | Working definition | Review question |
|---|---|---|
| Business Intelligence (BI) | Tools, processes, and practices that turn raw or modeled data into information for decision-making. | Which decision or operating rhythm does this BI asset support? |
| Dashboard | A monitoring surface that shows current or frequently refreshed status. | Is the dashboard for live monitoring, or is it being used as a static report? |
| Scorecard | A performance snapshot against targets, goals, or thresholds. | Are targets approved, current, and owned? |
| Report | A structured analytical output, often with more detail, context, and interpretation than a dashboard. | Is the report exploratory, official, regulatory, or operational? |
| Drill down | Navigation from a summary level into lower levels within the same hierarchy, such as year to quarter to month. | Does the hierarchy match the governed dimensional model? |
| Drill through | Navigation from one analytical view into a related detail page or dataset filtered to a selected context. | Is the detail view governed at the same classification and access level? |
| Metric | A quantifiable measure of activity, quality, risk, cost, or outcome. | Is the calculation documented and reproducible? |
| Key Performance Indicator (KPI) | A metric tied to a strategic or operational target. | Who owns the target, tolerance, and interpretation? |
| OLAP | Online Analytical Processing: multidimensional analysis across measures, dimensions, hierarchies, and aggregations. | Are dimensions, grains, and aggregations consistent across tools? |
| Data mining | Statistical or computational analysis to discover patterns, associations, clusters, anomalies, or predictive signals. | Is the discovered pattern approved for action or only exploration? |
| Semantic layer | A governed business-facing model of metrics, dimensions, relationships, and calculations. | Is the semantic layer the source for official BI definitions? |
| Self-service analytics | A model where business users explore governed data with approved tools and guardrails. | Which datasets are certified for self-service use? |
Microsoft's Power BI drillthrough documentation is a useful reminder that navigation features carry governance implications. Drill-through often moves a user from summary to detail. That can change privacy risk, row-level access needs, and interpretation.
Governance Roles and Structures#
These terms describe who can define, approve, implement, and escalate.
| Term | Working definition | Review question |
|---|---|---|
| Data Governance (DG) | The operating system for managing, improving, protecting, and using data as an enterprise asset. | Which decision rights, artifacts, and cadences make the governance real? |
| Governance framework | The roles, policies, processes, standards, controls, and measures that define how governance runs. | Is the framework used in release gates or only documented? |
| Governance maturity model | A staged way to assess how repeatable, measured, and embedded governance practices are. | What evidence moves a domain from one level to the next? |
| Data Owner | Senior business accountable for domain data outcomes, risk, source-of-truth decisions, and policy approval. | Can this person approve tradeoffs when functions disagree? |
| Data Steward | Business-facing authority for meaning, quality rules, classifications, usage guidance, and issue triage. | Has the steward translated definitions into testable rules? |
| Data Custodian | Technical role that implements controls, pipelines, access, metadata capture, lineage, and operational reliability. | Are approved governance rules automated and observable? |
| Business SME | Functional expert who validates process reality, edge cases, and fitness for use. | Which process or report does the SME represent? |
| Data Steward Working Group | Domain or cross-domain forum where stewards coordinate definitions, rules, issues, and changes. | Does it resolve issues or only discuss them? |
| Executive Sponsor | Senior leader who provides funding, advocacy, escalation, and strategic priority for the governance program. | What decision can the sponsor unblock? |
| Governance co-chair | Leader responsible for running governance forums, managing agendas, tracking decisions, and linking working groups to executives. | Are decisions captured with owners and due dates? |
| EDGC | Enterprise Data Governance Committee: escalation body for cross-domain policy, exceptions, and unresolved risk. | Which issues qualify for escalation? |
I wrote a companion role model in Data Governance Roles Need Decision Rights. The short version here is simple: owners decide, stewards define, custodians implement, SMEs validate, and the committee escalates enterprise risk.
Governance Concepts#
These terms describe the rules of the operating model.
| Term | Working definition | Review question |
|---|---|---|
| Data governance standard | A mandatory rule or practice adopted by governed domains. | How is compliance measured and enforced? |
| Policy | A formal statement of required behavior, risk posture, or control expectation. | Who approved it, and what happens when it is violated? |
| Standard | A specific required implementation pattern or minimum bar. | Is it testable? |
| Procedure | Step-by-step process for executing a policy or standard. | Who follows it, and how is evidence captured? |
| Control | A safeguard or process that reduces risk or enforces policy. | Is the control preventive, detective, or corrective? |
| Exception | Approved deviation from policy, standard, or threshold. | Who accepted the risk, for how long, and with what compensating control? |
| Waiver | Temporary permission to proceed despite unmet criteria. | What expiry date and remediation plan exist? |
| Data domain | Business area with related concepts, processes, data products, and ownership. | Are domain boundaries clear enough to assign accountability? |
| Data product | Governed dataset, view, API, feature set, or analytical asset designed for consumption. | What contract proves it is fit for use? |
| Source of truth | The authoritative source used to resolve conflicts for a defined data element or domain. | Is authority scoped by purpose and time? |
| Golden record | Consolidated best representation of an entity, often produced through matching, survivorship, and merge rules. | Which survivorship rules choose winning values? |
| Data contract | Explicit agreement for schema, semantics, quality, freshness, ownership, and change behavior. | Does breaking the contract block release? |
| Certification | Governance approval that a data asset meets defined quality, metadata, access, lineage, and ownership standards. | What evidence supports the certified label? |
The failure mode is treating "source of truth" as a universal title. It is rarely universal. A billing system may be authoritative for invoice status. A CRM may be authoritative for account owner. A support platform may be authoritative for service obligations. A governed domain needs scoped authority, not slogans.
Data Management Concepts#
These terms describe how data is created, moved, described, and made reusable.
| Term | Working definition | Review question |
|---|---|---|
| ETL | Extract, Transform, Load: data is transformed before loading into the target. | Where are transformation rules versioned and tested? |
| ELT | Extract, Load, Transform: data is loaded first, then transformed inside the target platform. | Which layers are raw, curated, and certified? |
| MDM | Master Data Management: rules, processes, and systems that maintain authoritative shared entities such as customer, product, supplier, or employee. | Which domain entity needs a golden record, and why? |
| Reference data | Shared code sets, classifications, hierarchies, and lookup values used across systems. | Who approves changes to shared codes and hierarchies? |
| Metadata | Data about data: meaning, structure, ownership, lineage, classification, quality, use, and context. | Is the metadata complete enough to support trust and impact analysis? |
| Business metadata | Business definitions, policies, owners, usage constraints, classifications, and context. | Can a business user understand and use it? |
| Technical metadata | Schemas, data types, jobs, tables, columns, partitions, code, lineage, and operational properties. | Can an engineer trace and operate it? |
| Operational metadata | Runtime information such as freshness, failures, volume, latency, cost, and usage. | Can the team see whether the asset is healthy? |
| Metadata management | Processes and tools for maintaining metadata quality, access, lineage, and discoverability. | Who keeps metadata current after release? |
| Data dictionary | Technical inventory of fields, attributes, formats, constraints, and definitions. | Is it synchronized with actual schemas? |
| Data catalog | Searchable inventory of data assets, metadata, ownership, classifications, lineage, and usage signals. | Can consumers find certified assets and understand restrictions? |
| Business glossary | Approved vocabulary of business terms, definitions, owners, and relationships. | Are glossary terms linked to physical data assets? |
| Data lineage | Record of where data originates, how it moves, how it transforms, and where it is consumed. | Can the team trace impact before changing a field? |
| Data profiling | Analysis of data structure, values, distributions, patterns, nulls, duplicates, and anomalies. | Did profiling produce rules or only observations? |
Microsoft Purview's data governance glossary and lineage documentation are useful examples of how catalog terms, classifications, assets, and lineage become a connected operating surface.
The most common mistake is separating glossary and catalog work. A glossary without asset links is vocabulary. A catalog without business terms is inventory. Governance needs both.
Security, Privacy, and Compliance Terms#
These terms describe how data is protected and constrained.
| Term | Working definition | Review question |
|---|---|---|
| Information security | Policies, controls, and practices that protect confidentiality, integrity, and availability. | Which control objective applies to this asset? |
| Privacy | Rules and practices for responsible collection, use, sharing, retention, and disposal of personal or sensitive data. | What purpose, lawful basis, notice, and minimization constraints apply? |
| Data security | Technical and administrative protection for data assets, including access control, encryption, masking, monitoring, and incident response. | What protects the data at rest, in transit, and in use? |
| Data classification | Categorization of data by sensitivity, confidentiality, regulatory obligation, or handling requirement. | Is classification applied at dataset and field level? |
| Sensitive data | Data requiring additional protection because disclosure, misuse, or alteration creates harm or legal risk. | Which fields require masking, approval, or special handling? |
| PII | Personally identifiable information: data that identifies or can reasonably identify a person. | Is this field direct, indirect, derived, or linkable? |
| PHI | Protected health information under HIPAA context. | Is the organization a covered entity, business associate, or outside HIPAA scope? |
| Data Loss Prevention (DLP) | Processes and technologies that detect, classify, monitor, and prevent unauthorized exposure or exfiltration. | Are DLP findings routed to owners and remediated? |
| Data masking | Obscuring sensitive values while preserving some operational utility. | Is masking irreversible, format-preserving, dynamic, or only display-level? |
| Tokenization | Replacing sensitive values with tokens managed by a protected mapping service. | Who can re-identify, and under what approval? |
| Anonymization | Transformation intended to prevent identification of individuals. | Has re-identification risk been assessed for the actual context? |
| De-identification | Removal or alteration of identifiers to reduce privacy risk, often under a specific regulatory or analytical framework. | Which method and evidence prove the data is de-identified enough for the use? |
| Pseudonymization | Replacing identifiers while retaining a way to relink with additional information. | Is the key separated and controlled? |
| RBAC | Role-Based Access Control: granting permissions through assigned roles. | Do roles map to business need and least privilege? |
| Audit logging | Records of access, modification, administrative action, and policy events. | Are logs complete enough to reconstruct what happened? |
| Audit trail | End-to-end evidence chain showing who did what, when, and under which approval. | Can audit evidence survive staff turnover and tool migration? |
| Data ethics | Principles for fair, accountable, transparent, and responsible data collection and use. | Could the use be legal but still unacceptable? |
| Regulatory compliance | Meeting applicable legal, contractual, and industry obligations. | Which regulation or control actually applies? |
| Data residency | Requirement that data be stored in a defined geography. | Which storage, backup, and replication locations are in scope? |
| Data sovereignty | Legal or jurisdictional control over data based on where it is located, processed, or accessed. | Which laws govern processing and access? |
For security and privacy terms, current link-outs matter. NIST CSF 2.0 provides a general cyber risk management frame. NIST SP 800-53 organizes detailed control families such as access control, audit and accountability, incident response, PII processing and transparency, and system integrity. NIST's RBAC project gives historical and technical context for role-based access control.
For de-identification, Google Sensitive Data Protection documents transformations such as masking, and HHS explains HIPAA de-identification through Expert Determination and Safe Harbor. For GDPR-oriented teams, the European Commission's GDPR principles are a cleaner anchor than secondhand summaries.
Lifecycle Management Terms#
These terms describe how data quality and data lifespan are governed.
| Term | Working definition | Review question |
|---|---|---|
| Data quality | Fitness of data for intended use across dimensions such as accuracy, completeness, consistency, timeliness, validity, uniqueness, and reliability. | Which quality dimensions are critical for this decision? |
| Accuracy | Data correctly represents the real-world object, event, or measurement. | What source or process verifies correctness? |
| Completeness | Required values or records are present. | Which missing values block use? |
| Consistency | Values do not conflict across systems, records, or time. | Which conflicts need survivorship rules? |
| Timeliness | Data is current enough for its use. | What freshness SLA matters? |
| Validity | Values conform to allowed formats, ranges, codes, and rules. | Which checks are automated? |
| Uniqueness | Entities or records are not duplicated beyond accepted rules. | What match and merge logic applies? |
| Reliability | Data can be depended on repeatedly under expected conditions. | What monitoring proves stability? |
| Data retention | Policy defining how long data is stored for business, legal, risk, or operational needs. | Which clock starts retention? |
| Archival | Moving inactive data to long-term storage while preserving access, integrity, and policy compliance. | Who can retrieve archived data and why? |
| Disposal or disposition | Secure deletion, destruction, anonymization, or transfer at the end of the lifecycle. | What evidence proves disposal happened? |
| Legal hold | Suspension of normal disposal because of litigation, investigation, or regulatory need. | Which datasets and backups are included? |
| Data minimization | Collecting, storing, and processing only what is necessary for the defined purpose. | Which fields can be removed without harming the purpose? |
| Purpose limitation | Using data only for specified, legitimate, and compatible purposes. | Is the secondary use approved? |
| Storage limitation | Keeping identifiable personal data no longer than necessary for the purpose, subject to legitimate exceptions. | Which retention rule prevents indefinite storage? |
Lifecycle terms are where privacy, cost, and analytics collide. Analysts often want history. Legal may need retention. Privacy may require minimization and deletion. Platform teams want storage cost under control. The glossary should not pretend those tensions disappear. It should name the decision owner and the evidence required.
Term Entry Contract#
The glossary becomes operational when each governed term has a minimum contract.
term: "active customer"
domain: "customer"
status: "approved"
owner: "customer data owner"
steward: "customer data steward"
definition: "A customer with an active billable contract during the reporting period."
business_context: "Used for executive reporting, retention analysis, and revenue operations."
authoritative_source: "crm.account_contract_status"
related_terms:
- "billable customer"
- "service obligation"
- "churned customer"
quality_rules:
- name: "contract status is populated"
threshold: ">= 99.5 percent"
- name: "reporting period has one status per account"
threshold: "100 percent for certified reporting"
security_and_privacy:
classification: "confidential"
pii_fields:
- "account_contact_email"
access_model: "RBAC with steward approval for detail-level drill-through"
lineage:
source: "crm"
certified_asset: "customer_profile_certified"
consumers:
- "executive_customer_scorecard"
- "retention_model_features"
change_control:
breaking_change_requires:
- "steward review"
- "owner approval"
- "consumer migration note"
retention:
rule: "retain certified monthly snapshots for 7 years unless legal hold applies"
evidence:
- "definition approval record"
- "quality dashboard"
- "lineage registration"
- "access review log"This looks heavier than a plain definition because plain definitions are not enough for enterprise use. A term that drives reporting, access, AI features, compliance, or customer action needs ownership, source, rules, controls, lineage, and change behavior.
What I Would Add Before Calling It Complete#
The starter glossary above covers the core vocabulary from business analytics, roles, governance, data management, security, privacy, compliance, and lifecycle management. Before using it as an enterprise standard, I would add a few organization-specific columns:
| Column | Why it matters |
|---|---|
| Domain | Prevents global terms from hiding local ownership |
| Owner | Names final accountability |
| Steward | Names the person who maintains meaning and rules |
| System or asset links | Connects words to real tables, dashboards, APIs, and models |
| Classification | Connects meaning to access and handling |
| Quality rules | Turns definitions into tests |
| Consumers | Shows blast radius when a term changes |
| Change history | Preserves decisions and reversals |
| Status | Separates proposed, approved, deprecated, and retired terms |
The status field is underrated. Proposed terms should not be treated like approved terms. Deprecated terms should not vanish quietly. Retired terms should leave a pointer to their replacement or the reason they stopped being valid.
The Tradeoff#
Glossaries slow people down at the start.
Someone has to decide whether "customer" means buyer, account, household, patient, subscriber, merchant, tenant, or legal entity. Someone has to separate dashboard convenience from source-of-truth authority. Someone has to say that a field cannot be used for a new purpose until privacy and access have been reviewed.
That friction is the cost of enterprise meaning.
The alternative is local speed and global confusion. Every team moves quickly until a metric dispute reaches leadership, a model trains on an unstable feature, a regulatory request takes weeks, or a data quality incident reveals that nobody knows which definition was official.
I would rather pay the definition cost once and attach it to artifacts than keep paying the trust tax forever.
The Review Checklist#
Use this checklist when adding or approving a glossary term.
| Check | Pass condition |
|---|---|
| Meaning | Definition is specific enough to distinguish from adjacent terms |
| Owner | Accountable owner and steward are named |
| Source | Authoritative source is scoped and linked |
| Assets | Physical datasets, reports, models, or APIs are connected |
| Rules | Quality rules and thresholds exist for governed use |
| Controls | Classification, access, masking, and retention are documented |
| Lineage | Upstream source and downstream consumers are known |
| Change | Breaking-change approval and notification path exists |
| Evidence | Approval record, tests, and catalog metadata are available |
| Status | Term lifecycle state is clear |
That is the difference between a glossary and a governance asset.
A glossary is not successful because it has many entries. It is successful when the next dashboard, data product, AI feature, access request, and incident review all use the same language without reopening the same argument.