Privacy ranks
The Privacy Audit Report includes a Privacy rank for each column in the exported CSV report.
Privacy rank is the column-level privacy score in that report.
Privacy rank helps you understand how strongly a column is protected based on the generator and generator configuration applied to that column.
Privacy rank is shown as an integer from 1 to 6:
1
Highest privacy protection
2–3
Strong protection, with some possible structural or value-level leakage
4–5
Protected, but with increased linkability or leakage risk
6
Lowest privacy protection, such as original data or direct copy
Privacy rank is independent from whether a column is marked as PII or non-PII. A non-PII column can still receive a rank, and a protected PII column can have a weaker or stronger rank depending on how it is protected.
Where Privacy rank appears
Privacy rank is included in the CSV export of the Privacy Audit Report from:
Main Hub
Job summary
Jobs panel
Each exported row includes one additional column:
Privacy rank
Integer, 1–6
Indicates the privacy protection strength for the column. Lower values mean stronger protection.
Excluded tables may have an empty Privacy rank because no generated output is produced for those columns.
What Privacy rank tells you
Privacy rank answers the question:
“How strong is the privacy protection for this column?”
This is different from the existing column privacy status, which answers:
“Is this column protected, unprotected, or non-PII?”
For example, two columns can both have the status Protected, but one can still be safer than the other because of the generator configuration.
Use Privacy rank to:
Identify columns with weaker protection
Filter or sort the report by privacy strength
Compare privacy posture between jobs
Apply internal policy thresholds, such as “no PII column should have a rank higher than 3”
How Privacy rank is determined
Privacy rank is calculated at column level using the assigned generator and its configuration.
The ranking considers factors such as:
Data independence
Whether the generator creates data independently from the source column or uses source values.
1:1 link risk
Whether source values can be linked directly to generated values.
Consistent mapping
Whether the same source value always maps to the same output value. This can improve consistency but lowers privacy.
Completeness of transformation
Whether all values are transformed or whether some original values may remain.
Structural leakage
Whether the output keeps patterns such as frequencies, cardinality, or structure from the original data.
Privacy rank scale
Rank 1 — Highest privacy
Rank 1 indicates the strongest protection.
Typical examples include:
Data-free, irreversible generators
Mock generators without consistent mapping
These generators do not rely on the original source values.
Rank 2 — Strong privacy with limited structural leakage
Rank 2 indicates strong protection, but the generated data may still reflect some structure of the original data.
Typical examples include:
AI synthesize generators, except categorical AI synthesize
These generators use the source data in a way that cannot be reversed, but the overall shape or structure of the generated data may still reveal limited information about the input data.
Rank 3 — Transformed data with possible value-level leakage
Rank 3 indicates that data is transformed, but the output may still reveal whether certain values existed in the original data.
Typical examples include:
Shuffle
Categorical AI synthesize
Mask generators without consistent mapping
These methods provide protection, but may retain some properties of the source column.
Rank 4 — Consistent transformation with linkability risk
Rank 4 indicates that data is transformed securely, but consistent values may introduce linkability risk.
Typical examples include:
Mock generators with consistent mapping enabled
Mask generators with consistent mapping enabled
Hash generators
Consistent mappingmeans the same source value is always transformed into the same destination value. This can be useful for preserving relationships across tables or runs, but it may allow someone with knowledge of the source data or value frequencies to link source and generated values.
Rank 5 — Potentially weak protection
Rank 5 indicates that data may be only partially protected or protection strength may be limited.
Typical examples include:
Duplicate with PII text processor
These configurations may reduce exposure but can still carry meaningful privacy risk depending on the data and use case.
Rank 6 — Lowest privacy
Rank 6 indicates the lowest level of protection.
Typical examples include:
Original data
Duplicate generators
Missing generator for a sensitive column
Rank 6 usually means the original data is preserved or directly copied.
Relationship with column privacy status
Privacy rank and column privacy status are related, but they are not the same.
Protected
Can have rank 1–5, depending on generator and configuration.
Unprotected
Often has rank 6, especially when PII is present and no protective generator is applied.
Non-PII
Still receives a rank based on generator and configuration.
A column marked Protected is not always equally protected. Use Privacy rank to understand the strength of the protection.
Excluded tables
If a table is excluded from generation output, Privacy rank will be empty in the export.
This prevents excluded columns from being compared with columns that are actually generated. An excluded table does not represent a protection method applied to output data, because the data is not included in the generated output.
Using Privacy rank in reviews
You can use Privacy rank to filter and sort the Privacy Audit Report.
Common workflows include:
Filter for Privacy rank 5 or 6 to find weakly protected columns
Review all PII columns with Privacy rank greater than 3
Compare Privacy ranks between job runs
Use the export as evidence for compliance or governance checks
For example, a reviewer may define an internal policy that no PII column should have a Privacy rank higher than 3. The CSV export can then be filtered to identify columns that do not meet that threshold.
Considerations
Privacy rank is based on the generator and generator configuration only.
It does not consider:
Whether the column is sensitive or non-sensitive
Dataset-level privacy risk
Detailed analysis of structured subfields such as JSON fields
Last updated
Was this helpful?

