Cleaning Headers
This article explains how to clean and simplify headers and labels during import so that variables created from PDFs, Excel or CSV have clear, usable names.
You will learn how to:
Deal with multi line and messy headers.
Remove repeated header rows.
Strip out footnotes from labels.
Make row labels meaningful for variables and reports.
Why header cleaning matters
Imported headers often come directly from external reports and may contain:
Long descriptive text that is not helpful as a variable name.
Footnote markers, references or symbols.
Repeated boilerplate such as
For the year ended.Inconsistent wording across similar items.
If you use these raw labels for variables, your model becomes harder to search and understand.
Cleaning headers at import time gives you:
Short, consistent names for variables and categories.
Cleaner reports and dashboards.
Easier reuse of imported variables across branches and scenarios.
Common header issues
Typical issues include:
Multi line headers where Excel or PDF wraps text into several lines in the same cell.
Header rows inside the table repeated on each page of a report.
Footnotes in labels such as
Revenue*with* see note 3.Overly long labels copied from disclosure language.
All of these can be addressed during the mapping and cleaning steps.
Simplifying row labels
For each row label that will become a variable or category name, aim for:
A short, descriptive phrase.
Consistent wording across similar items.
No footnote markers or extraneous punctuation.
Examples:
Revenue from contracts with customers→RevenueCost of sales→COGSSelling and distribution expenses→Opex - SellingProperty, plant and equipment→Assets - Plant
You can keep the original wording in a note if it is needed for audit purposes.
Removing repeated header rows
When importing multi page tables from PDFs, repeated header rows can be pulled in as data rows. You should:
Identify rows that are clearly headers (for example they repeat the column headings on each page).
Mark them as non-data rows or delete them in the cleaning step.
Ensure that only genuine data rows remain to be mapped to variables.
This prevents duplicate or meaningless variables being created.
Handling footnotes and annotations
Footnote markers such as *, †, or (1) often appear in row labels or column headings. You should:
Remove footnote markers from the label text.
If the footnote content is important, copy it into a note on the variable after import.
Avoid encoding footnote detail directly into the variable name.
This keeps variable names clean while preserving context where needed.
Column headers and period labels
Column headers that represent periods (for example 2023, 2024) are handled by the date detection logic rather than header cleaning. You usually only need to clean:
Column labels for non-date columns, such as
Actual,Budget,Variance.Columns that will be imported as drivers or metadata.
Give these columns short, descriptive names if you plan to use them as drivers or filters in the model.
Practical tips
Related articles
Last updated