Cleaning Headers

This article explains how to clean and simplify headers and labels during import so that variables created from PDFs, Excel or CSV have clear, usable names.

You will learn how to:

  • Deal with multi line and messy headers.

  • Remove repeated header rows.

  • Strip out footnotes from labels.

  • Make row labels meaningful for variables and reports.


Why header cleaning matters

Imported headers often come directly from external reports and may contain:

  • Long descriptive text that is not helpful as a variable name.

  • Footnote markers, references or symbols.

  • Repeated boilerplate such as For the year ended.

  • Inconsistent wording across similar items.

If you use these raw labels for variables, your model becomes harder to search and understand.

Cleaning headers at import time gives you:

  • Short, consistent names for variables and categories.

  • Cleaner reports and dashboards.

  • Easier reuse of imported variables across branches and scenarios.


Common header issues

Typical issues include:

  • Multi line headers where Excel or PDF wraps text into several lines in the same cell.

  • Header rows inside the table repeated on each page of a report.

  • Footnotes in labels such as Revenue* with * see note 3.

  • Overly long labels copied from disclosure language.

All of these can be addressed during the mapping and cleaning steps.


Simplifying row labels

For each row label that will become a variable or category name, aim for:

  • A short, descriptive phrase.

  • Consistent wording across similar items.

  • No footnote markers or extraneous punctuation.

Examples:

  • Revenue from contracts with customersRevenue

  • Cost of salesCOGS

  • Selling and distribution expensesOpex - Selling

  • Property, plant and equipmentAssets - Plant

You can keep the original wording in a note if it is needed for audit purposes.


Removing repeated header rows

When importing multi page tables from PDFs, repeated header rows can be pulled in as data rows. You should:

  • Identify rows that are clearly headers (for example they repeat the column headings on each page).

  • Mark them as non-data rows or delete them in the cleaning step.

  • Ensure that only genuine data rows remain to be mapped to variables.

This prevents duplicate or meaningless variables being created.


Handling footnotes and annotations

Footnote markers such as *, , or (1) often appear in row labels or column headings. You should:

  • Remove footnote markers from the label text.

  • If the footnote content is important, copy it into a note on the variable after import.

  • Avoid encoding footnote detail directly into the variable name.

This keeps variable names clean while preserving context where needed.


Column headers and period labels

Column headers that represent periods (for example 2023, 2024) are handled by the date detection logic rather than header cleaning. You usually only need to clean:

  • Column labels for non-date columns, such as Actual, Budget, Variance.

  • Columns that will be imported as drivers or metadata.

Give these columns short, descriptive names if you plan to use them as drivers or filters in the model.


Practical tips

circle-info
  • Spend a few minutes cleaning headers on import rather than living with noisy names for the life of the model.

  • Use a consistent naming pattern where you have similar lines across multiple imports.

  • Keep a separate reference to the original labels if your workflow requires auditability.


Last updated