Fixing Merged Cells

This article explains how to deal with merged cells and missing labels in tables imported from PDFs, Excel or CSV files.

You will learn how to:

  • Recognise when merged cells are causing problems.

  • Fill down or propagate labels so each row is clearly identified.

  • Split compound rows into multiple variables where needed.

  • Avoid incorrect mappings caused by partially labelled data.

Why merged cells are an issue

In original reports, merged cells are often used to make tables look cleaner, for example:

  • A single label spanning several rows of detail.

  • Group headings without repeated text for each line.

  • Blank cells where the reader is expected to infer the label.

When such tables are extracted, the merged cells may become:

  • True blank cells with no label.

  • Labels only on the first row of a group.

  • Misaligned or mis associated labels.

If you create variables directly from these rows, you may end up with:

  • Multiple variables sharing the same name but representing different things.

  • Rows that look unlabeled or ambiguous.

  • Totals or subtotals being mistaken for individual items.

Identifying problematic merged cells

Warning signs include:

  • Blocks of rows with blank labels where you know there should be detail.

  • A heading like Operating expenses with several unlabeled rows underneath.

  • Rows that have numbers but no distinguishing text.

  • Variables being created with duplicated or generic names only.

Always scan the row labels after table detection to make sure that each data row has a meaningful label.

Filling down labels

A common pattern is that a group label should apply to several subsequent rows. To handle this, you can:

  • Use a fill down operation that copies a label from a merged cell into all the rows beneath it until the next non blank label appears.

  • Append a suffix or detail from the original second column if needed to distinguish rows.

For example:

  • Original table:

    • Operating expenses (merged)

      • Row 1: blank label, Marketing numbers.

      • Row 2: blank label, Distribution numbers.

  • After cleaning:

    • Opex - Marketing

    • Opex - Distribution.

This gives each row its own descriptive label while keeping the group context.

Splitting compound rows

Sometimes a single row in a table will represent multiple concepts that you want to separate in the model, for example:

  • Sales and marketing as a combined line.

  • Rent and outgoings.

  • Other income that mixes material items.

In these cases, no amount of merged cell fixing will solve the modelling need. Instead you can:

  • Import the combined line as a single variable initially.

  • Later split it into multiple variables by editing the Data Library entry and variable logic.

  • Or adjust the source data in Excel before import to separate the components.

Fixing merged cells is about alignment and labelling, not about decomposing combined accounting lines.

Avoiding incorrect mappings

If merged cells are not cleaned up, it becomes easy to:

  • Assign the wrong variable type or category to a row.

  • Assume that an unlabeled row is a total, not a detail.

  • Map a blank label row as a separate variable accidentally.

To avoid this:

  • Ensure every row you plan to import has a clear label before mapping.

  • Delete or skip rows that you cannot interpret confidently.

  • Where in doubt, go back to the source document to see how the table was originally intended.

circle-info

Practical tips:

  • Use header cleaning and merged cell fixing together before doing variable mapping.

  • If a table is too messy, consider exporting from the source system in Excel and cleaning there.

  • Focus on the lines that matter most for your modelling, rather than trying to import every small note.

Last updated