PDF Import Overview
This article explains how the PDF import pipeline works in Model Reef and how to go from static PDF documents to a compiled, working model.
You will learn how to:
Upload and select PDF files.
Let Model Reef detect tables, dates and units.
Map rows to variable types, categories and branches.
Review and clean imported data before compiling.
Understand how imports create Data Library entries and variables.
Model Reef aims to get you from PDF to a working three statement model as quickly as possible, so you can focus on assumptions rather than manual data entry.
When to use the PDF import
Use the PDF import when your source data is primarily in:
Annual reports and financial statements.
Investor decks and information memoranda.
Bank or lender covenant packs.
Management reporting PDFs exported from another system.
Static tables sent by clients or counterparties.
If you have access to the same data in Excel or CSV, it is usually cleaner to import from those formats instead. The PDF import is ideal when that is not available.
What the PDF import does
At a high level, the PDF import pipeline:
Ingests one or more PDF files you upload.
Detects tables and extracts tabular data from the pages.
Identifies date columns, units and numeric patterns.
Proposes mappings from rows to variable types, categories and branches.
Creates Data Library series for each mapped line.
Creates variables that reference those series and assigns timing logic.
Compiles the model and updates P&L, Balance Sheet, Cashflow and the Cashflow Waterfall.
You stay in control of the mapping and cleaning steps so that the imported model matches your intent.
Uploading PDFs
In the import interface:
Choose Import from PDF.
Drag and drop one or more PDF files, or browse to select them.
Wait for Model Reef to process the files. It will display a list of detected tables and pages.
Best practices:
Prefer original digital PDFs over scanned images where possible.
If a PDF is a scan, ensure it has been OCR processed.
Group related statements and tables into the same import batch if they belong in the same model.
Table detection
Once the PDFs are uploaded, Model Reef:
Scans each page for tabular regions.
Identifies header rows, row labels and data blocks.
Splits separate tables on the same page into distinct candidates.
Flags low confidence tables for extra review.
You can step through each detected table to:
Confirm that it is a real data table you care about.
Skip tables that are decorative or not relevant.
Merge or split tables if the detection boundaries need adjusting.
See Table Detection Rules for more detail on how tables are identified and how to get better detection results.
Date and period detection
For each accepted table, Model Reef attempts to:
Recognise date columns and columns that represent periods.
Interpret formats such as
2022,FY23,2024E,Q1 2025, orDec 24.Map those columns to your model timeline at the appropriate frequency.
You can override:
Which columns are treated as periods.
The frequency (yearly, quarterly, monthly) used for each table.
The alignment between table periods and model start date.
See Date Column Detection for detailed behaviour and edge cases.
Unit detection and scaling
The importer then looks for:
Unit hints in headers (for example
in $000s,in millions,%).Patterns in the numbers themselves.
It proposes a unit scaling for each table or column, such as:
1 (actual units).
1 000 (thousands).
1 000 000 (millions).
You can confirm or change the scaling. Model Reef applies the scaling once on import so that internal values are stored in consistent base units.
See Unit Detection for more detail.
Row level mapping
For each row in each table, you map the line to model concepts, including:
Variable type such as Revenue, COGS, Opex, Staff, Asset, Liability, Equity, Tax or Dividend.
Category and sub category for reporting.
Branch to indicate which entity, division or store the line belongs to.
Units and frequency to confirm interpretation.
Optional notes or tags for future reference.
Model Reef will often propose sensible defaults based on row labels and context, but you can change any mapping before proceeding.
See:
Mapping Variable Types
Mapping Categories
Mapping Subcategories
Mapping Branches
Mapping Units and Frequency
Cleaning headers and merged cells
Before you finalise the import, you can tidy up structural issues such as:
Messy or multi line headers.
Repeated header rows inside tables.
Merged cells where a label spans multiple rows or columns.
Footnotes embedded into header or row labels.
The cleaning tools let you:
Rename and simplify row labels.
Split combined labels into clearer names.
Fill down merged cells so that each row has its own label.
Remove or ignore non data rows.
See Cleaning Headers and Fixing Merged Cells for detailed workflows.
Creating Data Library entries and variables
When you complete the mapping and cleaning steps, Model Reef:
Creates a Data Library entry for each mapped line, containing the imported time series with the chosen units and frequency.
Creates a variable referencing that Data Library entry, with the selected variable type, category, sub category and branch.
Applies default timing and behaviour based on the variable type, which you can refine later.
Imported variables immediately start contributing to:
P&L.
Balance Sheet.
Cashflow Statement.
Cashflow Waterfall.
Dashboards, reports and valuation outputs.
You can open any imported variable to adjust its timing, drivers or additional logic.
Reviewing the compiled model
After the import runs, you should:
Check the P&L for each branch and at group level to confirm structure.
Inspect the Balance Sheet to ensure assets, liabilities and equity look sensible.
Review the Cashflow Statement and Waterfall to confirm cash behaviour.
Compare imported historicals with known values for a sample of lines.
If something looks off, you can either:
Adjust the affected variables directly.
Rerun the import for specific tables with improved mappings.
Use the Data Library to correct imported series in place.
Tips for successful PDF imports
Use the cleanest PDF source you can get.
Start with a small subset of critical tables before importing everything.
Spend time on mapping and cleaning once, then reuse the model going forward.
Combine PDF import with manual variables for items that are not present in the source data.
Related articles
Last updated