Detect Table Node (Shapes > Grouping) – axes4

What it does

This node automatically detects the table structure based on shapes that define the table grid, applies this structure to the tagged table content, and generates complete tables populated with that content.

Table detection is based on four types of histograms that analyze how much of an area is occupied by shapes (i.e., the percentage of area coverage) in both the grid and the content. These histograms are used to identify rows and columns:

A histogram across the table width to detect structure columns
A histogram across the table height to detect structure rows
A histogram across the table width to detect content columns
A histogram across the table height to detect content rows

For example, if a table grid consists of vertical lines, the histogram for structure columns will show strong peaks at positions where these lines occupy a large portion of the table height.

Tip

Use the Set Table Attributes Node to add table summaries, header scopes and cell spans.

Use it for

Use the Detect Table node for tables with simple grid layouts. For tables with complex grid layouts that cannot be reliably detected by the Detect Table node, use the Tabulate node instead.

Recommended workflow:

For simple table layouts:
1. Use the Group Spatially node to group grid elements and content.
2. Pass the result to the Detect Table node.
  → This approach is faster and more efficient.
For complex table layouts that detection fails on:
→ Use the Tabulate Node as a fallback solution.

How to use it

Drag and drop the node from the Node Library into your template:
Node Library > Folder Shapes > Folder Grouping
Connect the node with other nodes in the Data Flow of your template. Connect the table content to the Content input port and the table structure to the Structures input port.
Specify the settings in the Node Properties task pane.

Node Input

Content: Connect a node containing only the table content (typically shape trees with tagged table text), excluding any shapes that form the table grid.
The content should be grouped into a container per table. If you used the Group Spatially node, connect its Content Groups output.
Within each group, the table content should already be properly structured into lines and paragraphs and correctly tagged. The order is not important, as the node will automatically sort the content into the appropriate cells based on the provided table structures.
Structures: Connect a node containing the table structure. This usually consists of path shapes (e.g., lines or rectangles) that visually define the table grid.
The structures should be grouped into a container per table. If you used the Group Spatially node, connect its Geometry Groups output.
These shapes do not need to be ordered or arranged into rows and columns; they only need to be grouped per table.

Node Output

Tables: Outputs shape trees that represent complete tables.

Node Properties

Note

In most cases, the node’s default values are sufficient for accurate table detection.

Node Name

You can assign a custom name to the node to help identify its purpose within your template.

Column Raster Size

Defines the resolution used to calculate column histograms. The table width is divided into vertical strips of the specified size (in points), and one histogram value is computed per strip.

A finer raster reduces performance, but it must not be too coarse to ensure accurate table detection.

Example:

3,000

The table width is divided into vertical strips of 3 points.

Row Raster Size

Defines the resolution used to calculate row histograms. The table height is divided into horizontal strips of the specified size (in points), and one histogram value is computed per strip.

A finer raster reduces performance, but it must not be too coarse to ensure accurate table detection.

Example:

1,500

The table height is divided into horizontal strips of 1.5 points.

Min. Raster Column Width

Defines the minimum column width, relative to the specified Column Raster Size. This prevents detecting too many columns, for example when column lines in the table are represented by double lines.

Example:

If the Column Raster Size is set to 3 points, the minimum column width is calculated as 4 × 3 points, resulting in 12 points.

Min. Raster Row Height

Defines the minimum row width, relative to the specified Row Raster Size. This prevents detecting too many row, for example when row lines in the table are represented by double lines.

Example:

If the Column Raster Size is set to 1.5 points, the minimum row width is calculated as 4 × 1.5 points, resulting in 6 points.

Structures Column Threshold Expression

The structures column histogram spans the full width and height of the table. The table is divided into vertical strips (based on the Column Raster Size), and the area of each strip that is covered by structure shapes is accumulated.

The higher the coverage within a strip, the higher the histogram value. Peaks in the histogram typically indicate vertical grid lines (columns), as these occupy a large portion of the strip area.

This expression defines the minimum histogram value required for a strip to be recognized as a column line. At all positions where this value is exceeded, a so-called column structure cut is set, marking a potential column boundary.

Available variables:

bbox: the table bounding box. Properties: width, height, left, right, top and bottom.

Expected return type: double

Example:

bbox.height * 0.5

If the histogram peak exceeds half the table height, it is considered a column line.

Structures Row Threshold Expression

The structures row histogram spans the full height and width of the table. The table is divided into horizontal strips (based on the Row Raster Size), and the area of each strip that is covered by structure shapes is accumulated.

The higher the coverage within a strip, the higher the histogram value. Peaks in the histogram typically indicate horizontal grid lines (rows).

This expression defines the minimum histogram value required for a strip to be recognized as a row line. At all positions where this value is exceeded, a so-called row structure cut is set, marking a potential row boundary.

Available variables:

bbox: the table bounding box. Properties: width, height, left, right, top and bottom.

Expected return type: double

Example:

bbox.width * 0.5

If the histogram value exceeds half the table width, the strip is considered a row line.

Content Column Threshold Expression

The content column histogram spans the full width and height of the table. The table is divided into vertical strips (based on the Column Raster Size), and the area of each strip that is covered by content shapes is accumulated.

The more content a strip contains, the higher its histogram value. To detect columns based on content, gaps between content are analyzed. These appear as strips with low or zero coverage. Valleys in the histogram—especially values close to zero—typically indicate regions without content and may mark column boundaries.

This expression defines the maximum histogram value for a strip to be considered a valley and therefore interpreted as a column boundary. At all positions where this value is not exceeded, a so-called column content cut is set, marking a potential column boundary.

Available variables:

bbox: the table bounding box. Properties: width, height, left, right, top and bottom.

Expected return type: double

Example:

0.0

If a strip contains no content, its value is zero, indicating a column boundary.

Content Row Threshold Expression

The content row histogram spans the full width and height of the table. The table is divided into horizontal strips (based on the Row Raster Size), and the area of each strip that is covered by content shapes is accumulated.

The more content a strip contains, the higher its histogram value. To detect rows based on content, gaps between content are analyzed. These appear as strips with low or zero coverage. Valleys in the histogram—especially values close to zero—typically indicate regions without content and may mark column boundaries.

This expression defines the maximum histogram value for a strip to be considered a valley and therefore interpreted as a row boundary. At all positions where this value is not exceeded, a so-called row content cut is set, marking a potential row boundary.

Available variables:

bbox: the table bounding box. Properties: width, height, left, right, top and bottom.

Expected return type: double

Example:

0.0

If a strip contains no content, its value is zero, indicating a row boundary.

Column Cut Filter Expression

This expression is executed for each column structure cut and column content cut identified by the Structures Column Threshold Expression and the Content Column Threshold Expression. It determines which cuts should be used as actual column boundaries in the final table.

In most cases, it is sufficient to use only structure cuts—that is, cuts derived from the table grid shapes. Content cuts can be useful in cases where clear column lines are missing.

Available variables:

isStructureCut: true if the current cut is a structure cut
structureCuts: the sequence of structure cuts
structureCutIndex: the zero-based index of the current structure cut
isContentCut: true if the current cut is a content cut
contentCuts: the sequence of content cuts
contentCutIndex: the zero-based index of the current content cut

Expected return type: double

Example 1: Use only structure cuts as column boundaries

isStructureCut ? structureCuts[structureCutIndex] : null

If the current column cut is a structure cut, it is used as a column boundary. Otherwise, the value is set to null, meaning content cuts are ignored and filtered out.

Example 2: Use only content cuts as column boundaries

isContentCut ? contentCuts[contentCutIndex] : null

If the current column cut is a content cut, it is used as a column boundary. Otherwise, the value is set to null, meaning structure cuts are ignored and filtered out.

Row Cut Filter Expression

This expression is executed for each row structure cut and row content cut identified by the Structures Row Threshold Expression and the Content Row Threshold Expression. It determines which cuts should be used as actual row boundaries in the final table.

In most cases, it is sufficient to use only structure cuts—that is, cuts derived from the table grid shapes. Content cuts can be useful in cases where clear row lines are missing.

Available variables:

isStructureCut: true if the current cut is a structure cut
structureCuts: the sequence of structure cuts
structureCutIndex: the zero-based index of the current structure cut
isContentCut: true if the current cut is a content cut
contentCuts: the sequence of content cuts
contentCutIndex: the zero-based index of the current content cut

Expected return type: double

Example 1: Use only structure cuts as row boundaries

isStructureCut ? structureCuts[structureCutIndex] : null

If the current row cut is a structure cut, it is used as a row boundary. Otherwise, the value is set to null, meaning content cuts are ignored and filtered out.

Example 2: Use only content cuts as row boundaries

isContentCut ? contentCuts[contentCutIndex] : null

If the current row cut is a content cut, it is used as a row boundary. Otherwise, the value is set to null, meaning structure cuts are ignored and filtered out.

Table Name Expression

Assign names to table shape trees for easier reference in later processing. Use static strings or expressions.

Available variables:

firstTable: refers to the first table in the sequence of identified tables.
lastTable: refers to the last table in the sequence of identified tables.
tableIndex: index of the table in the sequence of identified tables.
tableCount: total number of tables in the sequence of identified tables.

Expected return type: string

Example:

"Table"

Assigns the name "Table" to all table shape trees.

Row Name Expression

Assign names to row shape trees for easier reference in later processing. Use static strings or expressions.

Available variables:

firstRow: refers to the first row in a single table
lastRow: refers to the last row in a single table
rowIndex: index of the table row in a single table
rowCount: total number of rows in a single table

Expected return type: string

Example:

"TR"

Assigns the name "TR" to all row shape trees.

Cell Name Expression

Assign names to cell shape trees for easier reference in later processing. Use static strings or expressions.

Available variables:

firstRow: refers to all cells in the first row in a single table
lastRow: refers to all cells in the last row in a single table
rowIndex: index of the table row in a single table
rowCount: total number of rows in a single table
firstCol: refers to all cells in the first column in a single table
lastCol: refers to all cells in the last column in a single table
colCount: total number of columns in a single table
colIndex: index of the table column in a single table

Expected return type: string

Examples:

rowIndex = 0 ? "TH" : "TD"

Assigns the name "TH" to cells in the first row, and "TD" to all others.

rowIndex <= 1 ? "TH" : "TD"

Assigns the name "TH" to cells in the first and second column, and "TD" to all others.