What it does
This node automatically detects the table structure based on shapes that define the table grid, applies this structure to the tagged table content, and generates complete tables populated with that content.
Table detection is based on four types of histograms that analyze how much of an area is occupied by shapes (i.e., the percentage of area coverage) in both the grid and the content. These histograms are used to identify rows and columns:
- A histogram across the table width to detect structure columns
- A histogram across the table height to detect structure rows
- A histogram across the table width to detect content columns
- A histogram across the table height to detect content rows
For example, if a table grid consists of vertical lines, the histogram for structure columns will show strong peaks at positions where these lines occupy a large portion of the table height.
Tip
Use the Set Table Attributes Node to add table summaries, header scopes and cell spans.
Use it for
Use the Detect Table node for tables with simple grid layouts. For tables with complex grid layouts that cannot be reliably detected by the Detect Table node, use the Tabulate node instead.
Recommended workflow:
- For simple table layouts:
- Use the Group Spatially node to group grid elements and content.
- Pass the result to the Detect Table node.
→ This approach is faster and more efficient.
- For complex table layouts that detection fails on:
→ Use the Tabulate Node as a fallback solution.
How to use it
- Drag and drop the node from the Node Library into your template:
Node Library > Folder Shapes > Folder Grouping - Connect the node with other nodes in the Data Flow of your template. Connect the table content to the Content input port and the table structure to the Structures input port.
- Specify the settings in the Node Properties task pane.
Node Input
-
Content: Connect a node containing only the table content (typically shape trees with tagged table text), excluding any shapes that form the table grid.
The content should be grouped into a container per table. If you used the Group Spatially node, connect its Content Groups output.
Within each group, the table content should already be properly structured into lines and paragraphs and correctly tagged. The order is not important, as the node will automatically sort the content into the appropriate cells based on the provided table structures. -
Structures: Connect a node containing the table structure. This usually consists of path shapes (e.g., lines or rectangles) that visually define the table grid.
The structures should be grouped into a container per table. If you used the Group Spatially node, connect its Geometry Groups output.
These shapes do not need to be ordered or arranged into rows and columns; they only need to be grouped per table.
Node Output
Tables: Outputs shape trees that represent complete tables.
Node Properties
Note
In most cases, the node’s default values are sufficient for accurate table detection.
Node Name
You can assign a custom name to the node to help identify its purpose within your template.
Column Raster Size
Defines the resolution used to calculate column histograms. The table width is divided into vertical strips of the specified size (in points), and one histogram value is computed per strip.
A finer raster reduces performance, but it must not be too coarse to ensure accurate table detection.
Example:
3,000
The table width is divided into vertical strips of 3 points.
Row Raster Size
A finer raster reduces performance, but it must not be too coarse to ensure accurate table detection.
Example:
1,500
The table height is divided into horizontal strips of 1.5 points.
Min. Raster Column Width
Example:
4
If the Column Raster Size is set to 3 points, the minimum column width is calculated as 4 × 3 points, resulting in 12 points.
Min. Raster Row Height
Example:
5
If the Column Raster Size is set to 1.5 points, the minimum row width is calculated as 4 × 1.5 points, resulting in 6 points.
Structures Column Threshold Expression
The structures column histogram spans the full width and height of the table. The table is divided into vertical strips (based on the Column Raster Size), and the area of each strip that is covered by structure shapes is accumulated.
The higher the coverage within a strip, the higher the histogram value. Peaks in the histogram typically indicate vertical grid lines (columns), as these occupy a large portion of the strip area.
Available variables:
-
bbox: the table bounding box. Properties:width,height,left,right,topandbottom.
Expected return type: double
Example:
bbox.height * 0.5
If the histogram peak exceeds half the table height, it is considered a column line.
Structures Row Threshold Expression
The structures row histogram spans the full height and width of the table. The table is divided into horizontal strips (based on the Row Raster Size), and the area of each strip that is covered by structure shapes is accumulated.
The higher the coverage within a strip, the higher the histogram value. Peaks in the histogram typically indicate horizontal grid lines (rows).
This expression defines the minimum histogram value required for a strip to be recognized as a row line. At all positions where this value is exceeded, a so-called row structure cut is set, marking a potential row boundary.
Available variables:
-
bbox: the table bounding box. Properties:width,height,left,right,topandbottom.
Expected return type: double
Example:
bbox.width * 0.5
If the histogram value exceeds half the table width, the strip is considered a row line.
Content Column Threshold Expression
The content column histogram spans the full width and height of the table. The table is divided into vertical strips (based on the Column Raster Size), and the area of each strip that is covered by content shapes is accumulated.
The more content a strip contains, the higher its histogram value. To detect columns based on content, gaps between content are analyzed. These appear as strips with low or zero coverage. Valleys in the histogram—especially values close to zero—typically indicate regions without content and may mark column boundaries.
This expression defines the maximum histogram value for a strip to be considered a valley and therefore interpreted as a column boundary. At all positions where this value is not exceeded, a so-called column content cut is set, marking a potential column boundary.
Available variables:
-
bbox: the table bounding box. Properties:width,height,left,right,topandbottom.
Expected return type: double
Example:
0.0
If a strip contains no content, its value is zero, indicating a column boundary.
Content Row Threshold Expression
The content row histogram spans the full width and height of the table. The table is divided into horizontal strips (based on the Row Raster Size), and the area of each strip that is covered by content shapes is accumulated.
The more content a strip contains, the higher its histogram value. To detect rows based on content, gaps between content are analyzed. These appear as strips with low or zero coverage. Valleys in the histogram—especially values close to zero—typically indicate regions without content and may mark column boundaries.
This expression defines the maximum histogram value for a strip to be considered a valley and therefore interpreted as a row boundary. At all positions where this value is not exceeded, a so-called row content cut is set, marking a potential row boundary.
Available variables:
-
bbox: the table bounding box. Properties:width,height,left,right,topandbottom.
Expected return type: double
Example:
0.0
If a strip contains no content, its value is zero, indicating a row boundary.
Column Cut Filter Expression
This expression is executed for each column structure cut and column content cut identified by the Structures Column Threshold Expression and the Content Column Threshold Expression. It determines which cuts should be used as actual column boundaries in the final table.
In most cases, it is sufficient to use only structure cuts—that is, cuts derived from the table grid shapes. Content cuts can be useful in cases where clear column lines are missing.
Available variables:
-
isStructureCut:trueif the current cut is a structure cut -
structureCuts: the sequence of structure cuts -
structureCutIndex: the zero-based index of the current structure cut -
isContentCut:trueif the current cut is a content cut -
contentCuts: the sequence of content cuts -
contentCutIndex: the zero-based index of the current content cut
Expected return type: double
Example 1: Use only structure cuts as column boundaries
isStructureCut ? structureCuts[structureCutIndex] : null
If the current column cut is a structure cut, it is used as a column boundary. Otherwise, the value is set to null, meaning content cuts are ignored and filtered out.
Example 2: Use only content cuts as column boundaries
isContentCut ? contentCuts[contentCutIndex] : null
If the current column cut is a content cut, it is used as a column boundary. Otherwise, the value is set to null, meaning structure cuts are ignored and filtered out.
Row Cut Filter Expression
This expression is executed for each row structure cut and row content cut identified by the Structures Row Threshold Expression and the Content Row Threshold Expression. It determines which cuts should be used as actual row boundaries in the final table.
In most cases, it is sufficient to use only structure cuts—that is, cuts derived from the table grid shapes. Content cuts can be useful in cases where clear row lines are missing.
Available variables:
-
isStructureCut:trueif the current cut is a structure cut -
structureCuts: the sequence of structure cuts -
structureCutIndex: the zero-based index of the current structure cut -
isContentCut:trueif the current cut is a content cut -
contentCuts: the sequence of content cuts -
contentCutIndex: the zero-based index of the current content cut
Expected return type: double
Example 1: Use only structure cuts as row boundaries
isStructureCut ? structureCuts[structureCutIndex] : null
If the current row cut is a structure cut, it is used as a row boundary. Otherwise, the value is set to null, meaning content cuts are ignored and filtered out.
Example 2: Use only content cuts as row boundaries
isContentCut ? contentCuts[contentCutIndex] : null
If the current row cut is a content cut, it is used as a row boundary. Otherwise, the value is set to null, meaning structure cuts are ignored and filtered out.
Table Name Expression
Assign names to table shape trees for easier reference in later processing. Use static strings or expressions.
Available variables:
-
firstTable: refers to the first table in the sequence of identified tables. -
lastTable: refers to the last table in the sequence of identified tables. -
tableIndex: index of the table in the sequence of identified tables. -
tableCount: total number of tables in the sequence of identified tables.
Expected return type: string
Example:
"Table"
Assigns the name "Table" to all table shape trees.
Row Name Expression
Assign names to row shape trees for easier reference in later processing. Use static strings or expressions.
Available variables:
-
firstRow: refers to the first row in a single table -
lastRow: refers to the last row in a single table -
rowIndex: index of the table row in a single table -
rowCount: total number of rows in a single table
Expected return type: string
Example:
"TR"
Assigns the name "TR" to all row shape trees.
Cell Name Expression
Assign names to cell shape trees for easier reference in later processing. Use static strings or expressions.
Available variables:
-
firstRow: refers to all cells in the first row in a single table -
lastRow: refers to all cells in the last row in a single table -
rowIndex: index of the table row in a single table -
rowCount: total number of rows in a single table -
firstCol: refers to all cells in the first column in a single table -
lastCol: refers to all cells in the last column in a single table -
colCount: total number of columns in a single table -
colIndex: index of the table column in a single table
Expected return type: string
Examples:
rowIndex = 0 ? "TH" : "TD"
Assigns the name "TH" to cells in the first row, and "TD" to all others.
rowIndex <= 1 ? "TH" : "TD"
Assigns the name "TH" to cells in the first and second column, and "TD" to all others.