The TabularData class (from System.Data.Tabular) provides a high-performance way to work with large, column-oriented datasets. It is ideal for data analysis, feature engineering, and high-throughput processing.
TabularData supports JIT-compiled expressions for adding columns and calculating aggregates. This is particularly useful for tasks like Normalization (scaling values to a [0, 1] range) or applying Activation Functions (like ReLU) across millions of rows.
uses System.Data, System.Data.Tabular;
var data := new TabularData;
data.AddColumn('Signal', [0.5, 1.2, 3.4, 2.1, 0.8]);
// 1. Define range for normalization (usually pre-calculated or known)
var sMin := 0.5;
var sMax := 3.4;
var sRange := sMax - sMin;
// 2. Apply Min-Max Scaling via JIT expression: (Signal - Min) / Range
data.EvaluateNewColumn('Normalized', [ '"Signal"', sMin, '-', sRange, '/' ]);
PrintLn(data.ExportToSeparated(['Signal', 'Normalized'])); Signal,Normalized 0.5,0 1.2,0.241379310344828 3.4,1 2.1,0.551724137931035 0.8,0.103448275862069
Calculations in TabularData use Reverse Polish Notation (RPN), a stack-based logic where operators follow their operands. This allows the JIT engine to evaluate expressions with extreme efficiency without needing complex parenthetical parsing.
Think of a stack where values are pushed. An operator then "pops" the required number of values, performs its calculation, and "pushes" the result back.
| Expression | RPN Array | Logic |
|---|---|---|
a + b |
['"a"', '"b"', '+'] |
Push a, Push b, Add them |
(a + b) * c |
['"a"', '"b"', '+', '"c"', '*'] |
Add a and b first, then multiply by c |
max(0, a + b) |
['"a"', '"b"', 'relu'] |
relu is a fused binary operator |
uses System.Data, System.Data.Tabular;
var data := new TabularData;
data.AddColumn('A', [10.0, 20.0]);
data.AddColumn('B', [5.0, 2.0]);
// Weighted Average: (A * 0.7) + (B * 0.3)
data.EvaluateNewColumn('Weighted', [ '"A"', 0.7, '*', '"B"', 0.3, '*', '+' ]);
// Error Metric (SMAPE): abs(A-B) / (abs(A)+abs(B))
data.EvaluateNewColumn('Error', [ '"A"', '"B"', 'smape' ]);
PrintLn(data.ExportToSeparated(['A', 'B', 'Weighted', 'Error'])); A,B,Weighted,Error 10,5,8.5,0.333333333333333 20,2,14.6,0.818181818181818
Because TabularData is optimized for batch processing and supports fused activation functions (like relu), it is highly effective for running pre-trained neural network inference across large datasets. This is commonly used for classification and prediction tasks where you need to process thousands of records instantly.
The relu opcode is specifically designed for this: it is a fused binary operator that calculates max(0, a + b). It pops two values (the sum and the bias), adds them, and pushes the result clamped to zero.
This snippet demonstrates a simplified inference pipeline using features from the Titanic dataset (Sex, PClass).
uses System.Data, System.Data.Tabular;
var model := new TabularData;
// Inputs: Features [Sex, PClass]
// (0.0 = Female, 1.0 = Male | 1.0 = 1st Class, 0.0 = 3rd Class)
model.AddColumn('Sex', [0.0, 1.0]);
model.AddColumn('PClass', [1.0, 0.0]);
// Hidden Layer: ReLU(Sex * -1.5 + PClass * 1.2 + 0.5)
// 'relu' pops the top two values (sum and bias). Here: (Sex*-1.5 + PClass*1.2) and 0.5
model.EvaluateNewColumn('H1', [ '"Sex"', -1.5, '*', '"PClass"', 1.2, '*', '+', 0.5, 'relu' ]);
// Output Layer (Prob): Sigmoid(H1 * 1.5 - 0.5)
model.EvaluateNewColumn('Logit', [ '"H1"', 1.5, '*', 0.5, '-' ]);
model.EvaluateNewColumn('Prob', [ 1, 1, '"Logit"', -1, '*', 'exp', '+', '/' ]);
PrintLn(model.ExportToSeparated(['Sex', 'PClass', 'Prob'])); Sex,PClass,Prob 0,1,0.885947618720209 1,0,0.377540668798145
Complex architectures like Residual Networks (ResNets) can be implemented using stack manipulation opcodes. These allow you to reuse values (like a previous layer's output) without recalculating them or reading from memory.
dup0: Duplicates the top value on the stack.dup1: Duplicates the value below the top.In the example below, we calculate a "base feature," duplicate it to use in a non-linear branch, and then add it back (the residual link) before the final activation.
The output below includes trailing commas. This occurs because ColumnStrings returns the full internal capacity of the column (defaulting to 8 for JIT optimization), even if fewer rows are active.
uses System.Data, System.Data.Tabular;
var model := new TabularData;
model.AddColumn('X', [0.5, -0.2]);
// Deep Residual Logic: Sigmoid(ReLU(X*1.5 + 0.5) + X - 0.2)
model.EvaluateNewColumn('Res', [
1, 1, // Sigmoid pre-load
'"X"', 1.5, '*', 0.5, '+', // Base calculation
'dup0', // Keep for residual. Stack: [1, 1, base, base]
0, 'relu', // Transform top copy. Stack: [1, 1, base, H1]
'+', // Add base back (Residual). Stack: [1, 1, base+H1]
0.2, '-', // Final bias
-1, '*', 'exp', '+', '/' // Sigmoid
]);
PrintLn('Result: ' + model.ColumnStrings('Res').Join(', ')); Result: 0.908877038985144, 0.549833997312478, , , , , ,
For maximum performance, you can "fold" multiple layers into a single RPN expression. This avoids the overhead of writing intermediate results back to column memory, keeping all neuron activations on the CPU stack during the JIT-compiled loop.
uses System.Data, System.Data.Tabular;
var data := new TabularData;
data.AddColumn('X', [0.5, -0.2]);
// Calculate a 2-layer result in one pass: Sigmoid(ReLU(X*1.5 + 0.5) * 0.8 - 0.2)
// relu pops two values (sum + bias) and pushes max(0, sum+bias)
data.EvaluateNewColumn('Result', [
1, 1, // Sigmoid pre-load
'"X"', 1.5, '*', 0.5, 'relu', 0.8, '*', 0.2, '-',
-1, '*', 'exp', '+', '/' // Sigmoid final
]);
PrintLn('Result: ' + data.ColumnStrings('Result').Join(', ')); Result: 0.689974481127613, 0.490001333120035, , , , , ,
In a web environment, you can load a dataset once and share it across all requests using LockAndShare. This drastically reduces memory usage and startup time for data-heavy applications.
uses System.Data.Tabular;
var sharedData := TabularData.ConnectToShared('AppWideData');
if sharedData = nil then begin
// Load data and share it
// ...
var data := new TabularData;
// ...
data.LockAndShare('AppWideData');
PrintLn('Data loaded and shared.');
end else begin
PrintLn('Connected to shared data.');
end; Connected to shared data.
For a complete list of aggregation functions and sharing options, see the reference documentation: