Tabular Data

The TabularData class (from System.Data.Tabular) provides a high-performance way to work with large, column-oriented datasets. It is ideal for data analysis, feature engineering, and high-throughput processing.

High-Speed Data Transformation

TabularData supports JIT-compiled expressions for adding columns and calculating aggregates. This is particularly useful for tasks like Normalization (scaling values to a [0, 1] range) or applying Activation Functions (like ReLU) across millions of rows.

uses System.Data, System.Data.Tabular;

var data := new TabularData;
data.AddColumn('Signal', [0.5, 1.2, 3.4, 2.1, 0.8]);

// 1. Define range for normalization (usually pre-calculated or known)
var sMin := 0.5;
var sMax := 3.4;
var sRange := sMax - sMin;

// 2. Apply Min-Max Scaling via JIT expression: (Signal - Min) / Range
data.EvaluateNewColumn('Normalized', [ '"Signal"', sMin, '-', sRange, '/' ]);

PrintLn(data.ExportToSeparated(['Signal', 'Normalized']));
Result
Signal,Normalized
0.5,0
1.2,0.241379310344828
3.4,1
2.1,0.551724137931035
0.8,0.103448275862069

Understanding RPN Expressions

Calculations in TabularData use Reverse Polish Notation (RPN), a stack-based logic where operators follow their operands. This allows the JIT engine to evaluate expressions with extreme efficiency without needing complex parenthetical parsing.

How it Works

Think of a stack where values are pushed. An operator then "pops" the required number of values, performs its calculation, and "pushes" the result back.

Expression RPN Array Logic
a + b ['"a"', '"b"', '+'] Push a, Push b, Add them
(a + b) * c ['"a"', '"b"', '+', '"c"', '*'] Add a and b first, then multiply by c
max(0, a + b) ['"a"', '"b"', 'relu'] relu is a fused binary operator

Practical Examples

uses System.Data, System.Data.Tabular;

var data := new TabularData;
data.AddColumn('A', [10.0, 20.0]);
data.AddColumn('B', [5.0, 2.0]);

// Weighted Average: (A * 0.7) + (B * 0.3)
data.EvaluateNewColumn('Weighted', [ '"A"', 0.7, '*', '"B"', 0.3, '*', '+' ]);

// Error Metric (SMAPE): abs(A-B) / (abs(A)+abs(B))
data.EvaluateNewColumn('Error', [ '"A"', '"B"', 'smape' ]);

PrintLn(data.ExportToSeparated(['A', 'B', 'Weighted', 'Error']));
Result
A,B,Weighted,Error
10,5,8.5,0.333333333333333
20,2,14.6,0.818181818181818

Neural Network Inference

Because TabularData is optimized for batch processing and supports fused activation functions (like relu), it is highly effective for running pre-trained neural network inference across large datasets. This is commonly used for classification and prediction tasks where you need to process thousands of records instantly.

The relu opcode is specifically designed for this: it is a fused binary operator that calculates max(0, a + b). It pops two values (the sum and the bias), adds them, and pushes the result clamped to zero.

Example: Titanic Survival Prediction

This snippet demonstrates a simplified inference pipeline using features from the Titanic dataset (Sex, PClass).

uses System.Data, System.Data.Tabular;

var model := new TabularData;
// Inputs: Features [Sex, PClass]
// (0.0 = Female, 1.0 = Male | 1.0 = 1st Class, 0.0 = 3rd Class)
model.AddColumn('Sex',    [0.0, 1.0]); 
model.AddColumn('PClass', [1.0, 0.0]); 

// Hidden Layer: ReLU(Sex * -1.5 + PClass * 1.2 + 0.5)
// 'relu' pops the top two values (sum and bias). Here: (Sex*-1.5 + PClass*1.2) and 0.5
model.EvaluateNewColumn('H1', [ '"Sex"', -1.5, '*', '"PClass"', 1.2, '*', '+', 0.5, 'relu' ]);

// Output Layer (Prob): Sigmoid(H1 * 1.5 - 0.5)
model.EvaluateNewColumn('Logit', [ '"H1"', 1.5, '*', 0.5, '-' ]);
model.EvaluateNewColumn('Prob',  [ 1, 1, '"Logit"', -1, '*', 'exp', '+', '/' ]);

PrintLn(model.ExportToSeparated(['Sex', 'PClass', 'Prob']));
Result
Sex,PClass,Prob
0,1,0.885947618720209
1,0,0.377540668798145

Advanced: Residual Links & Stack Manipulation

Complex architectures like Residual Networks (ResNets) can be implemented using stack manipulation opcodes. These allow you to reuse values (like a previous layer's output) without recalculating them or reading from memory.

  • dup0: Duplicates the top value on the stack.
  • dup1: Duplicates the value below the top.

In the example below, we calculate a "base feature," duplicate it to use in a non-linear branch, and then add it back (the residual link) before the final activation.

The output below includes trailing commas. This occurs because ColumnStrings returns the full internal capacity of the column (defaulting to 8 for JIT optimization), even if fewer rows are active.

uses System.Data, System.Data.Tabular;

var model := new TabularData;
model.AddColumn('X', [0.5, -0.2]);

// Deep Residual Logic: Sigmoid(ReLU(X*1.5 + 0.5) + X - 0.2)
model.EvaluateNewColumn('Res', [
  1, 1,               // Sigmoid pre-load
    '"X"', 1.5, '*', 0.5, '+', // Base calculation
    'dup0',           // Keep for residual. Stack: [1, 1, base, base]
    0, 'relu',        // Transform top copy. Stack: [1, 1, base, H1]
    '+',              // Add base back (Residual). Stack: [1, 1, base+H1]
    0.2, '-',         // Final bias
  -1, '*', 'exp', '+', '/'  // Sigmoid
]);

PrintLn('Result: ' + model.ColumnStrings('Res').Join(', '));
Result
Result: 0.908877038985144, 0.549833997312478, , , , , ,

Folded Inference Pipelines

For maximum performance, you can "fold" multiple layers into a single RPN expression. This avoids the overhead of writing intermediate results back to column memory, keeping all neuron activations on the CPU stack during the JIT-compiled loop.

uses System.Data, System.Data.Tabular;

var data := new TabularData;
data.AddColumn('X', [0.5, -0.2]);

// Calculate a 2-layer result in one pass: Sigmoid(ReLU(X*1.5 + 0.5) * 0.8 - 0.2)
// relu pops two values (sum + bias) and pushes max(0, sum+bias)
data.EvaluateNewColumn('Result', [
  1, 1,               // Sigmoid pre-load
    '"X"', 1.5, '*', 0.5, 'relu', 0.8, '*', 0.2, '-', 
  -1, '*', 'exp', '+', '/'  // Sigmoid final
]);

PrintLn('Result: ' + data.ColumnStrings('Result').Join(', '));
Result
Result: 0.689974481127613, 0.490001333120035, , , , , ,

Sharing Data (Web)

In a web environment, you can load a dataset once and share it across all requests using LockAndShare. This drastically reduces memory usage and startup time for data-heavy applications.

uses System.Data.Tabular;

var sharedData := TabularData.ConnectToShared('AppWideData');
if sharedData = nil then begin
  // Load data and share it
  // ...
  var data := new TabularData;
  // ...
  data.LockAndShare('AppWideData');
  PrintLn('Data loaded and shared.');
end else begin
  PrintLn('Connected to shared data.');
end;
Result
Connected to shared data.

Related Reference

For a complete list of aggregation functions and sharing options, see the reference documentation:

On this page