PINE LIBRARY

RandomForestLibrary

313
RandomForestLibrary is a self-contained Random Forest library for Pine Script v6 that other Pine developers can [icode]import[/icode] and use to build their own machine learning indicators and strategies.

What Makes This Different

Random Forest is one of the most widely used ensemble methods in applied machine learning. Until now, Pine Script developers wanting to use it had only two choices: call out to an external Python / ONNX pipeline, or hand-roll a single decision tree inline. This library closes that gap by providing a complete Random Forest implementation — CART trees, bootstrap aggregation, Gini / MSE splitting, out-of-bag scoring, weighted feature importance — reachable with a few lines of [icode]import[/icode] code.

The API intentionally mirrors scikit-learn's [icode]RandomForestClassifier[/icode] and [icode]RandomForestRegressor[/icode] (init → fit → predict → evaluate), so practitioners already familiar with scikit-learn can translate existing logic directly.

What This Library Provides

  • Binary classification: [icode]fit(X, y)[/icode], [icode]predict[/icode], [icode]predict_proba[/icode], [icode]predict_batch[/icode], [icode]oob_score[/icode]
  • Multi-output regression: [icode]fit_regressor(X, Y)[/icode], [icode]predict_multi[/icode], [icode]predict_multi_per_tree[/icode], [icode]oob_r2[/icode], [icode]oob_residual_std[/icode]
  • Weighted Gini / MSE feature importance: [icode]feature_importance()[/icode]
  • Deterministic Park-Miller RNG for reproducible forests


Exported Types

  • Forest — the ensemble model. Holds all trees, hyperparameters, training data references, OOB accumulators, and feature importances.
  • Tree — a single decision tree with its node array, max depth, leaf count, and split-failure count.
  • Node — a single node storing feature index, threshold, children indices, leaf label / probability, Gini impurity (or MSE in regressor mode), sample count, and a per-horizon output array for regression.
  • RNG — a Park-Miller linear congruential generator with a=48271, m=2^31-1. Deterministic given the same seed.


Exported Methods

Initialization and training
  • [icode]init(n_estimators, max_depth, max_features, min_samples_leaf, n_threshold_candidates, seed)[/icode] — configure hyperparameters. [icode]max_features=0[/icode] auto-selects [icode]ceil(sqrt(n_features))[/icode] for classification and [icode]ceil(n_features/3)[/icode] for regression.
  • [icode]fit(X, y)[/icode] — train classifier on a feature matrix [icode]X[/icode] (rows = samples, columns = features) and integer label array [icode]y[/icode] (values 0 or 1).
  • [icode]fit_regressor(X, Y)[/icode] — train multi-output regressor. [icode]Y[/icode] is a matrix whose columns are separate regression horizons / targets.


Inference
  • [icode]predict(sample)[/icode] — classify a single sample via soft voting (threshold 0.5).
  • [icode]predict_proba(sample)[/icode] — average class-1 probability across all trees.
  • [icode]predict_batch(X)[/icode] — classify every row of a matrix.
  • [icode]predict_multi(sample)[/icode] — regressor output: averaged per-horizon predictions.
  • [icode]predict_multi_per_tree(sample)[/icode] — per-tree, per-horizon predictions for custom uncertainty analysis.
  • [icode]tree_predict[/icode], [icode]tree_predict_proba[/icode], [icode]tree_predict_multi[/icode] — single-tree inference for advanced use.


Evaluation
  • [icode]oob_score()[/icode] — classification out-of-bag accuracy (0.0 to 1.0), computed by soft voting on samples not selected in each tree's bootstrap.
  • [icode]oob_r2()[/icode] — regression out-of-bag R^2, averaged across horizons.
  • [icode]oob_residual_std()[/icode] — per-horizon standard deviation of OOB residuals. Useful for prediction interval construction (Wager, Hastie, and Efron 2014).
  • [icode]feature_importance()[/icode] — normalized weighted Gini (or MSE) decrease per feature, averaged across trees. Sums to approximately 1.0.


How It Works

Tree construction (CART, iterative, level-by-level)

Each tree is built top-down, one depth level at a time, using complete binary tree indexing ([icode]left = 2i+1[/icode], [icode]right = 2i+2[/icode]). At every internal node:

  • A random subset of features of size [icode]max_features[/icode] is drawn without replacement.
  • For each feature, [icode]n_threshold_candidates[/icode] thresholds are sampled uniformly between the feature's min and max on the samples at that node.
  • For classification, the split minimizing weighted Gini impurity is chosen. For regression, the split minimizing weighted MSE (summed over all horizons) is chosen.
  • A node becomes a leaf when it is pure (classification), too small ([icode]n < 2 * min_samples_leaf[/icode]), at max depth, or when no valid split exists.


Bootstrap aggregation and OOB

Each tree is trained on a bootstrap sample (same size as the training set, sampled with replacement). Samples that were not drawn for a given tree become its out-of-bag set and are used to compute unbiased performance estimates ([icode]oob_score[/icode] / [icode]oob_r2[/icode]) and residual variance ([icode]oob_residual_std[/icode]), avoiding the need for a separate holdout.

Feature importance

Each split records its weighted impurity decrease ([icode]n_node * impurity_node - n_left * impurity_left - n_right * impurity_right[/icode]). Per-tree importances are normalized to sum to 1, then averaged across trees — matching scikit-learn's definition.

Quick Start

[code]//version=6
indicator("My RF Indicator")
import ShigemiQuant/RandomForestLibrary/2 as RF

// 1. Build feature matrix X and label array y over recent bars
// (not shown: accumulate features into a matrix<float>)

// 2. Initialize and train
var RF.Forest model = RF.Forest.new().init(
n_estimators = 10,
max_depth = 4,
seed = 42)

if barstate.islast
model.fit(X, y)

// 3. Predict on current bar
array<float> sample = array.from(rsi_val, atr_pct, cci_val, adx_val)
float prob = model.predict_proba(sample)

// 4. Evaluate
float oob = model.oob_score()
label.new(bar_index, close, "prob=" + str.tostring(prob, "#.##") + " oob=" + str.tostring(oob, "#.##"))
[/code]

Compatibility Notes

  • scikit-learn parity: same init → fit → predict / predict_proba workflow, same default for [icode]max_features[/icode], OOB uses soft voting, importances use weighted Gini decrease.
  • Determinism: given identical [icode]seed[/icode], training set, and hyperparameters, the resulting forest and all predictions are bit-identical across reruns.
  • Binary classification only in [icode]fit()[/icode]: labels must be 0 or 1. Multi-class is not yet supported.
  • Numeric features only: all columns of [icode]X[/icode] must be [icode]float[/icode].


Limitations

  • This is a machine-learning library, not a trading signal. Indicators built with it make no guarantee of profit, do not predict the future, and depend entirely on the quality of the features, labels, and hyperparameters that the caller supplies.
  • Binary classification only in [icode]fit()[/icode] (labels must be 0 or 1); multi-class is not supported. Regression via [icode]fit_regressor()[/icode] supports multi-output targets but assumes they are numeric float values.
  • TradingView runtime budget limits tree size. A reasonable starting point is [icode]n_estimators[/icode] between 5 and 20 with [icode]max_depth[/icode] between 3 and 6. Total node budget per tree is [icode]2^(max_depth+1) - 1[/icode] — depth 6 allows up to 127 nodes per tree, and 15 trees means up to roughly 1,905 nodes total.
  • Large training sets combined with deep trees (thousands of bars × depth 6) can hit Pine Script's loop iteration caps. Start small and scale up while watching compile / runtime warnings.
  • OOB metrics ([icode]oob_score[/icode], [icode]oob_r2[/icode], [icode]oob_residual_std[/icode]) are valid only when each sample is out-of-bag in at least one tree. For very small training sets or very few estimators, some samples may never be OOB and those metrics will be biased or undefined.
  • Overfitting is the caller's responsibility. The library exposes standard controls ([icode]max_depth[/icode], [icode]min_samples_leaf[/icode], [icode]max_features[/icode], [icode]n_estimators[/icode]) but applies no automatic regularization. Trees that are too deep on a noisy training window will memorize noise.
  • Features must be stationary enough to generalize. Raw price levels or unnormalized indicators that drift with the market will cause training-test distribution shift. Prefer bounded or ratio-based features (RSI, ATR%, percentile ranks).
  • Training happens on the chart's own bar history. There is no external data upload; the library cannot import pre-trained models, and the forest must be rebuilt whenever the script recomputes. Designs that rely on very large historical context may conflict with Pine Script's bar-history window.


References

  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
  • Wager, S., Hastie, T., and Efron, B. (2014). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife. Journal of Machine Learning Research, 15, 1625–1651.


Disclaimer

This library is an educational and research tool. It does not constitute financial advice. All trading decisions based on code built with this library are the sole responsibility of the user. Past model performance does not guarantee future results.

Disclaimer

The information and publications are not meant to be, and do not constitute, financial, investment, trading, or other types of advice or recommendations supplied or endorsed by TradingView. Read more in the Terms of Use.