S&P 500 E-mini Futures
Education

Where the Edge Actually Lives

2 967
Someone asked the obvious question. After we published our VWAP study showing that mean reversion generates over 150,000 Bonferroni-significant results with short signal edge of 0.89 percentage points, the first response from readers was not skepticism. It was not a methodological objection. It was: "Which parameters? Which assets? Show me the table."

Fair enough. The original paper documented that VWAP mean reversion works and that crossovers do not. It showed the aggregate statistics and the strategy comparison. What it did not do was lay out the full map of where precisely the edge concentrates, which assets carry it, what parameter ranges survive the strictest correction, and how the profitable zone looks when you decompose it by direction, timeframe, and deviation threshold.

This Idea does that. We extracted every single configuration from the 5.8 million tests that survives Bonferroni correction and shows positive edge, then organized them into a parameter atlas. The result is 101,632 profitable configurations. This is not a theoretical argument. It is a coordinate system.


1. What this paper contains and what it does not

This is a companion piece to our main VWAP study. It does not repeat the methodology, the statistical framework, or the strategy definitions. Those are documented in the original paper. Readers unfamiliar with the test design should consult that paper first.

What this paper provides is the decomposition. Every profitable, Bonferroni-significant VWAP configuration is catalogued by asset, direction, strategy, timeframe, VWAP lookback period, deviation multiplier, holding period, edge magnitude, and p-value.

The figures referenced throughout are generated directly from this dataset. No aggregation was performed that is not documented. No configurations were excluded except leveraged and volatility products, and those exclusions are explained.


2. The profitable universe at a glance

From 5,833,435 tested configurations with Bonferroni threshold 8.57 times ten to the negative ninth power, 101,632 survive correction and show positive edge. Of these, 23,444 are profitable long signals and 78,188 are profitable short signals. The short side outnumbers the long side by roughly 3.3 to 1.

Mean reversion accounts for 92,811 of the 101,632 configurations, which is 91 percent. Distance percentile accounts for 6,419. Bounce contributes 2,341. The remaining strategies, reversal, breakout, and volume confirmation, together account for 61 configurations. This confirms the original finding with granular precision: VWAP predictive content is a mean reversion phenomenon, and all other strategy types are noise.

snapshot

FIGURE 1: Top 20 tradeable patterns ranked by composite score. Horizontal bar chart showing median edge per pattern, color-coded by direction (green for Long, red for Short). Labels include config count. Excludes leveraged and volatility products.


3. Where the parameters cluster

We computed the interquartile range across all profitable mean reversion configurations in the tradeable universe, excluding leveraged and volatility products. The parameter core zone is:

VWAP lookback period: median 71 bars, interquartile range 54 to 86. The profitable region starts at roughly 50 bars and extends to the maximum tested value of 100. Short lookback periods below 30 produce almost no Bonferroni-significant results. This makes intuitive sense. VWAP calculated over a very short window is noisy and carries little information about where the institutional volume actually transacted. Longer windows accumulate more volume data and produce a more stable estimate of equilibrium price.

Deviation threshold: median 2.75 standard deviations, interquartile range 1.75 to 3.75. Signals that fire at less than 1.5 standard deviations from VWAP do not survive Bonferroni correction in sufficient numbers. Signals at more than 4 standard deviations survive but are extremely rare. The sweet spot sits between 2 and 4 standard deviations, where the deviation is large enough to create meaningful reversion pressure but not so large that signals never fire.

Holding period: median 50 bars, interquartile range 25 to 75. This is the finding that most directly impacts implementation. The profitable configurations are not quick trades. They are multi-week positions. Holding periods below 10 bars produce almost no surviving results. The longest tested period of 90 bars is well represented in the profitable set. VWAP mean reversion is a slow strategy, and trying to accelerate it destroys the edge.

snapshot

FIGURE 2: Parameter sweet spot heatmap. VWAP lookback period on x-axis, holding period on y-axis, colored by median edge across all tradeable assets. Green indicates positive edge, red indicates negative. The profitable region concentrates in the upper-right quadrant: long VWAP periods combined with long holding periods.

Figure 2 makes the parameter structure visible. The heatmap shows a clear gradient from red in the lower-left corner, where short VWAP periods and short holding periods produce negative edge, to green in the upper-right corner, where long VWAP periods and long holding periods concentrate the positive results. The transition is not abrupt. There is no cliff. The edge builds gradually as both parameters increase, which is favorable for implementation because it means the strategy is not fragile to small parameter changes. Moving from VWAP period 60 to 70 or from holding period 40 to 50 does not dramatically alter the outcome.


4. The asset map

4.1 US large cap equities

snapshot

FIGURE 3: Short Mean Reversion edge by asset on daily data. Horizontal bars showing median edge, yellow diamonds showing best single configuration. Labels include configuration count per asset.

The four largest US equity ETFs tell a consistent story. QQQ shows 5,676 profitable short mean reversion configurations on daily data with median edge of +2.32 percentage points and a best single configuration reaching +11.43 percentage points. VOO contributes the largest count at 9,772 configurations with median edge of +1.69 percentage points. VTI follows with 6,122 configurations at +1.63 percentage points median. SPY shows 3,098 configurations at +1.66 percentage points median.

The parameter ranges across these four assets overlap substantially. VWAP periods run from the mid-teens to 100, deviation thresholds from 1.5 to 6.0 standard deviations, holding periods from single digits to 90 bars. This overlap is significant. It means a single parameter set chosen from the intersection of these ranges would produce Bonferroni-significant positive edge on all four major US equity ETFs simultaneously. The edge is not asset-specific. It is a property of the US large cap equity market as measured by multiple independent instruments.

DIA stands out within the US equity group. Its 516 short configurations show a higher median edge of +3.94 percentage points, nearly double the SPY median. The Dow Jones ETF is less liquid than SPY or QQQ, and its price-weighted construction gives heavier weight to high-priced stocks, both of which may contribute to larger temporary deviations from VWAP and therefore stronger reversion.

4.2 The long side

snapshot

FIGURE 4: Long versus Short edge comparison for assets with Bonferroni-significant results on both sides. Grouped horizontal bars, green for Long and red for Short. Daily Mean Reversion only.

Long mean reversion tells a different story. IWM on daily data shows 345 profitable long configurations with median edge of +3.76 percentage points and a best single result of +15.56 percentage points. VOO shows 2,268 long configurations at +2.32 percentage points median. DIA contributes 371 at +1.94 percentage points. EEM, the emerging markets ETF, produces only 24 configurations but at a striking +6.90 percentage points median.

The long side has higher edge per trade but fewer configurations and tighter parameter requirements. Long signals require deviation thresholds between 0.2 and 4.0 standard deviations, compared to 0.5 to 6.0 on the short side. Holding periods concentrate between 20 and 90 days, with the longest holds producing the strongest results. The asymmetry is consistent with market microstructure: downward price spikes below VWAP tend to be sharper but shorter-lived than upward drifts above VWAP, which means the long reversion signal fires in more extreme conditions and captures a larger move when it does.

Figure 4 makes the directional asymmetry visible. For every asset that has significant results on both sides, the chart compares the median long and short edge. IWM, VOO, and DIA all show long edge exceeding short edge, but with far fewer supporting configurations.

4.3 International and emerging markets

EFA (developed international) shows 509 short configurations on daily data at +1.50 percentage points median, concentrated at VWAP periods 10 to 100 and deviation thresholds 2.5 to 5.8. VWO (emerging markets ex-China) shows 257 short configurations at +2.23 percentage points median. EEM shows 107 short configurations at +1.61 percentage points. These are smaller sample sizes than US large caps, which is expected given the shorter data histories for some international ETFs, but the edges are comparable in magnitude and the patterns are directionally consistent.

4.4 Commodities

Commodities behave differently from equities in one critical respect: the profitable configurations appear on 15-minute data, not daily. SLV (silver) shows 224 short configurations on 15-minute data at +3.47 percentage points median. UNG (natural gas) shows 912 short configurations at +4.07 percentage points median, the highest median edge among non-leveraged products. USO (oil) shows 1,381 long configurations at +1.20 percentage points median with the lowest p-value in the entire dataset at 7.4 times ten to the negative 39th power.

The shift to intraday data is notable because the main VWAP study showed that equity edge concentrates on daily and 4-hour timeframes while 15-minute data shows almost nothing for equities. Commodities reverse this pattern. The most likely explanation is that commodity ETF volume patterns differ structurally from equity ETF patterns, with institutional flow concentrated in shorter intraday windows that create and resolve VWAP deviations on a faster timescale.

4.5 Fixed income

AGG and BND show moderate short mean reversion edge on 4-hour data: +0.81 and +0.76 percentage points median respectively. Configuration counts are 390 and 135. The effects are real but thin, both in magnitude and in the number of surviving configurations. Bond ETF VWAP deviations are smaller in absolute terms because price volatility is lower, which compresses the available edge.


5. The deviation curve

snapshot

FIGURE 5: Edge by deviation threshold for short mean reversion across all tradeable assets and timeframes. Line chart with deviation on x-axis and median edge on y-axis. Marker size proportional to configuration count.

Figure 5 answers a question that matters for position sizing: how does the size of the VWAP deviation relate to the subsequent edge? The relationship is not linear. At very low deviations (below 1.0 standard deviations), edge is near zero or slightly negative. Between 1.0 and 2.0 standard deviations, edge turns positive but remains modest. Between 2.0 and 4.0 standard deviations, edge increases more steeply. Above 4.0 standard deviations, edge continues to increase but configuration counts drop, meaning signals become very rare.

The practical interpretation is that position sizing should be proportional to deviation magnitude. A 2-standard-deviation signal carries meaningfully less edge than a 3-standard-deviation signal, and a 3-standard-deviation signal carries less than a 4-standard-deviation signal. A flat position size across all deviation levels would underweight the strongest signals and overweight the weakest.

The curve also confirms that there is no deviation level at which mean reversion stops working. Even at the extreme tails, the edge is positive. Price that has moved 5 or 6 standard deviations from VWAP still reverts. It just happens so rarely that the statistical power is limited.


6. What the numbers say about implementation

The parameter map suggests a specific implementation framework that the data supports.

Asset universe: US large cap equity ETFs (SPY, QQQ, VOO, VTI, IWM, DIA) on daily data form the core. International ETFs (EFA, EEM, VWO) add diversification. Commodity ETFs on 15-minute data are a separate implementation with different infrastructure requirements.

Direction: Short signals outnumber long signals by 3.3 to 1 and produce more broadly supported edge. A short-only implementation captures the majority of the available opportunity. Adding the long side increases edge per trade but reduces signal frequency and requires tighter parameter constraints.

VWAP period: 50 to 85 bars captures the core of the profitable zone. Going below 40 rapidly loses significance. Going above 85 still works but the marginal improvement is small.

Deviation threshold: 2.0 to 4.0 standard deviations is the productive range. Below 2.0, many signals do not survive Bonferroni correction. Above 4.0, signals become too rare for consistent portfolio application.

Holding period: 25 to 75 bars. The edge builds with holding time and does not show signs of reversal within the tested range. Shorter holds sacrifice edge. Longer holds (up to 90 bars) continue to work but tie up capital.

Position sizing: Scale with deviation magnitude. A signal at 3.5 standard deviations should receive a larger allocation than one at 2.0 standard deviations. The deviation curve in Figure 5 provides the empirical basis for the scaling function.

snapshot

FIGURE 6: Summary table of the top 20 tradeable configurations. Columns: asset, direction, strategy, timeframe, configuration count, median edge, best edge, best p-value.


7. What this map does not show

Several caveats apply to this parameter atlas.

First, the configurations listed are in-sample results. The Bonferroni correction is extremely strict, reducing the false discovery risk to near zero, but the specific edge magnitudes are point estimates that will differ out of sample. The parameter zones are more reliable than any individual parameter combination within them.

Second, signal frequency is not reported per configuration. A configuration with +3 percentage points edge that fires once per year is less useful than one with +1.5 percentage points edge that fires monthly. Signal counts were recorded for each configuration and inform the ranking, but the summary figures show edge magnitude only.

Third, the exclusion of leveraged and volatility products removes the highest absolute edges from the atlas. UVXY short mean reversion on 4-hour data shows 9,088 configurations with median edge of +21.70 percentage points. VXX shows 6,495 configurations at +8.97 percentage points median. These numbers are real but reflect the structural decay of these instruments (contango, daily rebalancing drag) rather than pure mean reversion dynamics. A trader who understands these products can exploit the edge, but the mechanism is fundamentally different from equity VWAP mean reversion.

Fourth, transaction costs vary by asset and timeframe. The 0.10 to 0.15 percentage point round-trip estimate from the main study applies to liquid equity ETFs on daily data. Commodity ETFs on 15-minute data face wider effective spreads and higher market impact. The raw edge figures in this atlas are gross, before costs.

Fifth, correlations between signals are not modeled. If QQQ, VOO, VTI, and SPY all fire short mean reversion signals simultaneously, the four positions are effectively one concentrated bet on US large cap reversion. Portfolio-level risk management must account for this correlation.

Sixth, and this matters for anyone attempting to replicate these results on a charting platform: this study was conducted using historical price data from TwelveData and Tiingo APIs, not from TradingView's built-in data feed. Results will differ when backtesting the same parameters in TradingView's Pine Script strategy tester, and there are concrete reasons for this.

The data sources differ. TwelveData and Tiingo provide adjusted close prices that account for dividends and splits retroactively across the entire history. TradingView uses its own data vendors whose adjustment methodology, adjustment dates, and handling of corporate actions may differ. A split-adjusted close on one platform can diverge from the other by fractions of a percent, and those fractions compound over thousands of bars in a rolling VWAP calculation.

VWAP calculation specifics differ. Our study computes rolling VWAP over variable lookback windows using the standard cumulative price-times-volume formula applied bar by bar. TradingView's built-in ta.vwap() function resets at the session boundary by default, which produces a fundamentally different indicator than a rolling 71-bar VWAP. The anchored VWAP in Pine Script can approximate a rolling calculation, but the implementation details, particularly how partial bars, pre-market volume, and session boundaries are handled, introduce differences.

Volume data differs. Volume figures are not standardized across data providers. TwelveData may report consolidated volume while TradingView shows exchange-specific volume or vice versa. Since VWAP weights price by volume, different volume feeds produce different VWAP values even from identical price series. The deviation bands, which depend on the standard deviation of the price-minus-VWAP series, amplify this difference.

Bar timing and alignment differ. API data returns bars aligned to calendar timestamps. TradingView aligns bars to exchange sessions with configurable extended hours settings. A daily bar that includes pre-market and after-hours trading contains different price extremes and different volume than one restricted to regular hours. For intraday timeframes, the difference is more pronounced: a 15-minute bar starting at 9:30 versus 9:31 can produce materially different OHLCV values.

The backtesting engine differs. Our study measures forward returns from bar close to bar close with no execution model. TradingView's strategy tester uses its own fill logic, which can execute at the open of the next bar, at close, or at a specified price, and applies its own slippage and commission model. The same signal on the same data with the same parameters will report different P&L depending on the assumed execution price.

None of this means the results are wrong on either platform. It means they measure slightly different things. The statistical patterns documented in this study are properties of the price-volume relationship in the assets tested, measured with a specific methodology. Replicating the exact edge figures in a different environment requires matching the data source, the VWAP calculation, the volume feed, and the execution assumptions. Approximate replication using TradingView will show the same directional patterns, mean reversion working and crossovers failing, but the specific magnitudes and the precise parameter boundaries will shift.


8. Conclusion

Someone asked for the parameter table. Here it is: 101,632 configurations, 27 assets, six figures. The data shows that VWAP mean reversion is not a single strategy with a single set of parameters. It is a statistical regularity that spans a large region of parameter space, concentrated on the short side of daily data in US equity ETFs, with complementary signals in international equities, commodity ETFs on intraday data, and leveraged products on 4-hour data.

The practical value of this atlas lies in what it reveals about robustness. A strategy that works at one parameter combination and fails at every neighbor is fragile and likely overfit. A strategy that works across thousands of configurations spanning VWAP periods from 14 to 100, deviation thresholds from 1.5 to 6.0, and holding periods from 4 to 90 days is structurally different. The parameter zone is wide. The transition from positive to negative edge is gradual. Small changes in parameters do not destroy the result.

QQQ, VOO, VTI, and SPY each independently produce thousands of Bonferroni-significant configurations with median edge between 10 and 15 times transaction costs. That is not a lucky finding in one corner of the data. That is a map.

Disclaimer

The information and publications are not meant to be, and do not constitute, financial, investment, trading, or other types of advice or recommendations supplied or endorsed by TradingView. Read more in the Terms of Use.