I'm starting to use the research environment to test investment ideas and, while the tool is clear, I am not sure what would be a correct process to implement and test the validity of an idea.

For instance in the notebook code below I test a (very) simple SKLearn Regressor to predict the stock price change in the next minute given the price changes of the last 60. This is what I did:

  1. Create the history dataset from 10 SP500 randomly picked stocks (survivorship bias to be fixed) 
  2. Calculate the price change for the stocks and prepare the features and target
  3. Train a simple MLPRegressor model
  4. Check actual and predicted results via score and also a scatter plot to see visually if they correlate

Is this a correct way to proceed to validate an algorithm? In this case should I just move on to a new algo given that this has negative score?Thank you in advance for any tip you may share!

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.neural_network import MLPRegressor from sklearn.model_selection import train_test_split qb = QuantBook() symbols = ["KO", "SYK", "SYF", "ILMN", "NBL", "CAH", "ISRG", "FCX", "LVS", "TFC"] for s in symbols: qb.AddEquity(s) lookback = 600 datapoints = 1000 history = qb.History(qb.Securities.Keys, datapoints+lookback+2, Resolution.Minute) close = history["close"].unstack("time") returns = (close/close.shift(1, axis=1)-1) features, target = None, None for i in range(datapoints): data = returns.iloc[:,i:i+lookback+1].dropna().values features = data[:, :-1] if features is None else np.vstack((features, data[:, :-1])) target = data[:, -1:] if target is None else np.vstack((target, data[:, -1:])) print(f"Features {len(features)}") print(f"Target {len(target)}") test_samples = int(len(features)*0.2) x, x_test = features[:-test_samples], features[-test_samples:] y, y_test = target[:-test_samples], target[-test_samples:] model = MLPRegressor(hidden_layer_sizes=(1024, 1024), max_iter=1000) print(f"Train points: {len(x)}\tTest points {len(x_test)}") model.fit(x, y) score = model.score(x_test, y_test) print(f"Score {score:.3f}") y_pred = model.predict(x_test) plt.scatter(y_pred, y_test) plt.title('Actual vs Predicted Return') plt.xlabel("Actual Return") plt.ylabel("Predicted Return") plt.grid()