Hello,

I'm starting to use the research environment to test investment ideas and, while the tool is clear, I am not sure what would be a correct process to implement and test the validity of an idea.

For instance in the notebook code below I test a (very) simple SKLearn Regressor to predict the stock price change in the next minute given the price changes of the last 60. This is what I did:

  1. Create the history dataset from 10 SP500 randomly picked stocks (survivorship bias to be fixed) 
  2. Calculate the price change for the stocks and prepare the features and target
  3. Train a simple MLPRegressor model
  4. Check actual and predicted results via score and also a scatter plot to see visually if they correlate

Is this a correct way to proceed to validate an algorithm? In this case should I just move on to a new algo given that this has negative score?Thank you in advance for any tip you may share!
Francesco

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

qb = QuantBook()
symbols = ["KO", "SYK", "SYF", "ILMN", "NBL", "CAH", "ISRG", "FCX", "LVS", "TFC"]
for s in symbols:
qb.AddEquity(s)
lookback = 600
datapoints = 1000
history = qb.History(qb.Securities.Keys, datapoints+lookback+2, Resolution.Minute)
close = history["close"].unstack("time")
returns = (close/close.shift(1, axis=1)-1)
features, target = None, None
for i in range(datapoints):
data = returns.iloc[:,i:i+lookback+1].dropna().values
features = data[:, :-1] if features is None else np.vstack((features, data[:, :-1]))
target = data[:, -1:] if target is None else np.vstack((target, data[:, -1:]))
print(f"Features {len(features)}")
print(f"Target {len(target)}")

test_samples = int(len(features)*0.2)
x, x_test = features[:-test_samples], features[-test_samples:]
y, y_test = target[:-test_samples], target[-test_samples:]
model = MLPRegressor(hidden_layer_sizes=(1024, 1024), max_iter=1000)
print(f"Train points: {len(x)}\tTest points {len(x_test)}")
model.fit(x, y)
score = model.score(x_test, y_test)
print(f"Score {score:.3f}")

y_pred = model.predict(x_test)
plt.scatter(y_pred, y_test)
plt.title('Actual vs Predicted Return')
plt.xlabel("Actual Return")
plt.ylabel("Predicted Return")
plt.grid()




https://drive.google.com/uc?export=download&id=1T4hejKQ85K-2OuG1FePJBOTvoS1Uwnnv