Back

Machine Learning: Runtime Error: Algorithm took longer than 10 minutes on a single time loop

Bit of a newbie so apologies if I'm asking a silly question. Live deployment of a crypto machine learning algo, throwing up an error when I try and trigger the training function of the algo

 

Runtime Error: Algorithm took longer than 10 minutes on a single time loop. CurrentTimeStepElapsed: 0.0 minutes Stack Trace: System.TimeoutException: Algorithm took longer than 10 minutes on a single time loop. CurrentTimeStepElapsed: 0.0 minutes at QuantConnect.Isolator.MonitorTask (System.Threading.Tasks.Task task, System.TimeSpan timeSpan, System.Func`1[TResult] withinCustomLimits, System.Int64 memoryCap, System.Int32 sleepIntervalMillis) [0x002d3] in :0 at QuantConnect.Isolator.ExecuteWithTimeLimit (System.TimeSpan timeSpan, System.Func`1[TResult] withinCustomLimits, System.Action codeBlock, System.Int64 memoryCap, System.Int32 sleepIntervalMillis, QuantConnect.Util.WorkerThread workerThread) [0x00092] in :0 at QuantConnect.Lean.Engine.Engine.Run (QuantConnect.Packets.AlgorithmNodePacket job, QuantConnect.Lean.Engine.AlgorithmManager manager, System.String assemblyPath, QuantConnect.Util.WorkerThread workerThread) [0x009f0] in :0

User: 108607, Project: 5001223, Algorithm: L-886773484792a4b50dd08fc926320cd6

The code in question triggers at 3AM

def NeuralNetworkTraining(self):
'''Train the Neural Network and save the model in the ObjectStore'''
symbols = list(self.modelBySymbol.keys())

if len(symbols) == 0:
self.Debug("no contracts found")
return

for symbol in symbols:
try:
# Hourly historical data is used to train the machine learning model
history = self.History(symbol, (self.lookback + self.timesteps), Resolution.Hour)
self.Debug(history)
except:
self.Debug("Failed to receive history")
#history = self.x_scaler.fit_transform(history)

if 'open' in history and 'close' in history and 'high' in history and 'low' in history:
history = np.column_stack((history['open'], history['close'], history['high'], history['low']))
#history = np.column_stack((history['open']))

if len(history) < self.lookback:
self.Debug("Error while collecting the training data")
continue

#history = list([i[0] for i in history])

self.Debug("Start Training for symbol {0}".format(symbol))

#First convert the data into 3D Array with (x train samples, 60 timesteps, 1 feature)
x_train = []
y_train = []
for i in range(self.timesteps, len(history)):
x_train.append(history[i - self.timesteps:i])
y_train.append([history[i][0]])

x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 4))
y_train = np.reshape(y_train, (y_train.shape[0], 1))
if np.any(np.isnan(x_train)):
self.Debug("Error in Training Data")
continue
if np.any(np.isnan(y_train)):
self.Debug("Error in Validation Data")
continue

x_scaler = MinMaxScaler(feature_range=(0, 1))
y_scaler = MinMaxScaler(feature_range=(0, 1))


x_train = x_scaler.fit_transform(x_train.reshape(-1, x_train.shape[-1])).reshape(x_train.shape)
#x_train = self.x_scaler.fit_transform(x_train)
y_train = y_scaler.fit_transform(y_train)


#self.Debug(x_train.shape)

#self.Debug(y_train.shape)
#self.Debug(y_train)
# build a neural network from the 1st layer to the last layer
'''
model = Sequential()

model.add(Dense(10, input_dim = 1))
model.add(Activation('relu'))
model.add(Dense(1))

sgd = SGD(lr = 0.01) # learning rate = 0.01

# choose loss function and optimizing method
model.compile(loss='mse', optimizer=sgd)
'''
if symbol in self.modelBySymbol and self.modelBySymbol[symbol] is not None:
model = self.modelBySymbol[symbol]
iterations = 1
else:
#If Model not exist for symbol then create one
opt_cells = 5
model = Sequential()

model.add(LSTM(units = opt_cells, return_sequences = True, input_shape = (x_train.shape[1], 4)))

model.add(Dropout(0.2))

model.add(LSTM(units = opt_cells, return_sequences = True))
model.add(Dropout(0.2))

model.add(LSTM(units = opt_cells, return_sequences = True))
model.add(Dropout(0.2))

model.add(LSTM(units = opt_cells, return_sequences = False))
model.add(Dropout(0.2))

model.add(Dense(1, activation='linear'))

adam = Adam(lr=0.001, clipnorm=1.0)
model.compile(loss='mean_squared_error', optimizer=adam, metrics=['accuracy'])

iterations = 50
# pick an iteration number large enough for convergence
for step in range(iterations):
# training the model
#cost = model.train_on_batch(predictor, predictand)
hist = model.fit(x_train, y_train, epochs = 1) #verbose=0,
acc = list(hist.history['accuracy'])[-1]
loss = list(hist.history['loss'])[-1]


self.scalersBySymbol[symbol] = (x_scaler, y_scaler)
self.modelBySymbol[symbol] = model
self.Debug("End Training for symbol {0} with accuracy {1}".format(symbol, acc))

According to the log, my algo pulls the data and then starts the training. The runtime error is then generated. 

Is this simply a function of the fact I'm on the $20/month plan with limited ML time?  Do I need to move to the algo with LEAN deployed on a beefier server?

Appreciated your time. 

Update Backtest







0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


How often is this function being called? For the $20/mo plan, there's a leaky bucket allocation of 60 min + 10 min/24 hrs. Based on the number of iterations, training this model every week during live deployment should be okay - any more frequently and you'll likely hit the limit. Keep in mind though that training resources are shared across all calls to scheduled functions.

That being said, Jared mentioned there are some exciting changes today or tomorrow to make cloud resources more scalable which should get rid of these limits.

1

Thanks Adam. Was calling it every night, but will dial it back to once/week.

Look forward to seeing what Jared announces, wasn't aware this was coming up.

0

=D It's very exciting for us too! We're working on it. We're pushing for this week.

@Ryan are you using a Train() method? It looks like you're just using a scheduled event. Please attach the code, or representative code so we can give you better assistance. 

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Great news Jared, looking forward to the announcement

Was calling my ML Training script (linked in the OP post) through the Train method.

Would it be more efficient to actually place it inside the Train() method?

 

self.Train(
self.DateRules.Every(DayOfWeek.Friday),
self.TimeRules.At(3, 0),
self.NeuralNetworkTraining)

 

0

Hi Ryan,

self.Train is just as efficient as other methods, but the main reason to do it is so that the engine knows to allow a certain method to take longer than 10 minutes to finish its computations.

For more details, please read this page.

Best,
Shile Wen

1

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Thanks Shile for your answer.

Looking at my code and the documentation, it appears I'm calling it correctly. Will wait to see about the scalabilitiy of the QC hardware options

0

Update Backtest





0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Loading...

This discussion is closed