Kernel dies when gathering fundamental data

I am trying to gather some fundamental data on earnings dates from a large dataset, found it on kaggle. There are >5k symbols and >90k earnings events over 5 years. I plan to sort the earnings dates based on their fundamental info prior to earnings release, so I am trying to get the data just on the earnings days. Since QuantBook.GetFundamental() only gets daily I thought this would be a bit overkill.

The below code pulls several Morningstar selectors for each earnings date and saves to list. However I get random "Your kernel has died and will be restarted". Hence the reason I started saving to ObjectStore every 500 dates. Sometimes it runs for 300 dates or so. Sometimes it runs for 3000 days. I am about halfway through now, saved up to about 45k dates with the fundamental data. I have had to restart the notebook maybe 100 times at this point.

This is not a large dataset, only about 4MB so far, all said and done about 10MB. What is causing the kernel to crash? Is there anything I am doing incorrectly?



QuantConnect Logo

from clr import AddReference
AddReference("System")
AddReference('System.Memory')
#AddReference("QuantConnect.Common")
from System import *
#from QuantConnect import *
#from QuantConnect.Data.Market import TradeBar, QuoteBar
from datetime import datetime, timedelta
import pandas as pd
import cloudpickle, os
import pickle
import numpy

qb = QuantBook()

days = 3 #must be odd days only
date_start = datetime(2015,1,1)
date_end = datetime(2020,12,31)

quantQuote = pickle.loads(bytes(qb.ObjectStore.ReadBytes('quantQuote')))
qqSymbols = quantQuote.symbol.unique()
print(len(qqSymbols))
quantQuote.head()
del quantQuote

19019

earnings = pickle.loads(bytes(qb.ObjectStore.ReadBytes('earnings'))).reset_index()
allSymbols = earnings.symbol.unique()

print('number of symbols in earnings data, ' + str(len(allSymbols)))
earnings = earnings.query('symbol in @qqSymbols')
mask = (earnings['edate'] > date_start) & (earnings['edate'] < date_end)
earnings = earnings.loc[mask]
del mask
del qqSymbols
print('number of symbols after filtering on QuantQuote smybols, ' + str(len(allSymbols)))
print('number of earnings events, ' + str(len(earnings)))
earnings.head()

number of symbols in earnings data, 5282
number of symbols after filtering on QuantQuote smybols, 5282
number of earnings events, 90572

	index 	symbol 	edate 	qtr 	eps_est 	eps 	release_time
23 	23 	A 	2015-02-17 	Jan-15 	0.41 	0.41 	post
24 	24 	A 	2015-05-18 	Apr-15 	0.39 	0.38 	post
25 	25 	A 	2015-08-17 	Jul-15 	0.41 	0.44 	post
26 	26 	A 	2015-11-16 	Oct-15 	0.47 	0.50 	post
27 	27 	A 	2016-02-16 	Jan-16 	0.43 	0.46 	post

def ShowData():
    qb = QuantBook()
    keys = [str(j).split(',')[0][1:] for _, j in enumerate(qb.ObjectStore.GetEnumerator())]
    sizes = [os.path.getsize(qb.ObjectStore.GetFilePath(key)) for key in keys]
    for i in keys:
        print(f"{i} - {os.path.getsize(qb.ObjectStore.GetFilePath(i))/1e6:.1f} MB")
    print(f"Total size: {sum(sizes)/1e6:.1f} MB")
ShowData()

earnings_fund_data - 8.5 MB
quantQuote - 0.8 MB
EarningsData_1501_3000_sym - 1.1 MB
EarningsData_0_1500_sym - 1.7 MB
EarningsData_4501_end_sym - 0.4 MB
earnings - 6.6 MB
EarningsData_3001_4500_sym - 1.6 MB
Total size: 20.6 MB

edates = earnings.copy(deep=True)
edates = edates[['symbol','edate']].reset_index()
edates = edates.drop(['index'],axis=1)
del earnings

def GetFundFromSymbol(s,d,codes):
    l = [s,d]
    try:
        for i in range(len(codes)):
            f = qb.GetFundamental(s, codes[i],d,d)
            v = f[f.columns[0]][0]
            l.append(v)
        return l
    except:
        l.extend(numpy.repeat(float('NaN'),len(codes)))
        return l

%%time
import time
codes = ["ValuationRatios.PERatio",
       "ValuationRatios.BookValueYield",
        "ValuationRatios.RatioPE5YearAverage",
        "ValuationRatios.NormalizedPERatio",
        "ValuationRatios.ForwardROE",
        "ValuationRatios.ForwardROA",
        "ValuationRatios.TotalAssetPerShare",
        "EarningReports.BasicAverageShares.ThreeMonths",
        "EarningReports.BasicEPS.TwelveMonths"]
qb = QuantBook()

try:
    data = pickle.loads(bytes(qb.ObjectStore.ReadBytes('earnings_fund_data')))
    last = data[len(data)-1]
    lastSym = last[0]
    lastDate = last[1]
    start = False
except:
    data = []
    start = True

for i,row in edates.iterrows():
    s = row['symbol']
    d = row['edate']
    if start==True:
        l = GetFundFromSymbol(s,d,codes)
        data.append(l)
        if (i % 100) == 0:
            print(str(i) + ',  ' + "{:.4f} %".format(100*i/len(edates)) + ',   ' + datetime.now().strftime("%m/%d/%Y, %H:%M:%S"))
        if (i % 500) == 0:
            qb.ObjectStore.SaveBytes('earnings_fund_data', bytearray(cloudpickle.dumps(data)))
            qb = QuantBook()
            print('Saved data, total items=' + str(len(data)))
            time.sleep(1)
    else:
        if (s==lastSym) & (d==lastDate):
            start = True
            print('starting on iter=' + str(i))

print('Completed\n')
qb.ObjectStore.SaveBytes('earnings_fund_data', bytearray(cloudpickle.dumps(data)))
print('Saved data, total items=' + str(len(data)))

cols = ['symol','edate']
cols.extend(codes)
df = pd.DataFrame(data, columns = cols) 
df.head()
qb.ObjectStore.SaveBytes('earnings_fund_df', bytearray(cloudpickle.dumps(df)))

starting on iter=42500
42600,  47.0344 %,   12/03/2020, 22:09:17
42700,  47.1448 %,   12/03/2020, 22:09:28
42800,  47.2552 %,   12/03/2020, 22:09:38

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Hi Jared,
Is it only the GetFundamental that is causing it? Every time we save new data, we also fetch all previously saved fundamental data. Furthermore, every time we ask for certain data,ValuationRatios.PERatio, it fetches all fundamental data fields, so 9 different fields make 9 requests for the same data. I suggest maybe storing the fundamental data to different ObjectStore keys so that previous data doesn't need to be refetched. We might also need to ask for the data in different cells and not in a loop.

Best,
Shile Wen

I did notice the GetFundamental allows only one "selector". When digging into the private GetAllFundamentals function on the Lean Github it appears that all of the data is available and only the "selector" data is returned. Would it not be appropriate to allow this function to take an array of selectors? Without this I do not see a way around making the 9 requests for the identical data just to return a single selector each time. I am missing something here?

Shile Wen

63.5k Pro ,

Jared

192 Pro ,

Hi Jared,
At the moment, having 9 requests is the only option. We have submitted an issue here to make the GetFundamentals returrn more fields, and we have been working on increasing fundamental data access speed for Research.

Best,
Shile Wen

Jared INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

444,300 Quants.

VOTE FOR UPCOMING FEATURES

Kernel dies when gathering fundamental data

Allocate to this Strategy

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

444,300 Quants.

VOTE FOR UPCOMING FEATURES

Kernel dies when gathering fundamental data

Allocate to this Strategy

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free