Hi,
I'm running Lean locally and have found a very strange behaviour that I'd like to ask you all about. I have a super simple algo that AddEquity's AAPL on minute resolution, and then in the onData I do a self.History lookup. The strange thing is that the history returns the data incorrectly. The dates doesn't match the dates in the aapl.csv file, and it also returns a bar for the CURRENT day (which it shouldn't - that data shouldn't be available yet, right?).
I use the built-in demo data files, but since the values are really hard to track, due to the factor_file, I've changed the volume column in the daily csv to be able to see whats going on.
Here's a snippet from my aapl.csv file around the dates I backtest:
20140529 00:00,6280000,6368700,6277500,6353800,13073141
20140530 00:00,6379800,6441700,6289000,6330000,100 # changed volume below
20140602 00:00,6340000,6348300,6225000,6286500,101
20140603 00:00,6284700,6387100,6282500,6375400,102
20140604 00:00,6372800,6479000,6361000,6448200,103
20140605 00:00,6462500,6493500,6426100,6473500,104
20140606 00:00,6500000,6512500,6444800,6455700,105
20140609 00:00,926900,938800,917500,937000,1000
20140610 00:00,946400,950500,935700,942500,1001
20140611 00:00,941300,947600,928600,938600,1002
20140612 00:00,940400,941200,919000,922600,53269396
As you can see I've changed the volume column to 100's and 1000's, just to easily be able to see them in my logs.
If I run this, and print my history request at the first minute of the algo (2014-06-05 09:31), I get the following:
symbol time ...
AAPL R735QTJ8XC9X 2014-05-30 20.449754 ... 406186412.0
2014-05-31 20.373154 ... 3107.0
2014-06-03 20.233149 ... 3138.0
2014-06-04 20.519274 ... 3169.0
2014-06-05 20.753581 ... 3200.0
As you can see, the dates seem to have slipped a day, 2014-05-31 is not even a trading day, and 2014-06-05 should not be available. The data is also not in line with the date. I suppose it's a TZ problem, I have tried to change the time (00:00) for the datetime in the daily csv, but haven't found anything that makes the data and dates align properly.
Here's my very simplistic algo to reproduce this:
class DateBugTest(QCAlgorithm):
def Initialize(self):
# Set start and end date
self.start_date = datetime.datetime(2014, 6, 5)
self.end_date = datetime.datetime(2014, 6, 9)
self.SetStartDate(self.start_date) # Set Start Date
self.SetEndDate(self.end_date) # Set End Date
self.SetCash(100000) # Set Strategy Cash
self.aapl = self.AddEquity("AAPL", Resolution.Minute).Symbol
def OnData(self, data):
# Its the first minute of the day - check for our conditions
if str(self.Time.time()) == '09:31:00':
for symbol, trade_bar in data.items():
hist_bars = self.History(data.keys()[0], 5, Resolution.Daily)
l.warning(hist_bars)
Is this a huge bug, or am I missing something very obvious?
Micke Teeling
I experimented with the self.History request and added the “dataNormalizationMode=DataNormalizationMode.Raw” which makes things easier to read. Same result as above - incorrect timestamp for the data, but in a more clear way:
Yuri Lopukhov
Micke Teeling history is indexed by bar's end time, not start time. I don't know why, but its like this. It is a documented behavior:
Micke Teeling
Hi,
Thanks for your reply, appreciate it! In my mind, it makes perfect sense to tag a bar on its end timestamp, that is the point in time where the bar is fully defined, so I'm happy with that concept. And this is also why I was a bit confused about this, because the datetime in the daily .csv files are *not* tagged by their closing time, but on their open time! When read in to Lean, they're imported correctly and tagged in the data frames on the bar end time, but to me it's a bit unintuitive and confusing. This row in the .csv:
becomes this when printed from a history request df:
The documentation README in the csv data folder isn't very clear, it only says:
The time, 09:00, could be a clue that it's supposed to be the bar open time, but all the files in the data directory has the time 00:00 and not 09:00, which also makes it a bit confusing.
Any way, this is how it seems to work according to my testing, in case anyone else has the same problems!
Yuri Lopukhov
Hi, Micke Teeling, well, the data in csv is how it is received from the data provider I believe, and most systems use bar starting time as index. There may be also a time zone conversion. Data may be in one time zone, and QuantConnect may convert it to another.
If you believe the question is resolved, please don't forget to accept the answer.
Micke Teeling
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!