This post is the result of the first six weeks of class from the Computational Investing course I’m currently following on Coursera. The course is an introduction to Portfolio Management and Optimization in Python and lays the foundations for the second part of the track which wil deal with Machine Learning for Trading. Let’s move quickly to the core of the business.
The question I want to answer with is the following:
- Is it possible to exploit event studies in a trading strategy?
First of all we should clarify what an event study is. As Wikipedia states, an event study is a statistical method to assess the impact of an event on the value of a firm. This definition is very broad and can easily incorporate facts directly concerning the company (i.e. private life of the CEO, merging with other firms, confidential news from insiders) or anomalous fluctuactions in the price of the stock. I naively (and maybe incorrectly) categorized events regarding a company into these two types, news related and market related, but there should be no difference as they are generally tigthly correlated. In any case, as it is not easy to have access and parse in real time news feeds we will focus on market related events, meaning that in the rest of the post an event must be intended as an anomalous behavior in the price of the stock whose consequences we could exploit to trade in a more efficient way.
Now that we have properly defined an event we can go back to the beginning and think a little bit more about what study an event really means. To understand it let’s walk through a complete example and suppose that we have an event whenever the closing price of a stock at the end of day i is less than 10$ whilst at the end of day i-1 was more than 10$. Thus we are examining a significant drop in the price of the stock. Given this definition the answer is: what does it statistically happen to prices of stocks experiencing those kind of fluctuations? Is there a trend that could be somehow exploited? The reason at the base of these questions is that if we knew in advance that a stock followed a specific pattern as a consequence of some event we could could adjust our trading strategy accordingly. If statistics suggests that the price is bound to increase maybe it is a good idea to long the shares whether in the opposite case the best decision is to short.
In order to run an even study we take advantage of the EventProfiler class inside the QSTK library. This class allows us to define an event and then, given a time inerval and a list of stocks, it works in the following way: it scrolls firm after firm and whenever it finds an event sets that day day as day 0. Then it goes 20 days ahead and 20 days before the event and saves the timeframe. After having analyzed all the stocks it aligns the events on the day 0, averages all the prices before and after and scales the result by the market (SPY). The output is a chart which basically answers this question: what happens on average when the closing price of a stock at the end of day i is less than 10$ whilst at the end of day i-1 was more than 10$? The test period was the one between 1 January 2008 and 31 December 2009 (in the middle of the financial crisis), while the stocks chosen were the 500 contained in the S&P index in 2012. The graph is shown below and the following information can be extracted: first, 461 such events were registered during the investigated time frame. Second, on the day of the event there is a drop of about 10% in the stock price w.r.t the day before. Third, the price seems to recover after day zero, even though the confidence intervals of the daily increase are huge.
Now the idea is the following. If the observed behavior is respected what we can do is build a trading strategy consisting in buying on the day of the event and selling let’s say after 5 days (we don’t want to hold too long despite the price increasing almost monotonically). Just to recap here you find the whole pipeline from event definition to portofolio assessment.
Now that we have a plan let’s dive into the code (you can find all the code on Github).
# import statements import pandas as pd import numpy as np import math import copy import sys import matplotlib.pyplot as plt from pylab import * import QSTK.qstkutil.qsdateutil as du import datetime as dt import QSTK.qstkutil.DataAccess as da import QSTK.qstkutil.tsutil as tsu import QSTK.qstkstudy.EventProfiler as ep # save the marketsim() function as marketsim.py from marketsim import marketsim
First of all I’ll introduce one after the other the two main functions.
find_events(ls_symbols, d_data, shares=100): given the list of the stocks in the portfolio, their historical prices and the number of shares to be traded identifies events and issues a Buy Order on the day of the event and a Sell Order after 5 trading days. Eventually it returns a csv file to be passed to the market simulator. The first lines of the csv file are previed below (year, month, day, stock, order, shares).
def find_events(ls_symbols, d_data, shares=100): ''' Finding the event dataframe ''' df_close = d_data['actual_close'] orders = '' print "Finding Events" df_events = copy.deepcopy(df_close) df_events = df_events * np.NAN ldt_timestamps = df_close.index for s_sym in ls_symbols: for i in range(1, len(ldt_timestamps)-5): f_symprice_today = df_close[s_sym].ix[ldt_timestamps[i]] f_symprice_yest = df_close[s_sym].ix[ldt_timestamps[i - 1]] if f_symprice_today < 10 and f_symprice_yest >=10 : buy_time = pd.to_datetime(ldt_timestamps[i]).strftime('%Y,%m,%d,') buy_order = buy_time+str(s_sym)+',Buy,'+str(shares)+',\n' orders += buy_order sell_time = pd.to_datetime(ldt_timestamps[i+5]).strftime('%Y,%m,%d,') sell_order = sell_time+str(s_sym)+',Sell,'+str(shares)+',\n' orders += sell_order with open('event-orders.csv', 'w') as ord: ord.write(orders) print 'Saved orders to csv file'
marketsim(investment, orders_file, out_file): given the initial investment in dollars (50000 $ in our case), the csv files containing all the orders (the output of find_events()) and the file to save to the results of the simulation, this function places the order in chronologic order and updates automatically the value of the portfolio. It returns a csv file with the portfolio value in time, a plot comparing the portfolio performance against the market benchmark and print to screen a summary of the main financial metrics used to evaluate the portfolio.
def marketsim(investment, orders_file, out_file): df = pd.read_csv(orders_file, parse_dates=[[0,1,2]], header=None) df.columns = ['date', 'stock', 'order', 'shares', 'no'] df = df.drop('no',1) df = df.sort('date', 0) df = df.reset_index(drop=True) df['date'] = df['date'] + dt.timedelta(hours=16) start_date = df['date'][0] end_date = df['date'][df.shape[0]-1] dt_timeofday = dt.timedelta(hours=16) ldt_timestamps = du.getNYSEdays(start_date, end_date, dt_timeofday) c_dataobj = da.DataAccess('Yahoo') ls_keys = ['close'] equities = list(df.stock.unique()) data = c_dataobj.get_data(ldt_timestamps, equities, ls_keys)[0] data['cash'] = float(investment) for equity in equities: data['shares_'+equity] = 0 for row in range(df.shape[0]): order = df.ix[row] if order['order'] == 'Buy': bought = order.shares data['shares_'+order.stock][data.index >= order.date] += bought cash_paid = bought * data[order.stock][data.index == order.date][0] data['cash'][data.index >= order.date] -= cash_paid elif order['order'] == 'Sell': sold = order.shares data['shares_'+order.stock][data.index >= order.date] -= sold cash_taken = sold * data[order.stock][data.index == order.date][0] data['cash'][data.index >= order.date] += cash_taken def compute_equities_value(row): return (row[:len(equities)].values * row[len(equities)+1:].values).sum() data['eq_value'] = data.apply(lambda row: compute_equities_value(row), axis=1) data['portfolio'] = data['cash'] + data['eq_value'] portfolio = data['portfolio'].copy() dret = tsu.returnize0(portfolio) vol = dret.std() daily_ret = dret.mean() sharpe = np.sqrt(252)*daily_ret/vol cum_ret = data['portfolio'][data.shape[0]-1]/investment - 1 market = c_dataobj.get_data(ldt_timestamps, ['SPY'], ls_keys)[0] original = market.SPY.copy() market['dret'] = tsu.returnize0(market.SPY) market.SPY = original mvol = market.dret.std() mdaily_ret = market.dret.mean() msharpe = np.sqrt(252)*mdaily_ret/mvol mcum_ret = original[market.shape[0]-1]/original[0] - 1 fig = figure() ax = fig.add_subplot(111) ax.set_xticklabels(data.index, rotation=45) ax.yaxis.grid(color='gray', linestyle='dashed') ax.xaxis.grid(color='gray', linestyle='dashed') ax.xaxis.set_major_formatter(DateFormatter('%b %Y')) ax.legend(('Fund','Market'), loc='upper left') ax.set_title('Fund Performance VS Market (SPY)', fontsize=16, fontweight="bold") ax.set_xlabel('Date', fontsize=16) ax.set_ylabel('Normalized Fund Value', fontsize=16) port = data.portfolio/data.portfolio.max() mark = original/original.max() y_min = min(port.min(), mark.min()) ax.set_ylim([y_min-0.02, 1.02]) plt.plot(data.index, port, lw=2., label='Fund') plt.plot(data.index, mark, lw=2., label='Market') ax.legend(('Fund','Market'), loc='upper left', prop={"size":16}) fig.autofmt_xdate() plt.show() data = data.reset_index() data.columns.values[0] = 'date' begin = pd.to_datetime(data.date[0]).strftime('%b %d %Y') end = pd.to_datetime(data.date[data.shape[0]-1]).strftime('%b %d %Y') print 'Details of the Performance of the portfolio' print '' print 'Data Range: ', begin, ' - ', end print '' print 'Sharpe Ratio of Fund: ', sharpe print 'Sharpe Ratio of Market: ', msharpe print '' print 'Total Return of Fund: ', cum_ret print 'Total Return of Market: ', mcum_ret print '' print 'Volatily of Fund: ', vol print 'Volatily of Market: ', mvol print '' print 'Average Daily Return of Fund: ', daily_ret print 'Average Daily Return of Market: ', mdaily_ret print '' data.to_csv(out_file, index=False)
main(): this function calls the previous two after getting and cleaning all the relevant data.
if __name__ == '__main__': # test begin = 1 January 2008 dt_start = dt.datetime(2008, 1, 1) # test begin = 31 December 2009 dt_end = dt.datetime(2009, 12, 31) # getting only the trading days in the timeframe ldt_timestamps = du.getNYSEdays(dt_start, dt_end, dt.timedelta(hours=16)) # downloading prices for all the stocks contained in the list of S&P-500 in 2012 dataobj = da.DataAccess('Yahoo') ls_symbols = dataobj.get_symbols_from_list('sp5002012') ls_symbols.append('SPY') ls_keys = ['open', 'high', 'low', 'close', 'volume', 'actual_close'] ldf_data = dataobj.get_data(ldt_timestamps, ls_symbols, ls_keys) d_data = dict(zip(ls_keys, ldf_data)) # taking care of missing values for s_key in ls_keys: d_data[s_key] = d_data[s_key].fillna(method='ffill') d_data[s_key] = d_data[s_key].fillna(method='bfill') d_data[s_key] = d_data[s_key].fillna(1.0) # finding events and preparing trading strategy find_events(ls_symbols, d_data) # evaluating the strategy against the market marketsim(50000, 'event-orders.csv', 'event-values.csv')
This is the output, as promised:
Details of the Performance of the portfolio Data Range: Jan 03 2008 - Dec 22 2009 Sharpe Ratio of Fund: 0.610680695525 Sharpe Ratio of Market: -0.133639366311 Total Return of Fund: 0.19602 Total Return of Market: -0.191838897721 Volatily of Fund: 0.0108878915846 Volatily of Market: 0.02205848174 Average Daily Return of Fund: 0.000418849217983 Average Daily Return of Market: -0.000185699080962
Well, despite the huge crisis (-19% market return) our trading strategy brought us to gain a remarkable +19%! This was just an example but in any case very powerful to show the possibilities of event studies in finance.