In today’s investment world, there is no shortage of new data sets predicting new ways to generate alpha but the devil is in the detail and not all data is useful.
At the third Investment Data and Technology Summit in Sydney last week, the audience was told of numerous innovative ways to extract more alpha from investment data, ranging from stock specific website scraping techniques to natural language processing of conference call data, footnotes and US companies’ 10-K’s.
Michael Roach, head of the quantitative equity group, Asia Pacific at Vanguard Investments said natural language processing is one example of a strategy that can produce significant results.
“You can use NLP type processes to comb through financial statements focusing on footnotes, which is really where the key information is, taking a 100-page report and bringing it down to five pages focusing on the key information. That is something that needs to be trained and learned to make sure you are pulling the right information, but we have seen strong efficiencies from that type of work,” Roach said.
However, many new data sets – particularly those classified as “big data” – do not necessarily come with a lot of history, meaning you need to think carefully about how to incorporate these data sets into your models.
“There is an abundance of new data sets that are touting new ways to generate alpha or outperformance, but for us the concern is to make sure that information is still grounded in financial theory, that you expect to see some sort of persistence involved in this information and can you implement it in a low cost long term fashion,” Roach said.
Andre Roberts, senior portfolio manager at Invesco Quantitative Strategies, said that it is important that the time history of the data set is adequate in view of the investment time horizon. “When you push your investment horizon out though to months and potentially years, what confidence have you got that the history that you have only had for a few years is going to be a respectable data set to work with?” he said.
Roberts said that while big data and artificial intelligence are real developments, they only work in certain contexts. “That context needs to be a system where the rules are reasonably well-defined and you have a bit of data history. In that context a machine is going to do a pretty good job at learning. The challenge is [working out] in what contexts there are more likely to be well defined rules and data horizons that are appropriate,” Roberts said.
Understanding the exact ways in which data sets are composed is equally important, particularly when buying alternative data sets from third party vendors.
Sam Jacob, chief information officer at Wheelhouse Investment Partners said that with any data set, it’s important to understand the nuances and actions that have been taken to clean and structure that data set. That’s the case when you look at market data, which is about as structured and “clean” as it comes, and it’s even more important when looking at alternative data sets.
“If you think about the ‘king’ of normalised clean data, it is market data and now let’s just think about the nuances that have been overcome for that. There are corporate actions – obviously we need to account for them, how do we deal with them? What tax rates do we use, or do we do an international comparison? Are we accounting for all of the corporate structures? It depends what our objective is[…] Those are pitfalls of very organised, listed data. [..] When it comes to alternative data, we are in a whole new game. Despite the benefits of these alternative data sets being potential alpha generators, we need to get to the bottom of what that data set is really doing before we can plug it into our investment [processes],” Jacob said.
Jacob said a “particularly problematic” aspect of alternative data sets and the technology currently available is that it doesn’t provide a meaningful understanding of the markets that are being traded.
“The technology has given us the ability to look over a lot more data for associations […] What it doesn’t do, and this is kind of critical I think, is tell us why. It doesn’t tell us what relationships are and it doesn’t give us a formal model or a formal understanding of the markets we are trading, and I think that is particularly problematic when we talk about these new alternative data sets,” Jacob said.