• Wietske Blees

Investment data: strategy, governance and storage

Transitioning from legacy data management systems towards big data and unstructured ‘data lakes’ can bring significant benefits, provided you think carefully about the solutions you require and implement your systems accordingly.

Speaking at the 5th Australian Asset Owner Forum hosted by Fund Business in Melbourne last week, Bill Pryor, global head of data & analytics for Citi Custody & Fund Services, said the promise of big data as a means to solving all problems, whether regulatory, reporting, risk, compliance, liquidity or analysis in nature, is a tempting one. He also said there are real signs that it is a much better approach when compared to traditional legacy data warehouse technology.

Bill Pryor ​- Global Head of Data & Analytics, Citi Custody and Fund Services​
Bill Pryor ​- Global Head of Data & Analytics, Citi Custody and Fund Services​
For all of this to work, data governance is absolutely key. All firms are different, but you do need a data governance framework, it needs to be cross-organisational and it has to be continuous, to ensure the data quality is accurate

Pryor said other industry success stories utilising this approach, such as Google, Amazon, Twitter and Facebook, have highlighted the potential to quickly ingest and process vast amounts of data and provide lightning fast access. However, it does require a new approach to data management than fund managers and asset owners have applied to date.

“Historically just about everyone has envisioned the promise of data analytics as the absolute nirvana. You have all the data at your fingertips that gets put together and modelled in a way that reflects your complete investment picture. It’s the objective that we are all chasing, but with previous approaches, the reality has been that it is a lot of effort, a lot of cost, takes a long time and it comes with a lot of challenges,” Pryor said.

For example, developing strategies to support data quality can be a time-consuming undertaking; the acquisition of technology such as data warehouses and implementing these systems in practice can be both expensive and time consuming and not all business intelligence tools actually hit the mark.

“These are challenging and risky projects and even when you have built the right data model and you architect it to match your investment strategy – which is usually a very long and complex implementation plan – there is the ongoing maintenance that comes on top of it,” Pryor said.

Step change

However, Pryor said a new approach to data management is beginning to take hold. Among clients, he said, interest is shifting away from legacy warehousing technologies and towards big data and unstructured data lakes.

“Yes absolutely, legacy technology allows you to deliver data warehouse functionality, no doubt about that, but it is very costly, it does take a long implementation time, there is a lot of complexity and it is not massively repeatable. Similarly, while legacy data warehouses can provide real time feeds, data mining and predict values, they are not well suited to perform these activities and the type of response times that you see reflect those limitations,” Pryor said.

By contrast, a cloud-based unstructured data lake provides cheaper data storage and allows market participants to more easily ingest disparate data sets into one big data pool. To subsequently provide structure to these unstructured data sets, firms can apply business intelligence tools while machine learning techniques can continually improve performance, all in a much shorter time frame.

Pryor said that to date, most interest in unstructured solutions is coming from the operations side, where the benefits of automation are most obvious. “It’s early stages, but we are starting to see clients question the costs and the time it takes to see benefits when implementing projects that are based on the old data warehouse principles. At this point, the greatest demand is coming from the operations side, because so much is still done manually and if you can automate that, you get a higher quality process that is more straight through,” he said.

However, Pryor said that looking ahead, the potential benefits for performance analytics and investment functions are equally significant.

“The end game is taking all this data and applying it to investment analytics to properly reflect the complete investment picture. Where it is really going to get exciting going forward is the use of dynamic calculation engines and in-memory processing to recalculate your performance on the fly,” Pryor said.

“Being able to find that needle in the haystack, no matter what type of data it is, whether it is custody data, accounting data, performance or risk; being able to apply business intelligence tools to provide structure to the data is the magic formula that will give you this type of instant empowerment to get to the answers to your questions, whatever they may be,” Pryor said.

Data strategy is key

While data lakes can provide more capabilities compared to traditional data warehouses, Pryor said it is critical that clients select the technology based on the use cases they are trying to solve.

“Often technologies get selected because people are enamoured by the technology, rather than the practicalities of how it is going to solve a use case. You need to flip that around and build your technology based on the problems you are trying to solve. When you hear about big data projects that haven’t quite achieved what they are doing or missed the mark completely, it is because they just went off and gobbled up all kinds of data without knowing what they are going to do with it,” Pryor said.

Data governance is equally critical. “For all of this to work, data governance is absolutely key. All firms are different, but you do need a data governance framework, it needs to be cross-organisational and it has to be continuous, to ensure the data quality is accurate,” Pryor said.

In terms of system requirements, Pryor recommended that insofar as possible, firms aim for zero maintenance requirements, infinite scalability and massive repeatability.

“This is not going to happen overnight, it is probably going to take three to five years for these systems to do everything you want, but the benefits will be worth it,” Pryor said.