Risk Management in Analytics Projects

Business analytics with big data is increasingly being used to help manage risk in a number of areas such as operations, finance, insurance and security. While the broad lifecycle phases of any analytics project remain pretty much the same, the techniques that are used to implement each phase may vary a little depending on the type of data, its sources, etc. The amount of time and effort invested in each phase may also vary greatly across projects. Increased variability in method, time and effort implies less standardization and repeatability, and hence increased chances of encountering new risks in each new project. So while analytics is being used to manage business risk, is enough effort consciously being applied in managing potential project risks in these analytics projects?

As with any other project analytics projects involve the usual risks that are related to the overrun of time and cost, sometimes by inadequate resourcing, or by delayed inputs and decisions and so on. But in addition to these, there are a number of other potential risks that may make a difference to the reliability (and hence business usability) of analytics project output. Analytics, and predictive analytics in particular, is a science of heuristics and statistical probability, and so its output will always contain some degree of inherent uncertainty. Reducing the uncertainty requires a conscious effort to manage certain risks that could creep into the project lifecycle, or at least be aware of them. Some of these risks are as follows.

Inadequate data. Statistical analysis is often done on a sample of data rather than on a 100% complete data population because it is often neither practical nor possible to have data from the whole population. In such cases statistical results are expressed along with a calculated margin of error, which is fine. The larger the sample size relative to the total population, the greater the confidence in the result, ie, the lower the potential margin of error. But what if the real population size is not really known? Assumptions can be made, but how reliable are these assumptions? What are they based on? It may be worth asking the question and ascertaining from the right sources that the total population size is as accurate as it can be.

Lack of awareness about future change. This applies especially to predictive analysis, where data from the past is used to make probabilistic predictions about the future. When making such predictions the biggest risk is that the future environment could be different, and therefore the past is not enough of a determinant of the future. This is actually a fundamental thumb rule in the world of financial investment, where the utilization of publicly unavailable knowledge to make gains is illegal in most countries. However this is not the case in other areas of making predictions about the future, and so while the past can be used to produce good predictions, wherever possible, any variables that could change value in future should be considered to see if they would make any change to the statistical prediction.

Data Quality. Data quality could be an issue when testing of data integration and data cleansing techniques is done on the basis of sample testing, especially when Big Data is involved. While the techniques may produce good data in development and test samples, what is the confidence that the sample represents the entire data population well enough? Again, this is not something that requires a 100% check of the data population, but when it results in significant skews or outliers it’s always worth asking the question and then going back to double check the quality of the data points leading to such results.

Not enough team input included. Business analytics can be an expensive investment, given the cost of talent and also the amount of time that may be needed before benefits are realized. At the same time, the value of soft power in analytics cannot be underestimated. Getting the right data together, analysing models correctly and asking the right questions of the output requires as much creativity, business expertise and experience as possible, and therefore even if the core analytics team is small it helps if they engage with as many other colleagues as possible to get additional perspectives on their line of thinking.

Biased interpretations. Sometimes, though, the power of experience and gut feel may also come in the way, and that’s when there has to be that interesting discussion and debate between the statistician who is detached from the business, and the business expert who has so much knowledge of the domain that they may only expect to see validation of what they guess to be right. It’s a risky situation when either one side forces through a view that is potentially biased because it ignores either the data or past experience in some way, either consciously or inadvertently.

Group think. Group think is a twist on the biased interpretation issue. It refers to the phenomenon where those who don’t really have an opinion or don’t wish to voice an opinion defer to another opinion that seems more credible for whatever reason, and go along with it without really having any basis for doing so. As an example, in a group of three persons, if one states an opinion and the other two go along with it without any real conviction or basis other than accepting the authoritative knowledge of that person, on paper the result would be that three people voiced the same opinion. Group think is a common phenomenon and since the risks of it happening can be expensive it is something that any analytics leader should be aware of.

Unintended illegality. This is a risk that is relatively easier to control. When the analytics team is given the freedom to gather together whatever data they feel is necessary for their work there should be a control in place and exercised that ensures that the collection and use of any of that data is not illegal from the perspective of factors such as security, privacy and confidentiality. It is quite usual to allow employees to access several kinds of data, but there could be a risk that they may be unaware that the manner in which they intend to use it may constitute an illegality.

There may be more risks that may be added to the list, depending on the specifics of the project. The point here is that given that analytics involves the application of creativity along with probabilistic statistics there is a case for the adoption of a formal risk management process in analytics projects to improve the chances of achieving more reliable project outcomes.

Risk Management in Analytics Projects