As the adoption of predictive analytics in business continues to pick up, there is also a parallel thrust to harness the use of Big Data for analytics purposes. Working with Big Data, of course, means working with huge volumes of varied data. The statistical techniques that are used in analytics are not new, but their application across a much larger number of purposes and across more functions and sectors of industry has boomed only over the past few years. This has been made possible because of the sheer number of new platforms and tools that have emerged in the technology landscape of data analysis, including Big Data analytics. The drivers for the development and introduction of all these have been the need of businesses to compete ever more fiercely, to improve the way they operate, and to better understand their customers in a very dynamic and ever-changing market environment. As the adoption of predictive analytics in business continues to pick up, there is also a parallel thrust to harness the use of Big Data for analytics purposes. Working with Big Data, of course, means working with huge volumes of varied data. The statistical techniques that are used in analytics are not new, but their application across a much larger number of purposes and across more functions and sectors of industry has boomed only over the past few years. This has been made possible because of the sheer number of new platforms and tools that have emerged in the technology landscape of data analysis, including Big Data analytics. The drivers for the development and introduction of all these have been the need of businesses to compete ever more fiercely, to improve the way they operate, and to better understand their customers in a very dynamic and ever-changing market environment.

Enter the data scientist. While some functions, like those involved in risk management and pricing in the insurance sector have had actuarians using applied statistics ever since the business first evolved, the role of the data scientist is a relatively newer one, perhaps not more than 15 years old. The data scientist’s role is an evolutionary progression from that of the earlier ones that used statistics to answer business questions, a key difference being a level of proficiency in certain areas of IT. A data scientist is typically expected to have enough skills and expertise in understanding data models, and how data can be extracted, integrated and analysed. He/she would also have a reasonably strong understanding of a business domain, to be able to start with a business question and work backwards to get together the data needed for statistical analysis and (ultimately) arrive at a predictive model that would work off correlations established between one variable and one or more other variables. And, of course, a data scientist would be able to present the findings in language that the business can understand.

Tools like SPSS, SAS, MATLAB and a few others were the early ones that dominated the market for the automation of some of the statistical analysis work. Recently, though, with the advent of Big Data, not only has the breadth of capabilities and functionalities of these tools become richer and more sophisticated, but there have been more and more tools and technologies made available that claimed to automate the creation of predictive analytics models more and more, to the extent that users could directly do the work themselves, without needing as much involvement of the data scientists. Data scientists, however, know that although this may be theoretically true, in practice it would be difficult to achieve if the lifecycle of work starts with the sourcing and extraction of data and works forward to eventually produce a predictive model. The depth of business insight needed to do this, and to keep identifying subtle changes in the environment or the data itself that would need manual intervention to course correct would make it difficult to automate the entire lifecycle.

If machine learning were used, however, to come out with predictive models the scenario could be different. Machine learning is a broader science that, in a way, works backwards from a business fact (rather than forwards) to identify all sorts of possible relationships across a larger historic data population to find out what may have been causative factors, and to then come out with as many possible predictive models. Most technologies take a shorter time to do this as compared with manual predictive modelling. The very popularly known Google driverless car is an example of applied machine learning, where a set of predictive models in use require little or no manual intervention once they have been released.

At the end of the day, whether or not manual intervention is required is really a matter of knowing how the available technologies work, and that requires a solid understanding of data science combined with a business domain understanding. As the old saying goes in the IT world, a fool with a tool is still a fool. Different techniques and technologies have different capabilities and a good understanding of the sciences and how they are to be applied is the only way to make a prudent judgement on how to balance the use of automation with manual intervention.