After identifying the right team, the actual analytics work can begin, and that’s Step 5 of the cycle.
Step 5: Execute and iterate
The above seems like a simple enough statement, but in reality it takes time to produce results. First there’s the implementation of a framework to collect, process, cleanse and integrate the necessary data. If very large volumes of data need to be gathered continuously perhaps a Hadoop implementation may be necessary. Getting the right data in place, and having it scrubbed to a level of quality that is adequate to achieve statistically significant results can be a very painstaking task. Large parts of the scrubbing and integration may be automated once the business rules for doing so are created and implemented, but it’s likely that manual intervention may still be required. There could be many types of data issues, such as adequacy, consistency, errors and so on. Depending on the quality of the original data this stage of the project could easily account for 60% or more of the total project effort. It is important that it be done patiently and meticulously nonetheless. After all, data is the prime input to the whole analytics exercise, and the quality of results will be at least partly dependent on the quality of input data.
Performing statistical analysis on the data in order to analyse it may necessarily need to follow a process of experimentation and iteration, with decisions being made on the best treatments to be applied, and correlations, outliers, etc, being identified correctly at every stage with the help of a business domain subject matter expert. Once the best model has been arrived it needs to be validated repeatedly before it can be accepted for further use. Analytics is also an exercise in patience and repeated reviews. Predictive analytics is all about statistical forecasting, and the best forecast is one that provides the most comprehensive treatment to the most appropriate data that is available. Models need to be run repeatedly, and since many external variables may change in the process over time, these changes need to continuously be tracked and factored into the models periodically.