Blog header image

Let's come clean

The dark side of data analytics is data cleansing. For example, a set of SCADA data is packed with insights about the state of our wind turbines, but unfortunately it is also contaminated by a load of junk. We don't hear about this so often, since no data analyst likes to admit that a significant proportion of his or her time is spent searching for missing information, correcting errors in format, channel naming, timestamps, gaps, jumps and so on. These activities are of no direct value to our stakeholders and therefore hard to sell. Therefore we prefer to keep our dirty (data) secret, and instead focus on discussions about all the good stuff we can do, like predictive analytics, performance diagnostics and availability optimisation. But in order to apply operational data to the task of understanding the state of our equipment, we must first be able to properly differentiate between errors in the numbers and errors in the machines. This is truly a science in itself, so let's come out and admit it. The first step to recovery is accepting that we have a problem.