If data is the answer, what is the question?
In Douglas Adams’ Hitchhiker’s Guide to the Galaxy, a group of superbeings build a computer – Deep Thought – to learn the ultimate answer to life, the universe and everything. After a very very very long time of deep thought and computation and checking the solution, Deep Thought gives the answer: 42.
Seems meaningless, because it is. Without a context, so random that you can read anything into it. The computer itself points out the lack of any real meaning to the answer, because the super-intelligent being who created it for this sole purpose didn’t actually know what the question was!
Moral of the story: You can have all the data and computational power you need, but unless you ask the right questions, the answer is ultimately meaningless. Unless one is just happy fishing in data lakes hoping for that big insight (read answer) to bite. Or using data visualisation like a Rorschach test, and giving in to confirmation bias.
The best use of data is when the right question is asked.
Because if you ask the right questions and you have a clear purpose in mind, you can accurately predict the rise and fall of empires eons in advance. If that sounds like science fiction, it is for now, at least on the galactic scale.
I’m referring to Psychohistory, the fictional science created by Isaac Asimov in his Foundation series. A form of mathematical sociology coupled with statistics and history, Psychohistory was created by the mathematician Hari Seldon to make predictions about the future on a large scale, using massive amounts of historical data and behavioural information. The only two conditions being: the population whose behaviour was modelled should be sufficiently large and that they should not know the results of the application of psychohistorical analyses.
Using the data well, Hari Seldon foresees the outcome that psychohistory predicts: The imminent fall of the current empire and a dark age lasting 30,000 years before a second great empire rises again. But also an alternative scenario in which the dark age lasts only a thousand years. Now that the question has been asked and the answers given, Seldon creates the titular ‘Foundation’ to guide humanity towards the more favourable outcome.
The problem with not asking the right question, and just waiting for something to emerge from data being crunched, is not just that you end up with a meaningless ‘answer’, you have to invest more to find out the question
So, it’s not just the predictions that we derive from data that are important, but the underlying assumptions and processes underlying the predictions and projected scenarios are equally significant. Making data tell us not just ‘what’ but also the ‘why’.
The problem with not asking the right question, and just waiting for something to emerge from data being crunched, is not just that you end up with a meaningless ‘answer’, you have to invest more to find out the question, and what the context to the answer really is. Like the superbeings of Hitchhiker’s Guide find out when they ask Deep Thought to come up with the Ultimate Question to 42, the Ultimate Answer. Deep Thought admits itself to be unequal to the task and suggests building an even greater computer that can, with the big difference that this computer will have living beings as a part of its computational matrix.
That computer turns out to be earth, which in the world of Hitchhiker’s was basically created to find out the question to which the answer is ‘42’. Unfortunately for the superbeings, earth is destroyed mere minutes before the Question is computed. Unwilling to start all over again, and still lacking a real question and badly seeking one, the superbeings settle for ‘How many roads must a man walk down?’ The question was blowing in the wind, and picked out of thin air. But really, randomly from Bob Dylan’s song. See what happens when you don’t know what you’re asking of data or even worse, don’t know the questions to the answers you seek!
On the other extreme, the right question could also be the last question, answered once there is sufficient data. I refer of course to one of the most famous sci-fi short stories ever written, The Last Question by Isaac Asimov, and which was also his personal favourite from all of his prodigious output. Here the question comes first, and data later. Quick aside: I’ve always wanted to ask this trick question to corporate-types who only talk about ‘big data’ — “What came first, the data or your question? If at all there was a question.”
The Last Question features generations of a fictional supercomputer called Multivac trying to answer one simple question, originally asked of it by a couple of drunk technicians making a bet: “Will mankind one day without the net expenditure of energy be able to restore the sun to its full youthfulness even after it had died of old age?” In other words, can entropy be reversed? There is insufficient data in the beginning to answer the question. What happens next will stun you! Honest! There is a reason why this story is so well acclaimed.
It ultimately comes down to the fact that data is nothing without the right goal, the right question — and what that is, is up to you.
It ultimately comes down to the fact that data is nothing without the right goal, the right question — and what that is, is up to you. And I hope you agree with me when I say that as we’ve seen above if the answer does indeed lie in – and with – data, you can do justice to it just by asking the right question. Because data itself is never the answer. Of course there are other dimensions to data and we’ve only scratched the surface here. But for now, let’s go back to The Last Question, and if you’re wondering where you can read it and why it was the author’s favourite, I’ll do one better and point you in the direction of this awesome graphic rendition of the story by the Korean comic book artist Ryul.
Enjoy the read, Live Long and Prosper, and hope to see you again see you next week.