Recently, at the Gartner BI Conference, IBM announced its Dynamic Warehousing initiative. Dynamic Warehousing is about "real-time", "incorporation of unstructured data into warehouses", "embedding business intelligence into business processes", and about our other "Information on Demand" initiatives. Today, I will talk to you about the incorporation of unstructured data into warehouses.
Many of us are able to spout the statistic -- over 85% if the data in an enterprise are unstructured. Yeah? And it needs to be managed! Therefore good for the content management guys! But is it really useful for business intelligence? Everyone has a nagging suspicion that not looking at 85% of your data cannot be good for making business decisions, but then, they are doing fine without it, no? It turns out that we are just at the beginnings of exploiting unstructured information for the purpose of business intelligence.
Take a call center application, for example, in the automotive space. Of course, standard, structured information about who called, when they called etc. is available. But attached to the call record is the notes of the CSR. And what do they contain? Often jewels of information. Mistyped ("break" as opposed to "break"), abbreviated ("bk" as opposed to "brake") but information that is critical if one were to determine what type of problems are occurring in the field. Dimensional information is hidden in these. If we could only extract one more dimension, say "part", wouldn't my business intelligence be significantly improved? Indeed, and that is what our dynamic warehousing capabilities enable our customers to do. Through analytics powered by UIMA, metadata are extracted from unstructured columns, and added to the dimensional structure. And as a corollary, the millions of dollars invested in an enterprise warehouse -- the ETL, the warehouse database, and the business intelligence tools, are all leveragable over unstructured data.
Read Marc Andrew's posting on some several excellent examples of companies exploiting unstructured data in their warehousing/business intelligence play. As I said, we are just at the beginning...
This is an exciting area for me as a CTO, for several reasons.
First, this is just one of the techniques that people are using. Another class of techniques add business intelligence to unstructured data through building different data structures, including for example the facets technology found in IBM Omnifind and Endeca.
The research into being able to extract structure out of unstructured data is of course very rich, and many players (e.g., Autonomy, Attensity, Intelliseek, nStein etc.) provide such capabilities, many around the IBM UIMA framework. However, new techniques are emerging, such as Avatar, being pioneered by a set of young (at least compared to me) researchers at IBM's Almaden Research Center. And people are looking at what it means to do OLAP on (often uncertain or probabilistic) information derived from unstructured data. Afterall, in general, if a sale to customer x occurred in quarter y, modulo deliberate malfeasance, one could depend on that fact for business intelligence. But if an analytical technique tells us that the customer problem is about product z, can we really do rollups and aggregates on it?
New interaction paradigms (switching between BI and search metaphor irrespective of the underlying corpus) are beginning to emerge.
The new combination of structured and unstructured information analysis techniques are not only coming across as horizontal product capabilities (such as in IBM's data warehouse edition), but equally importantly, they are being assembled to solve real customer problems. And important step in this direction is IBM's Omnifind Analytics Edition, which is a solution enabler to solve our clients' customer care, quality insight, market insights, risk and compliance, and research and intelligence problems that require the leveraging of structured and unstructured data.
So our dynamic warehousing capabilities are a very important step in the direction of exploiting unstructured data into the existing business intelligence infrastructure, and this will continue to be an area with a lot of innovation, including, but not exclusively, from IBM