We live inside an explosion of data. More information is being created now than ever before. More devices are networked than ever before. This trend is likely to continue into the future. While this makes data easy to collect for companies, it also presents the challenge of sheer scale. How does a business handle data from millions, possibly billions, of sources?
To gain some insight into the cutting edge of distributed data collection, Jeff Frick (@JeffFrick), co-host of theCUBE*, from the SiliconANGLE Media team, visited the Chief Data Scientist, USA event in San Francisco, CA. There, he met up with Sam Lightstone, distinguished engineer and chief architect for data warehousing at IBM.
The discussion opened with a look at a recently announced concept technology called “Data Confluence.” Lightstone explained that data confluence was a whole new idea they’re incubating at IBM. It came from a realization that vast amounts of data is about to come upon business from distributed sources like cellphones, cars, smartglasses and others.
“It’s really a deluge of data,” Lightstone said.
The idea behind data confluence is to leave the data where it is. Lightstone described it as allowing the data sources to find each other and collaborate on data science problems in a computational mesh.
Using the power of processors at scale
Lightstone mentioned a great advantage of this concept, being able to bring hundreds of thousands, even millions of processors to bear on data where it lives. He called this a very powerful and necessary concept. Such a network must be automatic if it is to scale for hundreds of thousands of devices.
The complexities of such a system are too much for humans to deal with. Lightstone stated his goal was to make this automatic and resilient, adapting to the state of the devices connected to it. He related that with data confluence, they hoped to tap into data science for Internet of Things, enterprise and cloud use cases.
Watch the complete video interview below: