This is Part 2 of a 2-part article. Click here to read Part 1.
Downstream Data Operations
Downstream is primarily concerned with distribution and serving of the data. Distribution refers to the transportation of data to its intended targets and is therefore a primarily network infrastructure concept that deals with:
- Latency: How fast data can be delivered. If the data is used in real-time decision making, for example, it has to be delivered in a timely way as stale data loses much of its value for real-time decision making purposes.
- Throughput: The amount (size) of data that is delivered during a given time window (e.g. 1 min). This will influence the number of simultaneous users downstream systems can support as those users will load up the data transportation infrastructure in a similar fashion to residential homes loading up the power grid.
- Availability: Redundancy and fail over capabilities of the distribution network that enables guaranteed or non-guaranteed data delivery to downstream systems.
As in midstream, cost is a big factor in the specification of elements of the distribution network that deals with each of those concerns and pragmatic choices have to be made to attain optimum cost/value mix (e.g. choose high-cost, redundant distribution networks only for business critical downstream feeds but stick to low-cost, non-redundant networks for all others).
Serving (or servicing) of the data is the second and perhaps most important concern of the downstream. Serving refers to the set of processes, tools and transformations that makes data ready to be consumed by the end users upon receiving it from the distribution network. This might involve not only the way how data is consumed by the users (see the following discussion) but all the transformations data has to go through to make such consumptions possible. These transformations will take place on the fly (i.e. upon the act of data being consumed) and could be customized (personalized) to make the data consumption experience of a given user optimal.
The data could be served to end users in two fundamentally different modes:
- Push: This is the traditional, report-oriented way where the scope of data to be made available is pretty much predetermined and any expansion on this scope will require downstream system changes to make it happen (e.g. systems developers adding new report types)
- Pull: This is the self-service approach where users are provided with a set of tools they can use to get the data they want. This could be achieved either through technically sophisticated interfaces (e.g. by proving programming language based interfaces like SQL query windows) or easy-to-use, designer-oriented graphical interfaces. The former will sacrifice ease-of-use and flat learning curve for the flexibility and the extent of the functionality available while the latter will be more suitable to users who are not very technically oriented. Considering the existence of users that fit the profile of both groups, the ideal thing would be to provide both kinds of interfaces: complicated but powerful programming-based interfaces for advanced users, simple but limited graphical interfaces for common users.
Recognizing the data value chain concept and three segments it is composed of (upstream, midstream and downstream) allows us to choose methods and technologies specific to each segment, as opposed to one-size-fits-all approach, with important benefits of not only being able to easily manage and operate the complex and diverse set of data operations that take place in the data value chain but also to improve those operations in an incremental and agile way.
Cetin Karakus will be speaking at the Chief Analytics Officer, Spring happening on May 2-4, 2017 in Scottsdale, Arizona. For more information, visit http://coriniumintelligence.com/chiefanalyticsofficerspring
Disclaimer: All the content provided here is for informational purposes only and belongs solely and completely to Cetin Karakus, not BP and BP is not responsible for any damage caused by any use of the content provided in this article.
By Cetin Karakus
Cetin Karakus is the Global Head of Analytics Core Strategies & Quantitative Development, Group Technology Advisor, BP IST IT&S
Cetin Karakus has almost two decades of experience in designing and building large scale software systems. Over the last decade, he has worked on design and development of complex derivatives pricing and risk management systems in leading global investment banks and commodity trading houses. Prior to that, he has worked on various large scale systems ranging from VOIP stacks to ERP systems.
In his current role, he had the opportunity to build an investment bank grade quantitative derivatives pricing and risk infrastructure from scratch. Most recently, he is working on designing a proprietary state-of-the-art BigData analytics platform for global energy markets while leading a global team of talented software engineers, analysts and data scientists.
Cetin has a degree in Electrical & Electronics Engineering and enjoys thinking and reading on various fields of humanities in his free time.