How to break down data silos and fully use the business potential of data?28 October 2019
An interview with Kamil Folkert, PhD, CTO at 3Soft.
3Soft: Why break down data silos?
In my opinion, data silos are the biggest obstacle to implementing the Data-Driven Business approach in organizations. Stiff structures and processes established over the years inhibit the flow of valuable data, which negatively affects the achievement of strategic objectives and building a competitive advantage.
Individual departments within a company use only fragmented information in their work that does not reflect the real situation. This leads to making bad decisions based on incomplete data.
Paradoxically, for some managers silos are an alibi of sorts, allowing them to avoid giving comprehensive answers to difficult questions. In large companies it becomes problematic, for example, to precisely determine the number of customers. The painstaking task of connecting the dispersed pieces of information together is bypassed by reducing the scope of data to a particular department or region. Although correct, this data does not present the full business potential, nor does it allow for a holistic view of the situation.
3Soft: How do Data Science teams handle data silos?
The transformation towards Data-Driven Business does not simply mean launching a platform, chaotically feeding it with raw data and sharing it with the Data Science team. This process requires making profound changes in the organizational culture, getting the organization to focus on breaking down the silos, sharing information and ensuring data quality.
From the analysts’ point of view, fragmented data is an obstacle to developing models and identifying patterns, trends and cause-effect relationships. Effective modeling of complex business events with the use of automated algorithms is only possible on the basis of valuable, complete and consistent data.
The key to success is the correct integration of data. We have a number of technological solutions at our disposal in the form of specific programming frameworks, programming languages used and models of distributed data processing. The implementation of the Data Lake concept allows for flexible and cost-effective storing of all company’s data in one place.
3Soft: Why, despite the technological possibilities, are there still data silos in many companies?
I think there are several reasons. First of all, the process of dismantling data silos is long-lasting and requires cross-departmental involvement of employees at all levels of the organizational structure. It is necessary to introduce a number of new initiatives and procedures that will help to change the organization’s approach to data. The priority is to build an organizational culture based on ensuring that data is always consistent, high-quality and up-to-date. Such data is bound to have real business value.
From the technological point of view, the dismantling of data silos also requires a systemic approach. Digitalization offers the possibility of using many specialized tools that enable analyzing data from various sources. However, the use of each of these tools requires a different set of competences. Sometimes it is difficult or even impossible for a person who has specialized in one technology to quickly learn another one. Consequently, the lack of appropriate competences prevents conducting a cross-cutting analysis of data from all areas.
Additionally, companies face the problem of technological debt. Modern technologies offer many opportunities, but replacing old systems with new ones is generally costly and sometimes even impossible due to organizational reasons. As a consequence, it is necessary to store duplicate data in several places, which makes its arrangement and integration quite a challenge.
3Soft: How does one get started to effectively dismantle data silos?
In my opinion, the answer is the Data Lake architecture. The application of this concept brings evolution – not revolution – in the silos dismantling process. We offer our customers the implementation of a platform somewhat beside the already existing architecture, which does not interfere with the operation of systems used in the organization. This approach allows to safely and effectively run an integrated data management platform. The decision on when to transfer all the processes to a new platform and get rid of the technological debt is left to the people responsible for the individual processes in the organization.
Integrated data management platforms built by the 3Soft team enable stream and batch processing of structured and unstructured data at every stage – from loading it from source systems, through technical and business transformations, data cleaning and imputation, exploratory data analysis (preliminary conclusions), statistical modelling, process automation (model training and the daily application of models in business), to building structures for reports and dashboards.
We make sure that the entire process is conducted in a way that enables efficient data management. So that new data sources can be identified and described, metadata can be defined, owners of specific data can be assigned and a dictionary of business terms can be managed. All this in order to be able to answer, if necessary, the questions of where particular data comes from and how the change of sources affects the already developed models of machine learning and reports.
3Soft: What challenges are associated with building the platform?
It is worth pointing out that until now, data silos were usually created in systems maintained in the infrastructure located in one or more data centers. However, more and more companies are starting to use cloud solutions, resulting in silos being created at a completely new level – between clouds offered by different suppliers. So, there is a need to build multi-cloud or hybrid platforms. At 3Soft, we meet these challenges by offering solutions based on the Hadoop technology distributed by Cloudera – a supplier that focuses on the development of its product in the any-cloud and cloud-first model. As a result, we have created a unique competence team, which allows us to both fully exploit the potential of machine learning and artificial intelligence, as well as draw on the latest technological advances related to cloud (in particular, we have observed an increasingly dynamic adoption of Microsoft Azure in Poland). They enable dynamic scaling of environments in order to match the current load, agile launching of new clusters on demand and implementation of disaster recovery based on several linked cloud data centers.
3Soft: How do you cooperate with customers in terms of the silos removal strategy?
First, we discuss the main problems and diagnose the situation during workshops with data owners. Together we examine domain systems to see if they contain complete data, if the data is reconcilable and how data streams and stream processing are integrated. This allows us to evaluate which data acquisition stages are most neglected and where the lowest hanging fruits are located. On the basis of the collected information, we design a data management platform, taking into account scalability and specific processing possibilities. Our solutions are implemented in distributed environments, depending on customer needs – on-premise, in the cloud or as hybrid solutions. They also include functions relating to security, data management and business continuity. While developing data architecture and implementing the platform, we closely cooperate with the client’s team, as the proposed solutions affect individual business problems and business needs. The implementation process is carried out iteratively, with a time regime maintained, including service windows and we also provide continuous project architect’s supervision.
3Soft: What should be the first step towards eliminating data silos in an organization?
I definitely recommend a workshop meeting with 3Soft experts. It will help to quickly identify the most important problems and to plan further steps.