Solutions in the area of BIG DATA respond to the problem of ever-increasing amounts of data from various sources. The correlation and efficient processing of information is a way to make full use of the resources owned by the company.
Big Data in search of value
The modern world generates more and more data. The data is strongly diverse, unstructured and variable, e.g. data from operational, transactional, scanning or facility management systems, emails, messages, websites, or posts and comments on social networks. Until a few years ago, the processing of these types of data was technologically impossible. Now, thanks to Big Data platforms, such as Hadoop, it is possible and cost-effective to process even petabytes of data. Why then, according to a survey by Forrester Research, do companies analyze only 12% of the collected data in search of business value?
A network of medium-sized stores in Poland supports about 50 million transactions per year, with more than 400 million receipt items generated.
A nationwide bank supports 3 million clients. The behavior and history of each client are described by several thousand attributes. The scale of the bank requires an analysis of more than 3 billion data.
An online sales channel in the B2C model supports 250 thousand visitors per day. With an average 6-7 page views per visitor and approximately 100 products presented on the website, millions of events (“clickstreams”) are generated which contain information about the quality of the sales channel and the buyer preferences. Analyzing this data in real time allows for building effective recommendation models.
The use of large data sets
Technology already allows organizations not only to collect every byte of data, but, more importantly, to understand large data sets and use their value to make better business decisions.
Advanced business analytics
The tremendous increase in data generated, collected and processed by companies has resulted in the dynamic development of business analytics.
How to use Hadoop?
The Hadoop platform may complement the heterogeneous IT infrastructure of an enterprise, allowing for efficient collection and processing of large amounts of diverse and variable data.
Hadoop was designed as a distributed environment in the form of a cluster, which performs two basic functions:
data collection (storage)
effective data processing
Due to the continuous growth of data in the cluster, what is extremely important is its scalability consisting in adding subsequent machines which, in principle, do not have to be high-end servers. Another important factor is data security, which in the case of a Hadoop cluster concerns both access to data and the proper protection of data against loss in case of failure of the hardware infrastructure.