Ultra Large Systems and Big Data - Hana is not the solution

The big word of the second decade of this century will be "Big data". The affluence of information is growing steadily and exponentially. The demand to access information anywhere from everywhere is unstoppable. In a sheer consequence the information will float around in the internet, eventually every computer in the world will be able to talk to any other computer in the same universe. We are no longer talking about a mere network of computers but of Ultra-Large systems. There are typical use cases for such ultra-large systems are High-performance web shops, information harvesters.

High performance web shops

The classic amongst them is Amazon.com that can serve several million online shoppers simultaneously with a performance degrade

Information harvesters and web crawlers

These are web aware business intelligence systems that search the web for useful information. The 2012 classic are forecast systems for financial markets: in an abstract view they are identical to weather forecast systems with the difference that they do not predict real storms but financial storms. They try to detect signs of turbulences in the financial market that may lead to a financial storm like a "Black-Friday" or Lehmann-Brothers crash with the objective to be prepared early enough to take either counter action or safety measures.

Will Hana help?

Whenever there is a new hype or trend that comes along with complex challenges there is immediately near a panacea promising a simple solution to all problems. SAP Hana appears to be such thing. What is the problem with big data? We have an ultra-large amount of data, which in many cases is even growing exponentially and dynamic and fast retrieval of well-defined subsets of this data along with performing complex and often non-deterministic computations on them is the main task here. And what is SAP Hana? In its current version it is an enhanced version of the former APO live cache, a huge and expensive amount of RAM to hold data in memory for faster computation. It may be helpful for certain use cases but it  be no disruptive technology. The main  objective is fast data retrieval; flashing data to RAM is done to get higher access speeds than reading them from a hard drive. This sounds logical in first place. When you look twice you easily recognize that storing data on modern solid-state drives (SSD) will give you nearly the same access speed than a RAM buffer. But using SSD allows you to leave your installation as it is today and just replace the hard drives.

SSD brings the same acceleration than in-memory computing

But even if you do not want to use SSD it makes more sense to make use of the incredible amount of RAM claimed by HANA to build a ultra-large cache. And indeed manufacturers of hard drives are selling there latest devices with generous amount of cache RAM. Cache is definitely superior to the HANA concept. HANA will extract data from some data source like an ERP system and store them in a volatile memory for the sole purpose of data analysis. Using drive cache does widely the same but dynamically. The client application can access data from the fast RAM just as it would do with HANA with the essential difference, that there is no process needed to copy the data from a data source to the work space.

The important patents are owned by disk drive makers?

There are also many other database makers like IBM or Oracle who are working on in memory technology themselves. IBM DB2 has sort of in memory technology for 10 years, but their approach is similar to the drive makers: they add dynamical cache to the database and add intelligent algorithms to accelerate data retrieval. This includes strategies in which sequence tables and table spaces are parcoured, smart parallelisation techniques (like  Apache Hadoop) and they make use of permanently refreshed measurement data of hardware information like the drive geometry and access times of certain parts of a drive and network funnelling.

They realize that the real challenge in big data does not lies in having data in the fast data storage but in intelligent heuristic algorithms to read the data. If data is really growing exponentially then even a fast data storage will not really help. Here is where the designers of big data application providers work on. But even here it may appear that they come too late. The true winners for big data will be the makers of nowadays hard disk drives. For decades they invested in inventing even more sophisticated  algorithms to read and write data efficiently from and to the disk drive. Their archives are full with relevant patents. Not only that they will have a scientific advantage in research and knowledge, others will likely face the challenge to avoid patent infringement whenever they come up with some clever algorithm.

Appliances will the future for big data

So what will be the prediction for the future? Looking at the big picture one needs to conclude that the race will be won by those who own the patents and have the knowledge on smart algorithms. It is more likely that companies like IBM, Seagate or Western Digital will bring the real performance and correct strategy to big data, much more than Hana will ever be capable of doing so. Thinking this path further we can expect that big data will lead into melting hardware and software bringing a renaissance of specialized appliances.

Blog: