Tuesday, December 07, 2010

Tableau Software Adds In-Memory Database Engine

Summary: Tableau has added a large-scale in-memory database engine to its data analysis and visualization software. This makes it a lot more powerful.

Hard to believe, but it's more than three years since my review of Tableau Software’s data analysis system. Tableau has managed quite well without my attention: sales have doubled every year and should exceed $40 million in 2010; they have 5,500 clients, 60,000 users and 185 employees; and they plan to add 100 more employees next year. Ah, I knew them when.

What really matters from a user perspective is that the product itself has matured. Back in 2007, my main complaint was that Tableau lacked a data engine. The system either issued SQL queries against an external database or imported a small data set into memory. This meant response time depended on the speed of the external system and that users were constrained by the external files' data structure.

Tableau’s most recent release (6.0, launched on November 10) finally changes this by adding a built-in data engine. Note that I said “changes” rather than “fixes”, since Tableau has obviously been successful without this feature. Instead, the vendor has built connectors for high-speed analytical databases and appliances including Hyperion Essbase, Greenplum, Netezza, PostgreSQL, Microsoft PowerPivot, ParAccel, Sybase IQ, Teradata, and Vertica. These provide good performance on any size database, but they still leave the Tableau user tethered to an external system. An internal database allows much more independence and offers high performance when no external analytical engine is present. This is a big advantage since such engines are still relatively rare and, even if a company has one, it might not contain all the right data or be accessible to Tableau users.

Of course, this assumes that Tableau's internal database is itself a high-speed analytical engine. That’s apparently the case: the engine is home-grown but it passes the buzzword test (in-memory, columnar, compressed) and – at least in an online demo – offered near-immediate response to queries against a 7 million row file. It also supports multi-table data structures and in-memory “blending” of disparate data sources, further freeing users from the constraints of their corporate environment. The system is also designed to work with data sets that are too large to fit into memory: it will use as much memory as possible and then access the remaining data from disk storage.

Tableau has added some nice end-user enhancements too. These include:

- new types of combination charts;
- ability to display the same data at different aggregation levels on the same chart (e.g., average as a line and individual observations as points);
- more powerful calculations including multi-pass formulas that can calculate against a calculated value
- user-entered parameters to allow what-if calculations

The Tableau interface hasn’t changed much since 2007. But that's okay since I liked it then and still like it now. In fact, it won a little test we conducted recently to see how far totally untrained users could get with a moderately complex task. (I'll give more details in a future post.)

Tableau can run either as traditional software installed on the user's PC or on a server accessed over the Internet. Pricing for a single user desktop system is still $999 for a version that can connect to Excel, Access or text files, and has risen slightly to $1,999 for one that can connect to other databases. These are perpetual license fees; annual maintenance is 20%.

There’s also a free reader that lets unlimited users download and read workbooks created in the desktop system. The server version allows multiple users to access workbooks on a central server. Pricing for this starts at $10,000 for ten users and you still need at least one desktop license to create the workbooks. Large server installations can avoid per-user fees by purchasing CPU-based licenses, which are priced north of $100,000.

Although the server configuration makes Tableau a candidate for some enterprise reporting tasks, it can't easily limit different users to different data, which is a typical reporting requirement. So Tableau is still primarily a self-service tool for business and data analysts. The new database, calculation and data blending features add considerably to their power.

5 comments:

Elad Israeli said...

David,

Impressive numbers. I am curious where you got those, cos the numbers I have are quite different.

Also, a small correction - Tableau do not employ an in-memory database. An in-memory database means that the data is memory-resident, like in QlikView's case.

See here: http://tinyurl.com/2euaufc

Elad Israeli
SiSense

David Raab said...

Hi Elad,

The numbers came from Tableau.

Unless you are using some special definition of "in-memory", their new engine is indeed in-memory, at least until it runs out of space and then moves some data onto disk. Although I didn't discuss the details with them, I'd imagine that they prioritize memory use based on the specific columns required for recent queries. That is a standard approach for columnar structures and generally allows most processing to happen in-memory even if the entire database will not fit.

Elad Israeli said...

David,

Data does not need to be vertically fragmented (column-stored) in order to fit inside the RAM. Even in the case of traditional relational databases, if you have enough RAM the tables are stored entirely inside the RAM. That does not make them "in-memory databases".

While it is true that columnar databases allow for better utilization of memory, it does not automatically define them as in-memory. In-Memory databases are defined at the architecture level.

The term you are looking for is "in-memory query processing" not "in-memory databases".

The distinction is important because in entails implications to the end solution. For example, one important difference between "in-memory databases" (QlikView) and "in-memory processing" (SiSense/Tableau) is exactly in regards to whether the size of data is limited to the amount of physical RAM.

Elad

Anonymous said...

Hi David:

very good review and I agreed with your assessments, including about in-memory data engine. I think Tableau 6.0 now very competitive with Qlikview and Spotfire...
Thanks you very much for sharing your thoughts!

Andrei

Unknown said...

Hi David, thanks for sharing your article with us. Would also like to add that According to the textbook definition, an "in-memory" database is a database that lives entirely in physical memory. Two classic examples are the former Applix TM1 OLAP engine (now marketed by IBM Corp.) and Oracle Corp.'s Times Ten in-memory database. Both databases run in (and are constrained by) the physical memory capacity of a system. A decade ago, for example, the TM1 engine was limited to a maximum of 4 GB on 32-bit Windows systems. (TM1 was also available for 64-bit Unix platforms; this enabled it to scale to much bigger volumes. OLAP itself is something of a special case: most OLAP cubes run or "live" in physical memory.)Going by this definition, few so-called "in-memory" databases actually run completely in-memory; what they do, instead, is optimize for memory usage. Analytic database specialist Kognitio, for example, positions itself as an "in-memory" database; even though Kognitio extensively optimizes for memory usage, it does not run entirely in physical memory. For more information newbies can visit this link-http://customerexperiencematrix.blogspot.in/2010/12/tableau-software-adds-in-memory.html