I was at an event for Teradata Partners this past week and listening to people like Scott Gnau, Jeff Carter, and Carson Schmidt, I learned quite a few interesting things. One of the things they discussed was that Teradata has done a lot of analysis on the usage of data in an enterprise data warehouse. They studied various different workloads on different systems in different industries and found that approximately:
1. 1% of the data is accessed by about 43% of I/O
2. 10% of the data is accessed by about 85% of I/O
3. 20% of the data is accessed by about 94% of the I/O
This is a very interesting finding and supports Teradata’s multi-temperature approach to data warehousing. If 94% of access is to only 20% of data then you get a real bump by putting that data into memory, but the benefits of putting the other 80% of your data into memory is definitely not worth the extra cost. Therefore, Teradata’s multi-temperature architecture uses main memory, SSD, high performance disk drives, and high capacity disk drives to build you a high performance but very cost effective enterprise data warehouse.