Is it possible to use something like a UNIX pipe to pump the data from the generation program into a database load program? If the data in the database is partitioned and the data generator used generates the data randomly across partitions rather than randomly within partitions, the database load program will need to filter the data as it loads. This will slow down the data load considerably.
Blogging can take a toll on one’s health due to lack of any physical activity. Hence it is important to workout regularly. This is exactly why I’m looking at buying a treadmill for myself. While searching for tips on selecting the right treadmill, I stumbled upon this site which provides a free treadmills guide and details on how to select the right treadmill lubricant.
Whatever method is chosen to generate the data, it is important to ensure that the generated data is correctly structured and distributed. If the generated data has a normal distribution and the real data does not, any query performance tests will be useless. It is also important to ensure that the generated data has the same table-to-table ratios. So, for example, in a banking data warehouse application, ensure that there is the right ratio of transactions to account, and the correct ratio of accounts to customers. If these ratios are not correct the query execution plans are likely to be different from those in the live system, and so any query performance testing may prove to be of no use.
