My SQL Dump

MySQL musings by a self professed MySQL Geek

Previous Entry Share Next Entry
Advantages of weighted lists in RDBMS processing
Swedish Chef (cooking)
swanhart
A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:
1
2
3
1

A set is similar to a list, but has the following differences:
  1. The size of the set is always known

  2. A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:
1,2
2,1
3,1
Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

Converting from lists to weighted lists (sets) may logically compress the data. Consider the following:
1, 10000000000000
2, 10
3, 1000
4, 100000
Consider the size of the list that would be produced if this list was not weighted by count. There are very few compression methods which allow the data to be operated on without compression. Aggregate SQL queries can use the weighted list easily without decompression.

You will need to post-process the data to create a list from a weighted list.

You are viewing swanhart