Package picalo :: Module Crosstable
[show private | hide private]
[frames | no frames]

Module picalo.Crosstable

The Crosstable module creates crosstables of data. It is similiar to Excel's powerful PivotTable function. Crosstabling takes data that is in database format and converts it to spreadsheet format, usually with summarization functions like sum, average, etc.

The primary function of this module is pivot(). The other functions are more advanced functions that show different levels of detail from the crosstabling process.

Pivoting extremely large tables can take some time, depending upon your processor speed and amount of memory. Assuming you are running Picalo in GUI mode, the functions will show a progress bar.
Classes
Crosstabler Crosstables data sets.

Function Summary
Table pivot(table, col_fields, row_fields, expressions, delimiter)
Crosstables a Picalo table into a new table.
dict pivot_map(table, col_fields, row_fields, expressions)
Matches all unique combinations of values in col_fields and row_fields with expressions run on the records that containing those unique values.
dict pivot_map_detail(table, col_fields, row_fields)
Matches all unique combinations of values in col_fields and row_fields with Tables that contain only the records with those values.
Table pivot_table(table, col_fields, row_fields, expressions)
Crosstables a Picalo table into a new table.

Variable Summary
tuple __functions__ = ('pivot', 'pivot_table', 'pivot_map', 'p...

Function Details

pivot(table, col_fields, row_fields, expressions, delimiter='_')

Crosstables a Picalo table into a new table. This routine is similar to Excel's PivotTable feature, but it is quite a bit more powerful (although not as easy to use).

The pivot function flattens the results into a two-dimensional table, with a column for each expression of each data field. Of all the pivot functions in the Crosstable module, this is the most like Excel's pivot table feature.

There are no arbitrary limits on table size in Picalo, so you can pivot tables with many values (resulting in a significant number of rows and columns). The only limit on the number of resulting rows and columns is your memory.

The first example shows a simple pivot on a single col_field, row_field, and single expression. The second example shows a more complex pivot with multiple fields and expressions. Note that the second example has some errors because the average function is being run on cells that have no values (divide by zero).

Example 1:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...  [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...  [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...  [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...  [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...  [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot(data, 'Salesperson', 'Product', 'sum(group["Amount"])')
>>> ret.view()
+-----------+-------+--------+--------+
|   Pivot   | Danny | Mollie | Totals |
+-----------+-------+--------+--------+
| Computers |   700 |    500 |   1200 |
| Mice      |   100 |    900 |   1000 |
| Monitors  |   700 |   1200 |   1900 |
| Totals    |  1500 |   2600 |   4100 |
+-----------+-------+--------+--------+
Example 2:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...  [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...  [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...  [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...  [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...  [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot(data, ['Salesperson','Customer'], ['Product', 'Region'], ['sum(group["Amount"])','mean(group["Amount"])'])
>>> ret.view()
+-----------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+--------------------+--------------------+-------------------+-------------------+--------------------+--------------------+--------------+---------------+
|      Pivot      | Danny_Brian_expr1 | Danny_Brian_expr2 | Danny_Faiz_expr1 | Danny_Faiz_expr2 | Danny_Sheng_expr1 | Danny_Sheng_expr2 | Mollie_Brian_expr1 | Mollie_Brian_expr2 | Mollie_Faiz_expr1 | Mollie_Faiz_expr2 | Mollie_Sheng_expr1 | Mollie_Sheng_expr2 | Totals_expr1 |  Totals_expr2 |
+-----------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+--------------------+--------------------+-------------------+-------------------+--------------------+--------------------+--------------+---------------+
| Computers_City  |                 0 |                 0 |              300 |              300 |               400 |               400 |                  0 |                  0 |                 0 |                 0 |                  0 |                  0 |          700 |         350.0 |
| Computers_Rural |                 0 |                 0 |                0 |                0 |                 0 |                 0 |                  0 |                  0 |               500 |               500 |                  0 |                  0 |          500 |           500 |
| Mice_City       |               100 |               100 |                0 |                0 |                 0 |                 0 |                  0 |                  0 |                 0 |                 0 |                  0 |                  0 |          100 |           100 |
| Mice_Rural      |                 0 |                 0 |                0 |                0 |                 0 |                 0 |                900 |                900 |                 0 |                 0 |                  0 |                  0 |          900 |           900 |
| Monitors_City   |                 0 |                 0 |                0 |                0 |               700 |               700 |                  0 |                  0 |                 0 |                 0 |                  0 |                  0 |          700 |           700 |
| Monitors_Rural  |                 0 |                 0 |                0 |                0 |                 0 |                 0 |                  0 |                  0 |               200 |               200 |               1000 |              500.0 |         1200 |         400.0 |
| Totals          |               100 |               100 |              300 |              300 |              1100 |             550.0 |                900 |                900 |               700 |             350.0 |               1000 |              500.0 |         4100 | 455.555555556 |
+-----------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+--------------------+--------------------+-------------------+-------------------+--------------------+--------------------+--------------+---------------+
Parameters:
table - The Picalo table that will be crosstabled
           (type=Table or TableArray)
col_fields - A single field name or a list containing the field names of the fields used to group columns.
           (type=str/list)
row_fields - A single field name or a list containing the field names of the fields used to group rows
           (type=str/list)
expressions - A single expression or a list containing expressions used to do calculations on groupings; when the expression is evaluated, the keyword 'group' denotes the matching records for each cell
           (type=str/list)
delimiter - The character(s) to use to combine field values and expressions (for row and column headers)
           (type=str)
Returns:
A new table containing the crosstabled results. The last column and last row of the table contain the row and column totals
           (type=Table)

pivot_map(table, col_fields, row_fields, expressions)

Matches all unique combinations of values in col_fields and row_fields with expressions run on the records that containing those unique values.

This function is very similar to Grouping.summarize_by_value, only it is formatted in a way to make crossstabling possible.

This is an advanced function that is used internally during the pivot technique. It is provided for advanced users who want to access the detail records during the crosstabling process. It is one step in the process beyond pivot_map_detail.

Most users should use Crosstable.pivot instead as it is more like Excel's pivot function.

Example:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...   [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...   [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...   [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...   [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...   [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...   [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot_map(data, 'Salesperson', 'Product', 'sum(group["Amount"])')
>>> ret
{
  ('Monitors', 'Mollie'): (1200,), 
  ('Computers', 'Mollie'): (500,), 
  ('Mice', 'Danny'): (100,), 
  ('Computers', 'Danny'): (700,), 
  ('Mice', 'Mollie'): (900,), 
  ('Monitors', 'Danny'): (700,),
}
Parameters:
table - The Picalo table that will be crosstabled
           (type=Table)
col_fields - A single field name or a list containing the field names of the fields used to group columns.
           (type=str/list)
row_fields - A single field name or a list containing the field names of the fields used to group rows
           (type=str/list)
expressions - A single expression or a list containing expressions used to do calculations on groupings; when the expression is evaluated, the keyword 'group' denotes the matching records for each cell
           (type=str/list)
Returns:
A dictionary of each unique key (made up of row and column combinations) mapped to their matching records.
           (type=dict)

pivot_map_detail(table, col_fields, row_fields)

Matches all unique combinations of values in col_fields and row_fields with Tables that contain only the records with those values. This function is a "mega-select" function that is the basis of the crosstabling technique.

Use this function if you just want to separate a Table in a number of subtables -- one for each unique combination of col_fields and row_fields.

This function is very similar to Grouping.stratify_by_value, only it is formatted in a way to make crossstabling possible.

This is an advanced function that is used internally during the pivot technique. It is provided for advanced users who want to access the detail records during the crosstabling process. It is the first step performed in a crosstable.

Most users should use Crosstable.pivot instead as it is more like Excel's pivot function.

Example:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...   [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...   [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...   [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...   [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...   [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...   [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...   [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot_map_detail(data, 'Salesperson', 'Product')
>>> ret
{
  ('Mollie', 'Computers'): <Table: 1 rows x 5 cols>, 
  ('Danny', 'Computers'):  <Table: 2 rows x 5 cols>, 
  ('Mollie', 'Monitors'):  <Table: 3 rows x 5 cols>, 
  ('Danny', 'Monitors'):   <Table: 1 rows x 5 cols>, 
  ('Danny', 'Mice'):       <Table: 1 rows x 5 cols>, 
  ('Mollie', 'Mice'):      <Table: 1 rows x 5 cols>
}
Parameters:
table - The Picalo table that will be crosstabled
           (type=Table)
col_fields - A single field name or a list containing the field names of the fields used to group columns.
           (type=str/list)
row_fields - A single field name or a list containing the field names of the fields used to group rows
           (type=str/list)
Returns:
A dictionary of each unique key (made up of row and column combinations) mapped to Tables containing their matching records.
           (type=dict)

pivot_table(table, col_fields, row_fields, expressions)

Crosstables a Picalo table into a new table. This routine is similar to Excel's PivotTable feature, but it is quite a bit more powerful (although not as easy to use).

This version creates does not flatten results to a two-dimensional table like the pivot function. It is a more advanced way of pivoting than the pivot function because it creates a list of expressions results for each cell. In other words, the results table is not normalized, but contains lists within each cell that contain the results. When a single col_field, row_field, and expression is given, this function produces the exact same results as pivot().

This way of pivoting is useful if you want a single column for each col field value and a single row for each row field value. Even if you provide multiple expressions and/or multiple data fields, you'll still only get one col/row match for a given value set. The multiple expressions on the multiple data fields will be contained in a list in the cell for each row/col match.

There are no arbitrary limits on table size in Picalo, so you can pivot tables with many values (resulting in a significant number of rows and columns). The only limit on the number of resulting rows and columns is your memory.

The first example shows a simple pivot on a single col_field, row_field, and single expression. The second example shows a more complex pivot with multiple fields and expressions. Note that the second example has some errors because the average function is being run on cells that have no values (divide by zero).

Example 1:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...  [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...  [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...  [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...  [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...  [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot_table(data, 'Salesperson', 'Product', 'sum(group["Amount"])')
>>> ret.view()
+-----------+-------+--------+--------+
|   Pivot   | Danny | Mollie | Totals |
+-----------+-------+--------+--------+
| Computers |   700 |    500 |   1200 |
| Mice      |   100 |    900 |   1000 |
| Monitors  |   700 |   1200 |   1900 |
| Totals    |  1500 |   2600 |   4100 |
+-----------+-------+--------+--------+
Example 2:
>>> # create a test table
>>> data = Table([
...  ('Region', unicode),
...  ('Product', unicode),
...  ('Salesperson', unicode), 
...  ('Customer', unicode),
...  ('Amount', int),
... ],[
...  [ 'Rural','Computers',  'Mollie', 'Faiz',  500 ],
...  [ 'City',  'Monitors',  'Danny',  'Sheng', 700 ],
...  [ 'Rural', 'Mice',      'Mollie', 'Brian', 900 ],
...  [ 'City',  'Computers', 'Danny',  'Faiz',  300 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Sheng', 500 ],
...  [ 'City',  'Mice',      'Danny',  'Brian', 100 ],
...  [ 'Rural', 'Monitors',  'Mollie', 'Faiz',  200 ],
...  [ 'City',  'Computers', 'Danny',  'Sheng', 400 ],
... ])
>>> # perform the pivot
>>> ret = Crosstable.pivot_table(data, ['Salesperson','Customer'], ['Product', 'Region'], ['sum(group["Amount"])','mean(group["Amount"])'])
>>> ret.view()
+------------------------+--------------------+-------------------+--------------------+---------------------+--------------------+---------------------+----------------------------+
|         Pivot          | ('Danny', 'Brian') | ('Danny', 'Faiz') | ('Danny', 'Sheng') | ('Mollie', 'Brian') | ('Mollie', 'Faiz') | ('Mollie', 'Sheng') |           Totals           |
+------------------------+--------------------+-------------------+--------------------+---------------------+--------------------+---------------------+----------------------------+
|  ('Computers', 'City') |             (0, 0) |        (300, 300) |         (400, 400) |              (0, 0) |             (0, 0) |              (0, 0) |               (700, 350.0) |
| ('Computers', 'Rural') |             (0, 0) |            (0, 0) |             (0, 0) |              (0, 0) |         (500, 500) |              (0, 0) |                 (500, 500) |
|       ('Mice', 'City') |         (100, 100) |            (0, 0) |             (0, 0) |              (0, 0) |             (0, 0) |              (0, 0) |                 (100, 100) |
|      ('Mice', 'Rural') |             (0, 0) |            (0, 0) |             (0, 0) |          (900, 900) |             (0, 0) |              (0, 0) |                 (900, 900) |
|   ('Monitors', 'City') |             (0, 0) |            (0, 0) |         (700, 700) |              (0, 0) |             (0, 0) |              (0, 0) |                 (700, 700) |
|  ('Monitors', 'Rural') |             (0, 0) |            (0, 0) |             (0, 0) |              (0, 0) |         (200, 200) |       (1000, 500.0) |              (1200, 400.0) |
| Totals                 |         (100, 100) |        (300, 300) |      (1100, 550.0) |          (900, 900) |       (700, 350.0) |       (1000, 500.0) | (4100, 455.55555555555554) |
+------------------------+--------------------+-------------------+--------------------+---------------------+--------------------+---------------------+----------------------------+
Parameters:
table - The Picalo table that will be crosstabled
           (type=Table or TableArray)
col_fields - A single field name or a list containing the field names of the fields used to group columns.
           (type=str/list)
row_fields - A single field name or a list containing the field names of the fields used to group rows
           (type=str/list)
expressions - A single expression or a list containing expressions used to do calculations on groupings; when the expression is evaluated, the keyword 'group' denotes the matching records for each cell
           (type=str/list)
Returns:
A new table containing the crosstabled results. The last column and last row of the table contain the row and column totals
           (type=Table)

Variable Details

__functions__

Type:
tuple
Value:
('pivot', 'pivot_table', 'pivot_map', 'pivot_map_detail')              

Generated by Epydoc 2.1 on Mon Aug 20 05:38:17 2007 http://epydoc.sf.net