| Home | Trees | Index | Help |
|
|---|
| Package picalo :: Module Simple |
|
| Function Summary | |
|---|---|
| Table |
Calculates the zscore for each value in a column. |
| Table |
Joins two tables together (similar to a regular SQL inner join) that have matching values in the given columns. |
| Table |
Joins two tables together (similar to a regular SQL LEFT join) that have matching values in the given columns. |
| Table |
Finds rows in the two tables with values that match in specific columns. |
| list |
Finds records in the two tables with values that match in specific columns and returns two new tables that contain everything but those records. |
| list |
Finds records in the two tables with values that match in specific columns and returns two new tables that contain only those records. |
| Table |
Joins two tables together (similar to a regular SQL RIGHT join) that have matching values in the given columns. |
| Table |
Runs through a table sequentially and compares each record with the one after it. |
| Table |
Finds rows in the two tables with values that match as returned by the specified expression. |
| list |
Finds records in the two tables with values that do not match based upn the give expression and returns two new tables that contain only those records. |
| list |
Finds records in the two tables with values that match based upon the give expression and returns two new tables that contain only those records. |
| Table |
Prepares a set of statistical descriptives for the given columns in the given table. |
| Table |
Finds records in the two tables where the expression evalutes True. |
| Table |
Finds all duplicates in the given columns. |
| Table |
Finds gaps in the sequence of values in the given col. |
| Table |
Joins two tables based upon a fuzzy match of two column values. |
| float |
Calculates a fuzzy match of text1 and text2. |
| float |
Calculates a fuzzy search to see if searchtext is found in fulltext. |
| Table |
Finds all of the records that are out of order as measured by the values of the given cols. |
| Table |
Joins two tables together (similar to a regular SQL inner join) where the expression evalutes True. |
| Table |
Joins two tables together (similar to a regular SQL LEFT join) where the expression evalutes True. |
| Table |
Joins two tables together (similar to a regular SQL RIGHT join) where the expression evalutes True. |
Selects records from a table based upon a custom expression and returns a new table including only those records. | |
| Table |
Selects record from a table given key=value pairs and returns a new table including only those records. |
| Table |
A convenience function to select non-outliers from a table. |
| Table |
A convenience function to select non-outliers from a table. |
| Table |
A convenience function to select outliers from a table. |
| Table |
A convenience function to select outliers from a table. |
| Table |
Selects records from a table given a number of table record indices and returns a new table including only those records. |
Sorts a table by values in one or more columns. | |
| str |
Calculates a soundex computation for the given text. |
Calculates the soundex score for each value in a column and appends the scores as a new column in the table. | |
| Table |
Transposes the table, which means the columns and rows are switched. |
Internal method to perform joining. | |
| Variable Summary | |
|---|---|
list |
__descriptives__ = ['NumRecords', 'NumEmpty', 'NumNone',...
|
dict |
__dm__ = {'NumZero': 3, 'NumEmpty': 1, 'Min': 9, 'Median...
|
tuple |
__functions__ = ('col_join', 'col_left_join', 'col_right...
|
int |
round_descriptives_to = 2 |
| Function Details |
|---|
calc_zscore(table, col)Calculates the zscore for each value in a column. It is useful to add the zscore values as a column in the original table.
|
col_join(table1, table2, *match_cols)Joins two tables together (similar to a regular SQL inner join) that have matching values in the given columns. This function is much faster than Simple.join() because it uses table indices. However, it is much slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.col_join(table1, table2, ['col000','col000'], ['col001','col002']) >>> matches.view() +--------+--------+----------+----------+--------+ | col000 | col001 | col000_2 | col001_2 | col002 | +--------+--------+----------+----------+--------+ | 3 | 2 | 3 | Bailey | 2 | | 3 | 2 | 3 | Sally | 2 | +--------+--------+----------+----------+--------+
|
col_left_join(table1, table2, *match_cols)Joins two tables together (similar to a regular SQL LEFT join) that have matching values in the given columns. All records in table1 are returned, and only matching records in table2 are joined with them. This function is much faster than Simple.join() because it uses table indices. However, it is much slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.col_left_join(table1, table2, ['col000','col000'], ['col001','col002']) >>> matches.view() +--------+--------+----------+----------+--------+ | col000 | col001 | col000_1 | col001_1 | col002 | +--------+--------+----------+----------+--------+ | 3 | 2 | 3 | Bailey | 2 | | 3 | 2 | 3 | Sally | 2 | | 4 | 5 | <N> | <N> | <N> | +--------+--------+----------+----------+--------+
|
col_match(table1, table2, *match_cols)Finds rows in the two tables with values that match in specific columns. This method is usually used only internally, but it is available for general use for those who want it. (in other words, most users generally use other methods like join, col_match_same, etc.) Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.col_match(table1, table2, ['col000','col000'], ['col001','col002']) >>> matches.view() +--------------+--------------+--------+ | table1record | table2record | key | +--------------+--------------+--------+ | 0 | 0 | (3, 2) | | 0 | 2 | (3, 2) | +--------------+--------------+--------+
|
col_match_diff(table1, table2, *match_cols)Finds records in the two tables with values that match in specific columns and returns two new tables that contain everything but those records. Stated differently, this method filters all matching rows out of the two tables and returns the filtered tables (the original tables are not modified). All records that do not have matching keys in the other table are included. This function is the opposite of col_match_same. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> match1, match2 = Simple.col_match_diff(table1, table2, ['col000','col000']) >>> match1.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 4 | 5 | +--------+--------+ >>> match2.view() +--------+--------+--------+ | col000 | col001 | col002 | +--------+--------+--------+ | 1 | Dan | 2 | +--------+--------+--------+
|
col_match_same(table1, table2, *match_cols)Finds records in the two tables with values that match in specific columns and returns two new tables that contain only those records. Stated differently, this method filters all non-matching rows out of the two tables and returns the filtered tables (the original tables are not modified). All records that have matching keys in the other table are included. This function is the opposite of col_match_diff. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> match1, match2 = Simple.col_match_same(table1, table2, ['col000','col000'], ['col001','col002']) >>> match1.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 3 | 2 | +--------+--------+ >>> match2.view() +--------+--------+--------+ | col000 | col001 | col002 | +--------+--------+--------+ | 3 | Bailey | 2 | | 3 | Sally | 2 | +--------+--------+--------+
|
col_right_join(table1, table2, *match_cols)Joins two tables together (similar to a regular SQL RIGHT join) that have matching values in the given columns. All records in the table2 are returned, and only matching records in table1 are joined with them. This function is much faster than Simple.join() because it uses table indices. However, it is much slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.col_right_join(table1, table2, ['col000','col000'], ['col001','col002']) >>> matches.view() +--------+--------+--------+----------+----------+ | col000 | col001 | col002 | col000_1 | col001_1 | +--------+--------+--------+----------+----------+ | 3 | Bailey | 2 | 3 | 2 | | 1 | Dan | 2 | <N> | <N> | | 3 | Sally | 2 | 3 | 2 | +--------+--------+--------+----------+----------+
|
compare_records(table, expression)Runs through a table sequentially and compares each record with the one after it. In other words, if table has 4 records, this method compares 0<=>1, 1<=>2, and 2<=>3. It runs the given expression for each set and returns the indices of those sets that return True. The expression should always evaluate to True or False. The expression should evaluate record1 against record2, as in: {"record1['id'] == record2['id'] - 1"} The index of the second record is stored in the results list. So if the expression compares the third and fourth records and evaluates true, index 4 is stored in the results. Example: >>> table = Table([('col000', int)], ([8000], [2000], [9000], [10000], [8000])) >>> results = Simple.compare_records(table, "record1['col000'] < record2['col000']") >>> results.view() +-------------+ | Row Indices | +-------------+ | 2 | | 3 | +-------------+
|
custom_match(table1, table2, expression)Finds rows in the two tables with values that match as returned by the specified expression. This is a more general version of col_match. It is significantly slower than col_match because it can't uses indices. It is O^2. The expression should use "record1" for the current record in table 1 and "record2" for the current record in the table 2, and it should evaluate to True or False. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> # match if the first column matches (granted, a very simple comparison in this example)P >>> matches = Simple.custom_match(table1, table2, "record1[0] == record2[0]") >>> matches.view() +--------------+--------------+ | table1record | table2record | +--------------+--------------+ | 0 | 0 | | 0 | 2 | +--------------+--------------+
|
custom_match_diff(table1, table2, expression)Finds records in the two tables with values that do not match based upn the give expression and returns two new tables that contain only those records. Stated differently, this method filters all matching rows out of the two tables and returns the filtered tables (the original tables are not modified). All records that do not have matching keys in the other table are included. The expression should use "record1" for the current record in table 1 and "record2" for the current record in the table 2, and it should evaluate to True or False. This function is the opposite of custom_match_same. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> # match if the first column matches (granted, a very simple comparison in this example) >>> match1, match2 = Simple.custom_match_diff(table1, table2, "record1[0] == record2[0]") >>> match1.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 3 | 2 | +--------+--------+ >>> match2.view() +--------+--------+--------+ | col000 | col001 | col002 | +--------+--------+--------+ | 3 | Bailey | 2 | | 3 | Sally | 2 | +--------+--------+--------+
|
custom_match_same(table1, table2, expression)Finds records in the two tables with values that match based upon the give expression and returns two new tables that contain only those records. Stated differently, this method filters all non-matching rows out of the two tables and returns the filtered tables (the original tables are not modified). All records that have matching keys in the other table are included. The expression should use "record1" for the current record in table 1 and "record2" for the current record in the table 2, and it should evaluate to True or False. This function is the opposite of custom_match_diff. Example:>>> employees = Table([('col000', unicode), ('col001', unicode)], (['Bailey', '123 North Way'], ['Sally', '456 Dety'])) >>> vendors = Table([('col000', unicode), ('col001', unicode)], (['ABC Comp', '789 Maple'], ['DEF Enterprises', '123 Nth Way'])) >>> t1, t2 = Simple.custom_match_same(vendors, employees, "Simple.fuzzymatch(record1[1], record2[1]) > .5") >>> t1.view() +-----------------+-------------+ | col000 | col001 | +-----------------+-------------+ | DEF Enterprises | 123 Nth Way | +-----------------+-------------+ >>> t2.view() +--------+---------------+ | col000 | col001 | +--------+---------------+ | Bailey | 123 North Way | +--------+---------------+
|
describe(table, *cols)Prepares a set of statistical descriptives for the given columns in the given table. While these descriptives could be retrieved from the included stats module, this method gives quick and easy access to the main statistical measures, such as mean, median, counts, totals, and so forth. The descriptives available are shown in the example below. If no columns are designated, descriptives are run for every column in the table. Note that this method is not optimized yet, so it might take a while for large tables with many columns. For numeric descriptives like the mean or std deviation, text fields and empty fields are ignored. If no numeric descriptives can be calculated, the field is set to None. Since this function is meant to show a descriptive view of a table (and not to provide hard statistical values), most fields are rounded off to 2 decimal places for readability. Example:>>> table = Table([ ... 'Name', ... 'Pay', ... ],[ ... ['Homer', 0], ... ['Marge', 20], ... ['Bart', None], ... ['Lisa', 152], ... ['Maggie', ''], ... ]) >>> descriptives = Simple.describe(table) >>> descriptives.view() +-------------+------+---------+ | Descriptive | Name | Pay | +-------------+------+---------+ | NumRecords | 5 | 5 | | NumEmpty | 0 | 2 | | NumZero | 0 | 1 | | NumText | 5 | 0 | | NumNumeric | 0 | 3 | | Median | | 20.0 | | Mean | | 57.33 | | Max | | 152.0 | | Min | | 0.0 | | Variance | | 6821.33 | | StdDev | | 82.59 | | Total | | 172.0 | +-------------+------+---------+
|
expression_match(table1, table2, expression)Finds records in the two tables where the expression evalutes True. This method is usually used only internally, but it is available for general use for those who want it. (in other words, most users generally use other methods like join, col_match_same, etc.) The resulting table shows the matching records indices in table 1 and table 2. The variable "record1" is set to a record in table1, and "record2" is set to a record in table2. The expression is evaluated for each record in table 1 and table 2. This function is *much* slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. However, although it is slower, this function can evalute any expression (including the use of Picalo functions) in the join. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.expression_match(table1, table2, "record1[0] == record2[0]") >>> matches.view() +--------------+--------------+ | table1record | table2record | +--------------+--------------+ | 0 | 0 | | 0 | 2 | +--------------+--------------+
|
find_duplicates(table, *cols)Finds all duplicates in the given columns. Some audit procedures look for duplicates in columns where duplicates should not exist (such as invoice numbers). This function supports this need. Example:>>> table = Table([('col000', int), ('col001', int)], ([1,6000], [2,6000], [2,4000], [3,5000], [5,5000], [6,5000])) >>> dups = Simple.find_duplicates(table, 'col000') >>> dups.view() +-----+--------+ | Key | Rows | +-----+--------+ | 2 | [1, 2] | +-----+--------+Another example:
|
find_gaps(table, ascending, column)Finds gaps in the sequence of values in the given col. A gap is where the value of the column in record1 - the value of the column in record2 does not equal 1. This function supports audit procedures that look for gaps in ordinarily-sequential column values, such as invoice numbers. Gaps indicate missing values that are normally the target of further research. Example to find gaps in a set of invoice numbers and amounts>>> table = Table([('col000', int), ('col001', int)], ([1,6000], [2,6000], [3,5000], [5,5000], [6,5000])) >>> gaps = Simple.find_gaps(table, True, 'col000') >>> gaps.view() +----------+ | Gap Rows | +----------+ | 3 | +----------+
|
fuzzycoljoin(table1, col1, table2, col2, matchpercent, ngramlen=3, ignorecase=True)Joins two tables based upon a fuzzy match of two column values. The resulting table has rows of both tables that match. This method uses the Simple.join method to join where the fuzzymatch percentage is greater than or equal to the given matchpercent. Example:>>> table1 = Table([('Name', unicode), ('Address', unicode)], [['Daniel', '500 West Street'], ['Marge', '200 North Maple']]) >>> table2 = Table([('Name', unicode), ('Address', unicode)], [['Steven', '500 West St.'], ['Denny', '600 Times Avenue']]) >>> results = Simple.fuzzycolmatch(table1, 'Address', table2, 'Address', 0.30) >>> results.view() +--------+-----------------+--------+--------------+ | Name | Address | Name_2 | Add | +--------+-----------------+--------+--------------+ | Daniel | 500 West Street | Steven | 500 West St. | +--------+-----------------+--------+--------------+
|
fuzzymatch(text1, text2, ngramlen=3, ignorecase=True)Calculates a fuzzy match of text1 and text2. This is an adaptation of the Trigram method found at http://www.heise.de/ct/english/97/04/386/ by Reinhard Rapp. The score will always be a decimal between 0.0 and 1.0:
This algorithm is not as effective as more advanced neural-net-based algorithms, but it is simple and fast. It requires very little processing power compared with more advanced algorithms. This method is different than fuzzysearch because it expects text1 and text2 to be about the same length. For example, two last names, two addresses, etc. Example:>>> Simple.fuzzymatch("500 West Street", "500 West St.") 0.69230769230769229
|
fuzzysearch(fulltext, searchtext, ngramlen=3, ignorecase=True)Calculates a fuzzy search to see if searchtext is found in fulltext. This is an adaptation of the Trigram method found at http://www.heise.de/ct/english/97/04/386/ by Reinhard Rapp. The score will always be a decimal between 0.0 and 1.0:
This algorithm is not as effective as more advanced neural-net-based algorithms, but it is simple and fast. It requires very little processing power compared with more advanced algorithms. This method is different than fuzzymatch because it expects a long string for the fulltext variable. For example, searching for a phrase within a large document.
|
get_unordered(table, ascending, *cols)Finds all of the records that are out of order as measured by the values of the given cols. The point of this function is not to determine whether a table needs sorting (that would be more inefficient than simply sorting it to begin with). This function supports audit procedures that look for out-of-order items. Example:>>> table = Table([('col000', int), ('col001', int)], ([5,6], [3,2], [4,4], [4,5], [4,3])) >>> unordered = Simple.get_unordered(table, True, 'col000', 'col001') >>> unordered.view() +----------------+ | Unordered Rows | +----------------+ | 1 | | 4 | +----------------+
|
join(table1, table2, expression)Joins two tables together (similar to a regular SQL inner join) where the expression evalutes True. This function is *much* slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.join(table1, table2, "record1[0] == record2[0]") >>> matches.view() +--------+--------+----------+----------+--------+ | col000 | col001 | col000_2 | col001_2 | col002 | +--------+--------+----------+----------+--------+ | 3 | 2 | 3 | Bailey | 2 | | 3 | 2 | 3 | Sally | 2 | +--------+--------+----------+----------+--------+
|
left_join(table1, table2, expression)Joins two tables together (similar to a regular SQL LEFT join) where the expression evalutes True. All records in the first table are returned, and only matching records in the second table are returned. This function is *much* slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.left_join(table1, table2, "record1[0] == record2[0]") >>> matches.view() +--------+--------+----------+----------+--------+ | col000 | col001 | col000_1 | col001_1 | col002 | +--------+--------+----------+----------+--------+ | 3 | 2 | 3 | Bailey | 2 | | 3 | 2 | 3 | Sally | 2 | | 4 | 5 | <N> | <N> | <N> | +--------+--------+----------+----------+--------+
|
right_join(table1, table2, expression)Joins two tables together (similar to a regular SQL RIGHT join) where the expression evalutes True. All records in the second table are returned, and only matching records in the first table are returned. This function is *much* slower than a Database join. If you are getting data from a database, use the SQL join instead. This method is provided when you need to join tables that were loaded from CSV, etc. Example:>>> table1 = Table([('col000', int), ('col001', int)], ([3,2], [4,5])) >>> table2 = Table([('col000', int), ('col001', unicode), ('col002', int)], ([3,'Bailey',2], [1,'Dan',2], [3,'Sally',2])) >>> matches = Simple.right_join(table1, table2, "record1[0] == record2[0]") >>> matches.view() +--------+--------+--------+----------+----------+ | col000 | col001 | col002 | col000_1 | col001_1 | +--------+--------+--------+----------+----------+ | 3 | Bailey | 2 | 3 | 2 | | 1 | Dan | 2 | <N> | <N> | | 3 | Sally | 2 | 3 | 2 | +--------+--------+--------+----------+----------+
|
select(table, expression)
Selects records from a table based upon a custom expression and returns
a new table including only those records.
The expression should evaluate using "rec" for the current record and
should evaluate to True or False, as in:
"record['id'] < 1000"
The select function is very slow compared with database SELECT statements.
If you have the choice (e.g. if you are using a database data source),
use the SQL SELECT instead of this function. This is useful when data comes
from sources other than SQL, such as CSV/TSV files.
Example:
>>> table = Table([('col000', int), ('col001', int)], ([5,6], [3,2], [4,6]))
>>> results = Simple.select(table, "record['col001'] > 5")
>>> results.view()
+--------+--------+
| col000 | col001 |
+--------+--------+
| 5 | 6 |
| 4 | 6 |
+--------+--------+
@param table: The table records will be selected from
@type table: Table or TableArray
@param expression: An expression that evaluates each record and returns whether each should be included in the results table.
@type expression: str
@return: A new table containing the records for which func evaluated to true (1)
@rtype: Table
|
select_by_value(table, **col_value_pairs)Selects record from a table given key=value pairs and returns a new table including only those records. This method *is* efficient and can be used often. It calculates indices as needed and should select very fast. If you need to do more complex selection, such as where a field is greater than some value, use Simple.select, which allows the use of arbitrary (and possibly powerful) expressions. Example:>>> table = Table([('col000', int), ('col001', int), ('col002', unicode)], [ ... [5,6,'flo'], ... [3,2,'sally'], ... [4,6,'dan'], ... [4,7,'stu' ], ... [4,7,'ben'] ... ]) >>> results = Simple.select_by_value(table, col001=6, col002='dan') >>> results.view() +--------+--------+--------+ | col000 | col001 | col002 | +--------+--------+--------+ | 4 | 6 | dan | +--------+--------+--------+
|
select_nonoutliers(table, col, min=None, max=None)A convenience function to select non-outliers from a table. Outliers are those with column values below the min or above the max. The source table is not modified. This type of filtering could also be done with the Simple.select method, but since filtering of outliers is so common, it is given its own function. Example:>>> table = Table([('col000', int), ('col001', int)], ([8,6], [3,2], [0,4], [4,6])) >>> selected = Simple.select_nonoutliers(table, 'col000', min=2, max=5) >>> selected.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 3 | 2 | | 4 | 6 | +--------+--------+
|
select_nonoutliers_z(table, col, zscore)A convenience function to select non-outliers from a table. Outliers are those with column values with zscores above +zscore or below -zscore. The source table is not modified. This type of filtering could also be done with the Simple.select method, but since filtering of outliers is so common, it is given its own function. Example:>>> table = Table([('col000', int), ('col001', int)], ([8,8], [3,2], [0,4], [4,3])) >>> selected = Simple.select_nonoutliers_z(table, 'col001', 1) >>> selected.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 3 | 2 | | 0 | 4 | | 4 | 3 | +--------+--------+
|
select_outliers(table, col, min=None, max=None)A convenience function to select outliers from a table. Outliers are those with column values below the min or above the max. The source table is not modified. This type of filtering could also be done with the Simple.select method, but since filtering of outliers is so common, it is given its own function. Example:>>> table = Table([('col000', int), ('col001', int)], ([8,6], [3,2], [0,4], [4,6])) >>> selected = Simple.select_outliers(table, 'col000', min=2, max=5) >>> selected.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 8 | 6 | | 0 | 4 | +--------+--------+
|
select_outliers_z(table, col, zscore)A convenience function to select outliers from a table. Outliers are those with column values with zscores above +zscore or below -zscore. The source table is not modified. This type of filtering could also be done with the Simple.select method, but since filtering of outliers is so common, it is given its own function. Example:>>> table = Table([('col000', int), ('col001', int)], ([8,8], [3,2], [0,4], [4,3])) >>> selected = Simple.select_outliers_z(table, 'col001', 1) >>> selected.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 8 | 8 | +--------+--------+
|
select_records(table, record_indices)Selects records from a table given a number of table record indices and returns a new table including only those records. This could obviously be done with a few lines of code, but this function provides the functionality for more readable code. This method *is* efficient and can be used often. Example:>>> table = Table([('col000', int), ('col001', int)], ([5,6], [3,2], [4,6])) >>> results = Simple.select_records(table, [0,2]) >>> results.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 5 | 6 | | 4 | 6 | +--------+--------+
|
sort(table, ascending, *cols)Sorts a table by values in one or more columns. Example:>>> table = Table([('col000', int), ('col001', int)], ([5,6], [3,2], [4,6])) >>> Simple.sort(table, True, 'col000', 'col001') >>> table.view() +--------+--------+ | col000 | col001 | +--------+--------+ | 3 | 2 | | 4 | 6 | | 5 | 6 | +--------+--------+
|
soundex(text, len=4, digits='01230120022455012623010202')Calculates a soundex computation for the given text. Soundex is a standard algorithm for comparing text in a fuzzy way. For example, it sees 'Smith', 'Smoth', and 'Smiith' as the same thing. From a fraud detection perspective, it is extremely useful to match addresses, employee names, vendor names, and other text that may have variations in it. Soundex creates a number out of text. To compare two text values, compute a soundex hash of both values and compare the soundex results. Note that soundex is optimized for English names. If you need to optimize for other languages, search the Internet for an appropriate set of digits. Also note that this method calculates the raw soundex score. It may be more useful to use Simple.soundexcol, which runs the soundex algorithm on an entire column. Credits: Taken from ASPN: implementation 2000-12-24 by Gregory Jorgensen License: Public domain by G.J. Example:>>> Simple.soundex('Smith') 'S530' >>> Simple.soundex('Smoth') 'S530' >>> Simple.soundex('Smithinson', len=6) 'S53525' >>> Simple.soundex('Smithinall', len=6) 'S53540'
|
soundexcol(table, col)Calculates the soundex score for each value in a column and appends the scores as a new column in the table. The new column is named 'column_soundex' (where column is the name of the column). See the Simple.soundex method for information on the soundex algorithm. Example:>>> t = Table([('name', unicode)], [['Samuel'], ['Sally'], ['Max'], ['Maxx']]) >>> Simple.soundexcol(t, 'name') >>> t.view() +--------+------+ | name | hash | +--------+------+ | Samuel | S540 | | Sally | S400 | | Max | M200 | | Maxx | M200 | +--------+------+ >>> # Grouping by the new 'hash' column will quickly place duplicates together in groups.
|
transpose(table)Transposes the table, which means the columns and rows are switched. A transposed (inverted) table is returned. Be careful with transposing large tables -- the new table will have as many columns as the source table has rows. Large tables are often unmanageable because of the large number of columns they produce. Since Picalo column names must conform to a specific format and must be unique, some changes may be made to the values in the transposition process. Since Picalo cannot guess the new column types, all columns are typed as unicode columns. Use table.set_type to set the types after transposition.
|
_join(table1, table2, matches, left_join=False)Internal method to perform joining. If left_join is True, all records in table1 are returned (i.e. SQL LEFT JOIN) with any matching records from the right table. If left_join is False, only the matching records are returned. To perform a right_join, simply reverse table1 and table2. |
| Variable Details |
|---|
__descriptives__
|
__dm__
|
__functions__
|
round_descriptives_to
|
| Home | Trees | Index | Help |
|
|---|
| Generated by Epydoc 2.1 on Mon Aug 20 05:38:16 2007 | http://epydoc.sf.net |