About Picalo

avatar

Welcome to the world of Picalo, a collaborative, open-source effort to produce a data analysis application suitable for auditors, fraud examiners, data miners, and other data analysts.
 
Example uses of Picalo:
  - Analyzing financial data, employee records, and purchasing systems for errors and fraud
  - Importing Excel, XML, EBCDIC, CSV, and TSV files into databases
  - Interactively analyzing network events, web server logs, and system login records
  - Importing email into relational or text-based databases
  - Embedding controls and fraud testing routines into production systems

 

Goal seeking for a total value

    Tuesday, February 16th, 2010No Commented
    Categorized Under: Uncategorized

    I’ve now been asked by two different people if Picalo can add (or subtract) a group of numbers together to see if any combinations equal a goal value.  This is useful when a person comes up with a missing value during an audit.  For example, if $7.07 is left after verifying some numbers, it must be the result of one or more of the values in the table.

    The problem is going through the hundreds of thousands of potential combinations by hand.  Picalo itself doesn’t do this (yet), but here’s a script that makes quick work of it.  Note that the algorithm is *exponential*.  It takes about 1 minute for my laptop to go through 8 numbers (which results in almost 20,000 possible combinations).  If you have hundreds of numbers, prepare to wait a while.

    The script has dummy numbers in it and seeks for a value of $7.07.  It comes up with 5 potential matches.  Readers should easily be able to use this script with their own data.  I’ll include it as a detectlet in future version of Picalo.

    # written by Conan Albrecht
    # to be included in a future version of Picalo
    # released under the LGPL
    # import the PIcalo libraries
    from picalo import *
    # import commonly-needed built-in libraries
    import string, sys, re, random, os, os.path, urllib
    import itertools
    # the goal we want to find
    goal_value = 7.07
    # create our table of numbers
    # this would normally be done by importing the data
    # in other words, delete this part here.  i’m only creating
    # it to make the script self-sufficient.
    nums = Table([("Num", number, "#.00")])
    nums.append(15.50)
    nums.append(42.33)
    nums.append(-33.11)
    nums.append(5.67)
    nums.append(22.31)
    nums.append(-15.23)
    nums.append(15.55)
    nums.append(14.88)
    nums.append(-7.33)
    # create a table to store the results in
    results = Table([
    ('Formula', str),
    ('Answer', number),
    ('DiffFromGoal', number),
    ])
    try:
    comb = {}  # use a dict for the has_key below (to remove duplicates efficiently)
    pos_neg = [ [ abs(n), -1*abs(n) ] for n in nums.column(’Num’) ]
    prod = list(itertools.product(*pos_neg))
    combmap = {}
    for pi, p in enumerate(prod):
    show_progress(’Finding all possible combinations…’, float(pi) / float(len(prod)))
    for i in range(1, len(p)+1):
    for c in itertools.combinations(p, i):
    if not comb.has_key(c):
    comb[c] = None
    # go through and fill out the results table
    for i, c in enumerate(comb.keys()):
    show_progress(’Checking combinations…’, float(i) / float(len(comb)))
    formula = ‘ + ‘.join([ '(%s)' % num for num in c])
    total = sum(c)
    results.append(
    formula,
    total,
    abs(goal_value – total)
    )
    # sort the results
    show_progress(’Sorting…’, .99)
    Simple.sort(results, True, ‘DiffFromGoal’)
    finally:
    clear_progress()
    1. # written by Conan Albrecht
    2.  
    3. # to be included in a future version of Picalo
    4.  
    5. # released under the LGPL
    6.  
    7. # import the PIcalo libraries
    8.  
    9. from picalo import *
    10.  
    11. # import commonly-needed built-in libraries
    12.  
    13. import string, sys, re, random, os, os.path, urllib
    14.  
    15. import itertools
    16.  
    17. # the goal we want to find
    18.  
    19. goal_value = 7.07
    20.  
    21. # create our table of numbers
    22.  
    23. # this would normally be done by importing the data
    24.  
    25. # in other words, delete this part here.  i'm only creating
    26.  
    27. # it to make the script self-sufficient.
    28.  
    29. nums = Table([("Num", number, "#.00")])
    30.  
    31. nums.append(15.50)
    32.  
    33. nums.append(42.33)
    34.  
    35. nums.append(-33.11)
    36.  
    37. nums.append(5.67)
    38.  
    39. nums.append(22.31)
    40.  
    41. nums.append(-15.23)
    42.  
    43. nums.append(15.55)
    44.  
    45. nums.append(14.88)
    46.  
    47. nums.append(-7.33)
    48.  
    49. # create a table to store the results in
    50.  
    51. results = Table([
    52.  
    53. ('Formula', str),
    54.  
    55. ('Answer', number),
    56.  
    57. ('DiffFromGoal', number),
    58.  
    59. ])
    60.  
    61. try:
    62.  
    63. comb = {}  # use a dict for the has_key below (to remove duplicates efficiently)
    64.  
    65. pos_neg = [ [ abs(n), -1*abs(n) ] for n in nums.column('Num') ]
    66.  
    67. prod = list(itertools.product(*pos_neg))
    68.  
    69. combmap = {}
    70.  
    71. for pi, p in enumerate(prod):
    72.  
    73. show_progress('Finding all possible combinations...', float(pi) / float(len(prod)))
    74.  
    75. for i in range(1, len(p)+1):
    76.  
    77. for c in itertools.combinations(p, i):
    78.  
    79. if not comb.has_key(c):
    80.  
    81. comb[c] = None
    82.  
    83. # go through and fill out the results table
    84.  
    85. for i, c in enumerate(comb.keys()):
    86.  
    87. show_progress('Checking combinations...', float(i) / float(len(comb)))
    88.  
    89. formula = ' + '.join([ '(%s)' % num for num in c])
    90.  
    91. total = sum(c)
    92.  
    93. results.append(
    94.  
    95. formula,
    96.  
    97. total,
    98.  
    99. abs(goal_value - total)
    100.  
    101. )
    102.  
    103. # sort the results
    104.  
    105. show_progress('Sorting...', .99)
    106.  
    107. Simple.sort(results, True, 'DiffFromGoal')
    108.  
    109. finally:
    110.  
    111. clear_progress()

    Version 4.39

      Friday, January 22nd, 20102 Commented
      Categorized Under: Uncategorized

      I posted version 4.39 today.  It’s got a lot of important changes in it.  Most importantly, I’m finally happy with the way field formatting is done.  Date and number formats are a mess inherently — so many formats, so many ways to represent things.  They affect how Picalo imports data from CSV and other formats, and they affect how values come from databases.  I’ve introduced a format specification into Picalo that should clear up the mess without breaking existing tables.

      There are lots of other changes as well.  Those interested can read the README file.  Enjoy.

      The Workbook is Here!

        Wednesday, December 2nd, 2009one Commented
        Categorized Under: Uncategorized

        After many hours and sore wrists, I’ve finished the Picalo Workbook.  It’s a hands-on introduction to Picalo.  It’s got exercises, tasks, and sample datasets.  I’m hoping it will be a great introduction to Picalo for new users.

        The workbook is the result of some training I did for the a client two weeks ago.  It wanted exercises for the group to go through.  After the training, I took the exercises and sample data sets and formally put them into the workbook.  Note that the datasets were entirely generated and have no relation to the client.

        The workbook is still missing a few items.  I also found some bugs in Picalo while working through it.  I also need to edit the text to ensure grammar is correct.  But for now, it’s 95 percent there and is up on the web site in the documentation section.

        P.S. For those users waiting for PyTables integration, it’s still coming.  I want to crank through these bugs I found, then I’ll get back to work on it.  I’m hoping the Christmas break will give me some time to get it pushed forward quite a bit.

        Status of PyTables

          Saturday, October 24th, 2009No Commented
          Categorized Under: News

          I’ve been working on an agent framework for some agent research I’m doing.  It’s taken all of my time to understand and implement the agents and their messages.  As soon as I finish with it, I’ll start the implementation of PyTables.  I have finished the investigation into how PyTables would integrate into Picalo, and it should go off without a hitch.  I won’t know fully until I get into it.  But assuming it works (and I have every indication that it will), it will be totally seamless to existing scripts, including the entire Picalo application.

          What is PyTables? It’s a Python library for holding data. It’s built upon the widely-used numarray framework.  Both projects (PyTables and numarray) are written in C, so they’ll speed up Picalo’s data structures considerably.  PyTables includes a disk-caching system, so Picalo should be able to load any size table without killing memory or the processor.  Overall, this integration should bring Picalo into the “full production” arena.

          I’m excited about doing it and will jump on the implementation of it ASAP.

          New web site

            Saturday, October 24th, 20093 Commented
            Categorized Under: News

            Welcome to the new web site.  I’ve decided to use a more blog-ish style web site so I can post about new releases, progress on new features, or user experiences that people email to me.