About Picalo

avatar

Welcome to the world of Picalo, a collaborative, open-source effort to produce a data analysis application suitable for auditors, fraud examiners, data miners, and other data analysts.
 
Example uses of Picalo:
  - Analyzing financial data, employee records, and purchasing systems for errors and fraud
  - Importing Excel, XML, EBCDIC, CSV, and TSV files into databases
  - Interactively analyzing network events, web server logs, and system login records
  - Importing email into relational or text-based databases
  - Embedding controls and fraud testing routines into production systems

 

PyTables

    Wednesday, July 28th, 20103 Commented
    Categorized Under: Uncategorized

    I have not finished the work on integrating PyTables into Picalo.  We moved this summer and have been homeless for a few months while we build a home.  It’s been a difficult summer to work on my OSS projects.  I am still excited about the prospects of what PyTables can bring to Picalo, so please stay tuned!

    Goal seeking for a total value

      Tuesday, February 16th, 2010No Commented
      Categorized Under: Uncategorized

      I’ve now been asked by two different people if Picalo can add (or subtract) a group of numbers together to see if any combinations equal a goal value.  This is useful when a person comes up with a missing value during an audit.  For example, if $7.07 is left after verifying some numbers, it must be the result of one or more of the values in the table.

      The problem is going through the hundreds of thousands of potential combinations by hand.  Picalo itself doesn’t do this (yet), but here’s a script that makes quick work of it.  Note that the algorithm is *exponential*.  It takes about 1 minute for my laptop to go through 8 numbers (which results in almost 20,000 possible combinations).  If you have hundreds of numbers, prepare to wait a while.

      The script has dummy numbers in it and seeks for a value of $7.07.  It comes up with 5 potential matches.  Readers should easily be able to use this script with their own data.  I’ll include it as a detectlet in future version of Picalo.

      # written by Conan Albrecht
      # to be included in a future version of Picalo
      # released under the LGPL
      # import the PIcalo libraries
      from picalo import *
      # import commonly-needed built-in libraries
      import string, sys, re, random, os, os.path, urllib
      import itertools
      # the goal we want to find
      goal_value = 7.07
      # create our table of numbers
      # this would normally be done by importing the data
      # in other words, delete this part here.  i’m only creating
      # it to make the script self-sufficient.
      nums = Table([("Num", number, "#.00")])
      nums.append(15.50)
      nums.append(42.33)
      nums.append(-33.11)
      nums.append(5.67)
      nums.append(22.31)
      nums.append(-15.23)
      nums.append(15.55)
      nums.append(14.88)
      nums.append(-7.33)
      # create a table to store the results in
      results = Table([
      ('Formula', str),
      ('Answer', number),
      ('DiffFromGoal', number),
      ])
      try:
      comb = {}  # use a dict for the has_key below (to remove duplicates efficiently)
      pos_neg = [ [ abs(n), -1*abs(n) ] for n in nums.column(’Num’) ]
      prod = list(itertools.product(*pos_neg))
      combmap = {}
      for pi, p in enumerate(prod):
      show_progress(’Finding all possible combinations…’, float(pi) / float(len(prod)))
      for i in range(1, len(p)+1):
      for c in itertools.combinations(p, i):
      if not comb.has_key(c):
      comb[c] = None
      # go through and fill out the results table
      for i, c in enumerate(comb.keys()):
      show_progress(’Checking combinations…’, float(i) / float(len(comb)))
      formula = ‘ + ‘.join([ '(%s)' % num for num in c])
      total = sum(c)
      results.append(
      formula,
      total,
      abs(goal_value – total)
      )
      # sort the results
      show_progress(’Sorting…’, .99)
      Simple.sort(results, True, ‘DiffFromGoal’)
      finally:
      clear_progress()
      1. # written by Conan Albrecht
      2.  
      3. # to be included in a future version of Picalo
      4.  
      5. # released under the LGPL
      6.  
      7. # import the PIcalo libraries
      8.  
      9. from picalo import *
      10.  
      11. # import commonly-needed built-in libraries
      12.  
      13. import string, sys, re, random, os, os.path, urllib
      14.  
      15. import itertools
      16.  
      17. # the goal we want to find
      18.  
      19. goal_value = 7.07
      20.  
      21. # create our table of numbers
      22.  
      23. # this would normally be done by importing the data
      24.  
      25. # in other words, delete this part here.  i'm only creating
      26.  
      27. # it to make the script self-sufficient.
      28.  
      29. nums = Table([("Num", number, "#.00")])
      30.  
      31. nums.append(15.50)
      32.  
      33. nums.append(42.33)
      34.  
      35. nums.append(-33.11)
      36.  
      37. nums.append(5.67)
      38.  
      39. nums.append(22.31)
      40.  
      41. nums.append(-15.23)
      42.  
      43. nums.append(15.55)
      44.  
      45. nums.append(14.88)
      46.  
      47. nums.append(-7.33)
      48.  
      49. # create a table to store the results in
      50.  
      51. results = Table([
      52.  
      53. ('Formula', str),
      54.  
      55. ('Answer', number),
      56.  
      57. ('DiffFromGoal', number),
      58.  
      59. ])
      60.  
      61. try:
      62.  
      63. comb = {}  # use a dict for the has_key below (to remove duplicates efficiently)
      64.  
      65. pos_neg = [ [ abs(n), -1*abs(n) ] for n in nums.column('Num') ]
      66.  
      67. prod = list(itertools.product(*pos_neg))
      68.  
      69. combmap = {}
      70.  
      71. for pi, p in enumerate(prod):
      72.  
      73. show_progress('Finding all possible combinations...', float(pi) / float(len(prod)))
      74.  
      75. for i in range(1, len(p)+1):
      76.  
      77. for c in itertools.combinations(p, i):
      78.  
      79. if not comb.has_key(c):
      80.  
      81. comb[c] = None
      82.  
      83. # go through and fill out the results table
      84.  
      85. for i, c in enumerate(comb.keys()):
      86.  
      87. show_progress('Checking combinations...', float(i) / float(len(comb)))
      88.  
      89. formula = ' + '.join([ '(%s)' % num for num in c])
      90.  
      91. total = sum(c)
      92.  
      93. results.append(
      94.  
      95. formula,
      96.  
      97. total,
      98.  
      99. abs(goal_value - total)
      100.  
      101. )
      102.  
      103. # sort the results
      104.  
      105. show_progress('Sorting...', .99)
      106.  
      107. Simple.sort(results, True, 'DiffFromGoal')
      108.  
      109. finally:
      110.  
      111. clear_progress()

      Version 4.39

        Friday, January 22nd, 20102 Commented
        Categorized Under: Uncategorized

        I posted version 4.39 today.  It’s got a lot of important changes in it.  Most importantly, I’m finally happy with the way field formatting is done.  Date and number formats are a mess inherently — so many formats, so many ways to represent things.  They affect how Picalo imports data from CSV and other formats, and they affect how values come from databases.  I’ve introduced a format specification into Picalo that should clear up the mess without breaking existing tables.

        There are lots of other changes as well.  Those interested can read the README file.  Enjoy.

        The Workbook is Here!

          Wednesday, December 2nd, 2009one Commented
          Categorized Under: Uncategorized

          After many hours and sore wrists, I’ve finished the Picalo Workbook.  It’s a hands-on introduction to Picalo.  It’s got exercises, tasks, and sample datasets.  I’m hoping it will be a great introduction to Picalo for new users.

          The workbook is the result of some training I did for the a client two weeks ago.  It wanted exercises for the group to go through.  After the training, I took the exercises and sample data sets and formally put them into the workbook.  Note that the datasets were entirely generated and have no relation to the client.

          The workbook is still missing a few items.  I also found some bugs in Picalo while working through it.  I also need to edit the text to ensure grammar is correct.  But for now, it’s 95 percent there and is up on the web site in the documentation section.

          P.S. For those users waiting for PyTables integration, it’s still coming.  I want to crank through these bugs I found, then I’ll get back to work on it.  I’m hoping the Christmas break will give me some time to get it pushed forward quite a bit.

          Status of PyTables

            Saturday, October 24th, 20093 Commented
            Categorized Under: News

            I’ve been working on an agent framework for some agent research I’m doing.  It’s taken all of my time to understand and implement the agents and their messages.  As soon as I finish with it, I’ll start the implementation of PyTables.  I have finished the investigation into how PyTables would integrate into Picalo, and it should go off without a hitch.  I won’t know fully until I get into it.  But assuming it works (and I have every indication that it will), it will be totally seamless to existing scripts, including the entire Picalo application.

            What is PyTables? It’s a Python library for holding data. It’s built upon the widely-used numarray framework.  Both projects (PyTables and numarray) are written in C, so they’ll speed up Picalo’s data structures considerably.  PyTables includes a disk-caching system, so Picalo should be able to load any size table without killing memory or the processor.  Overall, this integration should bring Picalo into the “full production” arena.

            I’m excited about doing it and will jump on the implementation of it ASAP.

            New web site

              Saturday, October 24th, 20093 Commented
              Categorized Under: News

              Welcome to the new web site.  I’ve decided to use a more blog-ish style web site so I can post about new releases, progress on new features, or user experiences that people email to me.