FAQ

 

Why Should I Use Picalo?

Why not use “your favorite app here” to do analyses? Why use Picalo instead? Picalo may or may not be the right choice for what you are doing, depending upon your goals. The following are features of Picalo that seem somewhat unique:

  • Picalo is an open framework. Users can either use the built-in routines or write their own. Those who write their own can share their routines with others in the Picalo community. The goal is to create a large set of analysis routines that meet many different needs–on a scale that a single company could never do.
  • Picalo’s Detectlet framework allows those with scripting interest and ability to create wizard-based routines for others in their organizations. This helps bridge the gap between power users and others.
  • The philosophy of Picalo is to help users learn how to script. Data analysts who know basic scripting routines (for loops, for example), are more efficient and effective than those who do not–usually by several orders of magnitude. Picalo shows you the script code for everything you do in the GUI, and it includes a function composer to help you create function calls.
  • Picalo includes advanced analysis routines not found in competing products. For example, it supports grouping by a number of days for analysis of labor and time card data. Picalo can also automatically group records to achieve a specified degree of smoothness in data.
  • Picalo’s scripting is based in Python, a powerful and easy-to-learn language. Rather than creating its own language (like competing packages do), Picalo rises on the shoulders of an extremely well-done language. You can download any of thousands of Python libraries from the Internet to use in your analyses.
  • Picalo runs on Windows, Mac OS X, Linux, and many other systems. Most competing data analysis applications run only on Windows.

Where can I get help?

As an open source project, we are very helpful to new users. While you won’t find a phone number to call (since there isn’t a company behind this project), users are generally very helpful to one another. You might just find that the help you get from other users is much better than “traditional” technical support. To start, sign up for our email list and post a question.

What is the Picalo philosophy?

The Picalo community believes that data analysis is best done through scripting. Everything in Picalo is designed to help you learn basic scripting techniques so you can become more productive. Picalo certainly includes a powerful graphical user interface, but the focus of the toolkit is to help you write powerful, 10- to 20-line scripts.

Picalo is based in open source principles. This doesn’t mean the designers can’t make money with Picalo, it just means that the software code is open for others to fix bugs, code review, and improve upon. Profits should be made in using the software (on the job or in consulting practice) rather than in selling the software.

Can I trust Picalo? What about quality control?

In short, you can trust Picalo as much as any other analysis application. See the quality control section for more information on this important matter.

What is Picalo’s relationship to ACL and IDEA?

Picalo is a competitor. ACL and IDEA are two of the most popular data analysis applications used in corporate data analysis. Each has unique features and abilities. The scripting ability of each is great. However, both applications are primarily audit applications rather than general data analysis applications. Picalo contains routines that can be used in many fields, including auditing, fraud detection, and other areas.
Picalo’s Detectlet idea and it’s levels differentiate it from ACL and IDEA. The end-goal of Picalo is to create a worldwide repository of Detectlet’s that do thousands of different anlayses.

One of the primary differences between these applications and Picalo is the latter is open source. Users can help solve bugs, contribute new modules, and do analyses not possible in closed-source software.

Finally, Picalo is built upon Python, a full-featured, mature programming language. While ACL and IDEA define their own languages, Python has existed for over 10 years. Modules to do all sorts of things have been contributed by programmers around the world. For example, the regular expression module provides powerful text searching not found in many off-the-shelf products — since it is part of the core Python language, you can be sure it is well tested, efficient, and mature. Python is not only simple, but it is also extremely powerful.

Why not use MS Excel?

Microsoft Excel (or, insert your favorite spreadsheet here) has become a powerful, mature application for number analysis. Spreadsheets are widely known and used, and they are visual in their analysis. However, spreadsheets are best suited for ad-hoc analysis rather than formal database-oriented analysis. For example, Excel is an excellent choice for tracking your investments or calculating a home mortgage schedule. It is less suitable for querying, stratifying, summarizing, joining, matching, trending–routines Picalo specializes in.
Picalo is meant to work with data retrieved from large databases; Excel is meant to work primarily with small sets of numbers in free-form. While Excel can only handle about 65,000 records, Picalo can handle millions upon millions of records (limited only by available memory in your computer).

A simple example illustrates this purpose (this example comes was first given in the sed and awk book published by O’Reilly). Most home owners don’t own expensive woodworking equipment, such as saws, routers, and so forth. Instead, they use smaller, nonindustrial tools for their weekend jobs like fixing their fence. However, professional carpenters do own expensive equipment. They are willing to invest time and money in industrial tools to do their daily work with. Excel is a general purpose tool meant for the weekend analyst. Picalo is meant for the professional. It’s startup cost is higher, but it provides economies of scale not possible with ad-hoc programs like Excel.

How does Picalo compare with EnCase or The Forensic Toolkit?

EnCase and TFK are really different products with different purposes. They are useful to inspect a computer, gather data, and document evidence gathering. Picalo is a data analysis package. It assumes you’ve already gathered the data into a data source and need to combine it in different ways to generate useful information.
What is Picalo’s relationship to Numerical Python/Numarray/SciPy?

The two are different, with different goals and different communities. Numerical Python is geared toward the scientific community whereas Picalo is geared towards the corporate data community. NumPy specializes in array storage and representation, and its functions focus on array manipulation, math, and so forth. Most NumPy routines assume you are working with large matrices of numerical data. The matrices do not normally contain empty cells, and the functions are focused on scientific applications.
Picalo specializes in database connectivity, representation of data in tables, and data analysis routines. While both arrays and tables are similar, the focus of the two projects (and resulting data structures and functions) are quite different. Picalo works with database-type data, such as text, addresses, salaries, and so forth.

What is Picalo’s relationship to Python?

Picalo is built in Python the same way a book is built in English or some other language. More technically, Picalo is a set of modules, functions, and routines built on top of Python. A casual reviewer of Picalo may not see the power of extending the Python language. Python alone is a very powerful data and text analysis platform — Picalo adds the Table type and many data analysis functions to the core Python language.
Because Picalo is built in a professional language, anything you can do in Python you can do in Picalo (this turns out to be a lot!). There are thousands upon thousands of Python routines available for free download from the Internet.

Why Python (instead of VB, Perl, Java, .Net)?

Because it’s better, of course! :)

Why open source?

This question actually has several answers. The surface level answer is because competing data analysis applications charge thousands of dollars per year for software that is not terribly difficult to write.
Another answer is significant money can be made using the software (on the job or as a consultant) rather than selling the software. Instead of paying a software tax each year, why not create a platform that does exactly what the community needs? Thousands of programmers from around the world can contribute thousands of data analysis routines that all can use. There’s plenty of money to be made in the use of the software in our work. Why not collaborate on our tools?

Consider why businesses create software: to make money. They do things because there is a business case. Most business decisions are made based upon the return (i.e. money) they will bring. In a perfect world, software decisions should be made because of technical reasons, not business reasons. Normally, business and technical motivations are in line: good software usually sells well. But this is not always the case. Too many software packages have been ruined over the years because companies gave in to the almighty dollar/euro/etc. Software bloat, unstable software, and market–driven release dates are well know problems in the software development world.

Open source is different. Decisions are made for their technical merits rather than for their monetary return. For example, most open source projects release new product versions when the products are tested and ready for use. Most projects don’t even give a release date — they simply say, “it will be released when it’s ready”. Contrast this with commercial software packages. Release schedules are usually driven by marketing reasons (when the competition will be releasing, when the market is ready, etc.). The result is often programmers are pushed by marketers and executives to push products out before they are tested and ready.

Please note that this philosophy is not an argument against free market economies. Competition is a good thing. Choice is a good thing. It is simply an argument for competition in the use of the software (consulting, etc.) rather than competition in the tool development. Creating software is different than the creation of most products, for example constructing automobiles. The former can be done with almost no investment up front, has almost no distribution costs, and benefits from collaborative development. The latter requires considerable investment to produce even one product and has significant distribution costs. Software is a different type of product and should be treated as such.

Another example is the feature creep seen in many commercial products. In order to continue making money, companies have to release new versions every two years or so. Once a product matures, companies almost have to invent needs their new features solve! Consider Microsoft Word — basic word processing hasn’t changed much since the beginning of GUI applications. However, Microsoft continues to upgrade Word with features like the infamous “clippy” to give their users reasons to upgrade, even though most users still only need the basic features. If a company declared their product “completed”, their money stream would quickly dry up.

Finally, Picalo is open source because it draws upon many different products, such as Python, mx.DateTime, the Gadfly database, and so forth. It is one more contribution to an ever-growing selection of excellent, free software.

Where does the name Picalo come from?

(Personal note from Conan Albrecht, who named Picalo) I do a lot of programming at home. On the day I started Picalo, my two oldest girls (six and four at the time) were dancing around my home office singing the jingle, “Daddy, Daddy, there you go, let us see you piccolo (i.e. dance)!” At this point in the song, I had to get up and dance for the next minute or so. I thought it was cute, so I searched the Internet for variations on the spelling of the word. I was happy to find that Picalo (and its variants) didn’t have any negative connotations and were not widely used. I finally settled on the “picalo” spelling. The term has no relation to the piccolo musical instrument or other variations and uses of the word.

How many records can Picalo hold?

Picalo has no explicitly-defined limitations on the number of records per table, the number of columns per table, or the total number of tables you can have. Therefore, its primarily limitation is the amount of memory you have in your computer. Currently, Picalo holds all loaded tables in memory. When we move the Table object to C-based code, we’ll program a disk-caching mechanism.

What are Picalo’s Limitations?

  • Picalo is relatively new software. It has several limitations, a few of which are listed here:
  • Picalo may still have bugs, especially in the GUI, that have not been found by the users. You should always double check your analyses, print control totals, and list intermediate results to ensure that your analyses run the way you expect. (Although you should do this regardless of the software package.)
  • Picalo is geared towards scripting. Everything in the GUI is meant to help you learn basic programming principles and structures. If you aren’t interested in this, another package might be better for you.
  • Picalo has no warranty, support, or guarantee. It is released with the hopes that it will be useful, but you are responsible for anything it does to your computer, your data, or anything else. Although there is no formal support, open source projects like Picalo have shown that user support for one another is usually superior than company support.
  • Unless you write scripts to use database results intelligently, Picalo loads all data into memory for analysis. Your computer’s memory may limit the number of records you can analyze. In practical use, Picalo has been shown to analyze millions of records easily, so this limit is quite high.