Halculon / Units

Units, Measures, and Physical Constants Database

Overview

A number of Haskell libraries can represent numerical values with physical dimensions (m, s, g, in combination, etc.) that are checked at runtime and/or compile time (including dimensional and the Numeric Prelude), but neither provide an exhaustive, searchable, annotated database of units, measures, and physical constants. Halculon ( http://www.updike.org/articles/Units ) is an interactively searchable unit database (with example Haskell web application: http://www.updike.org/halculon/ ) based on the units database created by Alan Eliasen for the wonderful physical units programming language Frink ( http://futureboy.homeip.net/frinkdocs/ ).

The goal of Halculon (at this point) is to provide a complete, user- and developer- friendly units database (along with my carefully tuned search string database that makes interactive use much more pleasant). Halculon interprets the units.txt database of Frink, and with a bit of tuning, makes all 4250 units available as a searchable, randomly accessible database (online and downloadable as UTF-8 text). (Because each unit in Frink’s unit.txt database is defined in terms of more basic unit definitions (a very elegant approach in general), units.txt is inconvenient for looking up a single unit at a time; the entire file must be parsed to represent each unit or constant solely in terms of the base SI units, which is precisely what Halculon provides, statically.)

Update

Mobile (iPhone) edition

The example web application now has a mobile version available at:

http://www.updike.org/halcmobile/

(tested in iPhone OS 3.1, Safari 3.0, and Firefox 2.0) For best results, Add to Home Screen to use the application in full screen. The calculator works completely offline, too.

Future direction

for the sample web app / calculator:

  • utilize MPFR’s arbitary precision floats to bring greater range to Real calculations, in line with those for Integers and Rationals (built in to Haskell).

Longer term goals:

Online Demo

The Database

The actual Tables:

  • AllSearches. Fields: (query_string, space_separated_string_list_of_unit_numbers)
    • about 61,000 query strings (rows)
  • AllUnits. Fields: (unit_number, unit_name_string, unit_value_string, hint_string)
    • 4,250 units (rows)

Both tables available as

Tables as a CGI web service

GET (url) or POST (everything after ? as in GET)

The CGI web services at

  • http://www.updike.org/units/allunits.py
  • http://www.updike.org/units/allsearches.py

take a URL query string q=blah&format=blah, where format is one of

  • txt ⇒ raw text, mimetype text/plain, fields separated by newlines
  • xml ⇒ XML result, mimetype text/xml, fields in structured xml
  • json ⇒ JSON syntax result, mimetype text/xml, results in JSON list object
  • html ⇒ JSON syntax result, mimetype text/html, results in JSON list object

Example:

http://www.updike.org/units/allsearches.py?format=txt&q=km

yields

2152
2230
3397
354

Taking the first search result, 2152:

http://www.updike.org/units/allunits.py?q=2152&format=txt

yields

Succuess=true
UnitNumber=2152
UnitName=km
UnitValue=1000 * 1 m
Hint=kilometer (length)

Or search the units table by unit name:

http://www.updike.org/units/allunits.py?q=mutchkin&format=txt

yields:

Succuess=true
UnitNumber=1887
UnitName=mutchkin
UnitValue=1077125000000000 / 2542667873083873281  m^3
Hint=1/2 choppin (volume)

The UnitNumbers have no significance except to uniquely identify and order the units in the database and to keep the search database small.

My Work

Frink

Frink’s unit database is supplied as a text file in Frink syntax that is parsed by the Frink runtime system and made available through an interactive Java applet and a few other HTTP based web applications. (Compiled jars of Frink are also available for command-line, web server-based, and general programming use, etc.) As each unit in Frink’s unit.txt database is defined in terms of more basic units (very elegant approach, indeed), units.txt is insufficient for looking up a single unit at a time (solely in terms of the base SI units kg, m, s, etc.). In addition, Frink’s lookup mechanism, though very liberal in what it accepts (in, inch, inches, inchs, kinch, kiloinch, kilokilogram) is not quite conducive to browsing or searching. Though the units.txt text file itself is a a treasure trove of interesting facts, some of the value is lost to the end user. For example, a mutchkin (a measure of volume) is defined as a half of a choppin, which is defined as a half of a scotspint, which is about half of an American gallon (not to be confused with a brgallon). Unless you knew to look up both mutchkin and choppin, you would not know their relationship. The Hint part of my search database provides this information, enabling much more enlightening browsing and discovery.

I’m really impressed with Alan Eliasen’s calculating tool and programming language Frink (Applet). My main problem is the use of Java: it takes a lot of memory and a long time to load, so that I can’t remember what it was I wanted to calculate when it finally has started. (The web version is not interactive or “Web 2.0” enough for me either.)

Desire for standalone units DB

I implemented the core unit functionality of Frink and parsed (most of) his entire unit database file. I have a working calculator that does most of what I need and is interactive enough but I want to separate off the unit database for a number of reasons.

  1. Right now, every calculator query requires the CGI script to parse the entire unit database file. Bad. (My web host does not allow persistent processes but is incredibly affordable and powerful.) A MySQL database should have a faster startup time, only looking up relevant units as needed for a given query, or none if none are needed!
  2. I want to build an entirely different calculator using symbolic algebra, where the units are only incidental to the design of the thing. (If you have general symbolic algebra, you get a unit calculator for free; the base units are automatically kept orthogonal for you. Just make a low precedence division operator ->, a la Frink, and you have syntactically simple unit conversion.)
  3. Separating the units out into something entirely useful on its own is the Unix way, right? “Do one thing and do it well.” Other developers can reuse the databases in any sort of mashup, online or offline, on any device, platform, with any programming language.

Slightly more organized (hacked) units DB

I made extensive modifications to the DB text file and shuffled things up and finessed everything (lots of Python scripts) until I got a usable explicit list of units good enough to demo as a sexy autocomplete search database. (See also the complete calculator.)

A fine-tuned, discoverable, browsable, explorable, interactively searchable database is not something Frink provides, relying more on generalizable principles (always a good thing, in general) rather than practicality in terms of unit name lookup/query (e.g. allowing plural with s—with an autocomplete mechanism, this is unnecessary and redundant).

I hope I added a bit of value by reordering prefixes in the database, and only assigning prefixes to appropriate units: metric on metric, short metric on short metric, kilobits and kibibytes but not kilokilogram or kibimeters, etc. I tuned the order in which search results appear (especially at the beginning and end of words, and with respect to case-sensitivity). It has to be seen to be appreciated.

Autocomplete Unit Search Demo

To demonstrate the services, I made

using AJAX/JSON for search query lookup, and YUI Autocomplete for the drop-down box. (Before, the whole autocomplete DB was in a giant 2MB javascript file. Dynamically fetching the search queries is much more bandwidth and startup-time effective.)

Currently, using the query string, autocomplete looks up the units by number and populates the drop down with unit_name_string = hint_string. The hints may be definitions (like what mm stands for) and/or the type of unit it is (mass, velocity, etc.)

TODO: _units

Finally, I need to make another entire web app and/or text file set/tables with underscores prefixing all units to free up all the short variable names, like m, etc. instead using _m, etc. Autocomplete can type the underscore for you (or perhaps, underscore can trigger autocomplete when units are needed).