Sunday, July 6, 2008

Wordpress Blog

I'm experimenting with a Wordpress blog. So far the key advantage is that it does not impose the black boxes on my images.

Friday, July 4, 2008

Publishing Latex to the Web

It's gross.

With that in mind, I'll say a few words about how I got somewhat nice-looking mathematical notation into blogspot a few days ago. I originally wrote that entry in TexShop and put in on my website as a PDF file. While this shares it with my fans, there are some disadvantages to PDfs as opposed to HTML in a blog entry:

* As a blog entry, it is intregrated into my blog and it benefits from the search engine crawling, incoming links, tags and appearing with the rest of my recent writings.

* PDFs are less accessible than HTML. It's often more work for the reader to download it if your browser doesn't display PDFs inline and it can be harder to search or cut and paste.

Let me mention two of my workflow principles:

* There must be one master version of a document. All edits should be made to this copy, and changes are propagated to all other formats and versions as automatically as possible.

* The master version should be the format where is the easiest and most natural to express oneself. For example, LaTeX makes it many times easier to create equations such as those appearing in my most recent posting mentioned above.

It was clear I had to do the editing in LaTeX, and develop a way to publish to Blogspot. Below is the Bash script I came up with. It has very rough edges. I believe in programming with a focus on a particular problem. As I use the programs in this entry in the future, they will doubtlessly improve greatly. In the meantime, any comments are welcome. Be warned that Blogspot's narrow text column may be wrapping some of the lines.


name=`cut -f1 -d'.' <<< $tex_file`
echo Operating on $name...
rm -f -- $name-trimmed.html.txt temp.html
latex2html -no_navigation -auto_prefix -split 0 -info "" -noaddress $tex_file
xmllint --format --html $name/index.html >
mv -f $name/index.html
echo $arg
echo perl -pi -e $arg $name/index.html >>
chmod 755
./ $name/index.html >> $name-trimmed.html.txt
rsync -vaz $name/ jonathan@my_servers_ip:/var/www/jonathan/$name

Let's step through the lines one by one.


This tells the shell to use bash to run the script.

name=`cut -f1 -d'.' <<< $tex_file`
echo Operating on $name...

This segment takes in a variable tex_file that holds the name of the Latex file to convert. It then trims off the extension and saves it as the name of the project.

rm -f -- $name-trimmed.html.txt temp.html
This clears out the temporary files we'll use later.

latex2html -no_navigation -auto_prefix -split 0 -info "" -noaddress $tex_file
This is the most important part. There is actually a pre-existing program called latex2html that does just what it suggests. It doesn't do a perfect job, and blogspot has a number of idiosyncrasies, most of the rest of the script is accommodating the personality of this wonderful (seriously) program.

xmllint --format --html $name/index.html >
mv -f $name/index.html

xmllint is an excellent program that cleans up XML and HTML. I use it here to make the HTML much nicer to look at and process later.

echo $arg
echo perl -pi -e $arg $name/index.html >>
chmod 755

One of Blogspot's weaknesses (to me) is image uploading. It seems that you can only upload five at a time, and even without that restriction, I wouldn't put up with the hassle of uplaoding the dozens of images needed for a blog entry generated from Latex. This is especially true because many of them could change as the source document is corrected or refined. Instead, I host the images on my personal site and change the links produced by latex2html to use these new URLs. $name/index.html >> $name-trimmed.html.txt
Much of the HTML tags produced by Latex2html are redundant or interact poorly with Blogspot. This is a program I wrote to strip most HTML tags, which can be found at the end of the file. I originally set out to use Perl's HTML::Parser module, but this turned out to be easier to do myself. This leaves a file that can be opened in any text editor and copied into the Blogspot editing interface. For my last entry, it was barrels-of-gold-trimmed.html.txt.

rsync -vaz $name/ jonathan@my_servers_ip:/var/www/jonathan/$name
Finally, push all the images produced by latex2html up to my server.

Any comments or ways to improve are welcome (I am sure there are many). In particular, if someone knows a way to better clean up the spacing and get rid of the borders on the images that Blogspot's CSS adds, that would be helpful.


#!/usr/bin/perl -w

use strict;
use HTML::Parser;
use Data::Dumper;

my $file = shift;
my $okay_tags = {
    'img' => 1, 'table' => 1,
    'td' => 1, 'i' => 1,
    'tr' => 1, 'div' => 1,
    'a' => 1, 'h2' => 1,
my $no_content_tags = {
    'title' => 1, 'h1' => 1,

open(IN, "<$file") or die "Couldn't open $file: $!";
undef $/;
my $data = <IN>;
$data =~ s/<!--[^>]*>//gs;
my $lines = [ split('<', $data) ];
my $result = '';
foreach my $line (@$lines) {
    $line = '<' . $line;
    if ($line =~ /^<$/) {next;}
    if ($line =~ /^<!/) {next;}

    $line =~ /<(\/?)([^ >]*)([^>]*)>(.*)/s;
    my $end = $1;
    my $tag = $2;
    my $rest = $3;
    my $content = $4;

    my $temp_result = '';

    if ($tag =~ /^a$/) {
        if ($rest =~ /(.*")(.*)(#.*)/) {
            $rest = "$1$3";
        $result =~ s/\n$/ /;

    if ($okay_tags->{$tag}) {
        $temp_result = '<' . $end . $tag . $rest .'>';
    unless($no_content_tags->{$tag}) {
        $temp_result .= $content;
    if ($tag =~ /^a$/ || $content =~ /^$/) {
        $temp_result =~ s/\n//g;
    $result .= $temp_result;

$result =~ s/\n\n*/\n/sg;
$result =~ s/^\n//g;
print $result;

Tuesday, July 1, 2008

Barrels of Gold


My friend Brian Marion posed a variant of the following problem:
Suppose I just robbed Fort Knox. I was in a rush and I put the gold in what I had available: 3 large barrels. I didn't keep track of how much I put in each one, I just threw bars into whichever was closest until the klaxons drove my nerves over the edge. Then I hammered down their lids and bailed - I can't even be sure that a particular barrel has anything in it.
Since you helped plan the heist, I'll let you have any one of these three sealed barrels that you want. Of course, even an empty barrel is too heavy to lift by hand, so it's hard to tell which has the most gold. The only way you can measure their contents is to see if the forklift can lift them. As a safety feature, our forklift can be calibrated to a certain weight, which it will lift no more than. However we can only set this once because changing it a second time requires retooling and a visit from the manufacturer, which would blow our cover.
We know this: these barrels are holding between 0 and 1000 pounds of gold, with a uniform, random distribution. You get to pick a weight, any weight you'd like and test which barrels are heavier than that, and which are lighter. If just one is heavier and two are lighter - great! You found the one with the most gold. If there's a tie for heaviest, you'll just have to pick one. But hey, you might eliminate the lightest one and that's better than no information, right?
Suppose the forklift's zero calibration is the weight of an empty barrel, so we can ignore that number. Therefore, if you set it to 500 pounds, you have exactly a 1/2 chance of getting a particular barrel too heavy to lift, but a 3/8 chance of getting one too heavy and two too light. On the other hand, you might get two too heavy, and you'll wish you had set it higher. Obviously, there's a tradeoff: higher weights are less likely to have a tie among several barrels being the heaviest, but on the other hand they risk overshooting all the barrels and you'll be stuck just picking one of the three.
What is the weight capacity you should set on the forklift to maximize your average expected gain?


First we'll solve a more general problem, then plug in the numbers to solve this one. Without loss of generality, say the amount of gold in a barrel ranges from 0 to 1. Suppose there are $b$ barrels and $g$ (for guesses) calibrations available to us. Suppose the set $C$ is the set of calibrations, which we'll write as
$C = \{c_{i} \hspace{.3em} \vert \hspace{.3em} i \in [1,g]\}$. $c_{g}$ will be the biggest one and they descend from there, so this equation is true:
$\displaystyle 0 < c_{1} < ... < c_{g-1} < c_{g} < 1$  

Finally, add the two constants $c_{g+1} = 1$ and $c_{0} = 0$ to make our formulas cleaner. Each barrel can fall into any of the $g + 1$ ranges: less than the smallest calibration, bigger than the largest, or between any pair of consecutive calibrations.
To find the expected value of a set of calibrations, we can find the expected values of each of these gaps and multiply those values by their probabilities, then sum up the products. The average value of a barrel between $c_{i+1}$ and $c_{i}$ is
$\frac{c_{j+1}+c_{j}}{2}$ and the only way you'll end up with that as the amount of gold you get is if the largest barrel is in that range. Fortunately, that probability is easy to calculate: the chance of all the barrels being less than $c_{i+1}$ is $c_{i+1}^{b}$ and the chance of them all being smaller than $c_{i}$ is $c_{i}^{b}$, so we can simply subtract these two powers to get the probability that all the barrels are less than $c_{i+1}$, but not all are less than $c_{i}$.
Then the expected value is
$\displaystyle E(C)$$\textstyle =$$\displaystyle \sum_{j = 0}^{g}\frac{c_{j+1}+c_{j}}{2}(c_{j+1}^{b} - c_{j}^{b})$
 $\textstyle =$$\displaystyle \frac{1}{2} \left[\sum_{j = 0}^{g}(c_{j+1}+c_{j})(c_{j+1}^{b} - c_{j}^{b}) \right]$
 $\textstyle =$$\displaystyle \frac{1}{2} \left[\sum_{j = 0}^{g}(c_{j+1}^{b+1} - c_{j+1}c_{j}^{b} +c_{j}c_{j+1}^{b} - c_{j}^{b+1}) \right]$
 $\textstyle =$$\displaystyle \frac{1}{2} \left[\sum_{j = 0}^{g}(c_{j}c_{j+1}^{b} - c_{j+1}c_{j}^{b}) +\sum_{j = 0}^{g}(c_{j+1}^{b+1}- c_{j}^{b+1}) \right]$
 $\textstyle =$$\displaystyle \frac{1}{2} \left[\sum_{j = 0}^{g}(c_{j}c_{j+1}^{b} - c_{j+1}c_{j}^{b}) + 1 \right]$

The maximum value of $E(C)$ can be found by taking partial derivatives with respect to each $c_{i} \in C$, setting these expressions to zero, and solving them simultaneously. The constant 1 will not survive differentiation and the $\frac{1}{2}$ is superfluous. A variable $c_{i}$ appears in 4 terms, namely

\begin{eqnarray*}c_{i}c_{i+1}^{b}-c_{i}^{b}c_{i+1} + c_{i-1}c_{i}^{b}-c_{i-1}^{b}c_{i}\end{eqnarray*}

with derivative,

\begin{eqnarray*}c_{i+1}^{b}-bc_{i}^{b-1}c_{i+1} + bc_{i-1}c_{i}^{b-1}-c_{i-1}^{b}\end{eqnarray*}

Thus the solution can be found by solving the equations:
$\displaystyle c_{i+1}^{b}-bc_{i}^{b-1}c_{i+1} + bc_{i-1}c_{i}^{b-1}-c_{i-1}^{b} = 0, \forall i \in [1, g]$  

Finally, If g = 1 and b = 3

Returning to our original problem, if there are 3 barrels and 1 calibration $C = \{c_{1}\}$, the Equations (3) require that we solve

\begin{eqnarray*}1 - 3c_{1}^{2} & = & 0\end{eqnarray*}

Therefore, the solution is
$c_{1} = \frac{1}{\sqrt{3}} = 0.57735... $, meaning the best calibration is 577 lbs. This has a nice 42% chance of isolating the heaviest barrel and a mere 27% chance of not separating the barrels at all. You must be wondering what your payout is. By Equation 2 it is 748 pounds of Gold.

Another Example, g = 3 and b = 4

As another example, for 3 calibrations
$C = \{c_{1}, c_{2}, c_{3}\}$ and 4 barrels, Equations (3) are

\begin{eqnarray*}c_{2}^{4}-4c_{1}^{3}c_{2} & = & 0 \\c_{3}^{4}-4c_{2}^{3}c_{3......{4} & = & 0 \\1-4c_{3}^{3} + 4c_{2}c_{3}^{3}-c_{2}^{4} & = & 0\end{eqnarray*}

This might be pretty tricky to solve by hand, but the mathematical software Sage gives the numerical solution:

\begin{eqnarray*}c_{3} & = & 0.83463... \\c_{2} & = & 0.64395... \\c_{1} & = & 0.40566...\end{eqnarray*}

In fact, Sage finds 57 solutions but only 5 are real. Of these 5, only 1 satisfies the ordering in Inequalities (1).

Friday, June 13, 2008

The Factorial Base II

After my factorial base post, a commenter asked about multiplication and it turns out multiplication is hard in factoradix, as far as I know. A natural question is, what does factoradix make easy? Here's an example: testing for divisibility.

In fact, this can be downright awesome by comparison. In decimal, testing if a number n divides into m takes time about equal to the length of m. Checking if 3 divides into a billion digit decimal number will take your computer about one billion cycles, within some constant factor of time spent.

Checking if m divides n in factoradix takes time proportional only to m. Checking if 3 divides a billion digit number in factoradix (which would be about a 9 billion digit number in decimal because of decimal's less efficient use of digits), only requires looking at the last 2 places!

Let me give an example. Does 3 divide into 9a0a4023220011?

Recall that each of the places corresponds to a certain number of factorials. Well, every factorial greater than 2! contains 3 as a factor, by definition. Therefore, the values of any digit other that the two rightmost, is irrelevant, because it could only change the value of the number by a multiple of three. This is sort of like how you can tell if a decimal number is divisible by ten only by looking at the last digit (all the other digits cannot affect the divisibility by ten).

Getting back to 9a0a4023220011, we need only check 11. Sure enough 11(factoradix) = 1 * 2! + 1 * 1! = 3. So yes, 3 does divide 9a0a4023220011.

Thursday, June 12, 2008

Is No One Rational?

I ran into a math subproblem in the course of a bigger problem. I thought I'd give the Internet a shot before posting the answer. Here it is:

Show that for any finite set of positive real numbers, there must exist some number x such that x times each of the numbers in the set produces an irrational number.

Or rephrased:

If n runners start out together on an infinite race, each at a constant speed, which is different from the others' speeds, show that at some time each one will be an irrational distance from the start line.

Just a reminder: irrational numbers are those that cannot be written as a fraction a/b with a and b both whole numbers.

Sunday, June 8, 2008

The Factorial Base

We've all heard of binary and decimal as systems to represent integers. The idea is simple: In the decimal number 328, 3 reflects three hundreds, 2 reflects two tens and the 8 adds in eight ones.

The same number is 101001000 in binary. Specifically, one 256, one 64 and one 8. 256, 64 and 8 are the eighth, sixth and third powers of two, and those occur as ones that many spaces over from the right side of the number.

Clearly, numbers are longer in binary than they are in decimal, but the digits are much simpler in binary. If all we cared about was reducing the number of digits, we could use base 20, or 2008 or the largest number we could think of.

Of course the tradeoff that comes is that it can be hard to make up and keep straight all the extra digits we would need. For example, the decimal number 1330 only needs three digits in base 11, but each of those digits is the digit "10", because 1330 = 10 * 112 + 10 * 11 + 10. How would we write this? One option is the hexadecimal system where 10 is a, 11 is b and so on. Then 1330 (base 10) = aaa (base 11). We can take this all the way up to base 36 using all 26 letters, in which case a long decimal number like 2621102907451 (base 10) compacts down to xg48zv3v (base 36).

But for a small number like 10, it seems unreasonable to write it as "a" just to save one digit. What if there was some way to have small digits for small numbers and compact representations for large numbers? It turns out there is a way to do this using the "factorial base" or "factoradix".

In factoradix, the rightmost digit is base 2, the next one is base 3, and so on. As an example, here's how you would calculate the factoradix notation for 99:

First we notice that 99 gives a remainder of 1 when divided by 2. The quotient is 49. We keep both of these numbers: 1 because it is the rightmost digit of the factoradix representation of 99 and 49 because we need it for the next steps. Next, 49 divided by 3 is 16 with remainder 1. So 1 is the next digit and so far we have _11 as our representation with the _ part still to be filled in (we can't know how many digits it will be, yet). Divide 16 by 4, quotient: 4, remainder: 0. Last, divide 4 by 5 to get a remainder of 4 and quotient of 0 (0 means the algorithm is finished). Combining all the remainders, 4011 ends up as the factoradix notation for 99. Longer yes, but with a simpler choice of digits.

We don't need to go through the details, but larger numbers make the opposite tradeoff in factoradix. A 1 with 47 zeros behind it becomes 4z7,h0x,ipf,5gr,0bc,l7l,977,460,068,952,315,551,220 which is only 39 digits long. A 1 with 100 zeros is only 69 digits long, but we need some way to encode digits like "58".

What do some more of these representations look like? First, the Perl function below encapsulates the algorithm above.

sub factoradix {
my $n = shift;
my $base = Math::BigInt->new(2);
my (@digits, $digit);
while ($n->bcmp(0)) {
($n, $digit) = $n->bdiv($base);
unshift @digits, $digit->bstr();
return [ @digits ];

Using this function, here's how the powers of ten start out in factoradix:
For 100, we get 1
For 101, we get 120
For 102, we get 4,020
For 103, we get 121,220
For 104, we get 1,651,220
For 105, we get 23,551,220
For 106, we get 266,251,220
For 107, we get 2,750,051,220
For 108, we get 25,551,151,220
For 109, we get 210,564,451,220
For 1010, we get 1,7a5,726,651,220
For 1015, we get 2d,caa,584,234,451,220
For 1020, we get 1k2,13c,54a,a81,881,151,220
For 1025, we get g2i,h8c,8f7,bc3,5b8,250,051,220
For 1030, we get 3,7mf,cdg,584,64c,f8b,c01,182,251,220
For 1035, we get c,4tr,pn8,9ef,g2b,d3f,34b,225,975,551,220
For 1040, we get x,tkt,18i,jep,8bh,g9j,f7e,1bb,652,056,651,220
For 1064, we get g,lqd,ila,9ii,ju0,a5m,bas,91k,7l5,m66,1ii,4h5,462,7a5,726,651,220

Every power of ten larger than 1064 seems to use a digit that we don't have a letter for (and I checked up to 101400!). Of course, all we need to do is add more symbols, maybe hearts or that thing The Artist Formerly Known as Prince uses...

What do you think is the next factoradix question to investigate?

Sunday, March 2, 2008

From the Outbox: a Balanced Diet

From: Me
Subject: Re: The 20 Worst Foods in America]
Date: March 1, 2008 5:44:56 PM CST

On Feb 29, 2008, at 9:48 AM, Xxxxx Xxxxxxx wrote:

> Thought this was interesting.

> Xxxxx

> > 1. The Worst Food in America
> > Outback Steakhouse Aussie Cheese Fries with Ranch Dressing

> >

> > 2,900 calories
> > 182 g fat 240 g carbs

Of course, there's nothing inherently wrong with calories taken out of context of all food one eats. The Outback Steakhouse Aussie Cheese Fries with Ranch Dressing is a perfect complement to some diet, somewhere. For instance, if the only other thing we ate was grasses and roots that we dug up on the hllls, the aussie cheese fries might balance it out nicely.