Illegal Numbers and the Dangers of Banned Information

October 15, 2013 3:07 pm

Did you know that some numbers are illegal?  It's true.

No, there isn't a law somewhere that says the numbers 745, 1,889, and 131,101 are illegal.  In fact which numbers are illegal isn't even known a priori.  And there's an unbounded (and possibly infinite) number of illegal numbers!  It's crazy!

So how are some numbers illegal?

It all started with the invention of digital storage mediums (i.e., the computer).  You may have some notion about computers operating entirely on 0s and 1s.  That is, they use a binary number system.  Physically, these values are usually stored as a high or low voltage (electrical storage, like a USB thumb drive), the direction of magnetization (magnetic storage, like a hard drive or floppy disk), or as actual pieces of material (physical and optical storage, like a punch-card/CD/DVD/Blu-Ray).  These physical representations are interpreted as either a zero or a one.

Since every piece of data on your computer is stored in some fashion and then interpreted, every piece of data on your computer is represented by a series of zeros and ones.  Any series of zeros and ones in a binary system represents a specific number.  In binary, 01101 is the decimal value 13.  Therefore, every piece of data on your computer has a corresponding number that exactly represents that data.

Take this image for example:

simple

It's just a 64x64-pixel square with bands of red, green and blue.  The number that is stored on the computer to represent this image is:

1238301683640466317815934360806135690116434697154877923
4494792367923755906673163981641861895458337604702343466
2810700348261887772411034091469320270991594673908631991
5844885402876248165441943076299189588485408932698051652
9274954637784672242688658729288559072939506175348618889
6499525714591179951958772678400887694932269665261700174
3252008568133749407688395385589539990171146138088533455
4251168727225022618657147246713345507166265266049883607
2457758487000098535759665677768294453081438629169206933
668292205864758173826

In any sense that matters, this number and that picture are equal—with the important understanding that I told the computer to treat that number as if it were a picture (specifically a png file).  I could just as easily tell the computer to interpret that number as audio, or video, or text (but it would appear to be garbage if interpreted in any of those ways).

That picture is this number and this number is that picture.  It has to be so in order for computers to work.

Now here's where this gets interesting and a little bizarre.

There are many laws that make certain physical objects illegal to possess under many circumstances (drugs, explosives, etc.).  But our legal system has also made certain types of information illegal to possess. One such category is child pornography.  To my understanding it is illegal to possess any instance of child pornography, regardless of intent.  If such an instance exists on a computer then the number that represents that illegal information is itself illegal, for the number and the image are one and the same.

Other types of information are legal to possess, but illegal to share with others.  The DMCA makes it illegal to provide to others any tool which is capable of circumventing any measure designed to prevent access to a copyrighted work.  Meaning I could write a piece of software to copy DVDs, but it's illegal for me to use it or give it to anyone else.

This seems like it creates some real legal challenges.

I don't think anyone would disagree that a website dedicated to posting and sharing child pornography would be illegal.  But suppose that instead of pictures a website is set up dedicated to posting numbers.  The only things posted are numbers and discussion of those numbers.  Surely there's no harm in a website dedicated to numbers.

Now suppose during the discussion of some number, someone suggests that people tell their computers to interpret that number as an image (create a file, load in the binary form of the number, set the file extension to .jpg or .png, or .gif).  And suppose that the resultant image is illegal as described in the previous section.  Who, if anyone, is legally liable for possessing or sharing this illegal number?

This is a theoretical practice as far as I'm aware, but let's take another step anyway.  MIT hosts a site with the first 1 billion digits of PiThe current record for calculating Pi is just over 10 trillion digits (though not posted for viewing).  Surely if pi continues on forever and never repeats it would have to eventually include all illegal numbers.

Suppose someone sets up a website that says start at the 19,995th digit of Pi, take the next 3,021 digits and interpret it as an image or run it as a program.  And again, suppose that interpretation is illegal.  Is anyone at fault?  Should they be?

This is the trouble that occurs when information itself is made illegal.

This referencing is how your computer works though.  You can consider your hard-drive as an incredibly long list of zeros and ones and through some conventions your computer looks at one set of numbers which tells it how to interpret the other numbers (go to the 1,313,163rd bit, take 766,122 bits and treat them as a picture).

So, given an illegal number sitting on your hard-drive, is the number itself illegal, or is it the other numbers elsewhere on the hard-drive that tell the computer how to treat the first number?  Or is it only the pair together that's illegal?

How many steps out do we go before illegal numbers are no longer illegal?  Can I break up the numbers into parts and have people add them back together?  Can I tell you to get X numbers from pi and then use those numbers to look up in pi the illegal number?

We can sort of borrow a concept from quantum mechanics to describe the situation: A number is a superposition of information.  It only takes on definite meaning once a specific interpretation is applied.  So is it the person applying or sharing the specific interpretation that is at fault?

Kyle's Hypothesis

Using this superposition idea, I propose the following hypothesis:

A number exists which represents a perfectly innocuous piece of data but when interpreted in another format (e.g., image) is illegal.

What happens if some particularly popular number (e.g., a song in mp3 format) turns out to be the same number that represents something illegal?

In this case, it's only the other numbers on the hard-drive that specify how to interpret the mp3 as either a song or something illegal.  Is it then illegal to suggest to other people to interpret the same number they already have in a different way?

Once you start making information itself illegal to possess or distribute, you start creating some really bizarre corner-cases for the legal system.

(For the technically minded:  For simplicity I'm ignoring the scenario where magic-number headers may be used to suggest file format within the file itself.)

Update to answer Megan's question:

Megan asked what

846513265498765646454545431313

15464875465134876532165400014654684
would look like as a picture.  As my parenthetical at the end the post alluded to, it's not quite as simple as I made it sound.
There is this notion of "magic-number" headers.  Which is really just a convention that computer scientists use that says when I want this data to be treated as a BMP file, the very first thing in the file will be the number "16973" which stands for the letters BM.  There are similar magic numbers (or "fingerprints") for many different file formats.  Many programs that know how to open a BMP file, won't even try if the magic number doesn't match.
So by directly dumping Megan's number into a file and trying to open it as various image formats I only got error messages.  So I used the simplest file format with which I'm familiar (pbm) and added the proper magic number (P3) and header (which describes the width and height of the image; I chose both to be 5 arbitrarily).
The resultant number is now
6086291824888092467866770275946841
3783905258129499855885219551478363
6286731887369226
And the image itself is just a smudge.  The pbm format is black and white only, no grey.  And I've blown the image up to 65x65 pixels so you could actually see something (remember it was only 5x5 to start with):
megan

I did not change anything about the number Megan provided, I simply added the necessary information that tells the computer how to interpret that number into an image.

If I change that interpretation to expect a very simple color image (a PPM file with magic number P6, 4x4 pixels with only 8 possible colors) then the image looks like this (blown up to 64x64 pixels):

megan2

Most of the image is white, so you can't really see it against the white background.

2007 Honda Civic Efficiency Update

May 7, 2013 7:41 pm

Just an update on the gas mileage efficiency I've been getting with my 2007 Honda Civic (automatic transmission).  The orange line represents the average, which is currently hovering just above 30MPG.  The EPA rating was 25/36 so I'm doing pretty well still.

It does look like the efficiency may have dropped a little bit starting around autumn 2011.  But it's hard to say for sure since the gasoline formulations change regularly what with the summer/winter mixes and inconsistent ethanol levels.

Civic_07_MPG_chart

Automaton Simulator

March 23, 2013 5:20 pm

I posted about this on Google+ a while back, but I've updated the site and it's now much cleaner.  I still have a few features I'd like to add in the future, but they don't really impact the site's purpose.

Anyway, I present AutomatonSimulator.com:

automaton_simulator

In Computer Science we study simple automatons called finite-state machines.  They are equivalent to various useful language concepts.  For example, Deterministic Finite Automata (DFA) can be used to process any Regular Language (i.e., regular expressions, which are infinitely useful).  And Push-Down Automata (PDA) can process any Context-Free Grammars.

In the CS course I TA'd for as a student, CS 252, a chunk of the course is devoted to working with these concepts.  This usually means developing a working automaton design based on some desired language recognition. For example, make a machine that will accept strings that alternate between "A" and "B". Or, make a machine that will accept strings that have the same number of "A"s as "B"s.

They're usually quite meaningless in and of themselves, but the point is to develop the skills necessary to understand how programming languages are created and why, as well as to hone the ability to logically analyze problems and build logically consistent solutions.

Well, we had to do all this work by hand.  Drawing out machines, tracing through their execution, finding bugs, and making sure they did what they were supposed to without doing things they weren't supposed to.

As the TA I had to grade a lot of these messily drawn machines that often didn't work.  It was tiring.  So to aid my grading I wrote a simple simulator in Python for each machine type.  Then I'd encode each student's machine into my simulator, run a bunch of tests and figure out from there whether it worked and, if not, how badly it was wrong.

AutomatonSimulator.com is a fully functional tool to visually create and test these types of machines.  I took my Python simulators, rewrote them in Javascript, and built a lovely UI around them.

You can save/load machines from your browser's local storage.  Or you can copy/paste machine descriptions to share with other people.  A small set of examples is included on the site.  You can debug a machine by stepping through an input and you can bulk test a large set of strings with a single button press.

I had fun creating the site and hopefully CS students will find it useful in developing their understanding of finite-state machines.

Something I'd like to do in the future is to build a simple game around the site.  It wouldn't be very involved, but it would challenge the user to build a machine for a certain language and help them make the connections between these machines and regular expressions.  We'll see if I get around to it someday.

XBMC LIRC XBox DVD Dongle

July 25, 2012 8:53 pm

Just sticking this here for the rest of the Internet to benefit from.

I was rebuilding the HTPC with an updated OS and needed to get LIRC working again for my remote using the original XBox DVD dongle connected via USB.

I found this post which describes the process of building and using the lirc_xbox kernel module.  Which worked fine in XBMCBuntu (based on Ubuntu 11.10).  But when I tried to follow the instruction in Linux Mint 13 XFCE I got the error "E: Unable to find a source package for lirc" when attempting the line "sudo apt-get build-dep lirc". 

I worked around the problem by running the same command on my laptop running Ubuntu 12.04 which gave me the correct list of dependencies:

autotools-dev build-essential debhelper devscripts dh-apparmor diffstat dpkg-dev g++ g++-4.6 gettext html2text intltool-debian libasound2-dev libdpkg-perl libftdi-dev libftdi1 libgettextpo0 libice-dev libirman-dev libjack-dev libjack0 libportaudiocpp0 libpthread-stubs0 libpthread-stubs0-dev libsm-dev libstdc++6-4.6-dev libunistring0 libusb-dev libx11-dev libxau-dev libxcb1-dev libxdmcp-dev libxt-dev patch po-debconf portaudio19-dev quilt x11proto-core-dev x11proto-input-dev x11proto-kb-dev xorg-sgml-doctools xtrans-dev

I used that list in a "sudo apt-get install [long list of package names]" and was able to successfully get the build dependencies installed and follow the rest of Mr. Plow's instructions and get my remote working properly.

Since I saw that a few other people were having the same issue, I thought I'd post this in case anyone else gets stuck on the problem.

Reducing power usage: SheevaPlug and Squeezebox Radio

July 7, 2012 12:36 pm

Since we've entered summer our electricity rate tiers have switched to the summer levels.  This means that the lower (cheaper) tiers are smaller so you start running into the much more expensive tiers sooner.  Tiers 1 and 2 are both pretty cheap, 13 and 15 cents per kWh, but tier 3 jumps to 30 cents per kWh.  So whenever possible we try to avoid landing in tier 3.

A while back Mom gave me a Kill-A-Watt meter.  Our electricity bill informs us that we're consistently using substantially more electricity than "similar homes in your area."  Which seemed odd since we don't obviously waste energy.  So I finally got around to checking on our electronics to find out what's guzzling our energy.

Jess has an old stereo thing that we use to listen to music when going to bed.  I discovered that this stereo was drawing ~18 watts regardless of what it was doing, 24 hours a day, 7 days a week.  So just having this stereo plugged in was costing us somewhere between $20 and $45 a year depending on the tier.

I have a desktop computer which I use as a file server and media center (via XBMC).  It holds all of our DVDs so that the actual DVDs sit in a box somewhere out of the way.  It also holds all of our music files and I use it to download various things via bittorrent (Linux ISOs, games purchased via the Humble Bundle, perfectly legal things, of course).  Thus the computer was usually on 24/7 also.

So I was rather shocked and appalled to discover that it was drawing ~106 watts when running.  Keeping that machine on was costing us ~$100-$200 a year!  So the first thing I did was dig into configuration options and disconnect unused components.  Via this route I was able to bring its energy usage down to about 80 watts.  Better, but not great.

Enter the Logitech Squeezebox Radio and the SheevaPlug.

Logitech Squeezebox Radio

IMGP8274as

The Squeezebox Radio was something I've wanted for a little while.  It's a nifty device and the very low power usage was just a nice bonus.  It's, essentially, a music streaming device with built-in speaker and wireless network connection.  So you just plug it in and you can listen to Internet radio stations, connect to a Pandora account (or most other music streaming services), and play music from a local server via the freely available Logitech Media Server.  Something I like about it is that all the software is open source and they don't make any attempt at locking down the software or hardware.

Anyway, I received the Squeezebox Radio for my birthday this year.  Part of its job was going to replace Jess' old stereo system.  It's working great at that task and takes up less than a third of the space.  It's small (smaller than I expected) and easy to move so Jess often moves it to the living room during the day, into the bathroom for Heather's bathtime, etc.

The Squeezebox Radio draws ~2.0 watts when running (~2.2 watts when the screen is on).  So that's a big win over the stereo drawing 18 watts.  But it also contributes in savings in other ways.  Instead of running the full blown stereo system in the living room for music Jess uses the Squeezebox Radio, so that's going to count for something.

Overall I am very pleased with the Squeezebox Radio.

The Squeezebox Radio spends much of its time streaming music locally from the desktop (using the mentioned media server software).  And the computer was the big power hog.  So let's address that now.

SheevaPlug

IMGP8269as

To try and reduce the power usage of the computer I spent some time researching low-power computing options.  I researched building an Intel Atom based machine, an AMD Fusion based machine, a dedicated NAS device, and a few other options.  But based on those systems it looked like I was going to get a lot more power than I needed and still be pulling 20-30 watts.

I then turned my attention to the Plug Computer scene.  Plug computers are designed to be cheap, low power, plugged in somewhere out of the way, and mostly forgotten. They have a vibrant community built around them.

There are several plug computers to choose from.  I went with the SheevaPlug because it has a long history with many success stories and guides.  Its age means it's a little less capable than some of the other offerings, but it looked like it would do what I needed just fine.

It's small, about the size of 3 decks of cards.  It features a 1.2 GHz ARM processor, 512 MB DDR2 RAM, SD Card slot, USB 2.0 port, and a gigabit ethernet jack.

I set it up with a 4 GB SD card and a 64 GB USB flash drive.  I had planned to use a 2 GB SD card, but it didn't like the one I had and a 1 GB card was too small.

I used the 4 GB card to install the operating system (Debian Linux) and other necessary software (like the Logitech Media Server, Transmission [a bittorrent client], etc.).

The 64 GB USB flash drive is holding all the data I need.  It has our music library, backup files from the Board (the Board gets backed up nightly, previously to my desktop, now to the SheevaPlug), and any currently active torrent files.

Maybe some of the other plug computers are different, but setting up a SheevaPlug isn't exactly for the novice.  I had to cobble together bits and pieces from various guides in order to get everything working correctly.  It requires a working knowledge of Linux, a comfortable familiarity with command lines and a basic understanding of memory addressing (well, if you want to have any idea what the commands you're typing do, that is).

Here are the main resources I used:
http://www.cyrius.com/debian/kirkwood/sheevaplug/
http://www.cyrius.com/debian/kirkwood/sheevaplug/install.html
http://plug.noloop.net/sheevaplug-hacks/installing-debian/
http://wiki.slimdevices.com/index.php/SheevaPlug_Installation_guide
http://d-i.debian.org/daily-images/armel/20120705-08:35/kirkwood/netboot/marvell/sheevaplug/ (for the latest Debian installer images)
I also needed some decent Google skills to solve various issues along the way.

The SheevaPlug is up and running smoothly now.  Running full tilt it draws about 3.5 watts.  So the Squeezebox Radio and SheevaPlug together use about 6 watts compared to the ~125 watts previously needed for the desktop and stereo.  So over the course of 1 year this set up will save us somewhere between $120 and $200 in electricity.  The SheevaPlug costs $99, so it will pay for itself in a year and that's ignoring the reduced cooling costs.