Thursday, April 8, 2010

Using Scala to teach programming

Still not using Scala at work. I think I'm not pushing hard enough :-)

This hasn't stopped me fooling around with it on my own time though. I think I'm familiar enough with Scala now to be able to give a talk on it at work and bring people to the dark side. I'm still not what I'd call good at it yet, though. Still get tripped up on syntax occasionally.

Interestingly, the other night my girlfriend expressed curiosity about programming, so I thought I'd show her how you get stuff done in Scala. She's not a programmer, but has done tons of mathematics, so I fired up the Scala REPL and started showing her how to do stuff, and she picked it up with no problem.

It occurred to me that shells like the Scala REPL or Ruby's IRB are great ways to teach people how to program, same as older languages Logo or C64 Basic. You get instant feedback, you know you're doing it wrong right away. You try something else and you learn.

Of course, where Scala really shines here is that you have an interactive shell that's completely typesafe. I don't beleive there's ever been a language this popular that's had a typesafe interactive shell - this is something new (correct me if I'm wrong!)

Imagine what an invaluable teaching tool this could be. Scala's friendly syntax allows you to express really complex concepts with a minimum of syntactic cruft.

So I started messing around with the shell, showing my girlfriend how easy it is to do various things. I started off with saying that programming is a lot like algebra - you have named variables to refer to values. Then moved on to expressions, if/else constructs, functions, passing functions around, simple value classes, and manipluating collections of them. An hour long demonstration covered quite a bit of territory and was rather well received and understood. Which is pretty good when you consider that functional programming concepts are something I didn't get for quite some time, yet I was able to explain them to a smart non-programmer using a syntactically sane language with little difficulty.

I got to thinking that this would make a really good live coding demonstration that I could do in person, or even a Youtube video. So I've been thinking about ways to do make that happen.

In the meantime, I saved a transcript of the session, have included some it it below (exluding the more complex stuff), and have added comments to describe what's going on.

Want to try your hand at programming? Try this.

If you've been thinking about getting your feet wet with programming, try some of this stuff in the scala shell. Bold type indicates stuff typed by a human, the green stuff in italics is my commentary, everything else is printed by Scala. Typically you will type expressions in at the "scala>" prompt, and press Enter, then Scala will evaluate what you typed and try to do something with it.

Store the number 3 in the variable named x. Scala will show that it now remembers that x, an integer, has the value 3.
scala> var x = 3
x: Int = 3

Evaluate x, i.e. recall what it's storing.
scala> x
res1: Int = 3

Evaluate x+3, an example of simple arithmetic
scala> x +3
res2: Int = 6

Define a function called "bump", which given an integer (which we'll call "num"), will give back whatever num is plus six.
scala> def bump(num:Int) = num + 6
bump: (Int)Int

Use the bump function on x, and see what we get
scala> bump(x)
res3: Int = 9

We can remember groups of things together in arrays. Let's make an array called nums, to hold this bunch of numbers.
scala> val nums = Array(4, 7, 9, 3, 1, 2)
nums: Array[Int] = Array(4, 7, 9, 3, 1, 2)

Make a new array called newnums out of nums, by mapping each of its values to a new value. For each value in nums (let's say call it x), the corresponding value in newnums will be x+7.
scala> val newnums = { _ + 7 }
newnums: Array[Int] = Array(11, 14, 16, 10, 8, 9)

Find all the nums that are more than 5. This will return a new array containing the elements that meet this condition.
scala> nums.filter { _ > 5 }
res6: Array[Int] = Array(7, 9)

Say hi! What this really doing is supplying a string of characters (marked in quotes) to a function called println, which does the "saying" around here.
scala> println("hello")

Let's do something for each of our numbers. For each one, let's call it x, and print out hello alongside x.
scala> nums.foreach { (x) => println("hello " + x ) }
hello 4
hello 7
hello 9
hello 3
hello 1
hello 2

Do the same as above, but only for the numbers more than 5. What this is really doing is creating a new array out of nums containing only the ones more than 5, and doing the printing operation for each element in that.
scala> nums.filter { _ > 5 }.foreach { (x) => println("hello " + x ) }
hello 7
hello 9

What if we needed to remember something more complex? Say we had to work with people, and we needed to remember their name and age? We need to create a new type of variable to do that. Let's create the class Person, which contains values for name and age.
scala> case class Person(val name:String, val age:Int)
defined class Person

Let's create a person and store him in the value "fred"
scala> val fred = Person("Fred Bloggs", 28)
fred: Person = Person(Fred Bloggs,28)

Make another for Jane
scala> val jane = Person("Jane Doe", 24)
jane: Person = Person(Jane Doe,24)

Show me Fred's name, then Jane's age
res21: String = Fred Bloggs

scala> jane.age
res22: Int = 24

Is Fred older than Jane? This shows it's true.
scala> fred.age > jane.age
res24: Boolean = true

Define a function to determine the older of two people, then use it to see who's oldest out of fred and jane. Clearly we see that Fred is older.
scala> def oldest(p1:Person, p2:Person): Person = if (p1.age>p2.age) { p1 } else { p2 }
oldest: (Person,Person)Person

scala> oldest(fred, jane)
res25: Person = Person(Fred Bloggs,28)

As you can see it's not that hard to get started learning programming with Scala.

Thursday, September 24, 2009

Roll your own cloud based backup

So I'm moving out of my current apartment shortly, ahead of giving a talk at Hadoop World NYC and going on a two week vacation overseas. I'm going to be putting my stuff in storage while I'm away, and this includes my hard drives with my photos and music on them. It occurred to me, well what if this gets stolen, or the drives damaged in transit?

It was time to make a backup. Long overdue, really.

There were several options. The easiest option probably would have been to go and pick up another 500GB external drive, copy the existing one that has the goodies on it, and leave that with a friend. The cheap option would perhaps be to back up onto DVDs.

But I've been messing around with Amazon's Simple Storage Service (S3) an awful lot lately so it occurred to me that I could just back up to the cloud like I've been doing at work. So I wrote it in Ruby, making use of Amazon's aws-s3 gem, which makes dealing with S3 almost trivial.

I did these steps on Ubuntu, adjust appropriately for whatever OS you use.

First things first - an Amazon account.

If you're going to do things this way, you'll need an account on Amazon Web Services (AWS). Note that this stuff isn't free - you will get charged $0.15/GB for storage. However, keeping it running is someone else's problem, not yours. If you've ever shopped at Amazon you can just use the same account you shop with to sign into AWS.

Once you're signed up and signed in, click on the Your Account menu near the top, and select Security Credentials. You'll see a section on this page called Access Credentials, and as part of that, Your Access Keys:

This contains the two pieces of information you need to be able to programmatically connect your machine to S3 - your "Access Key ID" and "Secret Access Key". These are pretty much like a username/password combination that provides access to the account identified by your email address. (You could have more than one, or even multiple accounts for one email address since Amazon accounts are really identified by a unique "Canonical ID", but I digress). Obviously, don't share these with anyone. Since you signed up with a credit card, you don't want anyone else storing stuff in S3 on your account and having you pay for it. So be careful.

Get this Access Key ID/Secret Access Key values and write them into a text file in your home directory called .awssecret. Two lines. Put the Access Key ID on the first line and the Secret Access Key on the second. Now you're ready to get the software on.

Required packages

You'll need to have ruby, gem, and libopenssl-ruby installed. These might be called something different on your system, but on Ubuntu, you just run this:
# sudo apt-get install ruby gem libopenssl-ruby
Then you can install the aws-s3 gem trivially.
# sudo gem i aws-s3

Experimenting in the Ruby REPL

Probably the easiest way to try anything out in Ruby is by using the interactive Ruby interpreter, irb. This is pretty much like a Ruby shell. Don't forget to fire it up with the -rubygems flag so that you can use the gem libraries.
irb -rubygems
At that point, you can use these commands to read your credentials from the file you saved earlier and get connected to S3. Using SSL is recommended.
require 'aws/s3'
include AWS::S3
creds ="#{ENV['HOME']}/.awssecret") { |f| {
:access_key_id => f.readline.chomp,
:secret_access_key => f.readline.chomp,
:use_ssl => true
Base.establish_connection! creds
You can then start issuing commands to see what's around. This one gets a listing of buckets in your S3 account, or create one. The buckets are used to group objects that you store in S3.
Bucket.create ''
=> [#"", "creation_date"=>Fri Sep 25 03:27:55 UTC 2009}>]
Doing the backup

I decided that since I was backing up photos, that I wanted to non-recursively tar up all the photos in each directory, uncompressed (JPG is already insanely compressed, don't squeeze rocks!), one tar file per directory. I caught the output of find to find out how many dirs I was dealing with.
irb> (dirs = `find  /media/HD-HCIU2/photos -type d`.split("\n")) && nil
=> nil
irb> dirs.size
=> 2859

Since an S3 bucket has a maximum observable capacity of 1000 objects, I planned to create 3 buckets to hold the tar files for these 2859 directories. Here's the first one.
irb> bucketName=""
irb> Bucket.create(bucketName)
OK, ready to go. What I needed next was a function which given a directory name and a bucket name, would tar up the contents of that directory and upload it to the bucket.

I decided to name the tar files as an md5 hash of the full path to avoid any complications from odd characters. This function after some experimentation and adjustments did the trick. It needs Digest for Md5 included. (It's a little verbose with logging what it's doing to stdout).
require 'digest'  #for MD5
include 'Digest'

def uploadDirFiles(dir,bucketName)
dirkey=MD5::hexdigest dir
puts "chdir to #{dir}"
Dir.chdir dir
files = Dir.glob("*.*")
puts "#{files.size} in #{dir}"
if (files.size>0) then
qfiles ={|f| "\"#{f}\""}.join " " # wrap filesnames in quotes, join with spaces
puts "creating archive #{archive}"
cmd="tar --create --verbose --no-recursion --file #{archive} #{qfiles}"
print `#{cmd}`
puts "uploading archive #{archive} - #{File.size(archive)} bytes long", open(archive), bucketName)
Notice that this handles one directory. If something goes wrong, the store operation should raise an exception. We don't catch it here, but handle it in a higher level function which records successes and failures for all the directories, cleans up the tar files, and returns the failed and successful directories:
def upload(dirs,bucketName)
dirs.each do |dir|
rescue=> ex
puts "#{dir} upload failed: #{ex}"
sum=MD5::hexdigest dir
File.unlink(archive) if File.exists?(archive)
Then we just call that and let with the first 10 directories whose names we collected earlier to test.

It works quite well, and the uploader returns a pair of arrays indicating successes and failures on a per-directory basis. You can run the failure array through the uploader again to retry, whittling the failure list down until you're done. This is probably better suited for turning into a script rather than running in the irb shell.

So far I've just tested with uploading the first 10 directories of my music collection and did not encounter any failures. In an hour of testing I probably uploaded a CD's worth of data. Which isn't very fast, but this isn't the script's fault.

Maybe don't try this at home

The main problem is that running this over your average household ADSL completely sucks - your mileage may vary here, but my upload speed appears to be capped about about 1.5MBit/sec. If all goes well, 1GB would take about about 1.5 hours to upload. Which makes for a slow slow backup of 300GB (would take 18 days!)

For home data I think I'll just have to back up my media drive to another media drive and store it somewhere else.

So this technique is better for backing up to the cloud from your office with its big fat data pipe, right?


Wednesday, September 2, 2009

Preventing gcj from being installed by Ubuntu

If you're using a real Sun JDK/JRE on Ubuntu then you will want to prevent gcj from being dragged in by apt-get when installing Java applications like ant, eclipse, groovy etc. Trust me, gcj is horrible, and if you have Sun Java installed, you don't need any other JVMs anyway.

Add the following to /etc/apt/preferences:
Package: Package: gcj-4.3-base
Pin: version 0.001
Pin-Priority: 1000

Package: gcj-4.2-base
Pin: version 0.001
Pin-Priority: 1000

Package: gcj-4.3
Pin: version 0.001
Pin-Priority: 1000

Package: gcj-4.2
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-6-jre
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-6-jre-headless
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-6-jre-lib
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-5-jre
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-5-jre-headless
Pin: version 0.001
Pin-Priority: 1000

Package: openjdk-5-jre-lib
Pin: version 0.001
Pin-Priority: 1000

Package: kaffe
Pin: version 0.001
Pin-Priority: 1000
This pins the gcj packages to a version which doesn't exist, preventing their installation, thereby forcing apt to use packages provided by the Sun runtime instead. I've included version 4.2 since the Ubuntu Eclipse packages depend on that. Not that anyone in their right mind should use such an ancient version of Eclipse, but hey, if you want to roll like that, then at least this'll prevent you from having to suffer the additional indignity of running it with gcj.

For some reason I couldn't get this to work by blocking java-gcj-compat instead, which means if the gcj package name ever changes (i.e. if 4.3 version number in package name gets bumped) then I'll need to update this rule, but for now I can keep an eye on my installs and it's happy. I'm guessing that it's probably a provided virtual package and that apt happily fails to block things which provide it.

I used debtree to do package dependency graph analysis while messing about with this, interesting package, pity it's not in the repos. The author does provide source and a .deb for easy installation though.

Surveying the damage

You can use dkpg -l to quickly and easily tell you what's installed. I wanted to see if there were any alternate JRE remnants lurking on my system, this did the trick.
$ dpkg -l |egrep "openjdk|kaffe|gcj"|egrep "^ii"
ii gcj-4.2-base 4.2.4-5ubuntu1 The GNU Compiler Collection (gcj base packag
ii libgcj-common 1:4.3.3-1ubuntu1 Java runtime library (common files)
ii libgcj8-1 4.2.4-5ubuntu1 Java runtime library for use with gcj
ii libgcj8-1-awt 4.2.4-5ubuntu1 AWT peer runtime libraries for use with gcj
ii libgcj8-jar 4.2.4-5ubuntu1 Java runtime library for use with gcj (jar f
Yuk. Let's ditch those!

First, verify what Sun Java stuff you have installed. You should at least see a JRE. This is what I see:
$ dpkg -l |egrep "sun-java"|egrep "^ii"
ii sun-java6-bin 6-14-0ubuntu1.9.04 Sun Java(TM) Runtime Environment (JRE) 6 (ar
ii sun-java6-fonts 6-14-0ubuntu1.9.04 Lucida TrueType fonts (from the Sun JRE)
ii sun-java6-jdk 6-14-0ubuntu1.9.04 Sun Java(TM) Development Kit (JDK) 6
ii sun-java6-jre 6-14-0ubuntu1.9.04 Sun Java(TM) Runtime Environment (JRE) 6 (ar
ii sun-java6-plugin 6-14-0ubuntu1.9.04 The Java(TM) Plug-in, Java SE 6
ii sun-java6-source 6-14-0ubuntu1.9.04 Sun Java(TM) Development Kit (JDK) 6 source

OK great, so that means I can delete all the other garbage. Be CAREFUL when you run this. Pay attention to what apt-get says it's going to remove - make sure it's really not going to remove any packages you care about. As long as you do have a the Sun Java stuff installed it should be fine since any Java-using Ubuntu packages that can only satisfy their Java dependencies with alternate JREs are clearly Doing It Wrong.
# sudo apt-get remove `dpkg -l |egrep "openjdk|kaffe|gcj"|egrep "^ii"|awk '{printf "%s ",$2}'`
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
gcj-4.2-base gij-4.2 libgcj-common libgcj8-1 libgcj8-1-awt libgcj8-jar
0 upgraded, 0 newly installed, 6 to remove and 4 not upgraded.
After this operation, 47.5MB disk space will be freed.
Do you want to continue [Y/n]? y

Indeed, in this case, it's only taking out the garbage, leaving my groovy, ant, maven, etc packages intact.
(Reading database ... 202154 files and directories currently installed.)                                                     
Removing libgcj8-jar ...
Removing libgcj8-1-awt ...
Removing gij-4.2 ...
Removing libgcj8-1 ...
Removing gcj-4.2-base ...
Removing libgcj-common ...
Processing triggers for man-db ...
Processing triggers for libc6 ...
ldconfig deferred processing now taking place
Nice, that cleans things up!

Friday, June 26, 2009

Scala brevity, good and bad

So I've been playing around in Scala enough now that when I look at fairly well-written Java, I find myself thinking how much more concise it would be in Scala.

Take for example the fairly common case of iterating over a collection of business objects, picking some of them out according to some criterion, then applying an transformation on those, and returning a collection of those filtered, transformed objects. This is the kind of stuff you see floating around en masse in a typical business application, and a method which does this will typically run you at least 9 lines of Java, with a lot of boilerplate type specification cluttering the place up if you're being good and using generics.
public List<Result> youngOnes(Collection<Person> peeps) {
final List<Result> res = new ArrayList<result>();
for (Person person : peeps) {
if (person.age < 30) {
res.add(new Result(person));
return res;
The Scala alternative is much simpler:

def youngOnes(items) = items.filter(_.age < 30)
.map(new Result(_))
Type inference is a wonderful thing, and I would have to say that it and support for functional programming constructs are so far my favourite Scala features. Obviously the example above has a little more going on than simple syntactic sugar - this also shows off the power of Scala's collections API. The filter and map methods, which both accept closures as arguments, enable you to selectively transform the items of a collection with ease.

So what's going on with this Scala code? Well, it's doing exactly the same thing that the Java above it does. filter() returns all items in a collection which match a condition which is specified as the first argument and is a closure. The closure takes one argument (indicated by the _ placeholder) and returns a boolean indicating whether to retain the object in the placeholder. map() runs on filter()'s results, making a new Result from each of those lucky young people, and returning a collection of those.

The beauty of this is we didn't have to create any temporary or mutable variables.

There's a LOT more you can do with Scala to express logic concisely. I'm just starting to get into it, and really, I find some of the example code to be pretty unreadable. There is a danger to Scala that with all those concise operator-like method definitions that it starts looking like Perl to me.

[examples to come]

The other thing that bothers me is that presumably a lot of libraries will be built in Scala - and these operators will have different meanings for different classes. I am guessing that this will be a lot more confusing to remember than verbal method names, but perhaps it's just a matter of getting used to them when coming from Java's world of words.

Tuesday, June 16, 2009

Bugs I've encountered in Ubuntu 9.04 Jaunty beta

Since installing Ubuntu 9.04 Jaunty on various systems that I use, I've noticed a number of significant improvements as well problems. Some of these I've been able to find workarounds for, some of which I haven't. Most of these problems occur on my personal laptop, which is a Dell XPS M170.


Ubuntu's come a long way with this and I was pleasantly surprised to find that my onboard Intel PRO/Wireless 2915ABG adapter was supported out of the box, with none of the previous manual messing about with wpa_supplicant required. Not so much as a command line was touched in order to get online. This is a vast improvement over learning and manually configuring wpa_supplicant which was previously an annoying rite of passage for anyone unfortunate enough to have one of these extremely common wireless systems.

However, I find that the ipw2200 kernel driver is not without problems. For some reason, when under modest loads, this driver will start eating all the CPU and kick me offline. This is most often reproducible by restarting firefox and having it restore its state, apparently something about handling multiple large TCP streams causes the driver to barf. I can work around this by kicking the module out and reloading it:
sudo rmmod ipw2200 && sudo modprobe ipw2200
But that gets tedious fast. In usual web browsing usage the problem rarely shows up, but when refreshing lots of tabs or using a site like Flickr it does.

Not sure what to do about this one short of hacking the driver myself. It's been a while since I worked on kernel drivers (2001!) so that might not be an effective use of time.

If I can somehow freeze the ipw2000 driver when this occurs and use a debugger or something and figure out where in the source it's failing, then I'll try submitting a patch, but I'd need to relearn the tools for this stuff.


Doesn't work any more. I have an external firewire drive for media storage. It doesn't get seen at all. dmesg reports confusion from the IEEE 1394 driver. Previously, in Intrepid, it wouldn't get seen unless I turned it on after I'd booted up, which was merely annoying. I'd gone without my music collection for a couple of weeks before I realised the external drive also has a USB connector which works perfectly, so I'm not going to bother chasing this.


I use a PCMCIA Sound Blaster Audigy2 card for audio and previously in Intrepid, this has worked very well, despite a little difficulty in getting certain applications to use this card and not the onboard audio. IMO, the ALSA configuration is still way too confusing.

Anyway, in Jaunty, once I got my music back, I found that playing resulted in a LOT of skipping during playback. I searched around and apparently the cause is misconfiguration in the packaging of some subsystem called PulseAudio. The following post has instructions which I followed for fixing that and it worked very well, though make sure to use the name of your audio card and not quote "Intel" verbatim as the author does.
Video Improvements

I'd had trouble previously, getting Intrepid to properly display in dual head mode on my workstation using an ATI video card. There was lots of editing of xorg.conf and messing around with drivers, general hassle, and swearing.

I'm happy to say that in Jaunty, It Just Works right out of the box. YAY! I'd previously relegated ATI to that mental pigeonhole of crappy manufacturers who fail to make decent drivers for Linux. Well, maybe they're still there, but at least it's not causing me any problems any more.

That said, I will point out that the upgrade process from Intrepid to Jaunty did not work on this machine, it killed the video display and the system in general to the point where I couldn't even get in on a console or SSH. A fresh install was required and that fixed it. Let's hear it for backups.


Hopefully by the time the LTS (Long Term Support) release on Jaunty arrives, they'll have these issues ironed out.

One thing I forgot to mention is that Jaunty feels more responsive than Intrepid. I'm not sure whether this is due to process scheduler improvements or running on fresh clean systems.

For a beta Jaunty is quite good, but obviously at the time of writing it isn't production ready yet.

Thursday, January 29, 2009

The Price of Not Doing Functional Programming

I need to find a description of lambda calculus that doesn't make my eyes bleed.

Right now it's not making much sense to me. This is the price I pay for never having learned functional programming. Well, I'm learning it now.

And yes, it does have a real world application. Or it might. This stuff is certainly very parallelizable, let's put it that way. I might end up writing some MapReduce jobs in Ruby if I can call Ruby code from within Hadoop, and a y combinator is 6 lines of almost comprehensible Ruby compared to 60 lines of Java jibberish (and I *like* Java).

That doesn't mean I really understand how the y combinator works yet, though. It just seems to bounce off my brain. Sleeping on it will probably work.

Oh well, at least I can say I've had 15 years of successful software development career and haven't run into anything I had trouble understanding until now. Really, this is one of those delicious moments when you notice that something you hadn't ever given much thought is a complete rabbit hole that could swallow your brain for as long as you let it.

Functional programming, for me right now, is in that realm of things I never took the time to understand - alongside quines, advanced statistics, and why anybody in their right mind would program in COBOL. And yesterday I discovered this thing called the Y Combinator, which for my procedural/OO tainted eyes, is probably one of the most baffling things I've ever seen. I looked at versions of it written in a few different languages - the Java one was horrific, the Lisp one I didn't get because I can't read Lisp yet, but the Ruby one almost makes sense. I think I can understand what it's doing but not how it's doing it. Running a debugger on it doesn't really help.

This requires an epiphany on the order of the one I had in the old Data Structures And File Organizations class from university days, when pointers were being explained to me for the first time. I remember that feeling of when I got it, and realized the implications of them - it changed the way I thought about everything all at once. It really was a transforming experience. I wonder if I'll have another when I finally get this stuff. I'm going to have to spend some time in the rabbit hole and see where the labyrinth leads.

Having slept on it, I can feel like my brain is at least starting to gnaw on the problem. I'm thinking about programs more as expressions to be evaluated than lists of instructions to be processed. Now to just think more about functions as first class objects. Or if like me you are infected with Patterns, let's start passing these Strategies around...

Things I found interesting along the way:

Tuesday, January 13, 2009


It's about time I started laying down in a readable place my thoughts on software, development, and Java. So here it is. Welcome to Entwinery.

Your host is Ben Hardy, a software engineer currently working in Los Angeles, California. I'm my home town is Sydney, Australia. I've been in the software business since 1994, started out writing multimedia, 3D, networking, and web applications in C++, and Java soon after. I've done LAMP development, Swing, J2EE, Perl, Linux device drivers and other stuff, but mostly oriented towards large scale (millions of users) web sites these days.

My first computer was a Commodore 64. Now there's a machine that'll install a love of code in anyone who uses it, since it's pretty useless if you don't code ;-)

Anyway, on with the code!