Global Announcement, Haskell

Setdown: the best tool for fast and repeatable line based set operations

Introducing setdown

Have you ever been on the command line and tried to perform set operations? Have you ever followed crazy cli guides on the internet that suggest complicated commands to try and perform set operations on files. I have. And I did not like it; I think that we can do better.

Over the weekend I wrote a pretty nifty program that I am calling: Setdown. Setdown requires you to specify the set operations that you wish to perform as a definitions in a set definitions file (often suffixed with ‘.setdown’). The setdown language definitions are written in a very similar format to Makefiles; except that it performs set operations.

If you want to install setdown right now or checkout the code then you can follow these links:

If you want to learn how to use setdown and write set operations in it’s language then you should read the README file provided in the source code. However, to show you how easy the language is to read you I have provided an example .setdown file right here:

-- All of the letters of the alphabet
alphabet: "alphabet.txt.unsorted"

-- Calculating the consonants with a set difference
consonants: alphabet - "vowels.txt.unsorted"

-- Getting any letter than is e-sounding or a vowel
e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

-- Get any letter that is e-sounding and a vowel
e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

-- Get all of the e-sounding letters, the vowels and the consonants
e-or-vowels-or-consonants: ("e-letters.txt.unsorted" / "vowels.txt.unsorted") / consonants

You should install setdown and then check out the setdown-examples project to give it a try right now!

Can you show me an example?

By this point in time you are probably wondering “I love the look of it but show me an example”. So I will. Here is the output of a full running example by checking out the first example in the setdown-examples repository and running it:

$ setdown ex1.setdown 
==> Creating the environment...
Base Directory: ./
Output Directory: ./output

==> Parsed original definitions...
e-or-vowels-or-consonants: ("e-letters.txt.unsorted" / "vowels.txt.unsorted") / consonants

e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

consonants: alphabet - "vowels.txt.unsorted"

alphabet: "alphabet.txt.unsorted"

==> Verification (Ensuring correctness in the set definitions file)
OK: No duplicate definitions found.
OK: No unknown identifiers found.
OK: All files in the definitions could be found.

==> Simplifying and eliminating duplicates from set definitions...DONE:
alphabet: "alphabet.txt.unsorted"

consonants: alphabet - "vowels.txt.unsorted"

e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels-or-consonants: e-or-vowels / consonants

==> Checking for cycles in the simplified definitions...DONE:
OK: No cycles were found in the definitions.

==> Copying and Sorting all input files from the definitions...
"alphabet.txt.unsorted" (unsorted) => "./output/alphabet.txt.unsorted.1.split.sorted" (sorted)
"e-letters.txt.unsorted" (unsorted) => "./output/e-letters.txt.unsorted.1.split.sorted" (sorted)
"vowels.txt.unsorted" (unsorted) => "./output/vowels.txt.unsorted.1.split.sorted" (sorted)

==> Computing set operations between the files...
Required results:
alphabet: ./output/alphabet.txt.unsorted.1.split.sorted

consonants: ./output/c989d1cf-b860-41cc-a52c-e2afc1e6a235

e-and-vowel: ./output/a8bd5974-22d5-4fdb-b269-0c09a1eeeb18

e-or-vowels: ./output/c3a8cc7c-f246-4eb4-b321-57f900964960

e-or-vowels-or-consonants: ./output/493ca813-7e3c-4259-9435-e2d5ddb4d6a5
$

As you can see we have ended up with a number of output files. Just to pick one example lets see the contents of the consonants file:

$ cat ./output/c989d1cf-b860-41cc-a52c-e2afc1e6a235
b
c
d
f
g
h
j
k
l
m
n
p
q
r
s
t
v
w
x
y
z
$

And look at that, we have computed the consonants when we were given the vowels and the rest of the alphabet. Hopefully you can see that this is very powerful and will let you write increasingly more correct set operations from the command line.

The benefits of setdown

Depending on your command line bent you may have used other tools in the past to perform set operations on files, like comm or fgrep, but these tools are quite lacking. Instead let me show you the full range of features that setdown gives you:

  • Maintainability
    If you get more set data to add to your collection (as often happens) then it is trivial to edit the setdown definitions to include it.
  • Repeatability
    Even if the data changes you run one single command and all of your set operations are performed again.
  • Sorted input is not required!
    Programs like comm require that you have sorted input if you want to do efficient set operations on files. This make sense because sorted files make set operations very efficient. However, we don’t put the onus on you to provide us with sorted input. Setdown will sort any files that you give it itself. We even use External Sort so that you can give us truly massive files and expect that we will still be able to perform your set operations.
  • Simplification of definitions
    If you write the same definition twice then setdown will factor that out and only perform the set operation once. This makes setdown run as efficiently as possible:

    ==> Parsed original definitions...
    C: "b-1.out" - ("a-1.out" / "a-2.out")
    B: "a-1.out" / "a-2.out"
    A: ("a-1.out" / "a-2.out") / "b-1.out"
    
    ==> Verification (Ensuring correctness in the set definitions file)
    OK: No duplicate definitions found.
    OK: No unknown identifiers found.
    OK: All files in the definitions could be found.
    
    ==> Simplifying and eliminating duplicates from set definitions...DONE:
    A: "b-1.out" / B
    B: "a-1.out" / "a-2.out"
    C: "b-1.out" - B
  • Dependencies and cyclic dependency detection
    Since you can write set definitions that depend on other set definitions it is possible to write a cyclic dependency. We will spot this for you and also tell you exactly where the cycle is in your file, meaning that you don’t have to search for it yourself!

    ==> Simplifying and eliminating duplicates from set definitions...DONE:
    A: C
    B: D
    C: B
    D: A
    
    ==> Checking for cycles in the simplified definitions...DONE:
    [Error 20] found cyclic dependencies in the definitions!
    We found the following cycles:
       A -> C -> B -> D -> A
  • Validation
    We verify that your set description only references files that exist and that if you reference dependencies that do not exist then you will get an error.
  • Works nicely with version control
    You check your .setdown file and your input files into the repository and share them with your co-workers. Everybody can use setdown to get the same results! To prove it, I have written three examples in a setdown-examples repository. Check it out and give it a try!
  • Written in Haskell
    This makes the program very fast and efficient while, in my opinion, reducing the chances of having bugs.

I think that this is a much more compelling set operations tool for the command line than anything else that exists out there and I am really happy to share it with you today for free. I also really hope that you get some great usage out of this tool and that it makes your life easier.

Concluding words

From experience I can say, without this tool, dealing with complicated set operations on the command line and sharing your results with your co-workers is much more difficult than it should be.

At any rate I hope that you get a great deal of value from this tool and if you have any comments or suggestions then please ask them here on this blog or raise them as issues. If you have any questions then ask them here or on Stack Overflow.

Thanks for reading and I hope this makes somebodies like a little bit easier.

Global Announcement, Interesting

A Tribute to CSE and Uni

Breif History

Back in my first year of University I was enrolled in a course called COMP1917. It was an excellent course taught by Richard Buckland and I enjoyed it very much. It was also the exact course in which they taped the famous YouTube video lectures; I look back on those lectures with pride. The purpose of this course was to introduce students to computer science and teach them about the fundamentals that they simply must know. To that end we were taught C as our primary language (a common choice) but we were also given an emulator for an imaginary microprocessor called a “4917 chip” which I will call a “4917” for short. This blog post is going to focus on that chip.

The imaginary 4917 was a tiny device. It had sixteen nibble sized memory locations and sixteen different instructions. All instructions were either one or two nibbles in size. I hope you notice that this entire machine worked in nibble sized chunks. It also had only two working registers called  R0 and R1; it also technically had a program counter that you manipulate with instructions 13-15. The instruction set of this device is here:

Format:
Instruction in decimal => What the instruction does

One Byte Instructions:

0 => Halt Program
1 => R0 = R0 + R1
2 => R0 = R0 – R1
3 => R0++
4 => R1++
5 => R0- –
6 => R1- –
7 => Ring Bell (Nothing really [Side Effect])

Two Byte Instructions:
8 x => Print x to the output device; maybe a screen but it did not matter because this device is imaginary.
9 x => Load value at location x into R0
10 x => Load value at location x into R1
11 x => Store value in R0 into location x
12 x => Store value in R1 into location x
13 x => Goto x
14 x => If R0 == 0 Then Goto x
15 x => If R0 != 0 Then Goto x

It is actually quite a nice little instruction set. You can even write a quine in this 4917 machine.

The Problem

But what am I even getting at I hear you ask? Well it goes like this, Richard Buckland posed a question to us that said:

I have a bonus question for you. For every different length program, I want you to create/write a program that will print out the most characters that you possibly can to the screen and still terminate. That is you cannot write a program like “8 0 13 0” which will print ‘0’ forever. That program is incorrect because it will never terminate.

And with that the battle was on, if you wrote a program in 4917 that printed out the most characters for that program length, that anybody had found yet, then you gained one ‘brownie point’. Brownie points are tallied up at the end of the session and the person with the most wins. (Spoiler Alert: I had the most brownie points at the end of that session; by far.) You should also know that the length of a program is calculated as 16 – NUM_TRAILING_ZEROES. That means that you must count all of the memory spaces that you actually needed to write to.

Those of you that are clever would have realised that Richard was teaching the class a valuable lesson; and also, quietly, having a laugh at our expense – but I mean that only in the best way, right now I am laughing right there with him. The problem with this problem, as it were, is that you cannot write a fast program to find the best program for you. Any algorithm that attempted to do so would need to ensure that  it eventually terminated; and to do that you need to solve the Halting Problem. Haha Richard, very funny, try and get the first years to solve an impossible problem so that they learn the value of computing theory; actually it was quite a good move in retrospect. It teaches a good respect of theory.

Hovewer, we are also Engineers, and as with most problems this one can be partially solved using brute force methods. You can write your own emulator, count the number of characters that get printed to the screen, and include a timeout for programs that go for too long and are thus deemed “probably infinite”. And that is what this post is really about; me writing a brute force solver to this problem on a PicMicro device for sentimental value.

Past Results and My New Sentimental Solution

At the time of the course I wrote a brute force solver, using some code (an emulator) written by Brett Phillips, and ran it on my laptop (RIP Dell XPS M1710) and on increasingly large program sizes to find the best finite program. These were my results:

program_size: program bytecode => number of printed digits
4: 3 8 10 15 => 32
5: 4 1 8 10 15 => 62
6: 4 8 8 13 2 15 => 93
7: 1 8 10 14 4 1 15 => 172
8: 1 8 10 14 4 2 4 15 => 482
9: 1 3 8 10 15 1 4 2 15 => 512
10: 1 8 8 15 6 8 13 2 5 15 => 1173
11: 1 3 8 10 8 10 15 1 4 2 15 => 1024

And that earned me 8 brownie points in one hit. I was never able to get a solution for the higher values because it would have required me to leave my laptop on for around a week or two for a program of size 12 and much more for the others; class and the exams would have passed before I had an answer.

However, in recent days, for nostalgic purposes, I have engaged the problem again and I am pleased to say that I have written a number of implementations but the ‘best’ one of all would have to the the Microchip PIC16F688 version which I will display for you now via youtube:

This is awesome and makes me very happy. You can find the code for this device on Github. It should be pretty easy to recreate and pretty cheap to buy the parts for.

Conclusion

I enjoyed my time at University very much and I was very happy to have been there. I loved every minute of it and met amazing people and this little tribute is my offering to my time there. To have something to remember it by. A big thanks to Richard for the awesome course and 4917 problem, and also to all of my peers that made it an awesome time. I hope you enjoy this little project and my story.

Weekly Summary

Weekly Summary Six (Uni Work)

This week has met with alot more work to be getting on with; so much that it is a real balancing act to stay ontop of it all. However, I have learned a large amount in the process:

DOE (Digest of Events)

  • Started week one of University – It was a really full week but it looks like I am in for an excellent session; every subject contains interesting content
  • Read up on Lambda Calculus – I can now perform all possible reductions quite easily.
  • Read the first two and a bit chapters of the Modern Operating Systems textbook.
  • Dealt with some issues in the University working group that I head.

Dvorak Update

My typing speed with Dvorak has increased but I am not yet sure if it is fast enough to enable me to perform well during this session. I can type at 3.5 characters per second with Qwerty and average about 1.35 with Dvorak. It may not seem to bad but I am literally more than two times slower. This will not be good enough when I have to start seriously writing code at a prodigious pace. However, I have to take into account that one is touch typing while the other is not and I can actually see the keys. It undoubtedly creates a speed increase that I can not account for unless I learn how to touch type in Qwerty; and I can safely say now that I do not want to invest any effort into that. Therefore the true test will be when the Keyboard stickers come and I can see the keys that I am typing on. I cannot wait.

Overview

Essentially, I did not manage to get much done this week that was not university related, everything else took a back seat. However, alot was accomplished, and it is only a matter of time before I discover exactly how much time I can afford to spend on extra projects. When I do, and it will be soon, then you can expect more from me in the way of news. For the meantime I will have to be patient and simply perform the work that is most pressing. At any rate it is all exciting and I look forward to it all.

Database, Fuppes, Weekly Summary

Before Uni Begins (Weekly Summary Five)

DOE (Digest of Events)

This week in the news:

  • The Fuppes Packaging effort is moving along slowly; nobody has chosen to sponsor it yet.
  • I have written a html Logic Tree drawing program and it works on all HTML versions.

Uni Comes Back

University starts up again this week and I am really looking forward to it; after all, I will be doing AI, OS and Haskell. I have actually done quite a bit of study already for these subjects. Unfortunately, it means that I will not be able to spend as much time as I usually do on extra projects. It does not mean that I will stop working on them but that I will be moving slowly instead.

Dvorak

They say that “Prevention is better than Cure” and I do worry occasionally that all of this extended work on the Computer may eventually hurt my hands. To prevent that scenario I am now learning to use Programmer Dvorak and learning how to Touch Type at the same time. It is proving to be rewarding already and, though I type really slowly, I can already type without touching the keyboard. And I have noticed that my speed is regularily increasing. I was so happy with the improvements that I went online and bought Programmer Dvorak Keyboard Stickers and when they arrive in the mail I fully intend on applying it immediately. (You may have noticed that this post is not as long as usual; this is why)

Fuppes Config

Uli has spoken with me and he does not wish to add another dependency to Fuppes by using SOCI. Fair enough. As a result he is currently doing some database work and I am going to work on getting the Fuppes config separated into multiple files. That way for each different device you use a different file; so sharing devices will become much easier.

Weekly Summary

Busy Week (Weekly Summary 4)

This week was another very busy one. There was alot to do, and many people were making requests, so I was forced to try and distribute my time. Unfortunately this meant little work on Fuppes, which I intend on changing in the coming weeks, and pumping it right back up again.

Digest of Events (DOE)

  • I started using Google Chrome on Linux instead and I don’t think I’ll be going back to Firefox any time soon. Chrome is just faster and did you know that it actually compiles javascript so that it can gain faster execution speeds? And because I now use Google Chrome I looked into Vimium for all those Vim shortcuts that we know and love.
  • I was required to setup and eCommerce web application by a family member that really needed a solid one; so I decided to use Spree which turned out to be a joy to use. I’ll discuss this further down the page.
  • I talked to Ulrich and he had very exciting news that I am going to keep secret until he sees fit to talk about it; I only mention it because it got very excited hearing about it.
  • I did small amounts of Fuppes development and tried to keep up with the requests on various forums and locations.
  • Did some work for my local society; University begins in a week for me and they need all the help that I can give them. Lots of Django work there; actually on that note I don’t like Django. I much prefer Rails.
  • I have read about 150 pages of my “Modern Operating Systems” textbook and 210 pages of “Real World Haskell”. I even sent emails to the authors saying thankyou, asking questions and letting them know about some bugs. This is all in preparation for University; I’m trying to make the session easier on me so that I can distribute my time evenly over everything.
  • Two spammers left comments this week. So I am just going to let every spammer know that if they try and use me for their spam then I will block them with the tools at my disposal; and more; Please do not use this blog for your spam; for both of our sakes. Spammers, you have been warned.

All in all alot went on this week and there is always so much more to be done. I think I’ll be balancing and prioritising for a little while longer yet.

Extended Summaries

Spree and eCommerce (and Rails and Django)

A family member wanted me to consider creating them an eCommerce engine so that they could sell products online. Personally, I don’t have mush time to create a fully fledged eCommerce engine, so the first thing I did was look for and eCommerce engine and Spree was the best one I found (and It was just built on top of Ruby on Rails). After a few minutes I had spree running on my local machine but I really needed to provide static pages to the site. To that extent I decided to use the spree-static-content extension and it was almost perfect. After a quick patch it was ready to go and working beautifully. I’m actually really happy with the result. It all looks rather nice and I would recommend Spree to everyone and anyone. Actually, if this proves to be a fruitful affair, then I think that I will quickly try and get a few clients and make them sites using Spree.

While doing this nice, easy and beautiful programming with Ruby and Rails I was also forced to maintain the Django made website for CSEsoc. Django is painful and the part that I don’t like the most is the lack of a db:migrations framework. But other than that, doing things in Django just seems that much more difficult. They are technically equivalent; they are both web frameworks. But I know that I spend much less time using Rails than I do using Django.

Future Planning

The next week is going to be all about planning for University and getting as much done as I possibly can before I no longer have time to sleep. It will involve fore work on Fuppes, quickly getting that Spree website finalised, helping CSEsoc and finishing the reading of my textbooks. I have heaps to do but, despite everything, I’m having alot of fun doing it.

Fuppes, Packaging, Weekly Summary

Packaging Gets Complicated (Weekly Summary Three)

Happy Valentines Day readers. I’ve had quite a little trip over the past few days. I have done alot this week in my attempt to package Fuppes for Debian and it makes for a long sort of story; so long, that you should consider this the weekly summary. I had finally taken Fuppes to the stage where I believe that it was ready to package; it compiled just fine on Ubuntu and, believing that Debian and Ubuntu were not that different, I wanted to make and submit a package for Debian. But first that would require a Debian install on my machine.

Dual Booting Ubuntu and Debain

The logical option was to dual boot Debian and Ubuntu and make them share the same /home partition. For those of you who have not heard of this before, many GNU/Linux users that I know prefer to always partition their machines so that one partition is for the root directory and one for the home directory. This has important advantages for data retention:

  • Your OS is separated from your personal data
    The operating system exists in the root directory and all of you data (should) exist in the /home directory; therefore, if you wanted to, you could completely reinstall your OS and keep all of your data. This has saved me quite a few times when I have damaged my OS beyond repair and done a complete reinstall; a few moments after the reinstall I’m exactly where I left off. No loss of data at all.
  • Better for Hard Drive Fails
    I have been led to believe that partitioned hard drives are easier to extract data from in the case of faults. Should you be crazy enough not to make backups, then you may have a greater chance of success of recovering your drive with this setup.

What this all means is that when I wanted to dual boot two operating systems that point to the same data all I had to do was add an extra partition for the Debian OS and tell it to mount the /home directory as its own. Therefore I have the following partitions on my machine:

  • /dev/sda6 – Ubuntu /
  • /dev/sda7 – /home
  • /dev/sda8 – Debian /
  • swap – Common swap directory

However, getting there was not that easy, and I encountered two problems when trying to install Debian lenny via a net install I met with two problems. The first was one that I could not explain; when I tried to install Gnome as my window manager it would download all of the required packages and then freeze for no apparent reason. I had no idea why that happened and I still don’t but I can tell you that using Xfce as my window manager instead solved the problem.

The next issue that I encountered was more serious, my Debian net install CD had no knowledge of the existence of /ext4 file systems; which was the type of file system that I had installed the Ubuntu root on. So when the Debian install went looking for other OS installs it could find nothing. It said that it found nothing else on the machine and asked me if I would like to overwrite the MBR. I didn’t really think straight and clicked OK; byebye Ubuntu Boot record. So Debian installed correctly but now my Ubuntu partition was unreachable; good one Robert, I felt pretty dumb. But, luckily, it was fairly easy to just reinstall Ubuntu on that partition again. Ubuntu recognized the /ext3 partition that Debian was installed on and now I have a Dual Boot-able system with Debian and Ubuntu. Finally I could try and compile Fuppes on Debian.

Compiling Fuppes on Debian

When I tried to compile fuppes on Debian it proved to be more difficult than I thought. It seems that the real difference between Debian and Ubuntu is the strictness of which packages you can install and what they provide. Debian is very strict about the whole FOSS concept and does not let you use anything that is not. As a result many of the meta-packages for fuppes did not build. My short term solution was to simply comment them out and try and get what remained into Debian. Once it was in Debian then it would be my future task to add support for all of the extras. As a result I am pleased to announce the following:

FUPPES has finally gone up for Debian RFS (Request For Sponsor)!

The package itself has been placed on mentors.debian.net for sponsors to download it and check it out. As of the time of this writing three prospective sponsors have checked it out downloaded it. I hope to hear from them soon with any comments that they may have. I expect that there will be a number of things that I got wrong, being my first package, but I will put in whatever effort is required to actually get this package in there.

Other Notes of Interest

  • It turns out that it is not that hard to make a package archive mirror for Ubuntu; I’ll be making one later this week.
  • The Ubuntu Vimperator package is broken in Ubuntu but not Debian. So for a dual booter like me I cannot currently have Vimperator in both Ubuntu  an Debain. This should be fixed in Ubuntu Lucid.
  • Bioshock 2 is an awesome game! I finished it in four days. The storyline was a major disappointment but the gameplay was beyond compare. All in all good work 2K.
  • Telstra Bigpond frustrates me. I am only allowed 12GB per month and there are no other competitors that I can goto to get a better deal; somehow they have monopolised my area and I am stuck with their ripoff prices.
  • The next game I might look at playing is Trine; suggested to me by a friend.

I hope you’ve had a great week. Thanks for reading this long post and I will keep everyone updated.

Weekly Summary

Weekly Summary Two

This is a small weekly summary mainly because I focussed on only one thing; changing the database code.

Digest of Events (DOE)

  • Essentially I spent the week converting Fuppes to use different database code and I still have more left to go.
  • I had a little moment while doing some Haskell programming; it was a good moment. I have decided that I really like Haskell.

New News

HISHE is BACK!

I have no real new news so instead how about some humour to distract you? One of my favourite sources of humour, they sit there and make fun of movies. HISHE stands for How It Should Have Ended and they remake movies in cartoon form of how movies should have ended. They are back in action and rolling a new video out every month. Go take a look at them do their thing and enjoy.

Future Planning

  • Still working on the database items and getting it all working together.
  • Transcoding seems to be failing in all sorts of different ways so I will need to look into it soon.
  • Packaging of Fuppes for Debian is still going, soon I will just bypass waiting on people and attempt to get it into debian. I’ll send the RFS; it’ll be an exciting day.