Final thoughts on GSoC

August 24, 2009 in en, google, gsoc, linux, open source, projects, software engineering

So this year’s Google Summer of Code is officially over. Today 19:00 UTC was deadline for sending in evaluations for both mentors and students. Therefore I think some kind of summary what was happening and what I was doing is in order.

I was working on implementing neat idea that would allow previously impossible things for Gentoo users. Original name for the idea was “Tree-wide collision checking and provided files database”. You can find it on Gentoo wiki still. I later named the project collagen (as in collision generator). Of course implemented system is quite a bit different from original wiki idea. Some things were added, some were removed. If you want to relive how I worked on my project, you can read my weekly reports at gentoo-soc mailing list (I will not repeat them here). Some information was aggregated also on soc.gentooexperimental.org. As final “pencils down” date approached I created final bugreports of features not present in delivered release (and bugs there were present for that matter). Neither missing features, nor present bugs are a real show-stopper, they mostly affect performance. And more importantly I plan to continue my work on this project and perhaps other Gentoo projects. I guess some research what those projects are is in order :-)

Before GSoC I kind of had an idea how open-source projects work since I’ve participated with some to a degree. However I underestimated a lot of things, and now I would do them differently. But that’s a good thing. I kind of like the idea that no project is a failed one as long as you learn something from it. It reminds me of recent Jeff Atwood’s post about Microsoft Bob and other disasters of software engineering. To quote him:

The only truly failed project is the one where you didn’t learn anything along the way.

I believe I have learned a lot. I believe that if I started collagen now, it would be much better in the end. And the best thing is that I can still do that. I get to continue my project and learn some more. If I learned anything during my work on collagen it’s this:

If you develop something in language without strong type checking CREATE THE DAMN UNIT TESTS! It will make you life later on much easier.

In next episode: Why I think Gmail is corrupting minds of people and why I hate mobile phones

We need CAPTHHA

October 11, 2008 in en, privacy, rant, security, software engineering

I am pretty sure everyone has seen CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) before. Maybe you didn’t know the (full) name but you have encountered it when registering accounts, posting comments or accessing some parts of web. You know, those annoying things that are exercising your ability to recognize distorted words with weird backgrounds.

CAPTCHAs are used to protect against automated attacks. For example automatic registration of new users on Gmail would create great opportunities for spammers. CAPTCHAs are mostly working, even when they get “hacked” from time to time. The biggest problem? They are reaching levels where even humans are having problems reading the letters. I still have nightmares when I remember CAPTCHAs used on RapidShare. Telling cats from dogs was not that easy for me somehow. I am not sure about “hackability” of reCAPTCHA, but as far as usability goes, it’s one of the best ones for me. Too bad only a few sites are using it.

The main problem of CAPTCHAs is not the complexity but relay attacks and human solvers from 3rd world countries paid for solving thousands of CAPTCHAs a day. What we really need is CAPTHHA (Completely Automated Public Test to tell Humans and Humans Apart). Computer science is far from being able to tell humans with “clean” intentions from those being paid to get past the defences. One solution would be to issue certificates of “humanity” signed by central authority. You could then ban users that were misusing their certificates. There are of course privacy and security problems with this approach, not to mention financial “issues”, so I guess this is not how it’s gonna work.  Other approaches have also been tried, but they usually have problems with disabled people. I am certainly interested how Computer Science solves this problem.

Stackoverflow launched

September 16, 2008 in en, howto, software engineering

If someone actually read my previous posts (heh), (s)he may have noticed that I quite often link to www.codinghorror.com. It is Jeff Atwood’s blog, and I usually find it very exciting to read. His style of writing and ability to convey complex messages in a simple way is my holy grail. And if he is not able to do it himself, he links to other authors A LOT. Instead of repeating same thing that has been said over and over again he just links to proper post made by some other fellow programmer/software engineer. Avoiding duplicity is really one of basic goals of programming. Instead of repeating same code 20 times, just write a function and call it 20 times.

But that is just low-level stuff. Jeff, Joel Spolsky and few others, embarked on an adventure to get rid of duplicity in minds of programmers. Programming is so inherently complex that no one really knows solution to every problem. And don’t get me started on optimal solutions. What did you do when you found solution to some programming challenge, or some tricky workaround for problem that was bugging you for weeks? If you have a blog, you could go and post your solution there. Maybe someone would notice. Maybe not. So what did Jeff & Co. do? They created and launched ww.stackowerflow.com. Quoting from about page:

Stack Overflow is a programming Q & A site that’s free.

As is often the case, powerful ideas come in simple packages :-). It is that simple. You ask, others answer. Then you vote and best answer wins. It’s kind of expertexchange.com, just without the paying part, and with better user participation. Users who have proven themselves to be worthy can earn karma points and thus become more-less moderators. Try it out, and you will see what I mean. Begin with reading their FAQ.

Google Chrome mass betatesting

September 16, 2008 in en, google, rant, security, software, software engineering

Google released its own Web browser called Chrome few weeks ago and whole web was buzzing with excitement since then. They did it Google style. Everything is neat, clean and simple. And quite a few features are also unique. Google engineers obviously put a lot of thought into scratching their itches with web applications. Javascript engine is fast and whole browser is created around the idea that web is just a place for applications. One of the most touted things about Chrome were its security features. You can read whole account of basic Chrome features on its project page.

In Chrome each tab runs as a separate process communicating with main window through standard IPC. This means that if there is fatal error in handling of some page (malicious or otherwise), other tabs should be unaffected and your half-written witty response to that jerk on the forum will not be lost. Chrome also has other security enhancements, that should make it more secure. I said should. Within few days of Chrome release several security vulnerabilities surfaced, ranging from simply annoying DOS to plain dangerous remote code execution.

What caught my attention was bug that enabled downloading files to user’s desktop without user confirmation. It was caused by Googlers using older version of Webkit open source rendering engine in Chrome. Integrating “foreign” software with your application can be tricky, especially if you have to ensure that everything will be working smoothly after the upgrade. In that respect, it is sometimes OK to use older versions of libraries. As long as you fix at least security bugs. People write buggy software. Google engineers included. I am just surprised that they don’t have any process that would prevent distribution of software with known security vulnerabilities to the public.

And that is the main problem. Chrome is beta software. Because of this, bugs are to be expected. But Google went public with Chrome in the worst possible way. They included link to Chrome download page on their home page, making hundreds of thousands of people their beta testers. People who have no idea what “beta testing” actually means. They just know that Google has some cool new stuff. So let’s try it right? Wrong. Most of us expect our browser to be safe for e-banking, porn and kids (not necessarily in that order). Unfortunately Chrome is not that kind of browser. Yet. I am pretty sure it is gonna be great browser in the future though. But right now Google should put big red sign saying “DANGEROUS” in the middle of Chrome download page.

Until Chrome becomes polished enough for Google to stop calling it “beta“, it has no place on desktops of common computer users. Even oh-so-evil Microsoft doesn’t show download link for IE8 beta on their main page to promote it. Mentioned issues aside, Chrome really sports few good ideas that other browsers could use as well. Try it out, and you will like it. Then go back to your old browser for the time being.

Keepin’ it short

August 29, 2008 in en, rant, software engineering

I am wondering how should I write this blog. For example previous post about false sense of security was supposed to be a longer rant originally. In the end I shortened it considerably and focused on Perspectives. Why?

Well the main reason is, that I realized that there are a lot of essays/rants about perception of security. Maybe I could write a really good essay if I took some time to think it through. But if you are interested in privacy or security then you already know all these things. And if you are just another ordinary computer user who does not care if his email account gets cracked, my blog will not change that.

Actually Jeff Atwood wrote some time ago about unreachable type of software engineers (or for that matter any professionals). Paragraph that completely describes my feelings is this one:

The problem isn’t the other 80%. The problem is that we’re stuck inside our own insular little 20% world, and we forget that there’s a very large group of programmers we have almost no influence over. Very little we do will make any difference outside our relatively small group. The problem, as I obviously failed to make clear in the post, is figuring out how to reach the unreachable. That’s how you make lasting and permanent changes in the craft of software development. Not by catering to the elite– these people take care of themselves– but by reaching out to the majority of everyday programmers.

Writing for the masses is not easy, and I don’t think I’m up to it. Yet. It makes me angry that not everyone loves his job or profession, but I cannot change that. So I will keep writing about my passions the way I see fit and hopefully one day, I will become good enough software engineer and writer in one person, that I will be able to influence the 80%.

Note: If you are wondering what’s up with 80% – 20% thing, then I recommend article by Ben Collins-Sussman about two types of programmers

Developer isolation

August 23, 2008 in en, rant, software engineering

I recently stumbled upon blog post about TraceMonkey (thanks to Sisken). TraceMonkey is codename for new improvments to SpiderMonkey (Firefox Javascript engine). Results are very impressive, with speedups ranging from 2x to more than 20x. I love Firefox and I’m looking forward to every new version bringing more exciting features. But what struck me most in the post was this statement:

I fully expect to see more, massive, projects being written in JavaScript. Projects that expect the performance gains that we’re starting to see. Applications that are number-heavy (like image manipulation) or object-heavy (like relational object structures).

Now don’t get me wrong. I get excited about new features just as much as every other geek :). I see a problem here though. Firefox is biting more of market share pie every month. But however we put it, it’s still at most at 30% in some parts of Europe (US is dominated even more by IE). So how can Firefox create incentive for developers to create web applications for ONE specific browser? Sure, few years from now Javascript performance will be much better in other browsers too. What until then? You think that “Sorry, this site was designed for Firefox 3.1 or higher” is any better then “Sorry, this site was designed for Internet Explorer 5.0 or higher”?

You may ask “What about in-house applications, for one company?”. In-house applications are already dominated by IE and ActiveX. That’s not gonna change overnight. Or maybe I’m wrong.

GDevs (Geeky Developers) are rightly proud of their creations. The problem is when they fail to see the surrounding world. Now almost famous blog post from Ben Collins-Sussman about two types of programmers contains this pearl:

Shocking statement #1: Most of the software industry is made up of 80% programmers. Yes, most of the world is small Windows development shops, or small firms hiring internal programmers. Most companies have a few 20% folks, and they’re usually the ones lobbying against pointy-haired bosses to change policies, or upgrade tools, or to use a sane version-control system.

Shocking statement #2: Most alpha-geeks forget about shocking statement #1. People who work on open source software, participate in passionate cryptography arguments on Slashdot, and download the latest GIT releases are extremely likely to lose sight of the fact that “the 80%” exists at all. They get all excited about the latest Linux distro or AJAX toolkit or distributed SCM system, spend all weekend on it, blog about it… and then are confounded about why they can’t get their office to start using it.

Fortunately for OpenSource community, people like John Resig, Andreas Gal, Mike Shaver, and Brendan Eich are in the 20% crowd. Let’s just hope they won’t lose sight of the rest of us :)

How to change author of git commit?

August 19, 2008 in en, howto, software engineering


I recently needed to rewrite history of commits in a private
Git repository, because I wanted to change my email address in every commit. Note that you should not try following tip in a repository that anyone has pulled from. Normally Git doesn’t allow you to do this kind of thing, since changing authorship is…well bad (usually also against the law).

Let’s assume that email address changed from dev@comp1.com to dev@comp2.com. To create copy of repository $PROJECT_DIR to new repository $NEW_REPO with changed emails we can do following:

$ cd $PROJECT_DIR # change to project repository

$ git fast-export --all > /tmp/project.git # export repository to temporary file

$ sed 's/^author\(.*\)/author\1/' /tmp/project.git # replace emails on every line starting with 'author'

$ cd $NEW_REPO # change to empty directory

$ git init # initialize git

$ git fast-import

Third step is potentially dangerous, because you have to make sure that you don’t edit contents of any file aside from metadata. If you change content of files, git fast-import will complain because sha1 hash will not be correct.

Be sure to read git fast-import and git fast-export man pages for additional information. It took me a while playing with git-rebase and similar stuff to realize that they do not offer such feature, so if this tip helps anyone else I’ll be glad.