Canonical URLs
Canonical URLs are all the rage on the Internet nowadays. All one has to do is do a Twitter search for canonical to see the conversations going on.
The problem is this, micro-blogging services such as Twitter only allow a certain number of characters, so a URL like this http://eddorre.com/posts/buildin-the-blog-part-5-refactoring-part-2, is too long. This is where URL shorteners like tinyurl.com come in.
They take a long, but perfectly fine URL, and make it something like this: http://tinyurl.com/Hukjfd (this is an example – not a real link). This is much more useful on Twitter where character space is at a premium. However this has its own problems, one is trust. How can you trust tinyurl.com to actually be delivering you to a URL that you would even want to visit? Short answer, you don’t. Also, what happens if tinyurl.com goes away one day? Any search on Twitter that uses these links immediately becomes less valuable. It’s the opinion of some on the Internet, that a site should take care of its own short urls.
For example, the link http://eddorre.com/posts/buildin-the-blog-part-5-refactoring-part-2, could become http://eddorre.com/s/SQ2BAs. Notice that the domain remains the same and short url is under my control. If the link goes away it’s MY fault not someone else’s. I have a vested interest in making sure that the link doesn’t go away. It doesn’t solve the trust issue, but at least you know that it’s a link coming from eddorre.com instead of anywhere on the Internet.
Enter the debate about canonical URLs and rev=canonical. The theory behind rev=canonical is simple. When I include a link like this: http://eddorre.com/posts/buildin-the-blog-part-5-refactoring-part-2 in Twitter or a similar website, they will then load up the original link and look inside the HTML for markup that looks like this:
<link rev="canonical" href="http://eddorre.com/s/SQ2BAs" />
When Twitter or a similar service finds this tag, they should then replace the long link with the short one. It makes sense but this does have one drawback that will most likely kill its adoption. It’s expensive. Twitter, for every link that is posted, has to connect to the URL, parse the HTML and then return a new link if it supports canonical URLs.
With that in mind, there is a proposed new HTTP header for canonical URLs that a web server could return to a service like Twitter. Something like “X-Rev-Canonical: href=”http://eddorre.com/s/SQ2BAs". This way, Twitter or other services just have to do a GET request to the original URL and it would just return them the short URL. No downloading and parsing the HTML. Whether either of these methods becomes widely adopted remains to be seen.
This past Saturday, after reading comments in Twitter and blogs about short urls, I decided to try coding up my own system for my blog. Turns out it only took me about 15 minutes to code something in Ruby/Rails.
Here is how I did it.
- I created a migration to my posts table to include a short_url column
- Created a before_create callback method in my post model to create a short_url.
- The creation of the short_url is done by randomly mixing 6 characters from the characters available in Base 62 encoding (meaning: 0-9, A-Z and a-z).
- To create short_urls for all of my older blog posts, I just created a migration that looped through all of the blog posts and generated this random token string for all of those and saved them.
I create a named route similar to the following:
map.short_url "/s/:short_url", :controller => 'posts', :action => 'show'
Next, in my controller I did something similar to this:
if params[:short_url]
@post = Post.find_by_short_url(params[:short_url])
redirect_to post_path(@post), :status => :moved_permanently and return
else
@post = Post.find_by_permalink(params[:id])
end
There you have it. Something quick and dirty in about 15 minutes. Now if I want to post links to my blog on Twitter, I can use my own short url instead of using a service like tinyurl.
I should note that this probably isn’t the best way of doing things and it’s most certainly not the only way, but for a quick hack project on a Saturday afternoon, it fit the bill.
I forgot to mention that the inspiration for this post was Duncan Davidson’s post Everybody Wants Short Links.
2 comments