Is there a Twitter aux. service that takes any search term like “node.js” and creates list of hottest & recent links tweeted for that term?
This tweet is where the idea for Firetweets originates. 20 minutes and a dinner later the first version was online. While the look of the site hasn’t changed much, the code that powers went through several iterations. I thought it’d be of some interest, not to share the actual code, but rather the thinking behind it. It’s no rocket-science, identifying shortcomings and being able to solve them, is a favorite activity of mine :)
The first iteration was extremely simple. It consisted of making a call to the Twitter API asking for all tweets containing a link and our query (node.js).
For each result:
Link object, set its id to the URL found, set count at 1 and saveEven though it worked as expected, this is a rather naive approach. Here are a few reasons why:
Learning from the shortcomings of the first iteration, some things needed to change:
First thought was to use the API the URL shortener services provide. Even though a handful of them dominate the market, we’re dealing with a much larger number of services, from bit.ly to ff.im, tinyurl…, and not all of them have an API, and for those who do, they, unfortunately, didn’t think it’d be a good idea to use a standard format. Atom or RSS anyone ?
The good news is, that despite the lack of a standard json/xml stream, those services aren’t really disparate, they all do the same thing: provide a short link that will redirect to the final URL. And this, is perfect for our use case: let’s just follow the URL, which will ultimately redirect us to the page we’re looking for.
Since our app is running on top of Google App Engine, we can use the urlfetch module that is bundled with the SDK. The result of fetching an URL with urlfetch has a final_url property, which is exactly what we’re looking for. If the value of the final_url property is the actual URL whose request returned this response. Sweet! While we’re at it, let’s grab the content of the page, and with some very basic regexp extract its title. We’re almost there, we just need to clean the URL by removing tracking and other parameters and we’ll have our unique identifier.
To keep things simple, when fetching the data from the Twitter API, we’re only asking for the latest 50 results that contain both a link and our query. Even though we’re keeping track of the last fetched tweet_id, for further calls, in the worst case scenario, we will have 50 short links which we need to convert.
Since we want to regularly retrieve new links from the Twitter API, we’ll use scheduled tasks, also called crons. On App Engine, crons invoke URLs. Which means that the steps mentioned above will be happening in the lifespan of a request. And you know, that’s far from being ideal, it’s even a terrible idea. Why, you ask? Because we’re dealing with third-parties here, up to 50 of them. If just one of them is terribly slow or down, the request could time out before we’ve finished processing all our input. And this, we don’t want. But that’s easy you say, let’s make asynchronous requests, that’ll be much faster, and we won’t timeout! Or not. Since we’ll process requests in parallel, we’ll definitely have a speed bump in indexing all our URLs. But, we’re still bound to the slowest request, which means that we’ve no guarantee that this single call won’t make the cron request time out. Damn! We need to find a way to move these operations out of the initial request processing, and execute them in the background. Fear not my friend, The App Engine team as released a task queue API, which makes offline processing ridiculously easy. I’ve opted for the deferred library. Here’s an example:
for link in links:
deferred.defer(tools.get_title, link, tag, _countdown=30)
It can hardly be simpler. We’ll let App Engine take care of executing our tasks in the background. Our Links will automagically appear in the Datastore once the final URL and the title of the page has been retrieved, and the count incremented.
You’ve probably noticed that keeping track of hot links by just incrementing a counter of an object won’t get us very far. We’re not logging enough information to be able to extract stats, patterns and such. That’s what I’m working on right now, when not procrastinating on Twitter / Delicious ;)
It’s a only matter of minutes to have a prototype running on App Engine. But there’s a gap between a prototype and a useful and perennial service, and it sure helps to think it through first. Unless you love data migration of course ;)
Design is not just what it looks like and feels like. Design is how it works Steve Jobs
INTERNATIONAL DESIGN EXCELLENCE AWARD WINNERS 2009 on BusinessWeek
My personal favorites are below:
]]>It is unfortunate, but far too common, that extremely well-engineered systems, and the amount of work and talent on which they are based, may be eclipsed by a bad User Experience (UX), often regarded as a minor detail.
How often have you been frustrated when being squeezed in the train at peak hours, and complained about how incompetent the railway companies were by not allocating more trains ?
If the frustration, engendered by a bad UX, escalates to the point where it obfuscates the complexity and effort put into providing the service in its entirety, then, it is by no means a minor detail and needs immediate consideration. Customer loyalty can be heavily influenced by hasty opinions in a crisis situation.
It’s nothing earth shattering, but sometimes it’s important to be reminded that every single link of the chain matters.
The overall quality of the UX does not equal the sum, but the product of the individual experiences.
Let the Apple/AT&T deal illustrate our saying. On one hand, we have Apple — famous for striving towards excellence. On the other hand we have AT&T, renowned for less than stellar network coverage. Taken separately, the iPhone is the jewel of the Apple Kingdom. I will let History decides whether its release was a revolution, or just an impressive evolution of the smart phone industry, but many praise its technological wonders. I have yet to come across such a consensus about any of the services provided by AT&T. Lumped together, the iPhone experience — and to some extent Apple — loses some of its shine. It doesn’t matter if your device sports the fastest mobile web browser if the network is dead slow.
The exception being the “Visual voicemail feature” which proves itself quite handy during dropped or missed calls due to network issues. :)
This applies to any situation where responsibilities are chained, where all links are interdependent. It may be a bit cliché, but small or middle-sized Open Source projects are often impacted by this chain reaction. It works well, but it looks terrible. Due to the lack of design/aesthetics skills the projects stays anonymous, despite high quality code.
You can spend hundreds of thousands of $$$ to hire the programming rock star of XXX’s fame, if the product you are building, or the service you are providing depends on someone else’s skills in order to be completed, then, your product or service, will only be as good as the weakest of its parts.
Let’s not forget that weakest does not mean weak, so before we exploit social media tools and abuse the influence and the responsibility they entail, we should step back and look at the efforts made on every other part of the chain. That will certainly soften even our most legitimate frustration. There’s always room to improve :)
]]>Steve Krenzel, wrote a article entitled “Finding friends with MapReduce” which, after going through the basics, explains how you could use MapReduce to compute friendships on a social network, to offer features such as “You & John have XXX friends in common”.
The article does a great job at explaining how the data is processed at each step, and what the output should be. I wrote a quick python script that does exactly what Steve explains, you can find it at http://gist.github.com/184137
I hope it’ll be useful to someone.
]]>It’s never been easier to setup a web app:
web.py, werkzeug and webapp have some additional features, and are considered as lightweight frameworks instead.
I prefer the url dictionnary approach (mnml, newf, webpy) to the decorator approach (juno). djng is bit different, routing is based on the django module (from django.conf.urls.defaults import url).
micro frameworks + key-value stores are a match made in heaven for webservices development.
]]>