Few Against Many

Steven Paul Jobs

2011-10-05T11:28:00+00:00

To the man who inspired me to do what I do today and to be who I am. Rest in peace.

Dreaming of Web Hooks for Twitter

2010-04-12T00:16:00+00:00

In case you’re not familiar with the Web Hooks, here’s a simple explanation from Jeff Lindsay, web hooks evangelist :

Web hooks are user-defined callbacks over HTTP. They’re intended to, in a sense, “jailbreak” our web applications to become more extensible, customizable, and ultimately more useful.

Why Web Hooks for Twitter?

Twitter provides a solid API for its service. Even though calls are rate limited, they’re pretty generous about it, and with 20,000 calls per hour, there’s a lot you can do without being throttled. Over the years, they have heavily invested in caching mechanisms to be able to handle such a load. Given the insane numbers of services built on top of this API, we can say that they managed to build a scalable platform. Yet, in some cases, using the API doesn’t seem to be efficient.

Let’s take a look at a real world example : tracking unfollowers.

The Twitter API provides a method to retrieve the ids of your followers. The doc states:

cursor. Required. Breaks the results into pages. A single page contains 5000 ids. This is recommended for all purposes. Provide a value of -1 to begin paging. Provide values as returned to in the response body’s nextcursor and previouscursor attributes to page back and forth in the list.

http://api.twitter.com/1/followers/ids.json?screen_name=pims&cursor=-1 outputs:

{"previous_cursor_str":"0","next_cursor":0,"previous_cursor":0,"ids":[86977748,38477048…]}

If you want to know who stopped following you, you have to make another request at a later date, and by diff-ing both lists, you’ll see which ids were part of the first set but not part of the second. Easy. You’ll notice that we don’t exactly know when it happened, we just know it was between the two requests. A daily request is probably plenty enough, after all, they’ve stopped following you — it’s not really an emergency.

With roughly 250 followers, I’m far from being a Twitter celebrity, and the API handles my situation fairly well. Of course, I wouldn’t mind knowing exactly when they stopped following me. Why? Well, I may have hurt their feelings and would like to apologize, but with a 24h timeframe, it’s difficult to know which one of my tweet is to blame.

The situation gets a bit more complicated, when, you’re @ev and have 1,182,333 followers. You’ll only be able to find out if somebody — and who — unfollowed you, once you’ve made 24 requests. At roughly 48kb each request, that’s 1,152kb of data transferred to find out that today, nobody stopped following @ev.

Now, imagine if, when someone stopped following @ev, Twitter would notify our little app. They’d push that information once to us, instead of us pulling it repeatedly. Wouldn’t that be an efficient — and elegant — solution to this problem? Well, that’s how web hooks work. @ev could tell Twitter that when someone stops following him, Twitter should notify http://mylittleapp.com/ev/unfollowers and push the ids of the offenders who committed a Lèse majesté crime. And knowing exactly when it happened, well, that’s the icing on the cake.

The following events would benefit from being web hook enabled:

you have a new follower
you’ve lost a follower
you have a new direct message
your tweet has been favorited
your tweet has been retweeted
you have a new @reply
you’ve been added to a list
you’ve been removed from a list

Web hooks work as an instant notification system, and would be a great addition to the Twitter API. If you’re going to Chirp, maybe you could ask them why Twitter isn’t web hooks enabled yet?!

Embracing change

2010-03-16T07:13:00+00:00

This is a transcript of a conversation between PETE and JOHN. They work in the same company. They live in a small town, and they happen to be neighbors, and software engineers too.

PETE: Hey John, did you hear about this new Green bottle coffee place?

JOHN: No never heard about it. Where is it?

PETE: It’s on Market St.

JOHN: Ah! I never drive on Market St. I always take Main St. to go to work. Been doing that for the last 5 years.

PETE: Not sure I get it, are you saying that, in the last 5 years, you’ve always taken Main St? Never tried an alternative to go to work?

JOHN: Yup, Main St. every single day, why would I change? Main St. isn’t Main street for no reason! Hey, did you know that if you leave before 8:06am, you can avoid the school bus coming from 4th St. You win around 2 minutes! See, that’s one of the reason why I always stay on Main St. I know everything about it.

PETE: No, I didn’t know that. But now, I understand why you haven’t heard about this new Green bottle coffee place, since it’s on Market.

JOHN: Whatever, had they been doing serious business, they would have opened their store on Main St. Like every other serious coffee place in this city.

PETE: Hmmm, well, it’s a whole different concept, so it kinda makes sense for them to be on Market St, they can accommodate much more customers there, they have room to grow. So I guess you didn’t know that at this coffee place, the first small coffee is free and if you want a bigger one, you just pay for the difference in quantity between the small and the one you picked. Which means that, if you only drink one small cup of coffee every day, you don’t pay a cent. It’s free.

JOHN: Free?! Really? Man, that’s awesome. They really should have opened a store on Main St. Gotta find a way to have this free coffee on Main St.

PETE: But you know, Main St isn’t th&ellipsis;

JOHN: (interrupting Pete) Do they deliver? Because, I was just thinking. There’s kind of a parking spot on Main St. It’s not really a parking spot, but if I try hard enough, I’m sure I can park my big SUV there. Yeah, pretty sure I can do it. I could park there and wait for the delivery.

PETE: Hmmm, no they don’t. To be able to offer free coffee, they need to be close to their coffee crops. If you want free coffee, you have to go there.

JOHN: I don’t really feel like going to Market St. every morning. I need to find a way to have this coffee on Main St.

PETE: There’s something I don’t get. You want free coffee, but you still want to drive on Main St, am I right?

JOHN: Yes! I’m surprised they haven’t thought about this issue before opening their store on Market St. They probably have no idea what a real business is anyways.

PETE: Well, they’re the biggest coffee company in the world, so, I think they do. You, and other people, will just need to change some of your old habits to have some free or cheap coffee. That’s all. Plus, since it’s on Market St, there’s no traffic there. You should be able to make it around the same time at work every morning. But it does require you to stop driving on Main St.

JOHN: But I’ve been driving on Main St for 5 years now. I know every single inch of that street. If my car breaks down, I know where to look for help. So why would I want to change this ?

PETE: free coffee, remember?

JOHN: Yes! I want free coffee. I need to find a way to have this free coffee on Main St. There must be a way.

PETE: Alright, but don’t forget to get some sleep. Coming up with a shady plot to have free coffee and to continue driving on Main St is not worth sleepless nights. Just embrace change and drive on Market St. Or give up on free coffee. You realize that you can’t have one’s cake and eat it too, right?

JOHN: (ramblings) there must be a way. Maybe if I park, ride the subway for a couple stops and then rent a bike, I should be able to stay on Main St… (more ramblings)

Firetweets : the making of.

2010-02-23T19:10:00+00:00

Is there a Twitter aux. service that takes any search term like “node.js” and creates list of hottest & recent links tweeted for that term?

This tweet is where the idea for Firetweets originates. 20 minutes – and a dinner – later the first version was online. While the look of the site hasn’t changed much, the code that powers went through several iterations. I thought it’d be of some interest, not to share the actual code, but rather the thinking behind it. It’s no rocket-science, identifying shortcomings and being able to solve them, is a favorite activity of mine :)

1st try

The first iteration was extremely simple. It consisted of making a call to the Twitter API asking for all tweets containing a link and our query (node.js).

For each result:

extract link from text
query datastore for an object with an id equal to the link we’ve extracted
if there is a match, increment its count by 1 and save
if there isn’t, create a Link object, set its id to the URL found, set count at 1 and save

Even though it worked as expected, this is a rather naive approach. Here are a few reasons why:

URLs are shortened by different services, which means that two different URLs could link to the same resource. Duplicates would ruin the whole rank by popularity aspect.
tracking parameters or preferences in URLs (?utm=twitter&campaign� or ?view=print) generate duplicates
both previous points combined
short URLs aren’t easy to remember

2nd try

Learning from the shortcomings of the first iteration, some things needed to change:

short URLs must be resolved to avoid duplicates
expanded URLs must be stripped of tracking and other polluting parameters.
links need a context: title or tweet from which they have been extracted

First thought was to use the API the URL shortener services provide. Even though a handful of them dominate the market, we’re dealing with a much larger number of services, from bit.ly to ff.im, tinyurl�, and not all of them have an API, and for those who do, they, unfortunately, didn’t think it’d be a good idea to use a standard format. Atom or RSS anyone ?

The good news is, that despite the lack of a standard json/xml stream, those services aren’t really disparate, they all do the same thing: provide a short link that will redirect to the final URL. And this, is perfect for our use case: let’s just follow the URL, which will ultimately redirect us to the page we’re looking for.

Since our app is running on top of Google App Engine, we can use the urlfetch module that is bundled with the SDK. The result of fetching an URL with urlfetch has a final_url property, which is exactly what we’re looking for. If the value of the final_url property is the actual URL whose request returned this response. Sweet! While we’re at it, let’s grab the content of the page, and with some very basic regexp extract its title. We’re almost there, we just need to clean the URL by removing tracking and other parameters and we’ll have our unique identifier.

To keep things simple, when fetching the data from the Twitter API, we’re only asking for the latest 50 results that contain both a link and our query. Even though we’re keeping track of the last fetched tweet_id, for further calls, in the worst case scenario, we will have 50 short links which we need to convert.

Since we want to regularly retrieve new links from the Twitter API, we’ll use scheduled tasks, also called crons. On App Engine, crons invoke URLs. Which means that the steps mentioned above will be happening in the lifespan of a request. And you know, that’s far from being ideal, it’s even a terrible idea. Why, you ask? Because we’re dealing with third-parties here, up to 50 of them. If just one of them is terribly slow or down, the request could time out before we’ve finished processing all our input. And this, we don’t want. But that’s easy you say, let’s make asynchronous requests, that’ll be much faster, and we won’t timeout! Or not. Since we’ll process requests in parallel, we’ll definitely have a speed bump in indexing all our URLs. But, we’re still bound to the slowest request, which means that we’ve no guarantee that this single call won’t make the cron request time out. Damn! We need to find a way to move these operations out of the initial request processing, and execute them in the background. Fear not my friend, The App Engine team as released a task queue API, which makes offline processing ridiculously easy. I’ve opted for the deferred library. Here’s an example:

for link in links:
  deferred.defer(tools.get_title, link, tag, _countdown=30)

It can hardly be simpler. We’ll let App Engine take care of executing our tasks in the background. Our Links will automagically appear in the Datastore once the final URL and the title of the page has been retrieved, and the count incremented.

3rd try (in progress)

You’ve probably noticed that keeping track of hot links by just incrementing a counter of an object won’t get us very far. We’re not logging enough information to be able to extract stats, patterns and such. That’s what I’m working on right now, when not procrastinating on Twitter / Delicious ;)

Conclusion

It’s a only matter of minutes to have a prototype running on App Engine. But there’s a gap between a prototype and a useful and perennial service, and it sure helps to think it through first. Unless you love data migration of course ;)

References:

Firetweets
Background work with the deferred library by Nick Johnson
The URL Fetch Python API
Twitter Search API
Bit.ly API
Tweet from which the service was born by John Wright

IDEA 2009 Awards

2009-12-06T11:01:00+00:00

Design is not just what it looks like and feels like. Design is how it works – Steve Jobs

INTERNATIONAL DESIGN EXCELLENCE AWARD WINNERS 2009 on BusinessWeek

My personal favorites are below: