It’s kind of amusing to see this back and forth about the open web and people suddenly being concerned about whether challenges to Google’s position of dominance are good for the “open web”. I do think it’s very possible that Google is entering a phase similar to that of Microsoft circa the late 90’s but, if so, we won’t really know for sure for another 10 years, and a lot can change. However, I do think we’re seeing the decline of the search-driven page-oriented web as it’s existing for the last decade. The reason is pretty simple and I’ll try to summarize it as best as I can, although I’m a long-winded kind of guy, so bear with me.
I’m happy to announce that Apigee is acquiring Usergrid, the startup I’ve worked on for the last 18 months since leaving Six Apart. I’m very excited to be joining Apigee and taking Usergrid to the next level with their help. This has been a very cool project based on amazing technology for a market that’s still in it’s earliest stages of growth. Thanks to all my friends who helped me get it this far.
Here are a few links:
I’ve been working on this for time now and we’ve finally released the source code. Usergrid is a comprehensive platform stack for mobile and rich client applications. The entire codebase is now available on GitHub at https://github.com/usergrid/stack. There’s a full blog post up on the Usergrid Blog.
One thing that’s pretty interesting is that although this was initially envisioned as purely a cloud platform-as-a-service, a lot of the users I talked to were very interested in self-hosting. So, we decided to adopt the WordPress model by making the source completely open and then following that with a cloud version. One difference is that, unlike WordPress which kept the source for WordPress.com and WordPress.org separate, we’re releasing the entire multi-tenant architecture. This means that if this takes off, that anyone can run their own private grid. This is the sort of thing we used to debate at Six Apart, but we have the luxury of starting with a clean slate here. It will be interesting to play the role of WordPress this time against the folks that are trying to offer similar functionality purely as closed source hosted-only options.
Key to making this possible was the development of a double-clickable app that fires up the complete stack, including an embedded Cassandra installation, right on your desktop. This means that anyone can get started right away playing with Usergrid. This is kind of an old-school approach, but then again, everything related to mobile is fundamentally about rethinking and in many ways turning back the clock on the relation between the client and the server. For us, though, it means we don’t have to raise money to start getting developer traction. And frankly, that’s a good thing.
Here’s a presentation that explains what Usergrid is all about:
I’m going to be speaking on indexing in Cassandra at the upcoming Cassandra Summit 2011. It’ll cover some of the material from my previous blog posts on the subject with some new examples, and should be interesting. I’ve been a big fan of Cassandra but it provides a much lower level data model than most people are used to with conventional databases. It compensates for this by being much more scalable than any of the other NoSQL databases. However, it pushes a lot of the more advanced data modelling up to the application layer, in particular building your own relationship models and the queries against those. Hopefully I can shed some light on how to do that.
I often talk to people who are grappling with the question of how to get their products, particularly cloud services, adopted by developers. If you ask people to name companies that are really good at getting developers to use their products, you typically hear companies like Facebook, Google, or Apple listed. These companies, as successful as they are, don’t really provide a lot of useful hints on how to do this, because, in truth, it’s not that they’re good at getting developers, it’s that they’re “not not good” at it. What I mean by that is that the reason people are interested in learning the API of Facebook or figuring out how to develop for iPhone has everything to do with the market that can be tapped into by creating products for that market. So, people would have made the effort if those were the most difficult platforms to learn, and. in fact, at the beginning, they were none too easy although the situation has changed considerably over time. When we started Widgetbox back at the end of 2005, the idea was that we’d provide a platform for developers to build and deliver widgets and that we’d also create a destination site, essentially an “app store”, for users to find widgets.
We didn’t initially succeed as well as we’d hoped in being a consumer destination, but we did manage to get an impressive number of widgets built on the platform, literally thousands of them in the first six months. We didn’t have any market clout to make this happen. Although we’d hoped to have partnerships with social networks to aid in distribution of widgets built using our service, these didn’t really kick in until much later.
So, what accounted for the early developer traction at Widgetbox?
There’s been a lot of activity in the PaaS space lately. This is largely fueled by the successful exit of Heroku, and since a lot of investment tends to be made by looking in the rear-view mirror, a lot of people are turning their eyes to infrastructure and platforms. These sorts of businesses haven’t been as much in favor lately, and there’s more than a few startup “experts” who have been very critical of platforms and infrastructure. While it’s true that the business models for these have taken some time to adapt to the era of open source and the cloud, it’s silly to ignore just how much long term value and ROI has come from these types of companies. Oracle now owns MySQL, there’s a lot of ways to look at that, but I tend to view it as both companies won and won big (as did both platforms and closed source and open source, this stuff is nowhere near as mutually exclusive as the pundits would have you believe).
Java’s UUID class compares UUID’s using signed comparisons, in a way that will provide opposite results than you might expect and incompatible with other languages. If you’re writing an application that compares and sorts UUIDs, you should use an alternate UUID library for the comparison function or roll your own.
I’m writing this up because there’s always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them. I’d written a previous post about Secondary indexes in Cassandra last July, but there are a few more options and considerations today. I’m going to do a quick run through of the different approaches for doing indexes in Cassandra so that you can more easily navigate these and determine what’s the best approach for your application.
The Primary Index
Most conversations about indexing in Cassandra are about secondary indexes. This begs the question, what is the primary index? Your primary index is the index of your row keys. There isn’t a central master index of all the keys in the database, each node in the cluster maintains an index of the rows it contains. This is what the Partitioner in Cassandra manages, as it decides where in a cluster of nodes to store your row. Because of this, the index typically only enables basic looking up of rows by key, much like a hashtable. The discussions that break out about the OrderPreservingPartitioner versus the RandomPartioner are really about how literally the primary index behaves like a hashtable versus an ordered map (i.e. something you could do a “select * from foo order by id” against). This is because, in the case of the RandomPartioner, you can’t easily traverse your set of row keys in meaningful ways since the sorted order of those keys is assigned by Cassandra based on a hashing algorithm. The OrderPreservingPartitioner, as the name implies, orders the keys in string-sort order, so you can not only look up a row via a specific key, but can also traverse your set of keys in ways that are directly related to the values you are using as your keys. In other words, if your row key was a “lastname,firstname,ss#” string, you could iterate through your keys in alphabetical order by lastname. Generally, though, people try to use the RandomPartioner because, in exchange for the convenience of the OrderPreservingPartitioner, you lose the even distribution of your data across the set of nodes in your overall system, which impedes the scalability of Cassandra. For more understanding of this, I’d recommend reading Cassandra: RandomPartitioner vs OrderPreservingPartitioner.
By definition, any other way of finding your row other than using the row key, makes use of a secondary index. Cassandra uses the term “secondary index” to refer to the specific built-in functionality that was added to version 0.7 for specifying columns for Cassandra to index upon, so we’re going to use the broader term “alternate index” to refer to both Cassandra’s native secondary indexes as well as other techniques for creating indexes in Cassandra.
Done with a clear cache and VPN’d into a couple of different locations, but hardly an exhaustive test:
A is for Amazon, B is for Bank of America, C is for Craigslist, D is for DMV, E is for eBay, F is Facebook, G is for GMail, H is for Hotmail, I is for Ikea, J is for Jet Blue, K is for Kaiser, L is for Lowes, M is for Mapquest, N is for Netflix, O is for Outside Lands, P is for Pandora, Q is for Quotes, R is for REI, S is for Skype, T is for Target, U is for USPS, W is for Weather, X is for XBox, Y is for Yahoo, and Z is for Zillow.
Some of these appear to be location dependent, VPN’ing into a server in Los Angeles gives me KTLA instead of Kaiser, Lakers instead of Lowes, Myspace instead of Mapquest, OC Fair instead of Outside Lands.
It seems like most of my friends and colleagues have heard I’ve been using Cassandra in my current project and they forward on to me every blog post or tweet where someone has something negative to say about the Apache NoSQL open source database. To be clear, it’s debatable whether you should use Cassandra in production today, although at the recent Cassandra Summit, it was clear a lot of people were, and were having success with it, and of course, as we all know from the blog headlines, a few people are not. But, I also think that it’s probably useful to keep in mind how many of the basic building blocks of the web started on very shaky ground and went through many iterations before getting to where they are today. I’m not exactly talking about the Gartner Hype Curve, because it’s very hard to apply and very hard to actually determine where you are on that curve until years after the fact. Having ridden that curve on a number of web technologies, it’s not as simple as drawing a sloppy sideways S-curve and saying “here we are”. The reality is that there are a number of little peaks and valleys inside the overall process.
This joke never gets old
It all comes down to packaging and distribution, of course. App stores never go away, even if they ultimately become, under the hood, a way to sell password-protected, pay-to-open, web bookmarks. And that’s not a bad thing, because for all the headaches of dealing with opaque approval processes and such, at least they’ve figured out how people get paid.
Lots of people are getting into the weeds of this Oracle/Google/Java spat, it really is little more than a thinly veiled shakedown gambit. But when I look at it as the latest in a string of well publicized disputes between virtually every single major platform owner today and the developers trying to build on those platforms, as well as the major conflicts between potentially competitive platforms, I’m more concerned with the fact that we’ve recently moved into a new era of aggressiveness and heavy handed behavior by platform owners that we haven’t seen since the early 90’s. I used to suspect that many of the companies that were the most vocal in decrying Microsoft’s dominance back in the day would have behaved no differently than Microsoft if they’d had the ability to do so. Now, when I take a look at the way that every single platform owner of any significance is behaving, I realize that I was wrong, most of them would have behaved far worse.
Note: I’m not using the term “platform” in the way that every company with an API puffs up their chest and tries to claim, but to mean that the company and it’s technology have a meaningful ecosystem with a large base of third party vendors, partners, developers, and other participants, all of whom are earning a living (or at least trying to) on top of it. Platforms are ultimately markets, not technologies.