Archives For Posts
One of the challenges of building a strong product management team is that most people don’t really know what a product manager is supposed to do. Typically, the responsibilities ascribed to the product manager are actually product marketing or program management. In some organizations, these might need to be part of the product manager job description, and they can be important aspects of the role.
In my experience, the most critical responsibility of product management is the application of the Pareto Principle to the influx of requirements that flood the product planning process. The Pareto Principle, also know as the 80/20 rule or the identification of the vital few and trivial many, is a simple heuristic that we tend to apply every day except when it comes to figuring out what to build and what not to build. What happens if this isn’t done correctly? Lots of very bad things…
It’s kind of amusing to see this back and forth about the open web and people suddenly being concerned about whether challenges to Google’s position of dominance are good for the “open web”. I do think it’s very possible that Google is entering a phase similar to that of Microsoft circa the late 90’s but, if so, we won’t really know for sure for another 10 years, and a lot can change. However, I do think we’re seeing the decline of the search-driven page-oriented web as it’s existing for the last decade. The reason is pretty simple and I’ll try to summarize it as best as I can, although I’m a long-winded kind of guy, so bear with me.
I’m happy to announce that Apigee is acquiring Usergrid, the startup I’ve worked on for the last 18 months since leaving Six Apart. I’m very excited to be joining Apigee and taking Usergrid to the next level with their help. This has been a very cool project based on amazing technology for a market that’s still in it’s earliest stages of growth. Thanks to all my friends who helped me get it this far.
Here are a few links:
I’ve been working on this for time now and we’ve finally released the source code. Usergrid is a comprehensive platform stack for mobile and rich client applications. The entire codebase is now available on GitHub at https://github.com/usergrid/stack. There’s a full blog post up on the Usergrid Blog.
One thing that’s pretty interesting is that although this was initially envisioned as purely a cloud platform-as-a-service, a lot of the users I talked to were very interested in self-hosting. So, we decided to adopt the WordPress model by making the source completely open and then following that with a cloud version. One difference is that, unlike WordPress which kept the source for WordPress.com and WordPress.org separate, we’re releasing the entire multi-tenant architecture. This means that if this takes off, that anyone can run their own private grid. This is the sort of thing we used to debate at Six Apart, but we have the luxury of starting with a clean slate here. It will be interesting to play the role of WordPress this time against the folks that are trying to offer similar functionality purely as closed source hosted-only options.
Key to making this possible was the development of a double-clickable app that fires up the complete stack, including an embedded Cassandra installation, right on your desktop. This means that anyone can get started right away playing with Usergrid. This is kind of an old-school approach, but then again, everything related to mobile is fundamentally about rethinking and in many ways turning back the clock on the relation between the client and the server. For us, though, it means we don’t have to raise money to start getting developer traction. And frankly, that’s a good thing.
Here’s a presentation that explains what Usergrid is all about:
I’m going to be speaking on indexing in Cassandra at the upcoming Cassandra Summit 2011. It’ll cover some of the material from my previous blog posts on the subject with some new examples, and should be interesting. I’ve been a big fan of Cassandra but it provides a much lower level data model than most people are used to with conventional databases. It compensates for this by being much more scalable than any of the other NoSQL databases. However, it pushes a lot of the more advanced data modelling up to the application layer, in particular building your own relationship models and the queries against those. Hopefully I can shed some light on how to do that.
I often talk to people who are grappling with the question of how to get their products, particularly cloud services, adopted by developers. If you ask people to name companies that are really good at getting developers to use their products, you typically hear companies like Facebook, Google, or Apple listed. These companies, as successful as they are, don’t really provide a lot of useful hints on how to do this, because, in truth, it’s not that they’re good at getting developers, it’s that they’re “not not good” at it. What I mean by that is that the reason people are interested in learning the API of Facebook or figuring out how to develop for iPhone has everything to do with the market that can be tapped into by creating products for that market. So, people would have made the effort if those were the most difficult platforms to learn, and. in fact, at the beginning, they were none too easy although the situation has changed considerably over time. When we started Widgetbox back at the end of 2005, the idea was that we’d provide a platform for developers to build and deliver widgets and that we’d also create a destination site, essentially an “app store”, for users to find widgets.
We didn’t initially succeed as well as we’d hoped in being a consumer destination, but we did manage to get an impressive number of widgets built on the platform, literally thousands of them in the first six months. We didn’t have any market clout to make this happen. Although we’d hoped to have partnerships with social networks to aid in distribution of widgets built using our service, these didn’t really kick in until much later.
So, what accounted for the early developer traction at Widgetbox?
There’s been a lot of activity in the PaaS space lately. This is largely fueled by the successful exit of Heroku, and since a lot of investment tends to be made by looking in the rear-view mirror, a lot of people are turning their eyes to infrastructure and platforms. These sorts of businesses haven’t been as much in favor lately, and there’s more than a few startup “experts” who have been very critical of platforms and infrastructure. While it’s true that the business models for these have taken some time to adapt to the era of open source and the cloud, it’s silly to ignore just how much long term value and ROI has come from these types of companies. Oracle now owns MySQL, there’s a lot of ways to look at that, but I tend to view it as both companies won and won big (as did both platforms and closed source and open source, this stuff is nowhere near as mutually exclusive as the pundits would have you believe).
Java’s UUID class compares UUID’s using signed comparisons, in a way that will provide opposite results than you might expect and incompatible with other languages. If you’re writing an application that compares and sorts UUIDs, you should use an alternate UUID library for the comparison function or roll your own.
I’m writing this up because there’s always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them. I’d written a previous post about Secondary indexes in Cassandra last July, but there are a few more options and considerations today. I’m going to do a quick run through of the different approaches for doing indexes in Cassandra so that you can more easily navigate these and determine what’s the best approach for your application.
The Primary Index
Most conversations about indexing in Cassandra are about secondary indexes. This begs the question, what is the primary index? Your primary index is the index of your row keys. There isn’t a central master index of all the keys in the database, each node in the cluster maintains an index of the rows it contains. This is what the Partitioner in Cassandra manages, as it decides where in a cluster of nodes to store your row. Because of this, the index typically only enables basic looking up of rows by key, much like a hashtable. The discussions that break out about the OrderPreservingPartitioner versus the RandomPartioner are really about how literally the primary index behaves like a hashtable versus an ordered map (i.e. something you could do a “select * from foo order by id” against). This is because, in the case of the RandomPartioner, you can’t easily traverse your set of row keys in meaningful ways since the sorted order of those keys is assigned by Cassandra based on a hashing algorithm. The OrderPreservingPartitioner, as the name implies, orders the keys in string-sort order, so you can not only look up a row via a specific key, but can also traverse your set of keys in ways that are directly related to the values you are using as your keys. In other words, if your row key was a “lastname,firstname,ss#” string, you could iterate through your keys in alphabetical order by lastname. Generally, though, people try to use the RandomPartioner because, in exchange for the convenience of the OrderPreservingPartitioner, you lose the even distribution of your data across the set of nodes in your overall system, which impedes the scalability of Cassandra. For more understanding of this, I’d recommend reading Cassandra: RandomPartitioner vs OrderPreservingPartitioner.
By definition, any other way of finding your row other than using the row key, makes use of a secondary index. Cassandra uses the term “secondary index” to refer to the specific built-in functionality that was added to version 0.7 for specifying columns for Cassandra to index upon, so we’re going to use the broader term “alternate index” to refer to both Cassandra’s native secondary indexes as well as other techniques for creating indexes in Cassandra.