Development Update – 7

It has been a while since our last update. In this time, we’ve been working closely with our first customers to determine and implement various essential features. We’ve also applied our experience and research on hosting in the cloud to their projects. This has led up to a major milestone last week: observu.com now hosts the latest beta and a more descriptive website. It is still very private, but if you get on our mailinglist, we can let you in soon.

In terms of development we’ve made a lot of progress on properly organizing the API and mobile website code, to share 100% of the codebase with the main website. (proper MVC with only difference being the View). We are a big fan of Redis, which we’ve used extensively for various queues and rate limiting solutions, that would be challenging to get right otherwise.

What we are working on now is usability and extended reporting. Of course we also have a heap of features in mind, but we would love to have your feedback first, to know what really matters.

Posted in Progress Report | Leave a comment

Development Update – 6

As Observu is all about improving uptime and removing bottlenecks, we strongly believe that we can’t do with an ad-hoc infrastructure either. Especially as the exact time you need Observu is often in case of emergency, we feel strongly about the ability to recover from outages quickly.

We’ve selected Amazon for hosting because it is both flexible, is available in multiple parts of the world and has an excellent network quality. However, earlier this year is has been shown multiple times that no datacenter has 100% uptime and that if failure occurs, it is big. Therefore we are currently working on our architecture to be able to quickly overcome such events. The actual details warrant a separate post.

Our obsession with reliability also touched an other area of development: Our initial implementation of SMS notifications proved unreliable. Therefore we changed to Nexmo as our partner. It provides us with actual delivery confirmations, allowing us to monitor delivery.

Our Growing Development Stack

I personally always like to know what people are using to create their product, therefore a listing of almost everything we use:

  • Ubuntu on Amazon EC2
  • MySQL
  • Redis
  • PHP
  • Perl
  • jQuery
  • RaphaelJS
  • boto
  • Fabric
  • chef-solo

In the area of 3rd party services we rely on:

  • Amazon AWS
  • Nexmo
  • Tropo
  • Sendgrid
  • Github
  • Uservoice

These services allow us to focus on the things that really matter: gaining insight in all parts of your deployment and staying on top of the events that will occur. To improve that insight, we are currently working with the first customers to implement monitoring as part of their stack. A great example is the need to monitor logfiles centrally as soon as there are multiple servers handling your front-end.

Posted in Progress Report | Leave a comment

Development Update – 5

An important milestone has been met: we’ve implemented all core features that make up our monitoring system. The last important hurdle was the completion of a flexible, yet easy to understand, system to create and configure event rules.

Of course it is far from the system we imagine. Nevertheless, we feel confident that what we have now is a very useful product.

In addition to this technical progress, I can also present you the new observu.com logo:

Observu.com Logo

Current efforts are focussed around documentation and workflow, to make sure the first users get the experience they deserve. Furthermore, the API is being finalized and the production environment architected.

We are very excited about the coming weeks when the first testers will finally enter the system.

Posted in Progress Report | Leave a comment

Development Update – 4

We are getting closer and closer each day, we are almost ready for the first bunch of testers. Our attention is shifting more towards the user interface, to create the best experience possible.

An important part of that user interface are the graphs, that start to look better and better. An example is this stacked CPU usage graph:

CPU usage graph

This also shows an other major point of progress: we now have a fairly easy to install daemon script to collect this data on linux servers. Currently it collects load, cpu, memory, disk and network statistics. We are planning on collecting a whole lot more soon.

The final part of this months effort is a mobile website. It provides with a simple and clear way to check your site and server status on-the-go. For example, when you receive a text notification, you can instantly check what is actually going on.

We feel confident that we can allow the first testers access next month and are really looking forward to their feedback. To make sure, you can be one of those testers, sign up for our mailing list!

Posted in Progress Report | Leave a comment

Development Update – 3

Observu Dashboard It has been a while since I last updated you about our progress. We are still continuing our work on various reports. An important part of this is creating informative graphs. Although we liked Open Flash Chart, it felt a bit sluggish and we decided to go for a solution based on the Raphael library: an SVG abstraction with fallback for IE. It comes with a limited graphing library: g.raphael, but it was not mature enough for our needs. Another Raphael based charting library is Grafico, it’s able to display a few great graphs. However, we choose to create our own, mostly because Grafico depends on Prototype, which we do not use and because we would need to extend it for our own graph types. Although the library is well coded, we did not feel confident about customizing it to our needs. We will open-source our own library as soon as it is in a usable state.

At the same time, we’ve started to work on a very basic mobile website, which allows you to check your status on-the-go. We hope to slowly add more functionality and at the same time keep things really quick and simple.

Another major part of development involves the collection agent. We’ve chosen to use Perl, to maximize portability and reduced dependencies. Additional advantages include easy customization and the ability to verify that it does not contain harmful code. The next challenge in this area is to create an install script that works across distributions.

On the front-end we’ve introduced a new splash page for observu.com, it contains the first iteration of our new logo, which still needs a bit of work. We also got a very nice mascot designed, but we will keep that a secret for now.

If you visit the new splash page, you will also notice that we’ve selected UserVoice for feedback and support. I’ll write a separate blog post about the selection process soon.

We are also talking to potential users about their monitoring needs, if you feel you could contribute by telling us about your problems, please feel free to contact me.

Posted in Progress Report | Leave a comment

Amazon RDS vs DIY MySQL on EC2 Benchmark

As I was researching online whether Amazon RDS was a viable option, I had a hard time finding reliable benchmarks. The authors of this good book on EC2 mention it to be a bit faster, but without further clarification. The best benchmark I could find was this one. It uses the sysbench tool to test an EC2 instance vs RDS, exactly what I need. It provides the tools for benchmarking and pointed to the difference between running 1 and 10 threads. However, for me this benchmark was missing some vital information, therefore I decided to run my own benchmark using sysbench in a very similar way, with the following adjustments:

  • I’ve used a much bigger dataset: I’ve set it to use 50 million objects, in order to create a 12GB database that will surely not fit the 1.7GB memory.
  • Some parameters like: instance disk vs EBS and MySQL configuration were unspecified

I’ve used the following setups:

  • A small EC2 instance in the USeast region, with Debian squeeze and a standard MySQL install. The database is set-up on a separate EBS volume. (named Mysql on EBS (standard) )
  • The same instance with MySQL tuned to more reasonable values: key_buffer=512M, query_cache=128MB
  • A small RDS instance, set up in the same region

Single Client Thread

First, I repeated the single thread experiment. In this case the instance is not fully utilized. The results are shown below:

System Operations/sec Times (ms)
Transactions Read/Write Other min avg. max. 95th perc.
Mysql on EBS (standard) 18 334 35 4.4 56.9 1186.5 149.1
Mysql on EBS (optimized) 52 991 104 0.0 19.2 728.6 84.4
RDS 23.2 440.6 46.4 11.1 43.1 691.4 90.0

In this experiment the difference between a standard MySQL install and the optimized one is huge. RDS seems to come in comparable to a standard MySQL install, which seems reasonable.

50 Threads

Now, in real development we don’t care about the difference between fast and faster, if your website is growing, what matters much more is performance not deteriorating when things get tougher. Therefore I tried to stretch the database much further by using 50 client threads. This is much closer to the real world with multiple Apache processes constantly hitting the database. Especially in the case where you might have multiple front-end servers connecting to a single database instance. Again the results are shown below:

System Operations/sec Times (ms)
Transactions Read/Write Other min avg. max. 95th perc.
Mysql on EBS (standard) 38 724 76 30.2 1310.7 4662.8 2179.0
Mysql on EBS (optimized) 46 871 92 27.55 1089.4 3031.43 1853.76
RDS 111 2110 222 13.47 450.0 1557.4 807.3

First, the difference between a standard install and the optimized version have been greatly reduced. The most notable result is that RDS performs so much better. This confirms the results the original benchmark but now under conditions that matter to me. Maybe even more important than the difference in query throughput is that RDS does a much better job keeping request times within reasonable bounds. 95% returns within 807ms, compared to 1854ms for MySQL on the EC2 instance.

My conclusion is that although RDS may not perform as well as you can do yourself under ideal conditions, as soon as you are going for realistic loads, RDS can be pushed much further. Of course this should also be possible with DIY optimizations. RDS is after all running MySQL, but I’m sure it’s going to take a significant amount of time and does not outweigh the other benefits of RDS: easier backup and much less management.

November 3th, 2011 Further benchmarking has shown me that it is actually quite easy to bring the throughput of your own instance running mysql much closer to RDS, by increasing the innodb_buffer_pool_size. My lack of experience with InnoDB clearly biased the benchmark above. I do however still notice the difference in response times, RDS is much more stable.

Notes: 1: I’ve also benchmarked thread-numbers in between, but there was no interesting pattern. Results on 4 threads and up are largely similar to the 50 thread one for RDS, while for MySQL the times gradually get worse as the number of threads grows. 2: I’ve also done an experiment running MySQL on the instance disk, instead of EBS, but it wasn’t better and it removes all benefits of using EBS, therefore results are not included. 3: For more reliable results this should probably be repeated at different points in time with multiple instances.

Posted in Benchmarks | Tagged , , , , , | 3 Comments

Where does it hurt?

We want to create the best monitoring service around, therefore we would like to know:  Where does it hurt?

We’ve got a few pains of ourselves that we are currently focusing on:

  • Too much information: with a lot of servers, there is always something going on, at some point we were receiving so many notifications that we missed the critical ones. To counter this, we’ve created a summary dashboard and smart notifications.
  • Servers in serious problems can’t send e-mails. We solve this by providing a central monitoring dashboard, which directly receives updates from the individual servers with the added benefit of being able to alert you when no information is coming in.

We would love to hear from you, what you feel is important for Observu to provide. You can post your suggestions to our feedback forum.

Posted in Uncategorized | Leave a comment

Reviewing Pivotal Tracker

To plan and track Observu features we are using Pivotal Tracker. If you recognize having a features list where half of the tasks has the highest priority, you will probably like the idea behind it: the tasks are in a completely ordered list. This enables to clearly think and communicate about what really needs to happen first. You can’t just move one thing forward without moving other stuff backwards: decisions need to be made.

Other critical ideas are an icebox with unscheduled features, the need to estimate a task before you can start on it and a planning based on actual output based on past progress.

By keeping features in the ‘icebox’  you can already write down your ideas, without them clouding up your vision about what is on your current critical path. That’s something hard to do in other systems, do you give it a low priority? But what if it is a really important aspect, but just not now?

Because Pivotal tracker displays your list of tasks over time based on your previous progress, it is immediately clear how you are doing.  At first, it may seem daunting that your milestones are much farther away than you initially guessed, but it makes obvious that there are only two ways to change that:  actually speeding up or reducing the tasks for that milestone. It all makes you take a much more realistic look on your project.
This is especially true because programmers tend to estimate tasks by the time it takes in idea conditions, without taking into account that such conditions do not come in abundance.

For example,  coding up a piece may have taken you four hours, the whole project is 10 times as big, so you should be finished in a week, right?  This thinking tends to neglect the time you’ve spend thinking about the task and the fact that not every day is as productive.
As Pivotal tracker expresses things as points done per iteration instead of hours spend, the estimates become much more realistic.

As with any service, things can’t be perfect. First of all, I miss the opportunity to add extra documentation. Although you can write a description and add attachments, the space is fairly limited.  I would love it if I could add an extra stage in the workflow that is: adding specification/documentation.  Although maybe I should just use the icebox for the tasks without proper specification and only schedule them once they have.

Another thing is that having a completely ordered list makes you feel obliged to start on the first task, but if that one is particularly hard or daunting it may hurt productivity.  This is purely in my mind, because there is no problem at all starting on other tasks.

Even after a few months of use I am still in doubt whether I miss the lack of hierarchy.  The feeling I need a hierarchy probably has to do with the fact that I sometimes tend to create to big tasks, that keep dangling in the ‘current’ box.  Working a full day and than not being able to cross anything off is bad for motivation. So I do often feel like I need to split up bigger tasks, but than, why not just reduce the scope a bit and add a secondary task for the rest of the feature? Doing so will probably be better from a planning point of view as well, because it makes much more explicit that the task was actually bigger than anticipated and you need to allot some extra time to it.

From a more practical point of view, the whole thing does breathe that it is intended for teams. As I am mostly the only dedicated developer on this project, I do feel that it works better if you increase the size of iterations up to four weeks.

Taking everything into account the idea of having a single ordered list of features instead of just a bunch with priorities is amazing. It gives you so much more information and forces you to make decisions on what really needs to happen and what needs to happen first.

Posted in Reviews | 2 Comments

Development Update – 2

It’s been a month since our last update, in that time we’ve added three critical features:

  • multi-location monitoring: track availability and performance from multiple locations
  • a ping monitor, to ping your servers from multiple locations
  • (international) SMS and Phone notifications: nothing grabs your attention as our service calling you when a server is unavailable for some time

Furthermore, we’ve worked a lot on our internals: creating hourly and daily reporting data from the raw measurements. Work continues on our server collection agent and we are reading up quickly on mobile development as we don’t think we can do without a mobile client. Other than that, we are aiming at creating amazing report pages for both current and historic data.

For those of you more interested in the development side of things: We’ve moved the development server over to EC2, which will be it’s home as well as soon as we go live.  The micro instances work great as development servers. They are actually a lot quicker than the name implies.  Furthermore we’ve added Redis as part of our stack to store and update current system state efficiently.

For our code we are planning to move from Subversion to git (github) to keep our code. Not only because everyone seems to be making that move, but also to rely less on a single server to hold our most precious assets and the excellent features github offers.

After two months of usage we are very happy with Pivotal Tracker to hold our features, bugs and chores. Having everything ordered is a great advantage in focusing on what needs to happen next. (Full review coming up)

If all of this makes you curious, please sign up for our mailinglist to make sure you will get access as soon as possible.

Posted in Progress Report | Leave a comment

Development Update

We are making steady progress on the next release of Observu.  Our main focus so far has been the monitoring of website availability and the creation of a dashboard that instantly tells you whether or not everything is ok and what happened in the last 24 hours.

For me the dashboard is permanently opened in a tab just like I have with gmail and I spot outages before the notification mail is in my inbox. The biggest advantage of a dashboard over e-mail notifications?  I don’t have to check if the problem has already been resolved.

Now the basics are in place, we are working on the server-agent which will collect your server vitals and more advanced reporting and notification options. While working out ways to make notifications rules both flexible, easy to understand and quick to set up.

Is there something you really miss in your current monitoring setup? Please let us know and we may be able to fit it in.

Posted in Progress Report | Leave a comment