Development Update – 6

As Observu is all about improving uptime and removing bottlenecks, we strongly believe that we can’t do with an ad-hoc infrastructure either. Especially as the exact time you need Observu is often in case of emergency, we feel strongly about the ability to recover from outages quickly.

We’ve selected Amazon for hosting because it is both flexible, is available in multiple parts of the world and has an excellent network quality. However, earlier this year is has been shown multiple times that no datacenter has 100% uptime and that if failure occurs, it is big. Therefore we are currently working on our architecture to be able to quickly overcome such events. The actual details warrant a separate post.

Our obsession with reliability also touched an other area of development: Our initial implementation of SMS notifications proved unreliable. Therefore we changed to Nexmo as our partner. It provides us with actual delivery confirmations, allowing us to monitor delivery.

Our Growing Development Stack

I personally always like to know what people are using to create their product, therefore a listing of almost everything we use:

  • Ubuntu on Amazon EC2
  • MySQL
  • Redis
  • PHP
  • Perl
  • jQuery
  • RaphaelJS
  • boto
  • Fabric
  • chef-solo

In the area of 3rd party services we rely on:

  • Amazon AWS
  • Nexmo
  • Tropo
  • Sendgrid
  • Github
  • Uservoice

These services allow us to focus on the things that really matter: gaining insight in all parts of your deployment and staying on top of the events that will occur. To improve that insight, we are currently working with the first customers to implement monitoring as part of their stack. A great example is the need to monitor logfiles centrally as soon as there are multiple servers handling your front-end.

Posted in Progress Report | Comments Off on Development Update – 6

Development Update – 5

An important milestone has been met: we’ve implemented all core features that make up our monitoring system. The last important hurdle was the completion of a flexible, yet easy to understand, system to create and configure event rules.

Of course it is far from the system we imagine. Nevertheless, we feel confident that what we have now is a very useful product.

In addition to this technical progress, I can also present you the new observu.com logo:

Observu.com Logo

Current efforts are focussed around documentation and workflow, to make sure the first users get the experience they deserve. Furthermore, the API is being finalized and the production environment architected.

We are very excited about the coming weeks when the first testers will finally enter the system.

Posted in Progress Report | Comments Off on Development Update – 5

Development Update – 4

We are getting closer and closer each day, we are almost ready for the first bunch of testers. Our attention is shifting more towards the user interface, to create the best experience possible.

An important part of that user interface are the graphs, that start to look better and better. An example is this stacked CPU usage graph:

CPU usage graph

This also shows an other major point of progress: we now have a fairly easy to install daemon script to collect this data on linux servers. Currently it collects load, cpu, memory, disk and network statistics. We are planning on collecting a whole lot more soon.

The final part of this months effort is a mobile website. It provides with a simple and clear way to check your site and server status on-the-go. For example, when you receive a text notification, you can instantly check what is actually going on.

We feel confident that we can allow the first testers access next month and are really looking forward to their feedback. To make sure, you can be one of those testers, sign up for our mailing list!

Posted in Progress Report | Comments Off on Development Update – 4

Development Update – 3

Observu Dashboard It has been a while since I last updated you about our progress. We are still continuing our work on various reports. An important part of this is creating informative graphs. Although we liked Open Flash Chart, it felt a bit sluggish and we decided to go for a solution based on the Raphael library: an SVG abstraction with fallback for IE. It comes with a limited graphing library: g.raphael, but it was not mature enough for our needs. Another Raphael based charting library is Grafico, it’s able to display a few great graphs. However, we choose to create our own, mostly because Grafico depends on Prototype, which we do not use and because we would need to extend it for our own graph types. Although the library is well coded, we did not feel confident about customizing it to our needs. We will open-source our own library as soon as it is in a usable state.

At the same time, we’ve started to work on a very basic mobile website, which allows you to check your status on-the-go. We hope to slowly add more functionality and at the same time keep things really quick and simple.

Another major part of development involves the collection agent. We’ve chosen to use Perl, to maximize portability and reduced dependencies. Additional advantages include easy customization and the ability to verify that it does not contain harmful code. The next challenge in this area is to create an install script that works across distributions.

On the front-end we’ve introduced a new splash page for observu.com, it contains the first iteration of our new logo, which still needs a bit of work. We also got a very nice mascot designed, but we will keep that a secret for now.

If you visit the new splash page, you will also notice that we’ve selected UserVoice for feedback and support. I’ll write a separate blog post about the selection process soon.

We are also talking to potential users about their monitoring needs, if you feel you could contribute by telling us about your problems, please feel free to contact me.

Posted in Progress Report | Comments Off on Development Update – 3

Amazon RDS vs DIY MySQL on EC2 Benchmark

As I was researching online whether Amazon RDS was a viable option, I had a hard time finding reliable benchmarks. The authors of this good book on EC2 mention it to be a bit faster, but without further clarification. The best benchmark I could find was this one. It uses the sysbench tool to test an EC2 instance vs RDS, exactly what I need. It provides the tools for benchmarking and pointed to the difference between running 1 and 10 threads. However, for me this benchmark was missing some vital information, therefore I decided to run my own benchmark using sysbench in a very similar way, with the following adjustments:

  • I’ve used a much bigger dataset: I’ve set it to use 50 million objects, in order to create a 12GB database that will surely not fit the 1.7GB memory.
  • Some parameters like: instance disk vs EBS and MySQL configuration were unspecified

I’ve used the following setups:

  • A small EC2 instance in the USeast region, with Debian squeeze and a standard MySQL install. The database is set-up on a separate EBS volume. (named Mysql on EBS (standard) )
  • The same instance with MySQL tuned to more reasonable values: key_buffer=512M, query_cache=128MB
  • A small RDS instance, set up in the same region

Single Client Thread

First, I repeated the single thread experiment. In this case the instance is not fully utilized. The results are shown below:

System Operations/sec Times (ms)
Transactions Read/Write Other min avg. max. 95th perc.
Mysql on EBS (standard) 18 334 35 4.4 56.9 1186.5 149.1
Mysql on EBS (optimized) 52 991 104 0.0 19.2 728.6 84.4
RDS 23.2 440.6 46.4 11.1 43.1 691.4 90.0

In this experiment the difference between a standard MySQL install and the optimized one is huge. RDS seems to come in comparable to a standard MySQL install, which seems reasonable.

50 Threads

Now, in real development we don’t care about the difference between fast and faster, if your website is growing, what matters much more is performance not deteriorating when things get tougher. Therefore I tried to stretch the database much further by using 50 client threads. This is much closer to the real world with multiple Apache processes constantly hitting the database. Especially in the case where you might have multiple front-end servers connecting to a single database instance. Again the results are shown below:

System Operations/sec Times (ms)
Transactions Read/Write Other min avg. max. 95th perc.
Mysql on EBS (standard) 38 724 76 30.2 1310.7 4662.8 2179.0
Mysql on EBS (optimized) 46 871 92 27.55 1089.4 3031.43 1853.76
RDS 111 2110 222 13.47 450.0 1557.4 807.3

First, the difference between a standard install and the optimized version have been greatly reduced. The most notable result is that RDS performs so much better. This confirms the results the original benchmark but now under conditions that matter to me. Maybe even more important than the difference in query throughput is that RDS does a much better job keeping request times within reasonable bounds. 95% returns within 807ms, compared to 1854ms for MySQL on the EC2 instance.

My conclusion is that although RDS may not perform as well as you can do yourself under ideal conditions, as soon as you are going for realistic loads, RDS can be pushed much further. Of course this should also be possible with DIY optimizations. RDS is after all running MySQL, but I’m sure it’s going to take a significant amount of time and does not outweigh the other benefits of RDS: easier backup and much less management.

November 3th, 2011 Further benchmarking has shown me that it is actually quite easy to bring the throughput of your own instance running mysql much closer to RDS, by increasing the innodb_buffer_pool_size. My lack of experience with InnoDB clearly biased the benchmark above. I do however still notice the difference in response times, RDS is much more stable.

Notes: 1: I’ve also benchmarked thread-numbers in between, but there was no interesting pattern. Results on 4 threads and up are largely similar to the 50 thread one for RDS, while for MySQL the times gradually get worse as the number of threads grows. 2: I’ve also done an experiment running MySQL on the instance disk, instead of EBS, but it wasn’t better and it removes all benefits of using EBS, therefore results are not included. 3: For more reliable results this should probably be repeated at different points in time with multiple instances.

Posted in Benchmarks | Tagged , , , , , | 7 Comments