At the end of 2010 we decided that our development efforts were too fragmented and we needed to focus. We had dozens of websites, each either needing a lot of work or were not really future proof. We decided to select three of them, one of which was Observu. The two most important reasons being: first of all, we really needed it ourselves at that time. Secondly, we wanted to appeal to other developers as that is what we do best. Other projects such as FlexLists mostly appeal to developers and people in education and even our consumer oriented website picturepush.com appeals more to the techie, professional crowd than to any other.
We set out with the following key ideas:
- We want to collect all kinds of data, especially combining availability, server and application data
- We really wanted notifications by phone
- It should fit well with the cloud, so it should not rely on manually configuring each server
- Receive data at a fine-grained (every minute) resolution
After a few months of full-time development (april 2011) we already had a product that helped us a great deal by monitoring our own websites (more than 20 at that time)
We then started setting up the basics for the infrastructure: load balancing, automated deployment, efficiently storing the time series data, etc. etc. As a big sufferer of the not-invented-here syndrom we did almost everything ourselves, including designing the website and the logo.
In september 2011 we ran out of funds to continue development. We decided we really believed in the product and sold most of our other websites. Other than that we were lucky to find a client where we could apply a lot of knowledge we learned while building Observu as well as apply Observu itself in practice. We advised them on performance improvements, automated deployment, auto scaling, a redundant database setup and proper load testing.
This was a nice opportunity, but it did slow our development down at first. We did however learn a lot about features we really needed and never considered: e.g. auto-scaling your server pool results in a lot of short lived servers and thus monitors that just stop receiving data.
By june 2012 we felt development wasn’t progressing as it should: consulting and other projects got in the way again. We decided to invest a bit more of our consulting revenue and hired a developer on Odesk. We were lucky enough to find a young but very bright guy that made a lot of progress on especially our reporting and data explorer. We continued this till september, unfortunately our funds were limited and the dev had to go back to university, further limiting his availability. Development came down to us again, however our workload was already pretty heavy working on customer projects again. Finishing those last few features had to be done in the weekends when there were no projects to coordinate.
Of course some ‘last fixes’ had bigger implications than I anticipated, but we’ve finally got to a point where we felt confident that we got a product that is really useful for a lot of admins and developers. It’s unavoidable to leave a lot of features we really want in there for the future and we do feel some anxiety about competitors that popped up while we were developing. However, we could not postpone launch any longer and even skipped on payment integration just to get your feedback as soon as possible.
We got quite a bit of signups from the mailing list we built, but very little actual feedback or requests came our way. In the mean time we we had to pay our bills and work on mobile application development. However, it was taking up all of our time, resulting in not getting the most out of our trial users at all. It resulted in a big go/no-go moment. So in July 2013 we decided to take the plunge one more time as well as bring someone in to help us with marketing and business development. This paid off in many ways: we quickly learned a lot more about our users and quickly started to turn trials into paying subscribers.
For the long term we believe we can leverage our open architecture to really monitor anything and utilize machine learning techniques to automatically discover trends and outliers and take big steps in prioritising information and exclusion of false positives. We want to apply this not just to infrastructure and availability but to everything measurable in operating an online business.
Some more detailed aspects we feel we need to focus on as soon as possible:
- The trend to support more real-time data: every few seconds
- Full page load measurements and error checking (already in testing)
- Support for monitoring high-volume log files (e.g. access logs)
- Log file search and filtering
- Create low-overhead (async) ways of sending data to Observu
- Create proper support for rich exception logging that is easy to browse and includes meta data as well as libraries for all popular platforms
- Import for CloudWatch metrics
- Aggregated reporting (e.g. combine error logs from all servers in a cluster into a single view)
- An app with push notifications
Next time I’ll write more about what we did the last few months to turn our beta into a serious subscription business.