Data Infrastructure Services, Part 4: Monitoring

tl;dr - We use New Relic, and we have ELK but aren't relying on it much.

Despite having been to several meetups at the New Relic offices, I didn't really understand what they do until I took this job. New Relic is a suite of tools that takes data from little programs running on the servers running your Ruby app(s) that can tell you how your app(s) are performing. This results in lots and lots of kinds of graphs, from a weekly email that tells you how all your apps are performing and when they are busiest to things like this:

new relic overview.png

I feel like there's a lot you can DO with the New Relic data, but we basically use it for two things: Investigating a perceived slow page, and receiving alerts on performance issues.

new relic events

All of our apps have time out and error rate thresholds that they alert on. New Relic is always getting the data from our servers, so it can notice when these things happen and page/alert in Slack/send smoke signals as appropriate.

When we do have a slowdown for whatever reason, the New Relic alerts are usually tied with the support emails for letting us know quickly. And are more likely to be noticed in the middle of the night.

elk-logo.png

We also have the ELK stack collecting data for most of our apps. (ELK: ElasticSearch, Logstash, Kibana. Logstash sits on your servers and sends things to ElasticSearch, ElasticSearch indexes em, Kibana queries ElasticSearch and makes graphs and stuff.)

Unlike the web event logging, the ELK stack data is near real-time, allowing us to look into weirdness and debug a little quicker. We haven't been relying on it much, and I have barely used it at all. It's definitely been a nice-to-have investigative tool.

I played with it a bit this week, so while it's on my mind, here's what I've learned:

  • you can't download search results from Kibana
  • you CAN download graph data from Kibana
  • you can directly query the ElasticSearch instance using curl(?) and get json
  • you can post json to the endpoint or do simple queries in the uri
  • and if you're me, then you do terrible things to json in your Python Notebook

 

Updated: