Data Infrastructure Services, Part 3: Other Data Sources, especially Web Events
This is an area where I feel like there's GOT to be a ton of other solutions out there, but I only know of the one we happen to be using. (1) Please fill me in on other tools that do some of this and what you like/hate about them!
We use a tool called Segment for web events - getting a record of everything people do on your site. (I remember when we'd just look at web requests, but it's all javascript these days. And ideally you want to see both together. ) As well as giving us libraries to log anything we want to Redshift, they also integrate with a bazillionity other services as sources and destinations of data. The only other source we are using is Intercom, our customer service platform. (We're also sending data to it, which is real cool for people answering questions so they can have context on the account.) But we're also sending the data to our web analytics tools. (I promise to talk more about them later.)
Segment lets us log events from the Ruby backend of our app and from the Javascript that's running locally in the user's browser. They provide a nice debugger view into the real-time events they're receiving:
this is key for seeing if you have correctly added logging for a new event.
They also provide logging for our iOS app. To be nitpicky, it is occasionally frustrating that many of the web events I look at are in the "pages" table, but every iOS event is in its own table, each with exactly the same schema. (Write ALLLL the UNIONs) (Maybe I need to get the devs to use a screen_viewed event that has which screen as an attribute? )
Because of realistic engineering limits and Redshift loadtimes, Segment only promises to get your data to you within 24 hours. This is totally fine for me, but maybe you need something fancier? (Someone please tell me a really useful story of what you're doing with real-time data?)
Big Caveat for those who haven't fought this yet: because a lot of web event logging is in Javascript, you're going to have missing data. Ad blockers will keep out your handy cookies and stop the Javascript from running. Slower computers or slower network connections will make it so that the Javascript doesn't get a chance to actually send the info back to you. You will lose data and that data loss will be biased to people on mobile, and people with slower computers.
Maybe I just have too much trauma from past work, but I'm really happy to not own this system. There's nothing I can do if somehow data gets lost - but there generally isn't when it's an internal system either. It's just one more huge set of details to get right that I don't want to specialize in.
(1) I have heard of precisely one other answer for web events, but haven't yet found anyone who use(s|d) it: Snowplow.