> One of my clients' website traffic is composed of over 75% bot traffic. Server-side logs are unusable for anything other than site performance.
I'm unclear how broad you intend that second sentence to be, but there's still a ton of info you can glean just from server-side logs:
- Referrer info, and, by extension, popular search terms being used to find your site;
- Paths on your site causing 5xx errors (so pages which might be triggering an error in a server-side script)
- Paths on your site causing 4xx errors, and associated referrers (might be broken links on your own site; might be stale search engine indexing)
- Mobile vs desktop access statistics
Finding this data among a bunch of bot-induced noise might be annoying, but if they're good bots and sending proper UA headers specifying their botness, it's easy enough to filter out. Even otherwise there might be typical bot-like behavior you can find and account for such as not sending a referrer header or trying to access known exploitable PHP scripts (in which case you should block that IP address for a few hours or days - there are programs which can do this sort of thing automatically but frustratingly I can't recall the name of one off the top of my head right now).
Granted, a lot of this can be spoofed, but I'm pretty sure the number of people sending spoofed referral or UA headers is dwarfed by the number of those (like me) who block Google Analytics and similar cruft entirely.
No, you're right, I shouldn't have written something so dismissive. (I do include error tracing as part of "performance" for what it's worth but those have their own system from within the app itself)
Frankly I would love to see some serious low-config solutions to analyzing server-side logs. Oh, especially Fastly. Client in question uses Fastly and it blew my mind to find out that there was nothing in place to answer simple questions such as "what are the slowest paths to respond", "which paths are a cache hit most often", "which paths are most hit overall", etc. And being able to look at various dimensions such as browser, bot traffic, country of origin, etc. If you have suggestions…
Any log analyzer will tell you which path is the most hit. For slowest paths, I think a server daemon could theoretically log how long it took to serve the page from request in to last byte out, but I don't know if any of them do that - you might have to set up a custom format for logging, and then from there you'd need to tell your analyzer how to interpret that field. For cache hits, I guess it'd depend on what sort of cache you have in mind, but that might be something you could only effectively log at the application level.
I'm unclear how broad you intend that second sentence to be, but there's still a ton of info you can glean just from server-side logs:
- Referrer info, and, by extension, popular search terms being used to find your site;
- Paths on your site causing 5xx errors (so pages which might be triggering an error in a server-side script)
- Paths on your site causing 4xx errors, and associated referrers (might be broken links on your own site; might be stale search engine indexing)
- Mobile vs desktop access statistics
Finding this data among a bunch of bot-induced noise might be annoying, but if they're good bots and sending proper UA headers specifying their botness, it's easy enough to filter out. Even otherwise there might be typical bot-like behavior you can find and account for such as not sending a referrer header or trying to access known exploitable PHP scripts (in which case you should block that IP address for a few hours or days - there are programs which can do this sort of thing automatically but frustratingly I can't recall the name of one off the top of my head right now).
Granted, a lot of this can be spoofed, but I'm pretty sure the number of people sending spoofed referral or UA headers is dwarfed by the number of those (like me) who block Google Analytics and similar cruft entirely.