Hacker Newsnew | past | comments | ask | show | jobs | submit | kevindong's commentslogin

You can't realistically expect every log format to get a custom schema declared for it prior to deployment.


If you never intend to monitor them systematically, absolutely!

If you're a bit serious you can at least impose date, time to the millisecond, pointer to the source of the log line, level, and a message. Let s be crazy and even say the message could have a structure too, but I can feel the weight of effort on your shoulders and say you ve already saved yourself the embarassement a colleague of mine faced when he realized he couldnt give me millisecond timestamp, rendering a latency calculation in the past impossible.


Sorry if I was ambiguous before. When I said "log format", I was referring to the message part of the log line. Standardized timestamp, line in the source code that emitted the log line, and level are the bare minimum for all logging.

Keeping the message part of the log line's format in sync with some external store is deviously difficult particularly when the interesting parts of the log are the dynamic portions that can take on multiple shapes.


Sure the relative difference between the fastest and slowest approach is 5x. But the absolute difference is still just 125.23 ns. For some perspective, 1 millisecond / 125.23 ns = ~7,985.

There certainly are cases where that slowdown matters. But for the vast, vast, VAST majority of applications, it is completely irrelevant.


Dunno, it's not unheard of for a program to do the same thing over and over again, sometimes millions of times. Every slow program consists of a bunch of individually fast instructions.


If your hot loop is doing configuration setup then your code has bigger problems than this particular pattern.


You could also be configuring many things.


Indeed, but you should probably do that right before looping


Right, but if you needed to configure a million different things, wouldn't you also loop the configuration process?


Graphing metrics and doing transformations/comparisons is (much) faster. Yes, metrics do have to be defined first but in my experience that's a non-issue since the things you want to monitor are usually immediately obvious during development (e.g. request-response times, errors returned, new customers acquired, pages loaded, etc.).

With that being said, it's not a mutually exclusive situation. You can have both. However, some logs used for plotting metrics have near-zero debugging value (e.g. a log line that just includes the timestamp and the message "event x occurred"). Those kinds of logs should be fully converted over to metrics.

Some other logs however are genuinely useful (e.g. an exception occurred, the error count should be incremented, and this is the stack trace).


I don't understand why "some logs used for plotting metrics have near-zero debugging value (e.g. a log line that just includes the timestamp and the message "event x occurred"). Those kinds of logs should be fully converted over to metrics."

What is the difference between logs and metrics then?

Logs are pieces of text and metrics are well defined data points?

I can represent both using the same log format. I can also extract the logs with well defined structure into a columns store or inverted index for fast querying and plotting.

The entire distinction of logs and metrics and keeping one vs the other reeks of strong premature optimization by the software community. Storage is cheap, just dump the raw logs to s3 and run etl on them to extract meaningful metrics.

Logs, metrics and traces have the same representation - text or some kind of json yo have it structured. Metrics are just logs with well defined shema. Traces are logs with correlation ids in them to allow for joining between logs coming from different services.

It's just a data problem, nothing else. However, people keep overpaying for complicated services for observability that they don't need. When they split it into logs, metrics and traces they have to connect their programs using proprietary connectors to these external services, adding vendor lockin and potential for failure when the observability service has downtime. Instead of just dumping logs into stderr as json objects as intended by the unix philosophy.


As you correctly point out, everything you can do with metrics can be achieved via logs if you have enough compute and I/O. But that ignores the reality of what happens when you do indeed have too many logs to use something fancier like column store/inverted indices as you point out? I agree that in the vast majority of cases, it's likely fine to just take the approach of using logs and counting. But plenty of of developers (particularly here on HN) are in that overall small slice of the overall community that does have a genuine need for greater performance than is afforded by some form of optimized logging.

Likewise, traces are indeed (as you point out) functionally just correlating logs from multiple services which is akin to syntactic sugar. But again, that's precisely its value _at scale_: easing the burden of use. I've personally seen traces that reach across tens of services in an interconnected and cyclical graph of dependencies that would be hellish to query the logs for by hand.


I think the author is asserting that trading away the hackability is worth the faster compile times because faster compiles lets you iterate faster and therefore increases hackability (i.e. put in minutes of effort once and get minutes of benefits per compile in the future).


From the summary of the bill:

> Private insurance that duplicates benefits offered under New York Health could not be offered to New York residents.

---

My interpretation of that is that private insurance (which includes employer-based insurance) would be banned.


That's an incorrect interpretation. Private insurance would be restricted to cover things not already covered by the public insurance.


> The benefits will include comprehensive outpatient and inpatient medical care, long-term care, primary and preventive care, prescription drugs, laboratory tests, rehabilitative, dental, vision, hearing, etc. all benefits required by current state insurance law or provided by the state public employee package, Family Health Plus, Child Health Plus, Medicare, or Medicaid, and others added by the plan.

I'm struggling to understand what is not covered beyond that.


If they're no longer manufacturing older devices, they're probably not manufacturing parts for them anymore either.

The only new iPhone that Apple is selling that's not on the repair parts website is the iPhone 11 which is a bit peculiar.


Keep in mind that a substantial portion of users now use ad blockers such that a lot of URLs used for analytics like this are blocked.

Consequently, you can't actually expect to capture 100% of these analytics events nor even expect the percentage captured to stay the same over time since the filter lists are very regularly updated and users enable/disable different ad blockers over time.

More broadly speaking, once you have sent a webpage to the user, you should not expect anything from the user's browser. They may or may not allow whatever arbitrary JS you have on the page. They may even intentionally give you bad data (e.g. hijack the payload to give you intentionally malformed data).

edit: even more broadly speaking, there's additional reasons why you can't expect to receive these kinds of callbacks: consider what happens if a user loses connectivity between loading the page and them navigating away (e.g. their phone loses service because they went into an elevator before navigating away)


> Keep in mind that a substantial portion of users now use ad blockers such that a lot of URLs used for analytics like this are blocked.

How sure are we about this? I'm pretty sure it depends on which market specifically you're in, and the data I'm about to show is of course not perfect, but it seems that not so many users actually do use adblockers today. Although I don't know a single developer who doesn't, and in some web applications I'm running, the majority of users do use adblockers as they are focused on developers.

Chrome is assumed to be the most popular browser (by a large margin last time I checked, so I won't bother to check again) and a quick search puts the user base around 2-3 billion users. Searching for "adblock" in the Chrome Web Store (https://chrome.google.com/webstore/search/adblock?hl=en&_cat...) shows that the most popular adblocker has a user base of ~300,000 users.

That makes 0.015% to 0.01% of Chrome users having the "AdBlock" extension installed. Not that substantial.

If someone has some more accurate numbers than my slightly-educated guess, I'd be happy to be proven wrong.

Edit: The above user base of the adblock extension is wrong. As Jabbles pointed out, I was seeing the number of reviews, not number of users.

So instead, the page lists "10,000,000+ users" so we can assume the true number to be above that, but below "100,000,000+ users" users.

That would put the amount of Chrome users using the "AdBlock" extension between 0.3% and 5% more of less. Closer to "substantial", but not sure if it would impact businesses choice regarding ads/tracking or not.


> So instead, the page lists "10,000,000+ users" so we can assume the true number to be above that, but below "100,000,000+ users" users

Can we? I can't seem to find anything that indicates if/when the next number jump is, just a lot of big name extensions at "10,000,000+". Back in 2016 ABP had a post about their extension alone having 100+ million active users https://blog.adblockplus.org/blog/100-million-users-100-mill... and that's ignoring the >50% of Chrome users on mobile which requires non-extension based blocking.

Going for someone else's numbers instead of trying to build my own I'm finding anything from 10% to near 50% with most estimates being in the range of ~25%.


I just assumed that Google has brackets of 10 - 100 - 1000 - 10000 and so on, which led me to the "but below 100,000,000+ users" part. Not sure why they are hiding the true count (maybe they don't have exact numbers themselves), but if there truly is that many users, not sure why they wouldn't show it.

> and that's ignoring the >50% of Chrome users on mobile which requires non-extension based blocking.

That might be true for Chrome users on Android, but Chrome users on iOS (which aren't as many as Android users although), there isn't a choice of any extension nor non-extension based blocking. Firefox on iOS doesn't even allow extensions, and only Safari seems to have ad-blocking. At least last time I checked, might have changed lately.


You are in fact wrong. iOS users use DNS-based blocking through VPN apps.


You're saying that a significant portion of Chrome + iOS users are A) constantly connected to a VPN and B) that VPN has DNS-based ad-blocking?

I'm not doubting you that some are, but to say that a significant portion of them are, would need some sort of evidence/source to back you up.


> You're saying that a significant portion of Chrome + iOS users are A) constantly connected to a VPN and B) that VPN has DNS-based ad-blocking?

No, they simply said this statement:

> but Chrome users on iOS (which aren't as many as Android users although), there isn't a choice of any extension nor non-extension based blocking.

Is factually wrong as there is a choice; namely ad blocking VPN based apps.

Also these apps probably work differently than you're thinking. The "VPN" isn't a VPN in a traditional sense where your traffic goes to a remote server somewhere on the internet. Using VPN transport is just the method of convincing iOS/non-rooted Android to send the traffic to the app, which itself acts as the VPN server. Of course traditional true VPNs with ad filtering are also a thing but less common and more catered towards those already interested in true VPN service in the first place.

All that said iOS is hard to provide numbers showing usage scale being one way or the other as Apple doesn't publish them like other stores. If I had to guess iOS users would probably be the user group with the lowest ad blocking penetration though I wouldn't go as far as to say it's not worth noting. All of these smaller alternatives probably add up to more than the largest "normal" ad blocking method in the long run and ignoring them on an individual basis can significantly skew your overall result.


Ah, ok. Yeah, if we focus on the specific statement of "there isn't a choice for ad-blocking on iOS", then I was wrong, that's true.

I thought the comment was made in the larger context of which I made my initial comment about, namely the "Keep in mind that a substantial portion of users now use ad blockers" statement, but I understand now that devmor in classic HN tradition chose to specifically answer to one part of my comment while ignoring the rest, and not making it clear what they exactly responded to.


My reply was entirely unambiguous, as you only made a single claim in the comment I replied to, which was that Chrome users on iOS do not have the ability to block ads.

But, in classic HN tradition, you choose to put me at fault for your inability to understand how you represented your own train of thought.


I use an adblocker, and lots of filtering lists, but most of the `navigator.sendBeacon` requests I was seeing weren't being blocked. Sometimes they were when the URL matched a pattern, but often they weren't. Which makes sense since they aren't ads and by design have nearly zero effect on the user experience.

I still wanted to block them though... so I started killing all `navigator.sendBeacon` requests by replacing it with a no-op function on page load. [0]

I have the no-op function log the results to console and it's fascinating seeing all the sites attempting to use it.Some pages on Amazon will fire a sendBeacon request every second or so.

[0] With this uBlock Origin user script: https://gist.github.com/varenc/7c0c17eea480a03d7c8844d81e608...


That's interesting, I wasn't aware of "navigator.sendBeacon" before.

Question about the user script. Blocking WebSockets I can understand as they can be used for exfiltrating data you don't want them to get a hold of. But why disable WASM? It can't be used for exfiltrating, and disabling it probably gives them a stronger data-point than just leaving it on, for when they are able to exfiltrate data (via CSS HTTP requests for example).


Oh I don't actually disable WASM. Those are just different scripts that can apply to pages when a filter matches. I disable sendBeacon everywhere with `*##+js(disable-sendBeacon.js)` but the others I don't use or only use on a specific site. I believe I added the WASM removal just to test how a particular site's fallback would work when it wasn't present. That said, disabling WASM probably reduces your browser fingerprinting bits. I bet fingerprintjs[0] uses it.

UBO already has a built-in set of powerful scripts[1], but I just wrote my own for fun. I think I could have done this by just using the built-in ones.

edit: This filter does the same thing just using the built in `set` script, but won't log to console:

   *##+js(set, navigator.sendBeacon, trueFunc)   
[0] - https://fingerprintjs.com/demo/

[1] - https://github.com/gorhill/uBlock/wiki/Resources-Library


From tests I've ran, about 8-12% of our visitors have some sort of tracking, analytics, or javascript disabling or blocking. This is in an ecommerce site focused toward non-technical users. I'd expect a tech savy browsing audience to be composed of 20% or more visitors with blocking.


You're looking at number of reviews, the number of downloads is significantly higher.


Doh, you're right! I blame it on poor UX in the Chrome Web Store :) Will update my comment...


Not always - more sophisticated analytics will proxy these requests through the websites own domain.


> they may even intentionally give you bad data (e.g. hijack the payload to give you intentionally malformed data).

Interesting. Are there tools we can install to send malicious payloads to surveillance companies?


There are plugins that do this, but I think trying to choke off the signal is more effective than just adding noise.


One interesting thing is that First Class Parcels have noticeably faster delivery times than First Class Letters (as long as you both start and end in the continental US).


I've personally experienced three kinds of coding interviews.

1. Trivial questions that can be done very quickly; e.g. does this list of integers contain any duplicate integers?

2. Non-trivial questions that require a large amount of time to solve if you haven't memorized the solution; e.g. the skyline problem https://leetcode.com/problems/the-skyline-problem/

3. Practical questions; e.g. a question that more-or-less represents something you might actually encounter in your day to day job, for instance querying multiple APIs and joining the data together into a particular expected format

---

1 isn't bad due to its level of ease. 2 is awful since you mostly just need to grind many hours of Leetcode to be able to consistently solve those kinds of problems. 3 is enjoyable since the problems given might actually teach you something you'll use day-to-day.


I'd upgrade personally since the further behind you get, the more painful it becomes to upgrade down the road.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: