Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have some insight into this because this claim is about my company Fivetran:

“…relies on the data source being able to seek backwards on its changelog. But Postgres throws changelogs away once they're consumed, so the Postgres data source can't support this operation”

Dan’s understanding is incorrect, Postgres logical replication allows each consumer to maintain a bookmark in the WAL, and it will retain the WAL until you acknowledge receipt of a portion and advance the bookmark. Evidently, he tried our product briefly, had an issue or thought he had an issue, investigated the issue briefly and came to the conclusion that he understood the technology better than people who have spent years working on it.

Don’t get me wrong, it is absolutely possible for the experts to be wrong and one smart guy to be right. But at least part of what’s going on in this post is an arrogant guy who thinks he knows better than everyone, coming to snap conclusions that other people’s work is broken.



> When their product attempts to do this and the operation fails, we end up with the sync getting "stuck", needing manual intervention from the vendor's operator and/or data loss. Since our data is still on Postgres, it's possible to recover from this by doing a full resync, but the data sync product tops out at 5MB/s for reasons that appear to be unknown to them, so a full resync can take days even on databases that aren't all that large. Resyncs will also silently drop and corrupt data

I don't know, but it sounds like you skipped over most of the reasons why the author was annoyed by Fivetran. You advertise "Connect data sources to PostgreSQL in minutes using Fivetran" but if Dan Luu -- who is certainly an intelligent and capable engineer -- and his coworkers can't figure out how to use your product correctly, and if your customer support also can't figure out why the sync breaks, then maybe this isn't mere customer 'arrogance'.


> if Dan Luu -- who is certainly an intelligent and capable engineer -- and his coworkers can't figure out how to use your product correctly, and if your customer support also can't figure out why the sync breaks, then maybe this isn't mere customer 'arrogance'.

Dan Luu claims, among other things, to experience hundreds of software bugs per week. If you believe the things he writes then he's not at all representative of a normal customer.


Hundreds seems attainable.

Today, for me: * Firefox mobile didn't load the keyboard on a textbox making me kill and restart the app to get it to work. * Firefox mobile pull to refresh triggered while scrolling up * I ask Alexa to turn on some lights and she dinged like she did it but nothing happened. * I turned on my office light that has a routine to turn on a space heater and another light. Only the first light turned on * The Roomba got lost and ignored its keep our zone and ran into the Christmas tree skirt. * When I ran out to get groceries android auto wouldn't connect until I restarted the car. * On another errand, apple car play refused to play music even though it said it was. * A website told me I had unsaved changes and wanted to know if I wanted to navigate away from the page without saving.... While clicking the save button. * I got a letter in the mail from Amex telling me that they couldn't reach me by email and I needed to log into my account and pay a zero dollar balance. This is after I closed all my accounts months ago, I get two letters each month to sign into an account that was deleted to pay a bill that doesn't exist. * Octopi said it's webserver wasn't running, a refresh fixed it. * Build tools at work linked the wrong binary for some tools and I had to manually correct the symlinks. * Insert 10 or more bugs with pipe wire and pulse audio on Ubuntu.

I'm sure there's more, every day is like this. Yesterday I had a plethora of bugs trying to get screenshare and webcam streaming to work for a video conference despite working for a dry run a few minutes prior.

And right now, line breaks aren't working in this reply


This sounds like a plausible list.

Often times on HN it can be hard to distinguish between scoundrels/trolls/losers/etc… with ulterior motives and genuine people with accurate information, but for this case since Dan Luu has done a lot of credible actions, he should be given the benefit fo the doubt.


Many of those things may not be software bugs as I would normally understand the term, but rather software behaving as specified/intended, where that specification/intention has unexpected and/or undesirable consequences. (The line break thing certainly is, for example). I find it unhelpful to conflate the two.


> software behaving as specified/intended, where that specification/intention has unexpected and/or undesirable consequences

Unexpected behaviour is how I choose to define "bug".

It doesn't matter if the programmer intended it. It's still a bug if it behaves contrary to the user expectation.

It might be that the best resolution is better documentation / training, but it's still worth of a bug being raised to fix.


You've never had contradictory user requirements thrown at you, with the expectation that you somehow implement them both? By this definition all software with more than one user is buggy, and it's impossible to do otherwise until we get AGI to do everything.


It's a bug if it behaves contrary to the programmer's expectation. Full stop. There is too much diversity in users to go the other way on this.

If a product doesn't meet a user's expectations it may be a poor fit, improper usage, or even a braindead terrible design, but these are not bugs.


I personally don't mind when people lump intentionally bad behavior along with the bugs.


Which ones? I could see the pull to refresh maybe not being a bug, but every single other one sounds like a bug to me.


I mentioned the line breaks are by design. Roomba could be a hardware error or sensor noise. Text box not bringing up the keyboard it's hard to be sure whether that's a bug or intended. The auto thing not connecting could easily be hardware or deliberate. Letters from Amex might well be as specified by their processes. Webserver not running might have been deliberate maintenance.

Many of them could be software bugs, sure, but without actually figuring out what's going on and what the root cause is it's hard to tell.


Oh, I wasn't sure which direction the line break "certainly is" went in. But they have line breaks elsewhere in their post...

I would say failing to deal with hardware error that strongly is a bug. Keyboard I'm pretty sure is a bug, I've had plenty of situations where the keyboard code locks up and needs app restarts. Auto not being auto would be a weird thing to lie about, otherwise it's a bug. "Specified by their processes" a process is an algorithm, sending incorrect messages for $0 could be an algorithm bug or an implementation bug but either way it's a bug in the software, it's not doing that because someone decided it actually should do that. It said the webserver wasn't running but it was, that's a bug if they didn't have a exact unlucky timing.


I think most people experience hundreds of bugs per week, they may not notice them and or realize they are bugs.


I experience hundreds of bugs per week. Some in my own product. Most in others. Many are small and solved with a refresh. Most go unnoticed or are easily ignored.


That wouldn't really surprise me. Off the top of my head things I've experienced in the past week or two and remembered:

- Four involving a Vim upgrade being pushed by one tool and blocked by another; a broken shortcut causing gVim to load with an error message and no working menus - not fixed by uninstall/reinstall; unable to delete one of the Vim DLLs; Windows Explorer window lockup after right click (possibly it had the DLL loaded for Vim context menu icon?).

- FireFox regularly stops loading pages until I visit Help -> About FireFox and see that it's silently done an upgrade and is waiting to restart.

- Cloud service with SSO login working for months, stopped working with mysterious 'error processing your credentials' type message.

- and it has a broken 'send password reset email' feature, no email gets to me.

- checking for blocked emails in SaaS email filter, when I put my email in the 'Recipient' box of the search, Edge browser autofill puts my email address in the 'Sender' search field as well, incorrectly / uselessly.

- Edge browser autofill dropdowns routinely cover SaaS product's HTML/JS dropdown menus.

- New SaaS product which has a username/password login instead of SSO, but login goes to a prompt with only 'use SSO' (which we don't) or 'forgot my password' (after logging in successfully). Clicking 'use SSO' gets past that screen, I think it's internal between-different sub-sites SSO abstraction leaking to the customer.

- SaaS product offers 2-factor auth using Google Authenticator, if using the signup key instead of QR code it appears in a difficult-to-select text label. Entering an expired TOTP code shows an incorrect error saying the code can only be used once (Time Based OTP codes are valid for the time duration, often ~30 seconds for any number of uses).

- Saving details for that site in a SaaS password vault from a different vendor, an error like 'no saved password' popped up despite having a password saved.

- At least four different debugger errors - set breakpoint either not setting, or not breaking, or debugger crashing, or code crashing without error message until upgrading the runtime.

- Windows 'restart and update' over and over until 'check for updates' and it decided there weren't any, then the option disappeared.

- Dell laptop firmware upgrade which rebooted to upgrade firmwares, then said it failed.

- Always on VPN for work which stops passing traffic until the service is restarted, a couple of times a month.

- Backup software which runs for a week then fails, until rebooted. Vendor support blames the network.

- Monitoring software which reported the wrong datastore sizes from a VMware host until the host management services were restarted, one or both of them are bugged.

- Theater website's date range selection on mobile, I dropped down the start, slowly worked through year/month and tapped a day, dropped down end, again went through year/month and tapped a day after the start, 'apply' button was greyed out, tapped outside the calendar controls, and they hadn't registered the days. Retried and it worked second time.

- Multiple websites where the infinite scroll or 'load more' links stop working, including big ones like YouTube.

- SaaS product search finding an item for one coworker, not finding it for another coworker.

- SaaS ticketing system not finding the ticket ID typed into the search until the search is retried a couple of times, multiple times per day.

- SaaS ticketing system not handling session timeout properly and presenting a normal interface where everything looks fine until attempting to interact with it and then presenting an 'oops something went wrong' style error message instead of going to the login prompt.

- Switching back to VMware web client in a browser tab, finding it showing the "your session will expire in 20 seconds" countdown box, knowing that the session expired an hour or three ago, and the 'extend session' button doesn't work.

- SaaS password vault, VMware web client, not handling session timeout properly and presenting a normal interface where everything looks fine until attempting to interact with it and only then whisking away to the logon prompt.

- Secure Jumpbox tool doesn't save its settings between sessions, even though it explicitly has a 'make this the the default value' type option.

- Secure jumpbox tool presents multiple screen resolution options which don't work, only some of them work.

- Database engine sync reporting different sync status in different views.

- Reddit discussion from 1 year ago with URL to the CDC website which is now a 404.

- iOS keyboard unreliable at opening the select-all/copy/paste popup dialog, often needs prompting and reprompting.

- iOS keyboard unreliable at starting swiping.

- After powering off phone and on again, torch GUI would switch 'on' but torch LED was not lit up, persisted for maybe 10 seconds after unlocking, as if there were internals still booting and the GUI was disconnected from control of the hardware.

- pfSense (OpenBSD-based) firewall upgrade process from old versions is broken.

- Computer game where the character turned invisible and then never turned visible again; glitchy bounds checks; glitch-teleported back to a starting point in the middle of a fight; unreliable animation on characters being enchanted; glitchy behaviour happening for an attack which doesn't have that behaviour. 'Disable background service' setting doesn't disable the background service. Network glitched and dropped back to the main menu not counting any of the progress made. Showed loading screen then didn't load. Network glitch and the fire button became the wrong way round - gun stopped firing when clicked, fired continuously when not clicked. Audio stopped working after RDP connection until game restarted.

- Proprietary software content to fill the disk with logfiles that aren't tidied up or limited by default.

- Search engines which don't search for what I type in and show something else instead. Try Google for "spirtling", top result is a Wiktionary page for it as an archaic spelling of using a Scottish porridge stirrer, other relevant results. DuckDuckGo result has changed it to "sporting" and are horse racing tips, then spirling and spiraling, then spirtling.

- New York Times' Connections game, on small screen phone the "one away" message appears in the wrong Z-order behind the top bar, unreadable unless tapped on to bring it to the front. After a game the 'register an account to track progress' popup partially blocks the 'close summary screen and look at the final game state' X button.

- Runaway memory use in Amazon.co.uk and YouTube.com tabs left open too long until FireFox restarted that 'about:processes' can't help with.

- Teams continuing ringing on cellphone after I've answered the call on laptop.

- Google StreetView glitches where it sometimes gets stuck at junctions and can't move forwards but can double-click-jump there instead.

- and if we count glitchy / unexpected / poor user interface or user experience as well.. Python's IDLE laggy keyboard input until latest version installed. Vendor renewal email which doesn't say what is being renewed in a useful way. Etsy online shop which encouraged me to click on 'related' links then gave me a captcha because they couldn't tell if I was human or robot. Inconsistent YouTube keyboard shortcuts. AWS's chat LLM gave incorrect instructions for how to achieve firewall rule change on AWS server. VoIP phone app which switches back to the signing-in animation every few minutes.

- The https://www.skyscanner.net/ date selection is poor (it's not clear that clicking 'Depart' also sets the 'Return' date until you try and get rid of the calendar so you can set the return date and find you cannot. You can (X) to clear the return date and the depart date stays configured so you can set a new return date, but you can't X out of the depart date to set a different one without clearing the return date as well and having to reset that. Once both dates are set, you can't adjust them; clicking the 'Return' box where the UX looks like you're setting a return date, when you click on a day it will clear both dates and then set the date you click on as the new departure date, getting in the way. The depart and return boxes have the same visual loko as the From/To boxes which you can type airport codes in, but this is misleading as you cannot type a date into the date boxes.

- https://calculator-online.net/sohcahtoa-calculator/ says "This calculator uses the SOH.CAH.TOA mnemonic method to solve the sides and angles of a right triangle. It provides step-by-step calculations using the SOHCAHTOA formula" but put in leg a=1 and sin ⍺=30 and it calculates that the hypotenuse is length 2, showing the step-by-step solution using Pythagoras' Theorem using the length of sides b and c that we haven't given and don't know.

And this isn't counting design I find confusing, documentation which is unclear, layout bugs where things don't visually align or the spacing is misleading or the organization is unhelpful, bugs in tabstop ordering, or not-really-bugs like Notepad++ lagged with a single line of 10MB of data, or EMACS lagged with a long line, etc.


Just calling this one out: "Inconsistent YouTube keyboard shortcuts." because it's so annoying and wouldn't it have been easier if they were all hooked to the same code, than to implement them differently? Assume YouTube.com in FireFox on Windows.

- 'space' will play/pause the video if the main video has focus, or if most other controls are selected (e.g. volume), but if the settings are open then 'space' will change the setting and play/pause the video.

- holding 'space' when the main video has focus will 2x speed the video for the duration of the hold. Holding space when the 'play' button has focus will not do anything. Not double-speed, not repeatedly play/pause.

- 'k' will play/pause the video whatever control has the focus. Holding 'k' will stutter the video repeatedly switching play/pause. (Why?). Holding 'k' will never 2x speed the video. Space and K are both play/pause but implemented differently and inconsistently.

- Left/Right arrows rewind/fastforward by 5 seconds when most controls have focus, but when the volume slider has focus then left/right arrows change the volume. The Auto-play enabled/disabled control is also a left/right sliding one but the left/right arrows do not move that when it has focus.

- Up/Down arrows control the volume. Even when the volume slider has focus and left/right arrows are also controlling the volume. But wait, the up/down arrows don't control the volume when the settings are open; then they only move the selection in the settings. (Contrast with spacebar which activates settings and play/pauses at the same time, contrast with Return which sometimes does play/pause and does activate settings but does not do both at the same time).

- Page Up/Page Down with the video focused is browser page scroll. Click on the video position bar, the thin red one, and Page Up/Down do rewind/ffwd by 1 minute! With this focused, up/down arrows stop being volume control and now do the same rewind/ffwd as left/right arrows.

- j/l do rewind/ffwd like the left/right arrow keys, but they jump 10 seconds instead of 5 seconds.

- 'Return' activates a focused control so with play/pause focused it will do play/pause; it behaves almost like spacebar and it stutters the video like 'k' when held - but with the main video having the focus, Enter does not play/pause, it does nothing.

WHY are they so inconsistent?


Fivetran works perfectly fine for syncing Postgres databases into Snowflake. My company syncs dozens of them without problems. I can only assume their Postgres database has a non standard set up.


Considering the comment chain involved here chiming in with "I can only assume non standard setup" is pretty hilarious.


Yeah, if you have 1 million dollars to spend every time you run a data migration or anything else that touches many rows.

I've seen some new libraries crop up for writing your own replication slot clients. I wouldn't use fivetran for PG.

Either you have a lot of data and fivetran will be too expensive or you don't, and you're better off just using a postgres OLAP plugin/extension.

Maybe it was because it was in beta, but I had a nightmare of a time with fivetrans API trying to coordinate connectors and destinations and git access.


I have no idea who is right or wrong as to the capabilities. But I believe his story that he couldn't get it working. And I believe your statement that it can be made to work.

When very smart people can't get your product to work as advertised, that's a problem with either the advertising, or the documentation, or maybe the default settings. Or maybe it needs the source data set up in a very specific way.

That kind of plays into the larger point of the essay that outsourcing this sort of thing still requires significant internal knowledge, and therefore may not be as cheap as it looks at first glance.


In general, I absolutely agree with you. It’s basically an instance of “the customer is always right”: if a smart customer can’t get our product working, there is a problem with the product. But this post made a much bolder (and wrong) claim: “the product has a number of major design flaws that mean that it literally cannot work”.


You went too far in your pushback though.

"part of what’s going on in this post is an arrogant guy who thinks he knows better than everyone, coming to snap conclusions that other people’s work is broken"

He's probably wrong about why it was broken, but it was broken.

And it's not exactly "arrogance" to give the best explanation you have, in a blog post about something else, while not mentioning the company name.


> He's probably wrong about why it was broken, but it was broken.

That's going too far. If the customer misunderstands the product or misreads the documentation, that's something worth addressing, but "broken" is not an informative way to describe it.


Dan never called out Fivetran, he wrote a couple sentences about problems he experienced with an anonymous ETL provider. That was it. Hell we don't even know that he's actually talking about Fivetran.

George however should not allowed to come to HN and start talking shit about random people who had the professional courtesy to never even mention the provider in question. The fact that his post isn't flagged, is highly upvoted, and dang hasn't swooped in to chastise him is a prime example of why HN is so fucking ridiculous and hypocritical.


If, sure. Do we have any reason to think the problem was a misunderstanding or a misreading?


We know that Luu was misunderstanding at least some things, since he gave an inaccurate description of what was happening. Given that context, I find it more plausible that he was also misunderstanding other things, than that the thing was broken for him despite other people seeing it working. Even if you weigh the likelihood differently from me, you must admit that that's a possibility, so concluding that it was broken because Luu thought it was broken is at a minimum premature.


Given the other comments talking about problems here, we don't know for sure if it was inaccurate. But even if it was, "plausible" that he was misunderstanding other things is a hell of a lot weaker than the way your comment above treated it as the truth of the matter.

Overall I think it's pretty unlikely that it wasn't broken.


> Evidently, he tried our product briefly

> investigated the issue briefly

> coming to snap conclusions

Where exactly is the evidence that he tried your product only briefly and that he investigated briefly? I've read through it and don't see that anywhere.

After reading your comment, I lean towards you being the arrogant, thin-skinned one about your product and coming to snap conclusions about your customers who are paying for your product and having trouble with it and calling them arrogant instead of looking into why they are experiencing frustration with your product.

Perhaps Dan's conclusion was wrong, but the tone and wording of your response is just off putting and devoid of tact, empathy and teachability.

Something like "No, I don't believe it's broken because x, y and z. But I do see how the developer experience here is left wanting. Maybe we can improve it" would have been so much better.


The evidence is that he didn't read the postgres manual section on log-based replication[1] which would have told him how to configure a postgres master server so that it doesn't delete logs until all consumers have processed them.

It's not a five minute setup, but Dan doesn't write that the setup takes longer than five minutes - he writes that the design is fundamentally broken. Which it isn't, if you read the postgres manual. We're not even talking about the manual of the product he tried out for five minutes - we're talking about the manual of the database he's responsible for administering!

The overall point of the article is fine though. Original Commenter was nitpicking.

[1] https://www.postgresql.org/docs/current/warm-standby.html


> how to configure a postgres master server so that it doesn't delete logs until all consumers have processed them.

Do you think it'd be reasonable for FiveTran to include this little tidbit in their setup documentation? I'm not talking about repeating the Postgres docs, but just a blurb about the need to do this kind of Postgres config?

That's an example of what I mean when I'm calling out georgewfraser to be humble and use Dan's feedback improve his product (in this case by improving the docs) instead of name calling his customers.

Ok, so Dan came to the wrong conclusion and was wrong to say the product was broken, but he had the professional courtesy to not name the company/product. George just attacks his character. Like another commenter mentioned, we don't even know for sure it was FiveTran. Yet, George just jumps in head first with guns blazing.


And "manual intervention from the vendor" didn't involve fixing that, if it was such a trivial configuration issue?


The article is also several years old, No idea if that has an impact on the issue.


Weird approach chastising your customers lack of expertise in something they’re actively trying to pay you to solve for them. He shouldn’t have to be an expert in it.

I was a longtime customer of fivetran who hit these sync issues constantly. Forced resyncs every other month. Was so thankful when our contract ended.


In 2021, my employer was a major customer of Fivetran. Our Postgres syncs routinely broke and required time-consuming resyncs from scratch.

Dan's essay is dated 2022. It is now 2024, so maybe something has changed since then on the code path between Postgres and Fivetran to allow backtracking.


You would be the best one to evaluate if this applies in your case but in many cases where my users say "it's not possible" I end up finding a gap that's more related to usability than technical. I often still find there's something worth learning from this kind of feedback even if it's "wrong".


I assume Dan Luu was using the old “XMIN” method and not Logical Replication.

https://fivetran.com/docs/connectors/databases/postgresql/tr...


> Evidently, he tried our product briefly, had an issue or thought he had an issue, investigated the issue briefly and came to the conclusion that he understood the technology better than people who have spent years working on it..

This doesn't match this:

> Syncing from Postrgres is the main offering (as in the offering with the most customers) from a leading data sync company, and we found that it would lose data, duplicate data, and corrupt data. After digging into it, it turns out that the product has a design that, among other issues, relies on the data source being able to seek backwards on its changelog. But Postgres throws changelogs away once they're consumed, so the Postgres data source can't support this operation. When their product attempts to do this and the operation fails, we end up with the sync getting "stuck", needing manual intervention from the vendor's operator and/or data loss. Since our data is still on Postgres, it's possible to recover from this by doing a full resync, but the data sync product tops out at 5MB/s for reasons that appear to be unknown to them, so a full resync can take days even on databases that aren't all that large. Resyncs will also silently drop and corrupt data, so multiple cycles of full resyncs followed by data integrity checks are sometimes necessary to recover from data corruption, which can take weeks. Despite being widely recommended and the leading product in the space, the product has a number of major design flaws that mean that it literally cannot work.

That description doesn't sound like _he_ briefly used your product, but that company he was working for used your product, found bugs and despite contacting support couldn't make it work. This doesn't read at all as a minor experiment that he didn't put in the time.


The kind of arrogance this comment displays has ensured that I’ll try my best to never use Fivetran anywhere I work ever again.


But did you ever in the first place?


I'm not quite following. His argument appears to be: The replication system requires a backwards seek, Postgres does not support that operation, things break when that operation is attempted.

I don't understand why replication would need a backwards seek - are you saying it doesn't and he is mistaken on that?


This is unavoidable if you are at least a bit smarter than the average person, since in many cases their work just is broken.

It’s taken me far too long to internalize that the chances of someone making an (egregious) mistake in something I rely on to be correct are very much nonzero.


> an arrogant guy who thinks he knows better than everyone, coming to snap conclusions that other people’s work is broken

I see you’ve met my boss.


...how is _this_ the insight that you come away from this post with?

This post is a commentary on product quality issues, the underlying cost models (both goods and services), and the interplay with American culture. There's like 20+ company/product anecdotes in there - a mistake about one detail about one technical detail of one product is wildly uninteresting.


This is the case where you buy from experts instead of doing it yourself. You tried, thought it was impossible, someone else figured it out.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: