Hacker Newsnew | past | comments | ask | show | jobs | submit | Rezo's commentslogin

Cloudcraft | Senior Software Engineers | NYC, Seattle, USA, REMOTE only | Full-time | https://cloudcraft.co

We're looking for full-stack Senior Software Engineers with React & Node.js experience.

Cloudcraft provides tools for software developers, currently focusing on AWS architecture design. Join our small 100% remote team and you will have the chance to make a big impact.

Our ideal candidate is self-motivated with excellent written communication skills, and is always looking to improve and learn. Previous experience with AWS services, graphics / game programming, contributing to open source or personal Github projects and any additional programming languages are a big plus but not a must.

Our stack consists of modern JavaScript with React on the frontend + Node.js on the backend and literally every single AWS service due to our product's unique nature. You'll have the opportunity for a lot of learning and experimenting on the job!

We're bootstrapped, profitable and growing. Competitive salary and serious about work-life balance. To apply please email [email protected] with your resume or any links you'd like us to check out. No recruiters or agencies please.


Cloudcraft | Senior Software Engineers; Graphic Artists | REMOTE only | Full-time https://cloudcraft.co

We're looking for:

- Full-stack Senior Software Engineers with React & Node.js experience.

- Graphic Artists with technical chops (please include your portfolio when applying).

Cloudcraft provides tools for software developers, currently focusing on AWS architecture design. Join our small 100% remote team and you will have the chance to make a big impact and take ownership of projects and your own work.

Our ideal candidate is self-motivated, has excellent written and verbal communication skills, and is always looking to improve and learn. Previous experience with AWS services, contributing to open source or personal github projects and any additional programming languages are a big plus but not a must.

Our stack consists of modern JavaScript with React on the frontend and Node.js on the backend and literally every single AWS service due to our product's unique nature. You'll have the opportunity for a lot of learning and experimenting on the job!

We're bootstrapped, profitable and growing. Competitive salary and serious about work-life balance. Work from anywhere in the world. To apply please email [email protected] with your resume or any links you'd like us to check out, and include "HN: [Position] Cloudcraft" in the subject line. Even if you've applied before, please do feel free to apply again. No recruiters or agencies please.


Cloudcraft | Senior Software Engineer; UX Designer; Graphic Artist | REMOTE only | Full-time https://cloudcraft.co

We're looking for:

- Full-stack Senior Software Engineers with React & Node.js experience.

- UX Designers

- Graphic Artists with technical chops (please include your portfolio when applying).

Cloudcraft provides tools for software developers, currently focusing on AWS architecture design. Join our small 100% remote team and you will have the chance to make a big impact and take ownership of projects and your own work.

Our ideal candidate is self-motivated, has excellent written and verbal communication skills, is interested in UX and has a sense of design, and is always looking to improve and learn. Previous experience with AWS services, contributing to open source or personal github projects and any additional programming languages are a big plus.

Our stack consists of JavaScript with React on the frontend and Node.js on the backend and literally every single AWS service due to our product's unique nature. You'll have the opportunity for a lot of learning and experimenting on the job!

We're bootstrapped, profitable and growing. Competitive salary and serious about work-life balance. Work from anywhere in the world. To apply please email [email protected] with your resume or any links you'd like us to check out, and include "HN: [Position] Cloudcraft" in the subject line. No recruiters or agencies please.


Cloudcraft | Senior Software Engineer | 100% REMOTE | Full-time https://cloudcraft.co

We're looking for full-stack Senior Software Engineers with modern JavaScript and React experience.

Cloudcraft provides tools for software developers, currently focusing on helping teams work with AWS. Join our small, 100% remote, engineering team and you will have the chance to make a big impact and take ownership of projects and your own work.

Our ideal candidate is self-motivated, has excellent written and verbal communication skills, is interested in UX and has a sense of design, and is always looking to improve and learn. Previous experience with AWS services, contributing to open source or personal github projects and any additional programming languages are a big plus.

Our stack consists of JavaScript with React on the frontend and Node.js on the backend and literally every single AWS service due to our product's unique nature. You'll have the opportunity for a lot of learning and experimenting on the job!

We're bootstrapped, profitable and growing. Competitive salary and serious about work-life balance. Work from anywhere in the world. To apply please email [email protected] with your resume or any links you'd like us to check out, and include "HN - Cloudcraft" in the subject line. No recruiters, agencies etc. please.


Lots of people seem to be wondering why Instant Pot has become a hit while electronic pressure cookers have existed as a category for quite some time.

I personally think it's a great case of tipping points, networking effects and branding all working together.

The Instant Pot is a genuinely good product, so it didn't have a trouble finding early users. These people then produced recipes, books and videos not for pressure cookers, but for the Instant Pot specifically. There's 1600+ books for Instant Pot on Amazon, everything from how to cook Keto meals to Indian food. If you have a Breville Fast Slow, and I have a Cuisinart CPC-600 pressure cooker, the cooking times, settings and pressure levels aren't transferable between the two, and may produce quite different results. Hence the networking effect of everyone having the same brand and model of cooker, combined with the tipping point of reaching a certain mass of Instant Pot users, causing an explosion of recipes and guides, which again drives further adoption.

So why don't people create "Breville Fast Slow Pressure Cooker" recipes in the first place? I think it's because of the branding. The Instant Pot name itself is already fun and self-describing, and the marketing downplays the pressure cooking aspects. Pressure cooking has a negative association historically from a safety point of view. So while everyone tries to sell electronic pressure cookers, I think most people who buy this product aren't interested in pressure cookers at all, instead they're specifically getting an Instant Pot. And while technically they may be the same, the customers don't necessarily perceive it that way.


>I think most people who buy this product aren't interested in pressure cookers at all, instead they're specifically getting an Instant Pot. And while technically they may be the same, the customers don't necessarily perceive it that way.

I don't know how deliberate it was but de-emphasizing its pressure cookerness was a smart move. A lot of people still associate "pressure cooker" with "explosions" and, even if they know intellectually they're not really dangerous, they'll still move on to the next item.

As others have said, the price is also quite reasonable--in fact, a lot of stovetop pressure cookers cost more. $100-ish is around the point where a lot of people will take a flyer on something and won't be too put out if it starts gathering dust after a few months.


I have read an interview with the owner where he said he deliberately added & marketed a bunch of redundant safety features because Americans are afraid of pressure cookers.


I wonder how one properly make a safety valve that works even when smeared by food, or prevent the smearing from happening. Should the valve be really large in cross-sectional area, so that the force a food clot would have to withstand becomes large?


Pressure sensor that cuts off the electricity if pressure gets too high. No more electricity, no more heating, pressure drops. Another one I've seen on my mom's pressure cooker that she's had for probably 40 years or more, is a physical plug that will shoot out if the pressure gets too high and the release valve gets stuck. Of course, not sure what the potential damage from this plug would be (I've only ever seen it shoot into the pot when it was removed from the stove, and cooled too quickly). But better than the whole thing popping.


The Instant Pot claims to have some sort of mechanism that will lower the cooking pot and break the seal in an overpressure situation: http://instantpot.us/benefits/safety-features/


I think almost everyone was expecting this, still, it's great to see it happen.

Amazon truly listens to their customers and delivers what they want, even if they have their own competing in-house solution as well. I do think that for new projects, you'll see EKS being the more popular pick over ECS, which never reached quite the same mind-share as Kubernetes.


ECS will slowly be replaced, it doesn't make any sense anymore, and AWS was totally smart to "kill" their own product. Either you do like this, or you are out. Google and Azure provide K8 since months now... Companies want more and more solutions which allow them to "easily" migrate from one cloud to another - having everything based on a proprietary solution is no go...


ECS always felt rushed to me. The semantics of day-to-day operations always felt really awkward. Common things like rolling updates were a 3-step operation (or they were for us), and node replacement was a pain. It could be that we were using it wrong, but I never felt motivated to find out the right way. Kubernetes came along, I played with it briefly, and never wanted to go back.

I think ECS will, and should, be retired. It was a lurch in the right general direction, but ultimately missed the mark.


It WAS rushed. Amazon has customers banging on their door demanding container support in 2014 and they got ECS out the door, fast. Now it’s 2017 and Kubernetes seems to be the winner of the “orchestration wars” so they’re pivoting to that. Smart move of them to ship something and patch it up later if needed, they successfully defended against GKE’s assault and are vying to stay on top.


> Smart move of them to ship something and patch it up later if needed, they successfully defended against GKE’s assault and are vying to stay on top.

If you had to release a product to support the open source front runner, you did not successfully defend against it; you conceded after your tooling adoption failed.

As long as Kubernetes leads the way, lock in at any cloud provider is prevented (can even move back on-prem when the winds shift again). Kudos to Google for enabling that, but they have their own motives (ie disrupting AWS uptake).


>As long as Kubernetes leads the way, lock in at any cloud provider is prevented

That might be a little strong. They still have lots of other proprietary offerings you might use along with K8S. Cloudwatch, various database services, Lambda, SQS, S3, etc.


Don’t rely on primitives without open alternatives unless you want to be chained to your vendor. Today it’s roasting Oracle during the keynote, tomorrow it’ll be today’s “underdog”. It’s easy to talk customer success when the money firehose flows.

MariaDB and PostgreSQL both work well outside of RDS.


I suppose, but there's some stuff that's just hard to avoid. Like tooling to break down cloud costs, or network configuration, monitoring, provisioning block storage, etc. You can get to less lock-in, but it's hard to get to none.

Last I saw, "pets" like databases, were not optimal in K8S either. You can make it work, but it's higher effort.


I agree that more work is required on the K8S front. Vendor lock-in is a risk to insure against IMHO.

Disclaimer: I’m in risk management


Kubernetes wasn't the front runner when ECS was released. Nobody knew what would happen. There was a distinct chance ECS could own everything, or Mesos, or Docker’s orchestration, or something else altogether.


>Common things like rolling updates were a 3-step operation (or they were for us), and node replacement was a pain

I had the same experience until I created an ECS stack using CloudFormation, which made it much easier to manage.


They resisted vehemently for a long time, even when it was painfully obvious it was wrong, and you’d just end up having these totally surreal conversations with their ECS product manager that would make your head spin they made so little sense. Glad that’s over.


That’s cause it’d be way better for them if ECS “won”, but it didn’t, so they adapted. Similar to how Docker now has had to add Kubernetes support after being combative and dragging their feet for years.


It was just pride since google is their rival, didn’t make any sense otherwise.


I'm glad that I can run Kubernetes on AWS now (again), but I still can't run ECS on my laptop... hello Amazon? Are you home?


I thought that’s what Blox is for https://github.com/blox/blox/blob/dev/README.md


Does this just use a tool that runs on your laptop, to schedule and manage ECS clusters on Amazon? That's really not what I was looking for... I was hoping to prototype ECS-based solutions without spending money on cloud resources.

I have a laptop with 8GB+ RAM and a fast SSD, it doesn't have any trouble running fairly complex constellations on Minikube and I could later rebuild and/or install them on a production Kubernetes cluster, without any changes.

Can I do something like that with Blox, or is this another different way to consume ECS and spend money on EC2 nodes to run containers?

Edit: I would be satisfied if you told me, I still need to consume some AWS services like SNS and CloudWatch to use this toolkit, but that with Blox, I don't actually need to run my containers on ECS unless I want to expose them to the world.

I haven't found any tutorial or guide that indicates this is anything other than a different scheduler for ECS.


Thank you!


"ECS product manager" pretty much sums it up. They are gonna tow the ECS line till the ship sinks. A program manager would have been the real convo. Now the EKS product manager's star will shine.


They are at least a year late with this though. People were expecting this last year as well. And it's still a preview.


Exactly, this looks perfect for taking a screenshot of a page[1], or converting a page to a PDF[2] in just a few lines of code.

If you have an existing web service, this appears suitable for actual production usage to deliver features like PDF invoices and receipts, on-demand exports to multiple file formats (PNG/SVG/PDF) etc., which has quite different requirements compared to an automated testing framework.

[1] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...

[2] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...


Cloudcraft | Frontend Engineer | Earth | REMOTE (only), FULL-TIME or CONTRACTOR https://cloudcraft.co

We're looking for experienced Frontend or FullStack Engineers with modern JavaScript and React experience.

Cloudcraft provides tools for software teams working with AWS. Join our small, 100% remote, engineering team and you will have the chance to make a big impact and take ownership of projects and your own work.

Our ideal candidate is self-motivated, has excellent written and verbal communication skills and has worked professionally with React or equivalent frontend experience. Functional programming and previous experience with AWS services, graphics programming (including games) or SVG rendering are a big plus.

Our stack consists of JavaScript (100% ES6+), with React on the frontend and Node.js on the backend and literally every single AWS service due to our product's unique nature. You'll have the opportunity for a lot of learning and experimenting on the job! We're bootstrapped, profitable and growing. Work from anywhere in the world, we don't care. I'm in NYC. To apply please email [email protected] with your resume or any links you'd like us to check out, and include "HN - Cloudcraft" in the subject line. No recruiters or agencies, please.


You should put some way of contacting you in your HN profile, or message me if you're open to remote work.


Sorry, but if a junior dev can blow away your prod database by running a script on his _local_ dev environment while following your documentation, you have no one to blame but yourself. Why is your prod database even reachable from his local env? What does the rest of your security look like? Swiss cheese I bet.

The CTO further demonstrates his ineptitude by firing the junior dev. Apparently he never heard the famous IBM story, and will surely live to repeat his mistakes:

After an employee made a mistake that cost the company $10 million, he walked into the office of Tom Watson, the C.E.O., expecting to get fired. “Fire you?” Mr. Watson asked. “I just spent $10 million educating you.”


Seriously. The CTO in question is the incompetent one. S/he failed:

- Access control 101. Seriously, this is pure incompetence. It is the equivalent of having the power cord to the Big Important Money Making Machine snaking across the office and under desks. If you can't be arsed to ensure that even basic measures are taken to avoid accidents, acting surprised when they happen is even more stupid.

- Sensible onboarding documentation. Why would prod access information be stuck in the "read this first" doc?

- Management 101. You just hired a green dev just out of college who has no idea how things are supposed to work. You just fired him in an incredibly nasty way for making an entirely predictable mistake that came about because of your lack of diligence at your job (see above).

Also, I have no idea what your culture looks like, but you just told all your reports that honest mistakes can be fatal and their manager's judgement resembles that of a petulant 14 year-old.

- Corporate Communications 101. Hindsight and all that, but it seems inevitable that this would lead to a social media trash fire. Congrats on embarrassing yourself and your company in an impressive way. On the bright side, this will last for about 15 minutes and then maybe three people will remember. Hopefully the folks at your next gig won't be among them.

My take away is that anyone involved in this might want to start polishing their resumes. The poor kid and the CTO for obvious reasons, and the rest of the devs, because good lord, that company sounds doomed.


Yeah when I read that my first thought was that the CTO reacted that way because he was in fear of being fired himself. I wouldn't be at all surprised if he wrote that document or approved it himself.


So at what point are you allowed to fire someone for being incompetent? Blowing away the production database seems to rank pretty high.

Note that I'm not talking about the situation in this article. That was a ridiculous situation and they were just asking for trouble. I'm asking about the perception that is becoming more and more common, which is that no matter what mistakes you make you should still be given a free pass regardless of severity.

Is it the quantity of mistakes? Severity of mistakes? At what point does the calculus favor firing someone over retaining them?


> blowing away the production database seems to rank pretty high.

In this case, it's not a matter of degree, it's a matter of responsibility. The junior dev is absolutely not the one responsible for the prod db getting blown away, and is absolutely not responsible for the backups not being adequate. As somebody else mentioned, this is like leaving the electric cable to the production system strewn across the office floor, then firing the person who happened to trip over it.

I agree somebody's job should be in jeopardy, especially if the backups weren't adequate: not for a single mistake, but for a long series of bad decisions and shoddy oversight that led to this.


This has nothing to do with competence. Competence is not never making mistakes (if somebody tells you he never makes mistakes, it's actually more likely he's just incompetent enough to not notice them). Competence is arranging work in a way that mistakes don't result in a disaster. Which clearly wasn't the job of the junior dude, and very likely was the job of the CTO. I could easily count at least half-dozen ways in the situation was a total fail before the mistake happened. So no, one shouldn't be given free pass for mistakes. But one should expect mistakes to happen and deal with them as facts of life. As for killing whole prod database, anybody who have spent some good time in ops/devops/etc. has war stories like this. Dropping wrong instance, wiping wrong system, disabling network on wrong system... A lot of people been there and done that. If it didn't happen to you yet and you're in the line of work where it can - it will. It would feel awful too. But it'll pass and you'd be smarter for it.


> Competence is arranging work in a way that mistakes don't result in a disaster.

I like this definition.


> So at what point are you allowed to fire someone for being incompetent?

If someone is still expected to be learning, mistakes (even large ones) are to be expected. Incompetence has to be judged against reasonable expectations. In the case here, there was a series of pretty severe mistakes, but deleting the database isn't what I'm thinking of.

Protecting the database was the job of the more experienced employees of the company, and ultimately the CTO. Some combination of people at the company were negligent, and the absence of actions taken to protect their systems shows a pattern of irresponsible behavior.


You fire people when they stopped producing values for the company.

In my opinion, mistakes should never be considered the person's fault. The development process should be designed to prevent human mistakes. If mistakes happen, that only means the process has been designed poorly.


No kidding, --he should get a bonus for finding a HUGE bug in their security lol.


You can't treat mistakes as no-ops. This event demonstrated lack of attention.


Don't be ridiculous - the first day at a new job can be incredibly stressful and disorienting. And even if somehow this doesn't apply to you, keep in mind that it does to a lot of people.


Ha! I missed that it was his first day.

World's worst on-boarding guide!


Firing should only really be an option when someone doesn't respond well to re-training and education.


It's hard to imagine a startup allowing themselves to die because they're trying to patiently re-train and educate someone.

I know startups are a limit case, but we didn't bother to make those sorts of distinctions for this article, so it's worth considering.


If the startup is large enough to have employees who can be fired then it's large enough to train/educate them. 5 whys may be too many for a small startup but 1 is certainly too few.


Here's some simple practical tips you can use to prevent this and other Oh Shit Moments(tm):

- Unless you have full time DBAs, do use a managed db like RDS, so you don't have to worry about whether you've setup the backups correctly. Saving a few bucks here is incredibly shortsighted, your database is probably the most valuable asset you have. RDS allows point-in-time restore of your DB instance to any second during your retention period, up to the last five minutes. That will make you sleep better at night.

- Separate your prod and dev AWS accounts entirely. It doesn't cost you anything (in fact, you get 2x the AWS free tier benefit, score!), and it's also a big help in monitoring your cloud spend later on. Everyone, including the junior dev, should have full access to the dev environment. Fewer people should have prod access (everything devs may need for day-to-day work like logs should be streamed to some other accessible system, like Splunk or Loggly). Assuming a prod context should always require an additional step for those with access, and the separate AWS account provides that bit of friction.

- The prod RDS security group should only allow traffic from white listed security groups also in the prod environment. For those really requiring a connection to the prod DB, it is therefore always a two-step process: local -> prod host -> prod db. But carefully consider why are you even doing this in the first place? If you find yourself doing this often, perhaps you need more internal tooling (like an admin interface, again behind a whitelisting SG).

- Use a discovery service for the prod resources. One of the simplest methods is just to setup a Route 53 Private Hosted Zone in the prod account, which takes about a minute. Create an alias entry like "db.prod.private" pointing to the RDS and use that in all configurations. Except for the Route 53 record, the actual address for your DB should not appear anywhere. Even if everything else goes sideways, you've assumed a prod context locally by mistake and you run some tool that is pointed to the prod config, the address doesn't resolve in a local context.


You made a lot of insightful point here, but I'd like to chime in on one important point:

> - Unless you have full time DBAs, do use a managed db like RDS, so you don't have to worry about whether you've setup the backups correctly.

The real way to not worry about whether you've set up backups correctly is to set up the backups, and actually try and document the recovery procedure. Over the last 30 years I've seen situations beyond counting of nasty surprises when people actually try to restore their backups during emergencies. Hopefully checking the "yes back this up" checkbox on RDS covers you, but actually following the recovery procedure and checking the results is the only way to not have some lingering worry.

In this particular example, there might be lingering surprises like part of the data might be in other databases, storage facilities like S3 that don't have backups in sync with the primary backup, or caches and queues that need to be reset as part of the recovery procedure.


"Backups are a tax you pay for the luxury of restore" [1]

A lot of people pay the tax and never even try the lux.

[1] http://highscalability.com/blog/2014/2/3/how-google-backs-up...


Good blog post. This, I suggest, is its most essential point:

"Prove it. If you don’t try it it doesn’t work. Backups and restores are continually tested to verify they work"


And put a firewall between your dev machines and your production database. All production database tasks need to be done by someone who has permission to cross in to the production side -- a dev machine shouldn't be allowed to talk to it.


I would argue that no machine should be allowed to talk to each other unless their operation depends directly on each other. If I want to talk to the database, I have to either SSH to a worker machine and use the production codebase's shell, or directly to a DB machine and use a DB shell.

We've made things so reports and similar read-only queries can be done from properly firewalled/authenticated/sandboxed web interfaces, and write queries get done by migrations. It's very rarely that we'll need to write to the database directly and not via some sort of admin interface like Django's admin, which makes it very hard to do bulk deletions (it will very clearly warn you).


Would you recommend all these steps even for a single-person freelance job? Or is it overkill?


Depends. Do you make mistakes?

I absolutely do. "Wrong terminal", "Wrong database", etc. mistakes are very easy to make in certain contexts.

The trick is to find circuit-breakers that work for you. Some of the above is probably overkill for one-person shops. You want some sort of safeguard at the same points, but not necessarily the same type.

This doesn't really do it for me, but one person I know uses iTerm configured to change terminal colors depending on machine, EUID, etc. as a way of avoiding mistakes. That works for him. I do tend to place heavier-weight restrictions, because they usually overlap with security and I'm a bit paranoid by nature and prefer explicit rules for these things to looser setups. Also, I don't use RDS.

I'd recommend looking at what sort of mistakes you've made in the past and how to adjust your workflow to add circuit breakers where needed. Then, if you need to, supplement that.

Except for the advice about backups and PITR. Do that. Also, if you're not, use version control for non-DB assets and config!


For windows servers I use a different colored background for more important servers.


I do this with bash prompt colors on all our servers. Prod is always red.


I don't do production support on freelance development jobs. Even if I have to sub the hours to one of my associates, I always have a gatekeeper, that being said, when I design systems the only way to get to production is via automation, e.g something gets promoted to a prod branch in github, and production automation kicks off a backup and then applies said changes. The trick is to have a gatekeeper and never have open access to production. It's easy even as a one man shop. Git automation and CI are simple with tools like GoCD and other CI tooling and only take a day or two to set up, faster if you are familiar with them.


It depends on how much is at stake. If product does not have users yet, then there is only small downside in accidentally killing database, so it probably make sense to loose some production database security access in order to increase speed of development. But if you already have a legacy system on your hands with many users/data - then it's time to sacrifice some convenience of immediate production database access for security.


Depends on what you are hired for. If you are hired to create a web application and you spent time trying to create a stable environment with proper build processes it might be looked upon poorly. Everyone has different priorities and some have limited budgets.


I agree, it's the fault of the CTO. To me, the CTO sounds pretty incompetent. The junior engineer did them a favor. This company seems like it is an amateur hour operation, since data was deleted so easily by an junior engineer.


Yup, I've heard stories of junior engineers causing millions of dollars worth out outages. In those case the process was drilled into, the control that caused it fixed and the engineer was not given a reprimand.

If you have an engineer that goes though that and shows real remorse your going to have someone who's never going to make that mistake(or similar ones) again.


Agreed. Several years ago as a junior dev I was tasked with adding a new feature- only allowing a user to have 1 active session.

So, we added a "roadblock" post auth with 2 actions- log out other sessions and log out this session.

Well, the db query for the first action (log out other sessions) was missing a where clause...a user_id!

Tickets started pouring in saying users were logged out and didn't know why. Luckily the on-call dev knew there was a recent release and was able to identify the missing where clause and added it within the hour.

The feature made it through code review, so the team acknowledged that everyone was at fault. Instead of being reprimanded, we decided to revamp our code review process.

I never made that kind of mistake again. To this day, I'm a little paranoid about update/delete queries.


We all make this mistake eventually, often in far more spectacular fashion. My lessons learned are

1) Always have a USE statement (or equivalent);

2) Always start UPDATE or DELETE queries by writing them as SELECT;

3) Get in the habit of writing the WHERE clause first;

4) If your SQL authoring environment supports the dangerous and seductive feature where you can select some text in the window and then run only that selected text — beware! and

5) While developing a query to manipulate real data, consider topping the whole thing with BEGIN TRANSACTION (or equivalent), with both COMMIT and ROLLBACK at the end, both commented out (this is the one case where I use the run-selected-area feature: after evaluating results, select either the COMMIT or the ROLLBACK, and run-selected).

Not all of these apply to writing queries that will live in an application, and I don't do all these things all the time — but I try to take this stance when approaching writing meaningful SQL.


#5 is the big one. I left WHERE off an UPDATE on "prod" data once. Fortunately it wasn't mission critical data that I ended up wiping and I was able to recover it from a backup DB. I never did anything in SQL without wrapping it in a transaction again.

I will note that depending on your DB settings and systems, if you leave the transaction "hanging" without rolling back or committing, it can lock up that table in the DB for certain accessors. This is only for settings with high levels of isolation such as SERIALIZABLE, however. I believe if you're set to READ_UNCOMMITTED (this is SQL Server), you can happily leave as many hanging transactions as you want.


> 2) Always start UPDATE or DELETE queries by writing them as SELECT;

On that point, I'd love a database or SQL spec extension that provided a 'dry-run' mode for UPDATE or DELETE and which would report how many rows would be impacted.

-- 123338817 rows will be affected

Oooops made an error!


> I'd love a database or SQL spec extension that provided a 'dry-run' mode for UPDATE or DELETE and which would report how many rows would be impacted.

I mean, if your DB supports transactions, you are in luck.

Start a transaction (that may be different among vendors - BEGIN TRANSACTION or SET AUTOCOMMIT=0 etc) and run your stuff.

If everything looks good, commit.

If you get OOOps number of records, just rollback.


MySQL Workbench won't let you run an UPDATE or DELETE without a WHERE by default.


Not sure why this isn't default in many tools. That's 99% for hazards.


Put the following into your ~/.my.cnf to enable it for the command-line client:

    [client]
    safe-updates=1


Thanks for the tip, however put that in [mysql] section or otherwise you'll ruin mysqldump command not recognizing that option.


We have to manually whitelist raw queries from active record that do full table scan so this also helps with mistakes like this.


You are not alone.

There is even a song in Spanish about forgetting to add a WHERE in a DELETE: https://www.youtube.com/watch?v=i_cVJgIz_Cs


This is amazing.


> Luckily the on-call dev knew there was a recent release and was able to identify the missing where clause and added it within the hour.

Raises questions about deployment. Didn't the on-call have a previous build they could rollback to? Customers shouldn't have been left with an issue while someone tried to hunt down the bug (which "luckily" they located), instead the first step should have been a rollback to a known good build and then the bug tracked down before a re-release (e.g. review all changesets in the latest).


Yup, agreed entirely. I'm a bit fuzzy on the details of that decision...he very well may have rolled back and then deployed a change later in the evening. It was fixed by the time I checked my email after dinner.


UPDATE cashouts SET amount=0.00 <Accidental ENTER>

Oops. That was missing a 'WHERE user_id=X'. Did not lose the client at the time (this was 15+ years ago), but that was a rough month. Haven't made that mistake again. It happens to all of us at some point though.


I'm beginning to think this is a flaw in SQL. It's so easy to bust the entire table.

Could've had something like 'UPDATE ALL' required for queries without filtering.


just run BEGIN TRANSACTION before


I'm guessing this feature was never tested properly

we all assume code (or feature) that are not tested should be assumed to be broken


At a former employer, we had a Megabuck Club scoreboard; you got your name, photo and a quick outline of what your (very expensive!) mistake had been posted on it. Terrific idea, as:

a) The culture was very forgiving of honest mistakes; they were seen as a learning opportunity.

b) Posting a synopsis of your cockup made it easier for others to avoid the same mistake while we were busy making sure it would be impossible to repeat it in the future; also, it got us thinking of other, related failure modes.

c) My oh my was it entertaining! Pretty much the very definition of edutainment, methinks.

My only gripe with it was that I never made the honor roll...


We had something similar at one of my jobs, its hard to relay it in text but it was really a fun thing. Mind you this is at a fortune 100 company and the CIO would come down for the ceremony when we did it, to pass the award. We called it the build breakers award and we had a huge trophy made up at a local shop, with a toilet bowl on it. If you broke the build and took down other developers then the passing of the award ceremony was initiated, I would ping the CIO (as it was my dev shop) he would come down, and do a whole sarcastic speech about how the wasted money was all worth it because the developers got some screw off time, while the breaker fixed the build. It was all in good spirit though and people could not wait to get that trophy off their desk, it helped that the thing was probably as big as the Stanley cup.


We built actual, physical thingamajigs; 'breaking a build' more often than not meant fragments of steel whizzing around the workshop while we were all getting doused in hydraulic fluid for our trouble.

Note to self: be very careful using Ziegler-Nichols to tune multi-kHp-systems. Very careful. Cough.


Yep. I had a junior working for me once a few years ago that made a rather unfortunate error in production which deleted all of several customers' data. I could tell he was on pins and needles when he brought it to me, so I let him off the hook right away and showed him the procedures to fix the issue. He said something about being thankful there was a way to fix the problem, and I just smiled and told him A) it would have been my fault if there hadn't been; and B) he wouldn't have had the access he did without safeguards in place. Then I told him a story about the time I managed to accidentally delete an entire database of quarantined email from a spam appliance I was working on several years earlier. Sadly, my CTO at the time did NOT prepare for that.

I lost a whole weekend of sleep in recovering that one from logs, and that was when I learned some good tricks for ensuring recoverability....


Agreed. Also, why didn't they have a backup of some sort? The hard drive on the server could have failed and it would have been just as bad.

Sounds like an incompetent set of people running the production server.


As a lot of companies I bet they HAVE backup, just never tested if the backup process works. It is absurdly common...


This is trivial tho'. Just setup a regular refresh of the dev env via the backup system. Sure it takes longer because you have to read the tapes back but it's worth it for the peace of mind, and it means that every dev knows how to get at the tapes if they need to.


Well yes, but there's a string of WTFs here, lots of 'trivial' stuff that wasn't done!


From my experience, if I do not test my backups they stop working in about 1 year. So if I do not test backups for over a year then my assumption is that I probably do not have working backups.


Most likely something like this. There is probably backup software running but it's either nothing but failed jobs or misconfigured so the backups aren't working correctly.


My favourite in that regard is an anecdote shared by a customer; he said one of their techs had discovered (luckily in time) that the long-running automated backup script on a small, but important server wrote the backup volume to...

...RAM.


And here I've been wondering how could it get worse than writing to local storage ...


I can tell you since I work support for a backup product.

Lot of people think that naming a folder on a local drive "Disaster Recovery" counts as having an offsite Disaster Recovery copy of backups. The number of large corporations whose backups are in the hands of such people is frightening.


OP says the team is 40+ and CTO just let them all walk on a catwalk.


"It's your first day, we don't understand security so here's the combination to the safe. Have fun!!"


"we have a bunch of guns, we aren't sure which ones are loaded, all the safeties are off and we modified them to go off randomly"


"your first day's task will be to learn how to use them by putting them to the heads of our best revenue-generating sales people and pulling the trigger. don't worry it's safe, we'll check back in with you at the end of the day."


If someone on their first day of work can do this much damage, what could a disgruntled veteran do? If Snowden has taught us anything, it's that internal threats are just as dangerous as external threats.

This shop sounds like a raging tire fire of negligence.


He didn't follow the docs exactly. That doesn't matter, though, your first day should be bulletproof and if it's not, it's on the CTO. The buck does not stop with junior engineers on their first day.


> He didn't follow the docs exactly

Sure, but having the plaintext credentials for a readily-deletable prod db as an example before you instruct someone to wipe the db doesn't salvage competence very much.


I wouldn't be surprised if the actual production db was never properly named and was left with an example name.


Don't tell Etsy that


Thanks for Tom Watson quote, I'd never heard it before, it's a good one. Also agree with everything else you just said, this is not the junior devs fault at all.


He might be inept, but in this instance the CTO is mainly just covering his own ass.


"Yeah the whole site is buggered, and the backups aren't working - but I fired the Junior developer who did it" Is not how you Cover Your Ass ™.


You can be inept WHILE covering your ass. I'm not saying he's a genius.


Blaming the new guy is not covering your ass. Blaming the senior engineer who put the credentials in the document would be covering your ass.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: