This seems to be a pretty common sentiment here. But, here's my question, if you...

amenod · on May 23, 2017

Been there. Believe me: stick to Python scripts! Always. And when you finally land that first customer and you have trouble scaling, first scale vertically (buy better machine) and work days and nights to build a scalable solution. But no sooner.

Why? Because your problem is not technical, it is business related. You have no idea why your startup will fail or why you will need to pivot. Because if you did, it wouldn't be a startup. Or you would have had that client already.

You might need to throw away your solution because it is not solving the right problem. Actually, it is almost certain that it is solving a problem nobody is prepared to pay for. So stick to Python until people start throwing money at you - because you don't have a product-market fit yet. And your fancy Big Data solution will be worth nothing, because it will be so damn impossible to adapt it to new requirements.

I wish I could send this comment back in time to myself... :-/ But since I can't, how about at least you learn from my mistakes and not yours?

EDIT: good luck!

knz · on May 23, 2017

> first scale vertically (buy better machine)

With tools like message queues and Docker making it so easy to scale horizontally you don't even have to go vertically.

We just won an industry award at work for a multi billion data point spatial analysis project that was all done with Python scripts + Docker on EC2 and PostgreSQL/PostGIS on RDS. A consultant was working in parallel with Hadoop etc and we kept up just fine. Use what works not what is "best".

Edit: A dumbed down version of the Python/Docker piece for anyone interested (https://medium.com/@mbaker/horizontally-scaling-gis-python-a...). It's really easy to scale horizontally with Docker...

hvidgaard · on May 24, 2017

> With tools like message queues and Docker making it so easy to scale horizontally you don't even have to go vertically.

That depends entirely on the workload. It's not always a good idea to move from one sql instance, to a cluster of them. Just buy the better machine that gives you time to make a real scalable solution.

sah2ed · on May 23, 2017

What's your firm's name?

knz · on May 23, 2017

It was for my day job and I'd prefer not to have that publicly linked in a searchable format to my HN profile.

My profile has a link to my personal website though and it's the current employer listed on the associated LinkedIn profile.

nimchimpsky · on May 23, 2017

Why is it searchable if you name them here, but not via linkedin ? When your hn profile explicity links to yr website.

knz · on May 24, 2017

I assume searching for [employer] could direct someone here vs searching for knz/me specifically would take you to my HN profile page.

I'm not ashamed of anything I've said on HN but would rather not have people just searching for my employer ending up here (especially since I work in an office that routinely deals with sensitive political and community issues). It's a minor amount of (perceived) anonymity vs stating my name/job title/employer here!

VintageCool · on May 24, 2017

I love your website.

knz · on May 24, 2017

Ha. Thanks! It's a bit basic but has proven sufficient.

ScottBurson · on May 23, 2017

Well, they could start by using something faster than Python. I would tend to use Common Lisp, but Clojure would be the more modern choice.

But yes, scaling up is far easier than scaling out. A box with 72 cores and 1.5TB of DRAM can be had for around $50k these days. I think it would take a startup a while to outgrow that.

amenod · on May 24, 2017

Python is plenty fast where it matters. You have heavily optimized numerical and scientific libraries (numpy and scipy) and can easily escape in C if it matters that much to you. But in my experience bad performance is usually a result of wrong architecture and algorithms, sometimes even outright bugs, often introduced by "optimization" hacks which only make code less readable.

This holds for all languages, of course, not only Python. Forget raw speed, it is just the other end of the stick from Hadoop. Believe me, you don't need it. And even when you think you do, you don't. And when you have measured it and you still need it, ok, you can optimize that bottleneck. Everywhere else, choose proper architecture and write maintainable code and your app will leave others in the dust. Because it is never just about the speed anyway.

ScottBurson · on May 25, 2017

I agree that it depends on what you're doing, and the speed of the language often doesn't matter -- that has to be the case or Python would never have caught on to begin with.

But you can write code in Common Lisp or Clojure that's just as readable and maintainable (once you learn the language, obviously) as anything you can write in Python, and the development experience is just as good if not better.

nimchimpsky · on May 23, 2017

Could it play crysis on ultra settings tho ?

ereyes01 · on May 23, 2017

Your choice of "big data" vs. python scripts sounds just like the classic trade-off of scope creep vs. good enough.

IMO the answer is almost always "good enough". This has been expressed in countless tropes/principles from many wise people, like KISS (Keep It Simple Stupid), YAGNI (You Ain't Gonna Need It), "pre-optimization is the root of all evil", etc.

If you go the YAGNI route, and when your lack of scale comes back to bite you (a happy problem to have), you'll have hard data about what exactly needs to be scaled, and you'll build a much better system. Otherwise, you'll dig deeper into the pre-optimization rabbit-hole of hypotheticals, and in that case, it's turtles all the way down (to use another trope).

andrevan · on May 23, 2017

Exactly, you know the least amount about the problem when you first start.

isoprophlex · on May 23, 2017

[insert trope about premature optimization here]