Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This solution doesn't match the problem. Even the SQL injection example shows him sanitizing the input, which is at odds with the title of the post. Log4J is a more recent example of it being too late/useless to escape the output.


This is an example of why the term "sanitize" just brings confusion and leads to incorrect software. If we say "escape" (for concatenation) or "parameterize" (for discrete arguments) instead, then there's no confusion: we know that it should be done at the point of use, because the procedure for doing so depends on that use.

Calling it "sanitization" implies that the data is somehow dirty, so naturally it should be cleaned as soon as possible, and after that it's safe. But all that accomplishes in general is corrupting the data, often in an unrecoverable way, and then opening up security vulnerabilities because the specific use doesn't happen to exactly match the sanitization done in advance.

It's great to validate the data on input and make it conform to the correct domain of values, but conflating this with output formats and expecting this to take care of downstream security as well just leads to incorrect data along with security vulnerabilities.

PHP's long-ago-removed magic quotes feature was an example of this confusion in action. It not only mangled incoming strings containing single quotes in an effort to prevent SQL injection, but did so in a way that left some databases completely exposed, depending on their quoting syntax.


What?

SQL injection is avoided at the point of usage. Trying to sanitize your input against it is an extremely bad practice. The same is true about HMTL injection (whether you call it XSS or something else).

Log4j is an example of not interpreting text that the developer was never aware that was code. It's kinda of the extreme opposite of escaping your text on usage.


The article says DON'T sanitize when putting it into the database. I think contextual escaping counts as "sanitizing input", so the solution of "don't try to sanitize input" is undermined.


If the user says his name is "Bob'; drop tables students --", that is what you should store on your database. Unless, of course it's not a valid name for the rest of the system.

That's so old and obvious advice that I'm surprised people keep posting here and upvoting. And even more surprised when people keep disagreeing here.


If you're storing "Bob'; drop tables students --" in the database, you had to have sanitized your inputs, or there would be no students table.

The article title says NOT to sanitize inputs. perhaps it's that nuance doesn't fit in a headline, but eh...


The confusion is what is input and what is output. The string "Bob'; drop tables students --" should not be sanitized/encoded on *input* to the application. However, if you're not using parameterized queries, it should be encoded on *output* to the database.

Data should only be sanitized in transit and not stored in an sanitized form. That's what the article is really saying.


No you don't. You use a parameterized query: execute("INSERT INTO foo VALUES (?)", user_input)


I interpreted the message as not sanitizing inputs at the point they are received, a la PHP magic quotes. Instead, escape at the output (the output to the database engine).


> a la PHP magic quotes

Up to this day, the official way to deal with XSS in .Net is by doing sanitization at the receiving point. I imagine the article is directed at that.


That sounds pretty terrible, do you have an example of some docs which demonstrate that practice?


No where in the article do they use "output" to mean from the database engine; they use it to mean "outputting HTML".


The article doesn't explicitly say the words "outputting SQL to the database engine", but that's because the focus is on XSS attacks and the part about SQL injection is just an aside. Clearly it's what they were trying to imply with language like this:

> The only code that knows what characters are dangerous is the code that’s outputting in a given context. And of course use your SQL engine’s parameterized query features so it properly escapes variables when building SQL: ... This is sometimes called “contextual escaping”.

The "context" is that you are outputting to the database engine.


  > your SQL engine’s parameterized query features so
  > it properly escapes variables when building SQL
This is wrong. Parameterized queries do not build an SQL string by escaping the input. The input is actually sent to the database separately from the SQL.

Well, in all sane implementations, anyway. PHP has an PDO::ATTR_EMULATE_PREPARES option that does build SQL from a parameterized query. And, of course, Wordpress has $wpdb->prepare() that returns an SQL string with the parameter escaped. Also, so far as I know, one cannot run a prepared statement from the SQLite CLI, so no parameterized queries there either:

https://stackoverflow.com/questions/20065990/how-to-prepare-...


>This is wrong. Parameterized queries do not build an SQL string by escaping the input. The input is actually sent to the database separately from the SQL.

Your blanket observation is not necessarily true of all databases or database drivers. You found three counter-examples yourself, but there's no reason to not consider them "sane". It's not less correct than for databases that do support prepared statements in the driver protocol.


Sure, maybe it does not literally send a substituted SQL string, but in order to send the parameters "separately" from the query, do they not still eventually get concatenated into a single binary string of some form to be sent across the wire? In spirit I think the same arguments apply there, it's just that the format of the data is not strictly SQL. It's actually the wire format of the database protocol.


You are correct that the parameters go across the wire, obviously, but I've never heard of an attack in which the parameters caused any type of compromise in the wire protocol. I would highly appreciate examples if any exist.


It probably wouldn't result in an attack (unless you were dealing with a really sophisticated attacker), it's just necessary for correctness. Which is also true of all these examples: for example, people won't appreciate having backslashes wrongly inserted around legitimate characters of their names or other personal information, or having the software fail to process their request due to the characters in their name. It's not just a security concern.

In the general case there are certainly many examples of security vulnerabilities created by wrong serialization of data into the wire protocols of services, but maybe not specifically for this situation of query parameters. But maybe there are, I have no idea really. Either way, it's not the application developer's responsibility at that point, it's the responsibility of the people who developed the database driver.


For a long while, input sanitization in the web world was about modifying inputs to strip the problem areas. As such many consider escaping and sanitization to be completely different practices.

It seems like this article is using this differentiation. In my experience, it's very common. It's not worth arguing about.


The article is specifically about sanitizing inputs to prevent XSS attacks. Sanitizing input isn't a great defense against that; you need a defense that better matches the attack.

Validating or sanitizing input input is a reasonably good defense against certain other things. E.g. zeroes in values you'll later divide by, when it's too late to return an error; multi-gigabyte names; information that you want to avoid storing like credit card numbers. That sort of use case doesn't really have a whole lot to do with the article, though.


Yeah little Bobby Droptables is still a thing.


What are you referring to? The SQL injection example is showing what not to do.


"So the better approach is to store whatever [data] the user enters verbatim, and then have the template system HTML-escape when outputting HTML"

With this logic, someone could use a SQL injection. It wouldn't be sanitized as the INSERT is happening, so the SQL injection would be executed.

EDIT: I know he goes on to talk about escaping characters, but the title of the post is "Don't try to sanitize input". My point is simply that SQL injections happen on input, not output. His example of escaping the SQL is at odds with the title of the post.


They're calling the SQL query "output" (from the app to the DB server). The point is that the "bad characters" depend on the context, so it's the step where you combine trusted and untrusted data that you need to think about escaping or validating.


No they're not. They're using the word "output" to mean "back into the HTML".

"So the better approach is to store whatever name the user enters verbatim, and then have the template system HTML-escape when outputting HTML, or properly escape JSON when outputting JSON and JavaScript."


The sentence immediately after that is "And of course use your SQL engine’s parameterized query features so it properly escapes variables when building SQL"


Most SQL systems have bind parameters for this sort of thing. That is a form of encoding the input. You have to encode the SQL values as well. You're basically saying if you don't use the suggested technique, the suggested technique doesn't work. Well, yeah. It has to be used consistently, all the time, every time.

Unfortunately, that's just life. There's no way around it. One way or another you're going to be doing something or you're going to get owned.


They show the solution of using parameterized queries to store the user input verbatim. What is an example of the attack you have in mind?


Sorry, how does this happen if you’re using DB parameter in the query string?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: