The less verbose way of doing semijoins is by an IN subquery.

wvbdmp · 2025-10-18T16:23:35 1760804615

>subquery

>less verbose

Well…

In any case, it depends. OP nicely guarded himself by writing “overusing”, so at that point his pro-tip is just a tautology and we are in agreement: not every use of DISTINCT is an immediate smell.

Sesse__ · 2025-10-18T16:39:00 1760805540

What do you mean? Here are your real alternatives for doing a semijoin (assuming ANSI SQL, no vendor extensions):

  SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2 WHERE t2.x = t1.x );
  SELECT * FROM t1 WHERE x IN ( SELECT x FROM t2 );
  SELECT * FROM t1 JOIN ( SELECT DISTINCT x FROM t2 ) s1 USING (x);

Now tell me which one of these is the less verbose semijoin?

You could argue that you could fake a semijoin using

  SELECT DISTINCT * FROM t1 JOIN t2 USING (x);

or

  SELECT * FROM t1 JOIN t2 USING (x) GROUP BY t1.*;

but it doesn't give the same result if t1 has duplicate rows, or if there is more than one t2 matching t1. (You can try to fudge it by replacing * with something else, in which case the problem just moves around, since “duplicate rows” will mean something else.)

wvbdmp · 2025-10-18T17:34:53 1760808893

No, sorry, you’re certainly correct, I just meant that any subqueries are generally crazy verbose. And then you usually want additional Where clauses or even Joins in there, and it starts to stop looking like a Where clause, so I’m often happy when I can push that logic into From.

Sesse__ · 2025-10-18T17:38:04 1760809084

Yes, I would certainly prefer if you could write

SELECT * FROM t1 SEMIJOIN t2 USING (x);

although it creates some extra problems for the join optimizer.

Little_Kitty · 2025-10-18T20:04:26 1760817866

It's great being able to use an any join (and the counterpart anti join) in Clickhouse to deal with these operations.