Just to not feed the witch hunt further, note that human 30 second solve times can be entirely possible for the easiest puzzles, with enough experience, practice, and a bit of risk-taking; see e.g. https://adventofcode.com/2021/leaderboard/day/1 part 1. But the 4 second solution times we saw last year are not, no matter how you look at it.
But yes, you would have people openly share repositories for automatically ingesting the puzzle text, solving the puzzles, and submitting the results the moment the puzzles opened, leading to inhuman solution times. So, despite the rules stating that you can't do that, the result is that whenever the puzzles were easy enough for an LLM to solve them with high probability – and most of them are – the leaderboards would be overrun with such solutions.
In 2023, the LLMs would still struggle enough that the overall leaderboard (taking all 25 ⋅ 2 puzzles into account) would still be dominated by ordinary people (many of them recording their solutions), in 2024 that was no longer the case. Personally I would go from being able to top 100 regularly, to almost never being able to. I'm going to miss the thrill, and think it's a bit saddening that we can't have nice things, but also ultimately think that getting rid of it is the best option.
I don't understand why this is "hard". Doesn't a donut have this coveted property? I can't think of a way to drill a hole in a donut that would allow a donut through.
Hm, yeah, tested it down to about 500 px width, and the low-resolution devices in Chromium but that was too optimistic then. The modals should of course be closeable, and both game boards simultaneously visible. Played around with the modals a bit, so maybe it works better now?
Yeah, the problem is in a sense solved asymptotically by the optimal construction in https://arxiv.org/abs/quant-ph/0302002, but that one tends to lead to long solutions in practice, so there's plenty of room to try to come up with solutions that give shorter solutions for concrete instances.
That's nifty! There's a lot of symmetry that can help to boil it down. For example, you actually only need row moves, and any solution with column moves can canonically be turned into one with row moves; post-composing with Ci→j is pre-composing with Rj→i.
One can think of the set of all possible board configurations as the vertices as a graph, with edges indicating how to move between configurations. Then your 1536 solutions are the 1536 distinct shortest paths between the starting and target configuration.
Then, you can also choose to consider not just board configurations, but board configurations up to simultaneous permutation of rows and columns; that will also reduce the number of unique solutions.
Heh, nice catch, I think you'll find that with a bit of work, you can make row reduction/Gaussian elimination work here as well. But that the resulting sequences of operations can get very long! One thing I personally like about the puzzle is that once you've played it for a few days, you start gaining some intuition about sequences of moves that are useful, but coming up with a good general algorithm (that also works for larger than 4x4 boards) is still a challenge.
Have you tried some generic pathfinding algorithm like D star lite on the graph with heuristic being the hamming distance from current node to the start ?
For the 4x4 board there is only 2^16 nodes, and 8*2^16 edges, so you can materialize the graph and get away with brute-forcing the whole graph.
But for bigger boards you won't be able to materialize the whole graph.
Maybe there are better heuristics to be found than the simple hamming distance. You should try have an AI look for them, by comparing the performance of a RL path planning vs the basic heuristic.
I tried implementing A* using pointwise Hamming distance, found that it was inadmissible (since it yielded a suboptimal result on par with my manual attempt), then tried again with row-wise Hamming distance but was pretty sure that's inadmissible too (although it did yield an optimal result). I then tried min(row-Hamming, column-Hamming) but I'm not convinced that's admissible either.
I then switched to pure Dijkstra which ended up being faster because evaluation was much cheaper at each step, and despite these heuristics being inadmissible, they didn't result in substantially fewer nodes expanded.
That's almost certainly a function of the problem size -- if it were 5x5, this approach would not have been as successful.
You may need to add some factor for the Hamming distance to make it admissible. On a 4x4 board each move change at most 4 bits. So 4 x the hamming distance should be OK.
Edit: I 'm maybe getting this wrong confusing lower bound and upper bound. Sorry I'm a little rusty.
Edit2: For 4x4 one lower bound is hamming distance/4 , because you need at least these many moves to reach the goal. For 5x5 hamming distance / 5 and so on... But not sure how much this will reduce the need to graph.
Hamming distance doesn't strike me as a useful metric here because how "close" two rows are is entirely dependent on what the other rows are. E.g. if you have a row of all 1's then two rows with maximal hamming distance are only one move away and if on average you have a bunch of n/2-weight rows then two rows different by 1 bit are not close. The best you can do is count the number of total rows matching the target I think?
Yeah, that was what I tried with row-Hamming / col-Hamming (namely: treat entire rows / cols as matches or not). I then used the min of the two to address those issues.
Either way, I guess my implementation had a bug -- A* does yield a significant speedup, but adding the 0.25x scaling factor to ensure that the heuristic is admissible loses almost all of those gains.
For some concrete numbers: with the bug that basically reduced to BFS, it ran in about 7s; with the bug fixed but a wildly inadmissible heuristic, it ran in about 0.01s; with the heuristic scaled down by 4x to guarantee its admissibility, it ran in about 5s.
I think scaling it down by 2x would be sufficient: that lower bound would be tight if the problem is one row move and one column move away from the goal state, but potentially all four rows and columns would not match. In that case, it ran in about 1.6s.
I know of some work on trying out various heuristics for A*; Section 5 of https://arxiv.org/pdf/2201.06508 gives some examples of what works and what doesn't. I don't think D* Lite specifically has ever featured. There's plenty of room for trying to come up with other heuristics, and just for other takes in general.
> But for bigger boards you won't be able to materialize the whole graph.
If we restrict to boards corresponding to solvable puzzles, the number of vertices is https://oeis.org/A002884 (1, 1, 6, 168, 20160, 9999360, 20158709760, …) and indeed grows quickly. It's possible to manipulate the 7×7 case (https://arxiv.org/abs/2503.01467, shameless plug) but anything bigger than that seems hard.
One can ask, for example, how many moves are needed for the hardest n×n Swapple. For n = 1, …, 7 the answers are 0, 3, 6, 9, 12, 15, 18 respectively, but we don't know what the answer is for n = 8.
I have lived in several parts of Denmark and never once encountered the notion of "lørdagsslik"; that concept sounds completely absurd and laughable. No, in every place I have stayed, kids would only ever have fredagsslik (Friday goodies), to be consumed while watching FredagsTamTam (https://www.dr.dk/drtv/serie/fredagstamtam_357146).