You also need to subtract some bits from symmetries right?
Like you can mirror the tape. You can relabel states B and C. And you can relabel symbols 1, 2, 3. Though the analysis is complicated a little bit by the fact that some (combinations) of those operations may yield the exact same machine.
Only if you want to count the number of different possible TM behaviours.
The official count of BB TMs as given in OEIS sequence A052200 ignores such symmetries.
Also, is there any sense in restricting the universe of machines to those that include one and only one halting symbol? Machines with more or fewer than one halting symbol are possible, but they're less interesting, right?
If you mean multiple halting transitions, then it doesn't really help for BB purposes. If the machine does halt, then only one of the halting transitions will ever be taken, and the others will be 'dead code'.
If you mean multiple halting states, then that is possible, and it can be used as a model of computation for, e.g., computable functions returning a single bit as output. But you still don't gain any additional running time before the machine halts.
As for no halting transitions, the problem is choosing what point to measure the tape at. One thing you can do is to put a mark on one of the transitions, then see whether the machine only runs that transition a finite number of times, or if it keeps running it forever. This yields the "Beeping Busy Beaver" numbers [0], which are uncomputable even if you have an oracle for the halting problem.
What I mean is that instead of measuring the bits of a particular program BB(n,k), we could instead define a function BBWOHT(n,k) ("Busy Beaver With One Halting Transition"), and measure the bits of a particular BBWOHT(n,k) program, which would be fewer bits than in the equivalent BB(n,k) program.
Perhaps this is semantic games, but I'm trying to get at the idea that while there may be 60 bits of information in the BB(3,4) program, there are fewer in the BBWOHT(3,4) program, because we can prima facie exclude a bunch of "uninteresting" BB(n,k) programs from consideration if we restrict ourselves to BBWOHT(n,k), in the same way that we can exclude a bunch of "uninteresting" BB(n,k) programs from consideration if we restrict ourselves to BB(n,k) programs that aren't symmetries of other BB(n,k) programs.
There are many inefficiencies in Turing Machines which you could try to work around with increasingly complicated encodings. But that will never yield a provably optimal machine encoding such as you can obtain with BBλ2 [1].
Can you clarify what you mean by BBλ being "provably optimal"? IIUC BB functions for any Turing-complete computation model should be equivalent up to a constant. Maybe something like: there exists N, c: forall n >= N: BBλ(n) < BB(n + c) and vice-versa. I am not sure what the exact equation will be off-hand, maybe in this case we will need to replace BB() with a version based on binary encoding of TMs.
> for any other busy beaver BB based on self-delimiting programs, there is a constant c such that a(c+n) >= BB(n)
This requires an information theoretic measure of program size, such as bits or bytes, whereas the TM-based BB measures in states. I don't believe there are additively tight bounds on how many states are needed to encode an arbitrary string of n bits.
Like you can mirror the tape. You can relabel states B and C. And you can relabel symbols 1, 2, 3. Though the analysis is complicated a little bit by the fact that some (combinations) of those operations may yield the exact same machine.