Its number of occurrences is 103,090. In the master's thesis identified as the original source https://cs.uwaterloo.ca/~smwatt/home/students/theses/CSo2005... the Unicode value of the operator occurring 103,090 times is given as 2061, and the thesis helpfully explains that
Unicode 2061, 2062 and 2063 are invisible operators. TeX does not have any of these invisible operators. These invisible operators result from the TEX to MathML conversion.
– 2061 – Function application
– 2062 – Invisible times
– 2063 – Invisible separator
And Wikipedia says that function application may be represented as
U+2061 FUNCTION APPLICATION (⁡, ⁡) — a contiguity operator indicating application of a function; that is an invisible zero width character intended to distinguish concatenation meaning function application from concatenation meaning multiplication.https://en.wikipedia.org/wiki/Function_application#Represent...
I'm not sure though how an automated conversion process would be able to distinguish between these.
No, the data (as described in So's thesis) was mathematical expressions extracted from TeX source code, so the surrounding text and email addresses etc. were ignored. Skimming through by eye I can't see @ in any of So's tables, and searching for the hex Unicode value the tables list for every other character yields no hits: @ is not in the tables.
∋ is there anomalously frequently, and @ is missing, so something seems to have gone wrong, probably at multiple stages in the pipeline.
With tools like Ollama, self-hosting is easier than hosted. No sign-up, no API keys, no permission to spend money, no worries about data security, just an easy install then import a Python library. Qwen2.5-VL 7B is proving useful even on a work laptop with insufficient VRAM - I just leave it running over a night or weekend and it's saving me dozens of hours of work (that I then get to spend on other higher-value work).
I got the 70b qwen llama distill, I have 24GB of vram.
I opened aider and gave a small prompt, roughly:
Implement a JavaScript 2048 game that exists as flat file(s) and does not require a server, just the game HTML, CSS, and js. Make it compatible with firefox, at least.
That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.
I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.
Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.
Buy a used workstation with 512GB of DDR4 RAM. It will probably cost like $1-1.5k, and be able to run a Q4 version of the full deepseek 671B models. I have a similar setup with dual-socket 18 core Xeons (and 768GB of RAM, so it cost about $2k), and can get about 1.5 tokens/sec on those models. Being able to see the full thinking trace on the R1 models is awesome compared to the OpenAI models.
If/when Corporate Legal approves a tool like Ollama for use on company computers, yes. Might not require purchasing anything, but there can still be red tape.
I never claimed that it did. Gemini would probably save me the same dozens of hours, but come with ongoing costs and additional starting up hurdles (some near insurmountable in my organisation, like data security for some of what I'm doing).
Gemini flash or any free LLM on openrouter would be orders of magnitude faster and effectively free. Unless you are concerned about privacy of the conversation - it's really purely being able to say you did it locally.
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.
Texas has a plurality of fatal car accidents (for USA), but California is not far behind, and in 2022 California has slightly more deaths. (This page doesn't have the number of fatal car accidents for 2022, which is a bit odd.)
You're not looking at absolute numbers, which is what plurality means. I don't see how "someone in the US is more likely to die in a car wreck in Texas even if they never go to Texas" could make sense.
A driver in the US dies while driving due to a crash/wreck/whatever.
Statistically, that occurred with the highest probability in TX. as i said, this was like 2015-2019 when i used to claim this. There's a sign on freeways in TX that say "highway deaths so far in <year>: <16 bit int>" which led me to start looking in to it, and i think my little quip is just a way to draw attention to how dangerous it is to drive in TX. But it is quite large, Texas.
No, that works well IME. If it's worth something towards the final grade, even 1%, most students will do it. It can be hard to persuade some of my students not to spend multiple hours attempting to get 0.1% more of the course grade by doing another quiz attempt when they've already achieved 90% - I think they're better off moving on to the next thing.
This one is customisable. The firmware uses QMK, so you can remap it however you like. You'd need to make some key label stickers in Inkscape or something if you want the keys to show the characters.
There was a discussion here a couple of weeks ago (with a typo in the title): https://news.ycombinator.com/item?id=44110219
reply