Statements like this always feel a bit rude to me—as a Chinese, I use em dashes (in Chinese texts) on a daily basis and insert them in English texts when I see fit.
A bit of background: Em dashes “—” (or, very often, double em-dashes “——”) are to Chinese texts what hyphens “-” are to English texts. We use them in ranges “魯迅(1881-1936)”, in name concatenations “任-洛二氏溶液(Ringer-Locke solution)”, to express sounds “呜——”火车开动了, or `“Chouuuuuuuuu”, starts the train' in English, and in place of sentence breaks like this——just like em dashes in English texts. They are so commonly used that most Chinese input methods map Shift+- (i.e., underscores “_”) to double em-dashes. So, as a result, while I see many English people have to resort to weird sequences like “Alt + 0151” for an em-dash, a huge population in the world actually has no difficulty in using em-dashes. What a surprise!
As for this article, obviously it was translated from its Chinese version, so, yeah I don't see em-dashes as an AI indicator. And for the weird emoji “” (U+1F54A), I'm fairly certain that it comes from the Chinese idiom “放鸽子” (stand someone up, or, literally, release doves/pigeons), which has evolved into “鸽了” (pigeon'ed), a humorous way to say “delayed, sorry!”.
Totally agree, I don't think em dashes are a particularly useful AI tell unless they're used in a weird way. Left to my own devices (as a native English speaker who likes em dashes and parentheticals), I often end up with at least one em dash every other paragraph, if not more frequently.
On another note, it may be useful to you to know that in most English dialects, referring to a person solely by their nationality (e.g., when you wrote "as a Chinese") is considered rude or uncouth, and it may mark your speech/writing as non-native. It is generally preferable to use nationalities as adjective rather than nouns (e.g., "as a Chinese person"). The two main exceptions are when employing metonymy, such as when referring to a nation's government colloquially (e.g., "the Chinese will attend the upcoming UN summit") or when using the nationality to indicate broad trends among the population of the nation (e.g., "the Chinese sure know how to cook!"). I hope this is considered a helpful interjection rather than an unwelcome one, but if not, I apologize!
Thank you! It would indeed require extra effort for me to notice issues like this, and it is very nice of you to have pointed it out!
Speaking of personal devices, I also have a dedicated key binding for en dashes “–” (because, well, I already have a whole tap layer for APL symbols, and it costs nothing to add one more). Since we're on HN, I believe many people here can easily do that if they wish to, so I too don't think en/em dashes are very telling, especially on HN.
This clause is usually used together with the next sentence in the original poem:
> 先天下之忧而忧,后天下之乐而乐
> (put the world's worries before yours, and put your happiness after the world's)
> edit: this translation is wrong, and raincole has a definitely better translation
Since the model is a language model, they probably use this to demonstrate the model's language capabilities – the model should be able to complete the whole sentence pair. The paper also mentions this:
> To ensure the model’s language capabilities, we introduced 10% of in-house text-only pretrain
data.
So I believe it is just a text-only demonstration.
a) is clearly Simplified Chinese from a sibling comment, b) is Traditional copied from your comment, and c) is as I just typed in my own language. Unicode Hanzi/Kanji are a mess and there are characters same or different, in appearance or in binary, depending on intended variants, languages, fonts, systems, keyboard, distance between Earth and Alpha Centauri, etc.
Very location dependent. But when you learn to write the characters you understand the variants differently. They look like random strokes to an untrained eye. But they’re not. I’m not sure if that makes sense.
Take a lowercase a in English for example. This font writes it differently than a child. Or in cursive. Or probably than you would write it. But you recognize all of them and don’t really think about it.
Traditional kinds are usually recognizable, but I'd be unsure or straight up wrong about most Simplified versions. Overall proportions and small details often feel "wrong" for both as well due to cultures converging at different points.
... That post you linked was from two years ago, discussing JEP 295, which was delivered eight years ago. Graal-based AOT has evolved a lot ever since. And the answer even explicitly recommended using native images:
> I think what you actually want to do, is to compile a native image of your program. This would include all the implications like garbage collection from the JVM into the executable.
And it is this "native image" that all the comments above in this thread have been discussing, not JEP 295. (And Graal-based AOT in native images does remove the need to bundle a whole JRE.)
As someone based in China, it's a bit surprising that techniques used by Chinese people get very few mentions here, while I do think they are quite effective against access blocking, especially after coevolving with GFW for the past decade. While I do hope blocking in Indonesia won't get to GFW level, I will leave this here in case it helps.
I found this article [0] summarizing the history of censorship and anti-censorship measures in China, and I think it might be of help to you if the national censorship ever gets worse. As is shown in the article, access blocking in China can be categorized into several kinds: (sorted by severity)
1. DNS poisoning by intercepting DNS traffic. This can be easily mitigated by using a DOT/DOH DNS resolver.
2. Keyword-based HTTP traffic resetting. You are safe as long as you use HTTPS.
3. IP blocking/unencrypted SNI header checking. This will require the use of a VPN/proxy.
4. VPN blocking by recognizing traffic signatures. (VPNs with identifiable signatures include OpenVPN and WireGuard (and Tor and SSH forwards if you count those as VPNs), or basically any VPN that was designed without obfuscation in mind.) This really levels up the blocking: if the government don't block VPN access, then maybe any VPN provider will do; but if they do, you will have a harder time finding providers and configuring things.
5. Many other ways to detect and block obfuscated proxy traffic. It is the worse (that I'm aware of), but it will also cost the government a lot to pull off, so you probably don't need to worry about this. But if you do, maybe check out V2Ray, XRay, Trojan, Hysteria, NaiveProxy and many other obfuscated proxies.
But anyways, bypassing techniques always coevolve with the blocking measures. And many suggestions here by non-Indonesian (including mine!) might not be of help. My personal suggestion is to find a local tech community and see what techniques they are using, which could suit you better.
Is there any good DoT/DoH DNS resolver that works well in China? I know I can build one myself, but forwarding all DNS requests to my home server in NA slows down all connections...
> All JavaScript regexes engines that we could find, except one,
use a backtracking implementation strategy [Chromium 2009;
DukTape 2013; Hermes 2022; MuJS 2014; QuickJS 2020; WebKit 2018].
The exception is V8...
Well, maybe there's another: TRegex [1] in GraalJs [2].
So I've been writing a JIT runtime for a Lisp dialect [0] for a while and got into the habit of feeling jealous of other languages with array primitives, especially after I learned what you can do with arrays or opaque lists (see V8 element kinds [1] for an example). And then I discovered this article (or complaint?) by Xah Lee.
Aesthetically speaking, I quite like conses, in that it is powerful and flexible enough to hold both trees and lists. But performance-wise, I guess it's always the simplest things that are hard to optimize, and cons-style linked lists can't really compete with arrays [2], if you are aiming to get on par with V8.
However, I still believe that, with a JIT runtime, it is possible to have some Lisp lists backed by arrays and am working on it [3]. Currently, =setcdr= causes most trouble for me and I don't know if the additional checks are going to ruin the performance improvement brought by arrays. But if things go well, we might get to see what conses could have cost us.
If you want vectors, use vectors, elisp has them too as primitives. (I don't mean to suggest you don't know that, but still, you can just use vectors.)
I was writing from the aspect of an Lisp implementer, so that means it is not how I can just use vectors but about how existing code represents a sequence, and it is part of the job of the runtime to make it as fast as possible.
From what I see from existing ELisp code (at least in the Emacs codebase), the idiomatic representation of sequences (fixed-size or not) is using cons lists. And it is not surprising: Emacs vectors are fixed-size and that makes it very inflexible and only suitable for a few things. This matters because, for example, if you want Emacs to compete with VSCode in performance, you eventually compares how idiomatic code performs. Note that how cons lists affect performance in real-world ELisp code remains unknown because it is yet to be benchmarked, but exposing the internals of idiomatic lists as conses does pose some challenges for an implementer aiming for further optimizations.
Emacs Lisp vectors being fixed seems like an easily fixable problem. More functions can be defined to do useful things with vectors including mutations. If it is important to keep the existing type produced by make-vector immutable, a separate mutable variant can be introduced. The mutating functions blow up if applied to the immutable type.
An "JIT interpreter" for Emacs Lisp [1] with Graal Truffle [2] in Java. And it is really amazing how the frameworks these days simplify building a JIT runtime for a language. Currently I'm working on a pdump[3]-like feature for it.
Personally, and contrary to the article, I do prefer Emacs's plain text widgets over more "GUI-like" ones. Plain text widgets minimize the differences between TUI and GUI Emacs and also inherently offer text selection, searching, copying, and pasting, which nicely integrates with Emacs. I mean, not many GUI frameworks let you place a cursor within a button and select its text, do they? I believe this is a unique advantage of text-based widgets: while other GUI applications require a dedicated mechanism for searching through their settings, text-based widgets allow you to use any text-searching packages to perform these actions.
Reading through the article, the author seems to be hoping for a pure GUI approach with Emacs-like navigation mechanisms, but I am not convinced that this can be as flexible as text-based widgets. However, for packages used exclusively within a GUI environment (like el-easydraw [1], which relies quite heavily on SVG-based widgets), it would be nice to have a dedicated GUI widget library.
(There was a discussion on Reddit about this a week ago [2], and I saw some comments defending GTK and PGTK that might be worth reading.)
This is what I was referring to when I talked about "rich verbs".
Many people feel that way. The idea here is not to tell you that you're wrong, but understand what you want and do it better on the GUI side. TUIs can do a lot and we should recognise their benefits. GUIs can do that too, and can sometimes do better things.
The text-based widgets done graphically do the trick. We can add stuff that can't be done in a TUI and see if they give you anything useful. If it can be done with text widgets means that it can be done in principle. GTK can't do it, and that's why I'm leaving it behind.
> (There was a discussion on Reddit about this a week ago [2], and I saw some comments defending GTK and PGTK that might be worth reading.)
The author of those comments abused their power on reddit. I will not get into the weeds, just say that I'd be happy to respond to the critique presented here in good faith.
I agree and only use emacs in the console and love it. With the advent of lsp modes and real time typescript and linting it’s fantastic. But one thing I don’t like about the console is when LSP mode is giving you an in-line error on the right side of the terminal it’s wrapped and beautifully interspersed amongst your code, but it’s impossible to select multi line errors in order to paste into tools like ChatGPT. Since selecting the error, also selects your code on the left side.
A bit of background: Em dashes “—” (or, very often, double em-dashes “——”) are to Chinese texts what hyphens “-” are to English texts. We use them in ranges “魯迅(1881-1936)”, in name concatenations “任-洛二氏溶液(Ringer-Locke solution)”, to express sounds “呜——”火车开动了, or `“Chouuuuuuuuu”, starts the train' in English, and in place of sentence breaks like this——just like em dashes in English texts. They are so commonly used that most Chinese input methods map Shift+- (i.e., underscores “_”) to double em-dashes. So, as a result, while I see many English people have to resort to weird sequences like “Alt + 0151” for an em-dash, a huge population in the world actually has no difficulty in using em-dashes. What a surprise!
As for this article, obviously it was translated from its Chinese version, so, yeah I don't see em-dashes as an AI indicator. And for the weird emoji “” (U+1F54A), I'm fairly certain that it comes from the Chinese idiom “放鸽子” (stand someone up, or, literally, release doves/pigeons), which has evolved into “鸽了” (pigeon'ed), a humorous way to say “delayed, sorry!”.
[0] https://zh.wikisource.org/wiki/标点符号用法
reply