More

sigtstp · 2025-06-11T22:53:21 1749682401

what could go wrong

susiecambria · 2025-06-12T00:48:02 1749689282

Came here to say this.

sigtstp · 2025-05-05T08:17:15 1746433035

I feel this makes some fundamental conceptual mistakes and is just riding the LLM wave.

"Semantics" is literally behavior under execution. This is syntactical analysis by a stochastic language model. I know the NLP literature uses "semantics" to talk about representations but that is an assertion which is contested [1].

Coming back to testing, this implicitly relies on the strong assumption of the LLM correctly associating the code (syntax) with assertions of properties under execution (semantic properties). This is a very risky assumption considering, once again, these things are stochastic in nature and cannot even guarantee syntactical correctness, let alone semantic. Being generous with the former, there is a track record of the latter often failing and producing subtle bugs [2][3][4][5]. Not to mention the observed effect of LLMs often being biased to "agree" with the premise presented to them.

It also kind of misses the point of testing, which is the engineering (not automation) task of reasoning about code and doing QC (even if said tests are later run automatically, I'm talking about their conception). I feel it's a dangerous, albeit tempting, decision to relegate that to an LLM. Fuzzing, sure. But not assertions about program behavior.

[1] A Primer in BERTology: What we know about how BERT works https://arxiv.org/abs/2002.12327 (Layers encode a mix of syntactic and semantic aspects of natural language, and it's problem-specific.)

[2] Large Language Models of Code Fail at Completing Code with Potential Bugs https://arxiv.org/abs/2306.03438

[3] SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? https://arxiv.org/abs/2502.12115 (best models unable to solve the majority of coding problems)

[4] Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT https://arxiv.org/abs/2304.10778

[5] Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions https://arxiv.org/abs/2308.02312v4

EDIT: Added references

sigtstp · 2025-01-27T15:31:26 1737991886

That was my suspicion as well. This is outside my area of expertise, but from what I can tell the dosing isn't (multiple) orders of magnitude larger.

From the article, they mention about 4 times higher dosing than detected in humans when the MPs enter through medical supplies (including through surgery).

  - "about 12 μg of MPs [microplastics] per milliliter of blood have been detected in human blood. [prev studies mentioned above]"  (I haven't read those references, though, just going off a quick skim)
  - "We would like to bring mouse blood MPs to this level by injection."
  - "the diluted final concentration after entering the bloodstream should be blood of about 50 μg/mL" [sic]

A quick search for levels detected in humans led me to this paper [1] that gives 1.84 - 4.65 μg/mL, though with "a mean particle length of 127.99 ± 293.26 µm (7-3000 µm), and a mean particle width of 57.88 ± 88.89 µm (5-800 µm)." compared to uniform 5-μm-diameter microsphere used in the submitted article.

So the mouse dosing is (compared to humans):

  - 4 times higher than contamination through medical interventions (if i understand correctly)
  - 15 times higher than normal contamination  (only based on the one article)

So higher, for sure, but still rather close in cases with a lot of contamination. Not sure how the particle size factors into it

[1] Microplastics in human blood: Polymer types, concentrations and characterisation using μFTIR https://doi.org/10.1016/j.envint.2024.108751

EDIT: formatting and rephrasing

sigtstp · on Oct 4, 2024

Interesting! Some overlap with these Firefox add-ons:

- Tree Style Tabs: https://addons.mozilla.org/en-US/firefox/addon/tree-style-ta... (more simplistic, no session saving functionality)

- Tree Tabs: https://addons.mozilla.org/en-US/firefox/addon/tree-tabs (more complex, can also save sessions, but incompatible with some other add-ons and not evaluated for security by Mozilla)

Neither rearrange tabs in the window, just offer an alternate tree listing of open tabs.

ta988 · on Oct 4, 2024

Sideberry is a more advanced version of those.

Izkata · on Oct 5, 2024

Firefox did have an addon to show tabs in a grid in the same window, but it died along with XUL.

johnofthesea · on Oct 4, 2024

Wish there was also something with Miller columns.

minhaz23 · on Oct 4, 2024

i’ve only ever seen foxytab rearrange tabs in the window

sigtstp · on June 6, 2024

Comissioning it is a huge conflict of interest though, esp since it's used as promotion material for their consulting.

Reminds me of the study saying "a teaspoon of honey per day is healthy" with funding from the American Honey Something Association.

sigtstp · on June 6, 2024

Engprax being a consultancy that comissioned the study, mostly for self-promotion.

Hazardous levels of sodium with this one.

sigtstp · on June 5, 2024

If anyone works in bioinformatics, please, please, for the love of god, generate your own unique IDs. Database identifiers are not generally unique (the same id might get reused for e.g. protein variants). Even sequences are problematic: I've found the same seq with different ids (can't remember the db now), and they can change (sequencing or human error might've occured and they get updated).

sigtstp · on June 4, 2024

Side-note: I believe in many such cases (cancers and other serious diseases), the "placebo" is actually the existing standard treatment (not sugar pills), as it would be unethical to withhold treatment.

sigtstp · on March 29, 2024

From the post "It appears to be a pretty standard stealer, attempting to grab cookies, passwords, wallets, etc." and by the look of the decrypted payload, it's targeting Windows.

sigtstp · on Nov 25, 2023

It still largely comes down to incentives from what I've seen. A lot of times all anyone (from the researcher to the reviewer) cares about is the paper. Journals don't check that code actually works, and a lot of researchers don't spend time on preparing their code. They feel there's no need, since they now got a new article on their CV. It's true that they may not have the skills and experience to produce good code they can share (depending on the area), but often 1) there's no time to prep code since they've got 3 other projects going on and a crazy work pace 2) the code is seen as something incidental and secondary - what matters to them is the figures and results 3) some groups want to milk a topic for a few papers so they're guarding their code and data. Luckily at least plenty of journals demand access to data or even making it public.

staunton · on Nov 25, 2023

In fact, there's even more incentives for researchers to make reproducing their work as hard as possible. For example, what if someone tried to reproduce it and found contradictory results? In both cases (reproducer made mistake, original made mistake) it's additional hassle that the original authors can basically only suffer and never gain.

edgyquant · on Nov 25, 2023

This is just you confirming that tons of research is essentially fraudulent. If it can be contradicted it absolutely should be, that is how fields progress and weed out bad ideas.