It's actually pretty straightforward; we use tree-sitter for parsing and it take...

modderation · on May 31, 2024

With regards to your call for interesting language support, I'd add a very low-priority suggestion for documentation and specification formats. Plain text, Markdown, simple HTML paragraphs and sections. If a new paragraph or sentence is added to a spec, or a MUST becomes a MAY, it'd be neat to surface the context of the change instead of a word/line diff.

As the Semantic History is not yet available, how do you envision it being displayed at the moment? What sort of information are you currently collecting? Is this tracked across the project history? To that end, are you building Tree-Sitter grammars and queries yourself, or are you using the pre-existing grammars and building the language support into Asqi?

For context, I've got a back-burner project that maps reimplemented code (manually mapped, with annotations or structured comments) to identifiable items in an upstream codebase and git repository. When the content of an item is changed upstream, affected downstream code could be flagged for review. It's still early in the design/prototyping phase, but it feels like there's some interesting overlap with Asqi.

st_asqi · on May 31, 2024

YES - this was very much on my mind actually. I was thinking we really need section outlining for Markdown (and for diffs etc, so like "two people want to edit this paragraph" when we have collision detection later). It didn't make it into the first cut but was and still is top of mind as I'm dogfooding it.

I have had some good ideas and some stupid ideas for semantic history, and many very mixed prototypes. The simplest option, and what I'll probably go with, is just a table of commits-modifying-this-thing. But that's not all the information; for example, a linear list of commits doesn't convey branch/merge topology well. I'm not sure how useful that information is, but I'd kinda like to see it.

The data I collect is two things: first, we can do a semantic git-diff operation in the backend; and second, we search-index the diffs so you can say "which commits modify entity X" and get a fast answer. That's what the UI will do when you focus on a function.

I use off-the-shelf TS grammars and write custom queries for them. I've got a custom abstraction in the backend that lets you query for ranges of nodes at once, which is how we jump to the top of comments above a function rather than to the function definition itself.

Your back-burner project is exactly where Asqi is headed -- and in fact, the backend for this is 100% done in Asqi but the frontend doesn't show the data yet. The idea is to determine things like, "two branches modify function X in two different ways" and even if the diffs don't collide, you get to see that there's a potential semantic conflict coming up. There's some potential opportunity to use past change data to detect when a function is sensitive to multiple editors, or when it's just a big list of calls/hooks or some such where it doesn't matter so much. So in the long term, I think of it as "how important is this potentially interesting conflict scenario" as some type of user-attention number that is priority-sorted so the user sees most-important first.

(Btw, I would personally not like to hear that my pet project was being implemented by a non-free product; the silver lining here is that Asqi will always let you analyze private repos locally if you don't need to pull them automatically -- i.e. if you have local clones -- so it's probably free for what you're doing unless you're launching a SaaS product. I may also add some type of data export API later so you can use the Asqi backend to power other frontends.)

modderation · on May 31, 2024

> a table of commits-modifying-this-thing [...] a linear list of commits doesn't convey branch/merge topology well.

Agreed. Presenting both the local diff and the location in the commit graph seems like a better bet for helping people glean understanding of a change's purpose and context. I'm also thinking of using a table of per-item changes that's tied to the commit graph for topologically sorted history and reachability information. This will probably be backed by a per-commit list of item identifiers with their hashed content for easier comparison.

It sounds like your abstraction is doing a great job of representing file structure. For the most part, I'm just looking at telling users that they should look at a set of related symbols and revisions after an identifier's body has changed. The user is then responsible for performing a review and updating the "last-approved" information.

As a more concrete example, I'm expecting users to maintain their own mappings to items in the upstream sources:

  #[rawr(
      codebase = "reality",
      kind = "constant",
      identifier = "f_pi",
      path = "src/constants.h",
      revision = "123abc456",
      notes = "This probably shouldn't change, but it would be good to know if \
      the upstream team makes non-Euclidean alterations to the simulator."
  )]
  const PI: f64 = 3.14159;

If f_pi's contents have changed since revision 123, the new value can be flagged for review. In the example case, upstream's f_pi was changed to a new value. The user should be informed that PI was updated in Reality's src/constants.h@1897246. They can review the upstream change, reimplement it in the downstream codebase, and update the metadata to reflect the coordinates of the last change.

  #[rawr(
      codebase = "reality",
      kind = "constant",
      identifier = "f_pi",
      path = "src/constants.h",
      revision = "1897246",
      notes = "Required by Legal Counsel for compliance with bill #246."
  )]
  const PI: f64 = 3.2;

I'm starting to think that the best way to present the changelist is to spit out deep links into an Asqi instance. By the sounds of it, you've also got all the necessary data in the self-hosted Asqi container's /db volume. If you don't mind, I'd like to see if I can directly consume that instead of building my own Tree-Sitter integration.

(Personally, I wouldn't want to hear that my non-free product was being implemented by someone's pet project :) Thankfully, I think we're heading in different directions, leveraging and presenting the same dataset in very different ways. In this case, I'm actually thrilled that someone else is implementing the machinery required by my pet project. Now I'm closer to exploring and following the fast-moving codebases that I wanted to reimplement in the first place!)

Nuzzerino · on May 31, 2024

Ruby with sorbet would be cool, and C#

st_asqi · on June 2, 2024

For Ruby + Sorbet, it looks like Sorbet fits into Ruby syntax so we can reuse the same grammar, but you'd probably want the signatures to be considered as part of the methods. Working with the example from their website:

  sig {params(name: String).returns(Integer)}
  def main(name)
    puts "Hello, #{name}!"
    name.length
  end

I think the idea is that Asqi should treat `sig {}` sort of like a comment: prefix metadata attached to `def main`, rather than as an unrelated imperative call (whereas `attr_reader` would not have this behavior). Do you know whether Sorbet continues to work if you alias `sig` and any of its other definitions? For example, can I do this?

  def mysig(&proc)
    sig(&proc)  # or some other transformation
  end

  mysig {params(name: String).returns(Integer)}  # does this work? I hope/assume not
  def foo(name)
    name.length
  end

If the above is disallowed, then it should be easy to implement Ruby with Sorbet support built in. Otherwise I can probably do Ruby by itself, but it would be unaware of annotations attached to methods since to my knowledge there's normally no structural connection between statements that precede definitions.

Nuzzerino · on June 2, 2024

> I think the idea is that Asqi should treat `sig {}` sort of like a comment: prefix metadata attached to `def main`, rather than as an unrelated imperative call (whereas `attr_reader` would not have this behavior). Do you know whether Sorbet continues to work if you alias `sig` and any of its other definitions? For example, can I do this?

No, it doesn't work if you alias

st_asqi · on June 2, 2024

Awesome, OK I believe I can make that work then. Very glad you mentioned Sorbet because I hadn't heard of it before but it will be cool to have it ship with Ruby out of the gate.

Nuzzerino · on June 2, 2024

That is very appreciated! Sorbet is shamefully underrated. While there are some weaknesses in the tool (which are fixable), the criticisms it typically gets from the community are unwarranted/unfair IMO. For example, I've heard that it makes the code too verbose. But in my experience, inserting T.cast or T.unsafe is more of a shortcut and often a code smell, and lately since becoming more fluent with it, I'm rarely using casts anymore (though its unavoidable sometimes if you want to avoid T.untyped usages).

I think it has a solid foundation and puts Ruby at the top of my list of good languages if coupled together. It is very performant compared to other typed analysis tools in my opinion.

Please join the Slack community and ask questions if you have any. I'm not part of the Sorbet project or the companies that sponsor it, just an avid enthusiast. https://sorbet.org/en/community

l8nite · on May 31, 2024

Kotlin would be awesome!

st_asqi · on June 2, 2024

Looks like there's a grammar for it (https://github.com/fwcd/tree-sitter-kotlin) so I think it's on the table!