Hacker Newsnew | past | comments | ask | show | jobs | submit | sdesol's commentslogin

Full Disclosure. I am the author of https://github.com/gitsense/chat

> The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.

I agree with this. For those who have been programming with LLM, the difference between something working and not working can be a simple "sentence" conveying the required context. I strongly believe data enrichment will be one of the main ways we can make agents more effective and efficient. Data enrichment is the foundation for my personal assistant feature https://github.com/gitsense/chat/blob/main/packages/chat/wid...

Basically instead of having agents blindly grep for things, you would provide them with analyzers that they can use to search with. By making it dead simple for domain experts to extract 'business logic' from their codebase/data, we can solve a lot of problems, much more efficiently. Since data is the key, I can see why ClickHouse will make this move since they probably want to become the storage for all business logic.

Note: I will be dropping a massive update to how my tool generates and analyzes metadata this week, so don't read too much into the demo or if you decide to play with it. I haven't really been promoting it because the flow hasn't been right, but it should be this week.


> all I want at this point, is my politicians to be smarter than me

I don't care if they are smarter than me. I need them to be smart enough to know they are not that smart. I don't expect politicians to be smart. I expect them to be good listeners and be the voice for the people.


> I don't expect politicians to be smart. I expect them to be good listeners and be the voice for the people.

I want both. I want them to be smart -- not necessarily domain expert smart, but reasonably smart with making life changing decisions for everyone. And base those decisions on recommendations made by domain experts.


> I'm in awe they are still allowing free users at all.

I am not.

> The free tier is enough for me to use it as a helper at work, and I'd probably pay for it tomorrow if they cut off the free tier.

You are sort of proving the point that thid isn't crazy. They want to be the dealer of choice and they can afford to give you the hit now for free.


> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.

Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.

Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:

- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.

- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.

- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.

- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.

I will probably continue to pay for the $50 dollar plan for these use cases.

1. Applying LLM generated patches

Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.

2. Building context

Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.

3. AI Assistant

Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.

Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.


> A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

This is how I designed my LLM chat app (https://github.com/gitsense/chat). I think agents have their place, but I really think if you want to solve complex problems without needlessly burning tokens, you will need a human in the loop to curate the context. I will get to it, but I believe in the same way that we developed different flows for working with Git, we will have different 'Chat Flows' for working with LLMs.

I have an interactive demo at https://chat.gitsense.com which shows how you can narrow the focus of the context for the LLM. Click "Start GitSense Chat Demos" then "Context Engineering & Management" to go through the 30 second demo.


I haven't looked at the code, but it might do what I do with my chat app which is talked about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

The basic idea is, you don't search for a single term but rather you search for many. Depending on the instructions provided in the "Query Construction" stage, you may end up with a very high level search term like beverage or you may end up with terms like 'hot-drinks', 'code-drinks', etc.

Once you have the query, you can do a "Broad Search" which returns an overview of the message and from there the LLM can determine which messages it should analyze further if required.

Edit.

I should add, this search strategy will only work well if you have a post message process. For example, after every message save/upddate, you have the LLM generate an overview. These are my instructions for my tiny overview https://github.com/gitsense/chat/blob/main/data/analyze/tiny... that is focused on generating the purpose and keywords that can be used to help the LLM define search terms.


That’s going to be incredibly fragile. You could fix it by giving the query term a bunch of different scores, e.g. its caffeine-ness, bitterness, etc. and then doing a likeness search across these many dimensions. That would be much less fragile.

And now you’ve reinvented vector embeddings.


You could instruct the LLM to classify messages with high level tags like for coffee, drinks, etc. always include beverage.

Given how fast interference has become and given current supported context window sizes for most SOTA models, I think summarizing and having the LLM decide what is relevant is not that fragile at all for most use cases. This is what I do with my analyzers which I talk about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...


Inference is not fast by any metric. It is many, MANY orders of magnitude slower than alternatives.


Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?


More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!


How are you quantify the speed at which results are reviewed?


It’s not speed, but cost to compute.


It has become fast enough that another call isn't going to overwhelm your pipeline. If you needed this kind of functionality for performance computing perhaps it wouldn't be feasible, but it is being used to feed back into an LLM. The user will never notice.


Your readmes did a great job at answering my question "why is this file called 1.md? What calls this?" when I searched for "1.md". (The answer is 1=user, 2=assistant, and it allows adding other analyzers with the same structure.)


I'm guessing you are referring to https://github.com/gitsense/chat/tree/main/data/analyze or https://github.com/gitsense/chat/tree/main/packages/chat/wid...

The number is actually the order in the chat so 1.md would be the first message, 2.md would be the second and so forth.

If you goto https://chat.gitsense.com and click on the "Load Personal Help Guide" you can see how it is used. Since I want you to be able to chat with the document, I will create a new chat tree and use the directory structure and the 1,2,3... markdown files to determine message order.


https://github.com/gitsense/chat/blob/129210302ec06985bbd103... also says "put a 1.md here and the modular plugin structure will know to call it".


> We're putting aside the political stuff because there isn't a lot to discuss

I don't agree, as we are not quantifying the emotional aspect of the purchasing process. If people "love" the brand, they are willing to overlook a lot of things. Tesla was a status symbol and is now seen as a regret purchase and a toxic brand for many (see Europe and Canada for examples). I can't see how "politics" should not be considered as it does play a critical role in how people spend money. There is a reason why a lot of companies are not open about politics and I don't think I've ever seen a CEO that was so forth coming with their beliefs as Elon Musk.


It's particularly interesting because cars are probably one of the most emotional purchases for a lot of people. Car makers know this and put a huge amount of effort into brand identity. It's also hard to think of another company of any kind where the CEO is so synonymous with the company and so public facing. Maybe Steve Jobs or Bill Gates in their heydays, but even they had a lower public profile relatively speaking. And finally, it's hard to think of another CEO of a major company who has so aggressively adopted and broadcast very polarizing political views.

Tesla is such a perfect storm that it's actually kind of amazing the stock hasn't completely tanked, which itself makes it an interesting discussion topic. They make a product where brand identity is super important, have a CEO who is unquestionably the public face of the company, and said CEO continues to go out of his way to try to alienate a significant portion of the potential customer base. I frequently see Teslas driving around with anti-Elon bumper stickers, which I've certainly never seen before for a car company. It's hard to imagine a world in which such consumer sentiment among any non-trivial percentage of your customer base isn't a death knell for the company.


> But in my testing, other models do not work well. It looks like prompts are either very optimized for Claude, or other models are just not great yet with such an agentic environment.

Anybody who has done any serious development with LLMs would know that prompts are not universal. The reason why Claude Code is good is because Anthropic knows Claude Sonnet is good, and that they only need to create prompts that work well with their models. They also have the ability to train their models to work with specific tools and so forth.

It really is a kind of fool's errand to try to create agents that can work well with many different models from different providers.


It will certainly be interesting to see how businesses evolve in the upcoming years. What is written in stone is, you (employee) will be measured and I am curious to see what developers will be measured by in the future. Will you be at a greater risk of layoffs/lack of promotions/etc. if you spend more on AI? How do you as a developer prove that it is you and not the LLM that should be praised?


> So it is a game of being the one that is left standing

Or the last investor. When this type of money is raised, you can be sure the earlier investors are looking for ways to have a soft landing.


I'm not sure many investors are investing their own money. They are investing other people's money, maybe owned by shareholders of large companies in turn owned by our pension funds.


It might not be their money, but they are paid a management fee and if they cannot provide some return, people will stop using them.


The kind of thing that happens is Joe Bloggs runs the Fidelity Hot Tech fund, up 50% over the last three years. Then when it crashes that's closed and Joe is switched to the Fidelity Safe Income fund with no down years for the last five years.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: