More

jarofgreen · 2025-12-27T07:28:54 1766820534

Hello,

I'm James based in Scotland. For 7 years I ran Open Tech Calendar listing mainly in-person tech events.

After a pause I'm now relaunching Open Tech Calendar to list virtual events - this shift in focus is due to personal life changes that mean that virtual events is what I'm interested in, and looking at other sites that are now doing a great job listing in person events in Scotland. I know virtual events aren't as popular, but when I talked about this on social media, people suggested many reasons that virtual events may be preferred - public health, child care and other caring duties, people in rural areas.

I am specially going for events that include community participation - I'm keen to avoid "this virtual tech event could have been a video" and instead encourage events that people can get involved in.

We are also provide Open Data exports and are keen to encourage events to provide Open Data we (and others) can just import and reuse. Happy to help with options for that.

Brief word on the tech: I'm collecting data in a GitHub repository, then building that into a SQLite database with an Open Source library, then it's a Django site to provide all the filter options and open data feeds I wanted to provide. It's hosted on one virtual machine with Apache caching content.

So:

Constructive feedback on the project and site welcome!

If you know any good virtual tech events, do add them! You don't have to be the organiser. There are details how on the site, or just comment here with details.

Thanks, James

jarofgreen · 2025-12-26T18:24:34 1766773474

This seems to be about hosting an Sqlite database on a static website like GitHub Pages - this can be a great plan, there is also Datasette in a browser now: https://github.com/simonw/datasette-lite

But that's different from how you collect the data in a git repository in the first place - or are you suggesting just putting a Sqlite file in a git repository? If so I can think of one big reason against that.

dleslie · 2025-12-26T20:08:33 1766779713

Yes, I'm suggesting hosting it on GitHub, leveraging their git lfs support. Just treat it like a binary blob and periodically update with a tagged release.

jarofgreen · 2025-12-26T20:28:48 1766780928

It's not clear if you are suggesting accepting contributions to the SQLite file via PR from people (but accepting contributions is generally the point of why people put these on projects on GitHub).

But if you are I wouldn't recommend it.

PR's won't be able to show diff's. Worse, as soon as multiple people send a PR at once you'll have a really painful merge to resolve, and GitHub's tools won't help you at all. And you can't edit the files in GitHub's web UI.

I recommend one file per record, JSON, YAML, whatever non-binary format you want. But then you get:

* PR's with diff's that show you what's being changed

* Files that technical people can edit directly in GitHub's web editor

* If 2 people make PR's on different records at once it's an easy merge with no conflicts

* If 2 people make PR's on the same record at once ... ok, you might now have a merge conflict to resolve but it's in an easy text file and GitHub UI will let you see what it is.

You can of course then compile these data files into a SQLite file that can be served in a static website nicely - in fact if you see my other comments on this post I have a tool that does this. And on that note, sorry, I've done a few projects in this space so I have views :-)

dleslie · 2025-12-28T01:29:48 1766885388

Nah, git is terrible with binaries. But the SQL database can be rebuilt periodically; the problem being solved is replacing the git querying with SQL.

Could even follow your record model, and use that as data to populate the db.

jarofgreen · 2025-12-26T18:16:31 1766772991

I'm not OP but I'll guess .... lock files with old versions of libs in. The latest version of a library may be v2 but if most users are locked to v1.267.34 you need all the old versions too.

However a lot of the "data in git repositories" projects I see don't have any such need, and then ...

> Why not just have a post-commit hook render the current HEAD to static files, into something like GitHub Pages?

... is a good plan. Usually they make a nice static website with the data that's easy for humans to read though.

jarofgreen · 2025-12-26T16:29:27 1766766567

It's not just package manager who do this - a lot of smaller projects crowd source data in git repositories. Most of these don't reach the scale where the technical limitations become a problem.

Personally my view is that the main problem when they do this is that it gets much harder for non-technical people to contribute. At least that doesn't apply to package managers, where it's all technical people contributing.

There are a few other small problems - but it's interesting to see that so many other projects do this.

I ended up working on an open source software library to help in these cases: https://www.datatig.com/

Here's a write up of an introduction talk about it: https://www.datatig.com/2024/12/24/talk.html I'll add the scale point to future versions of this talk with a link to this post.

Hasnep · 2025-12-27T13:54:01 1766843641

Oh, this would have been great for a project I was working on a while ago! I'll have to keep it in mind for the future. Thanks for sharing

jarofgreen · 2025-12-23T22:30:56 1766529056

Some of it is different, but the basics are still the same and still relevant. Just today I've been working with some of this.

I took a Django app that's behind an Apache server and added cache-control and vary headers using Django view decorators, and added Header directives to some static files that Apache was serving. This had 2 effects:

* Meant I could add mod_cache to the Apache server and have common pages cached and served directly from Apache instead of going back to Django. Load testing with vegeta ( https://github.com/tsenart/vegeta ) shows the server can now handle multiples more simultaneous traffic than it could before.

* Meant users browsers now cache all the CSS/JS. As users move between HTML pages, there is now often only 1 request the browser makes. Good for snappier page loads with less server load.

But yeah, updating especially the sections on public vs private caches with regards to HTTPS would be good.

jarofgreen · 2025-12-14T20:48:33 1765745313

User agents not IPs, but: https://github.com/ai-robots-txt/ai.robots.txt

jarofgreen · 2025-12-14T20:35:11 1765744511

I was at an event about open data and AI recently and they were going on about making your data "ready for AI".

It seemed like this was a big elephant in the room - what's the point in spending ages putting API's carefully on your website if all the AI bots just ignore them anyway? There are times when you want your open data to be accessible to AI but they never really got into a discussion about good ways to actually do that.

jarofgreen · 2025-12-14T20:27:31 1765744051

> so it is better to find a way to cache your website's HTML so you're not bombarded

Of course, scrapers should identify themselves and then respect robots.txt.

jarofgreen · 2025-12-02T22:36:00 1764714960

> I'm just surprised we haven't seem some app that can act like a wordpress admin page but generating a static output you can host for free or very cheap somewhere.

https://news.ycombinator.com/item?id=44391535

(I'm not affiliated with it)

jarofgreen · 2025-11-10T07:49:36 1762760976

> load blog posts from a db

Can you explain a bit more about your requirement and how many blog posts you are talking about?

I'm curious to hear more for my future work as I have an extendable static site builder it would be easy to add this too. I don't want to be going all marketing on someone else's post (and it's early days so you'd probably find other features lacking) so I'll just say my email is in my profile if you want.

sureglymop · 2025-11-10T11:25:40 1762773940

Sure! I am coming from sphinx, another python based documentation tool that is used for example for the Linux Kernel docs, Python's docs, etc.

While it is great for documentation, we used it for the whole website of a project, mainly because people in the team already understood it. But we ran into many issues when it came to adding a blog...

Sphinx has a hard requirement for input files to be on disk. It means that, for example, it would be hard to add a page that lists all blog posts posted in a certain category (as the "meta" page would have to inspect other pages and then be generated on the fly). The only option is to pre-generate such pages into the source folder before a build.

I think that the inputs should be abstracted in a way such that there isn't a hard requirement on the filesystem. For example, extensions should be able to add input files using code, without having to write them to the filesystem first. This would make many things much easier and open a new world of possibilities without actually resulting in more maintenance work for the ssg.

The thing is that the output is abstracted in that way, one can create a new plugin to write to a different output format. If one could also "generate" input/source files dynamically, one would get support for all of these output formats "for free".

As for me personally, I don't really have enough blog posts to have to store them in a db, that was just an example. But if abstracted transparently in the way I am thinking about it actually doesn't matter for the ssg, it only knows about inputs and outputs and not how it got these inputs.