Hacker Newsnew | past | comments | ask | show | jobs | submit | ctur's commentslogin

It’s fun to build things like this but if you want to nourish a user base you need to fully understand the landscape of similar tools and then explain your differentiating value. This is /particularly/ important for security related tools.

Specifically you should compare and contrast to tools like SOPS, Ansible Vault, pass, etc.


Or you could just build things for fun. Why do we have to care about "nourishing a user base"? Two decades ago we would build software and release it for fun and utility.

> Specifically you should compare and contrast to tools like SOPS, Ansible Vault, pass, etc.

What a boring proposition for hobby projects.


You are taking the words right out of my mouth.

This Github star hunting—, CV padding—, make-it-big-and-BDFL-yourself—approach to open source that has crept in over the last decade is bewildering and rather unpleasant.


This is an unnecessary optimization, particularly for the article's use case (small files that are read immediately after being written). Just use /tmp. The linux buffer cache is more than performant enough for casual usage and, indeed, most heavy usage too. It's far too easy to clog up memory with forgotten files by defaulting to /dev/shm, for instance, and you potentially also take memory away from the rest of the system until the next reboot.

For the author's purposes, any benefit is just placebo.

There absolutely are times where /dev/shm is what you want, but it requires understanding nuances and tradeoffs (e.g. you are already thinking a lot about the memory management going on, including potentially swap).

Don't use -funroll-loops either.


It's true that with small files, my primary interest is simply not to wear on my disk unnecessarily. However I do also often do work on large files, usually local data processing work.

"This optimization [of putting files directly into RAM instead of trusting the buffers] is unnecessary" was an interesting claim, so I decided to put it to the test with `time`.

    $ # Drop any disk caches first.
    $ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
    $ 
    $ # Read a 3.5 GB JSON Lines file from disk.
    $ time wc -l /home/andrew/Downloads/kaikki.org-dictionary-Finnish.jsonl 
    255111 /home/andrew/Downloads/kaikki.org-dictionary-Finnish.jsonl

    real 0m2.249s
    user 0m0.048s
    sys 0m0.809s

    $ # Now with caching.
    $ time wc -l /dev/shm/kaikki.org-dictionary-Finnish.jsonl 
    255111 /dev/shm/kaikki.org-dictionary-Finnish.jsonl
    
    real 0m0.528s
    user 0m0.028s
    sys 0m0.500s

    $ 
    $ # Drop caches again, just to be certain.
    $ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
    $ 
    $ # Read that same 3.5 GB LSON Lines file from /dev/shm.
    $ time wc -l /dev/shm/kaikki.org-dictionary-Finnish.jsonl 
    255111 /dev/shm/kaikki.org-dictionary-Finnish.jsonl

    real 0m0.453s
    user 0m0.049s
    sys 0m0.404s
Compared to the first read there is indeed a large speedup, from 2.2s down to under 0.5s. After the file had been loaded into cache from disk by the first `wc --lines`, however, the difference dropped to /dev/shm being about ~20% faster. Still significant, but not game-changingly so.

I'll probably come back to this and run more tests with some of the more complex `jq` query stuff I have to see if we stay at that 20% mark, or if it gets faster or slower.


A couple things to consider when benchmarking RAM file I/O verses disk-based file system I/O.

1 - Programs such as wc (or jq) do sequential reads, which benefit from file systems optimistically prefetching contents in order to reduce read delays.

2 - Check to see if file access time tracking is enabled for the disk-based file system (see mount(8)). This may explain some of the 20% difference.


Hard disagree. Disk buffer cache is too eager on writes (which makes sense for the usual case), so temporary data written to a filesystem are almost always written to the medium. With several GBs of temporary data it easily could fill up internal SSD write buffers and make whole system choppy.

My use case is to use yt-dlp to download videos to ramfs, watch them and then delete. Before i switched to ramfs, the final pass of yt-dlp (where audio and video tracks are merged to one file) ordinarily caused the issue with choppy system.


This isn't great advice because /tmp is not always mounted as tmpfs.

I've used /dev/shm extensively for large datasets and it's consistently been a massive speed improvement. Not sure what you're talking about.


> This isn't great advice because /tmp is not always mounted as tmpfs.

Well, complain to whoever's mounting it wrong to fix it.


Not sure your aggression is warranted, friend. Many distros over the years have had tmpfs mounted and many distros over the years haven't.

Some hosts should have tmpfs mounted and some shouldn't. For those that don't, I can just /dev/shm. This isn't a "right" or "wrong" sorta thing.


> It's far too easy to clog up memory with forgotten files by defaulting to /dev/shm, for instance, and you potentially also take memory away from the rest of the system until the next reboot.

Aren't both solved by swapping?

Although I suppose on Linux, neither having swap, nor it being backed by dynamically growing files, is guaranteed.


This only stands for modern storage devices with a controller. For SD cards that don't have wear leveling, writing to the same region will make it die faster.


Woohoo, one of the highlights of this time of year. I had to do mine from an eastbound flight over the pacific. This has become a fun tradition not just for me personally but for many friends, colleagues, and fellow HNers. Big props once again to wastl and his helper elves for making this!

I encourage anyone who gets value from this to donate to support it if they can. It is a passion project but nonetheless comes with real costs.


> I encourage anyone who gets value from this to donate to support it if they can. It is a passion project but nonetheless comes with real costs.

With the sheer amount of sponsors and AoC++ users I do believe that this is not quite a small 'passion project' struggling to pay the monthly subscription to a VPS.

That being said, adventofcode is absolutely great and people should support it if they can. But I do think the author is doing quite well with the amount of support he is currently receiving.


Architecture matters because while deep learning can conceivably fit a curve with a single, huge layer (in theory... Universal approximation theorem), the amount of compute and data needed to get there is prohibitive. Having a good architecture means the theoretical possibility of deep learning finding the right N dimensional curve becomes a practical reality.

Another thing about the architecture is we inherently bias it with the way we structure the data. For instance, take a dataset of (car) traffic patterns. If you only track the date as a feature, you miss that some events follow not just the day-of-year pattern but also holiday patterns. You could learn this with deep learning with enough data, but if we bake it into the dataset, you can build a model on it _much_ simpler and faster.

So, architecture matters. Data/feature representation matters.


> can conceivably fit a curve with a single, huge layer

I think you need a hidden layer. I’ve never seen a universal approximation theorem for a single layer network.


I second that thought. There is a pretty well cited paper from the late eighties called "Multilayer Feedforward Networks are Universal Approximators". It shows that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function. For non continous function additional layers are needed.


Minsky and Papert showed that single layer perceptrons suffer from exponentially bad scaling to reach a certain accuracy for certain problems.

Multi-layer substantially changes the scaling.


But not all things you might do with a dotfile (or, more generally, per-user customization) are just replacing files. Things like cronjobs, brew installs, `defaults` in MacOS, etc. Viewing dotfile-based customization as strictly files to obliterate with pre-existing files is needlessly myopic.

For this broader problem, there are other more complete solutions that are more robust and flexible. Personally I like dotbot (https://github.com/anishathalye/dotbot) as a balance between power and simplicity, particularly when managing files across multiple OS homedirs (e.g. linux server, macos laptop).


I'm not suggesting you do this (and I certainly don't) but arguably you could still manage that with just files on Linux boxes:

1. Cronjobs replaced with systemd user timers

2. User packages (i.e. brew install or $HOME/bin) with systemd user services and distrobox manifest files

3. I don't think there's a `defaults` equivalent on Linux or at least not one that isn't file based (and thus manageable through dotfiles)

So maybe that's just an OSX concern.


That's provisioning, not dotfiles management. My dotfiles only includes config files. I'd just use the package manager to install packages and I'd just use the relevant program to enable stuff. As I use stow, I just create different configurations for different OS if they differ too much. At most, a handful of scripts to customize my user account.


A different view worth considering:

Dotfiles are just a component, but not the whole story, of your personal compute environment. Your environment also includes things like:

* ~/bin scripts (etc)

* programming language stuff - e.g. go, rust, python, ruby etc have tooling for per-user package management, language version, etc.

* various forms of password/key/auth stuff like ssh allow lists, encrypted password stores, etc.

And the biggest one: Type of machine - work, daily driver, server, etc

The type of machine may require different dotfiles or different parts of dotfiles (e.g. what basrc includes from `. .config/bash/my_local_funcs`), and having some scripting around this makes life easier.

Similarly OS packages are great, and I use them heavily, but work and personal servers and personal desktop all use a different OS, so its useful to have provision scripts for the type of machine, and i keep all that together with my dotfiles (etc) in my "personal environment repo" (it's name is dots, and when i talk about dotfiles I really mean "personal environment". I suspect other share this view, which leads to this "pure dotfiles" vs "dotfiles+parts of provisioning" viewpoint difference even though they largely have the same set of problems and tooling.


The majority of my computing happens at my workstation (desktop). That is what I consider my personal environment, and I would script its setup, but I can't find the motivation to do so (and I like to do ad-hoc changes). Permanent configuration (related to my usage, not the computer. My core utilities, I can say) get added to my dotfiles. As for server and works, their intersection and my personal stuff are minimal (mostly bash, vim, emacs?) I'd rather have a different system/project to manage them.


This is why I use Nix + home-manager to manage my CLI, programming environment, and system configuration across Linux, macOS and WSL using one GitHub repo. It also handles differences across machine types well.

A dot file management system is only part of the picture.

To spin up a new machine is a 30 minute job, and then it feels like “home”.


What are doing with a dotfile that needs to install a package?


I imagine that things like provisioning are essential to people that switch computers often. So it's not a dotfile-specific problem, but more of a dotfile-adjacent problem.

There's so many interesting edge-cases that affect UX even when distro-hopping between Debian-based distros... especially if you used it for several years and had plenty of custom scripts in your ~/.local/bin folder.

I may yet need to learn or (re)discover some best practices of how to get up to a working development environment faster. I'm thinking of using Guix for that... but I digress.

So far, my workflow goes like this (on a newly-installed distro):

1. Configure environment variables that affect package-specific file locations (/etc/security/pam_env.conf and a custom /etc/profile.d/xdg_std_home.sh script that creates and assigns correct permissions for required directories).

2. Provision packages

3. Deploy config files (using stow).

What I've yet to figure out (haven't really researched it yet), how do you handle app-specific configs (think Firefox add-ons, add-on configs, Thunderbird accounts, etc.)?


"Switch computers often" can also apply to "switch computers with little notice". Even if 95% of my time is spent on one computer, it's nice to know my config is safely squirreled away and, uh, trivially unsquirrelable if something terrible happens to this hardware and I have to get another computer. Seems like a relatively low probability event, but my child has already destroyed two ThinkPads (both were very old and very disposable--still an accomplishment).

As to your last question, nix+home manager gets you there, but that's a whole other Thing.


(n)vim for example: my dotfiles don't vendor the handful of plugins i use, they just include the directives to install those with plugin manager.

I generally use a makefile + stow to handle my dotfiles and home-dir setup. Each program has an entry in this Makefile - most of them are very simple, I keep a list of programs who's dots need to be in ~, and another for ~/.config/ and using make's variable expansion they just get a stow target.

For things like the above example (nvim):

   nvim: nvim_alert_install nvim_stow
 $(shell echo "PackerSync\nqall" | nvim -es )
This also allows me to not just copy preference, but provision a bunch of stuff that's invariant across machines (e.g. what i have installed via rustup, go install, etc).


Reread the story. The child wasn’t left in the car for an extended period (by a grandparent, not parent). The child had just been buckled into a car seat and the driver closed the door, walked around to the drivers side, and couldn’t get in.

Absolutely no indication of improper adult behavior.


We give away potatoes to trick or treaters on Halloween. They are immensely popular and we’ve become known as the potato house in our city’s Facebook groups. The weird delight on the faces of kids of all ages was hugely unexpected but surprisingly consistent.


When I lived in Santa Cruz back in the early 2000s I lived in a duplex, and my duplex neighbour and I would cook and give away well over three 30lb bags of baked potatoes each Halloween. Bake the potatoes early in the day, cut them open, put in the butter, salt and pepper, then close them up and wrap in tin foil. Kids and teenagers would go out of their way to get a potato from us.


Ah man, you're making me look forward to winter when we can make bonfire potatoes again, by wrapping them in foil with butter and a few flavourings, then putting them into the hot coals for a couple of hours.

I'm in the southern hemisphere and in general I love summer, but those potatoes are a thing of joy.


Do you give away cooked or raw taters at halloween?


Careful, your homedir has a CloudStorage folder and if you are using, say, Dropbox or Google Drive then that find will be incredibly slow (in addition to security software possibly slowing it down).


I find it very useful. I made a tool similar to mcfly (before knowing it existed) and use this workflow (`--here`) constantly. Also hostname context and shell session can be useful at times, too, to reconstruct something in the past.

https://github.com/chipturner/pxhist


While I doubt I'd quit my day job for it, over the past couple of years I've been poking at my own database-backed shell history. The key requirements for me were that it be extremely fast and that it support syncing across multiple systems.

The former is easy(ish); the latter is trickier since I didn't want to provide a hosted service but there aren't easily usable APIs like s3 that are "bring your own wallet" that could be used. So I punted and made it directory based and compatible with Dropbox and similar shared storage.

Being able to quickly search history, including tricks like 'show me the last 50 commands I ran in this directory that contained `git`' has been quite useful for my own workflows, and performance is quite fine on my ~400k history across multiple machines starting around 2011. (pxhist is able to import your history file so you can maintain that continuity)

https://github.com/chipturner/pxhist


Built something similar (though I've yet to get astound to the frontend for it--vaguely intend to borrow one).

I neither love nor hate it as a sync mechanism, but I ended up satisficing with storing the history in my dotfile repo, treating the sqlite db itself as an install-specific cache, and using sqlite exports with collision-resistant names for avoiding git conflicts.


CouchDB might be useful for this scenario due to its multi-master support so devices can sync to each other without using a centralized database. It's also very performant, though if you put gigabytes of data into it, it'll also consume gigabytes of RAM.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: