Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is fsck up to now? (toroid.org)
76 points by Arnt on March 17, 2020 | hide | past | favorite | 40 comments


One of my favorite BSD features is SIGINFO, which is intended for applications to give some sort of information about what they’re currently doing. If you’re on macOS, I know some of the copying commands (such as dd) implement it, and a ^T in your terminal will tell you how far along it is.


SIGINFO is awesome, I've no idea why the linux world refuses to copy its semantics.


While named somewhat poorly for what it is sometimes used for SIGQUIT can be used for this. Ping uses it for statistics reporting in the middle of pinging. You can use Ctrl+\ in the terminal to send it. E.g.

    $ ping google.com
    PING google.com (172.217.5.110) 56(84) bytes of data.
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=1 ttl=54 time=4.04 ms
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=2 ttl=54 time=4.04 ms
    2/2 packets, 0% loss, min/avg/ewma/max = 4.037/4.039/4.037/4.042 ms
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=3 ttl=54 time=4.16 ms
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=4 ttl=54 time=4.06 ms
    4/4 packets, 0% loss, min/avg/ewma/max = 4.037/4.076/4.054/4.164 ms
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=5 ttl=54 time=4.19 ms
    64 bytes from sfo03s07-in-f110.1e100.net (172.217.5.110): icmp_seq=6 ttl=54 time=4.20 ms
    ^C
    --- google.com ping statistics ---
    6 packets transmitted, 6 received, 0% packet loss, time 11ms
    rtt min/avg/max/mdev = 4.037/4.114/4.195/0.068 ms


I was thinking it was SIGINFO, but this reminded me of the SIGUSR1 trick to get the current status of a running dd process-

  $ dd if=/dev/zero of=/dev/null& pid=$!
  $ kill -USR1 $pid; sleep 1; kill $pid
  
  18335302+0 records in 18335302+0 records out 9387674624 bytes (9.4 GB) copied,  34.6279  seconds, 271 MB/s


That's interesting. I always wonder how people monitored the progress of dd before it added the status option.

Now you can use the status option to get a realtime update of the progress.

dd if=/dev/urandom of=/dev/null status=progress


It is siginfo on BSDs, sigusr1 is the fallback.


There have been attempts to do so over the years. https://lkml.org/lkml/2019/6/5/174 was one. I haven't looked to see what became of them.


Signal numbers are a non-renewable resource, perhaps they think it'll cut into their valuable reserves ;)


I have used SIGUSR1 and SIGUSR2 for this purpose at work.


The problem of SIGUSR is it has no defined semantics and defaults to terminating the application, so you can't just throw SIGUSRs at random process, you need to know somehow that the process does something useful. Furthermore I don't think sigusrs have a control code.

SIGINFO is ignored by default, and pretty clearly an info-dump trigger, so you can throw ^T at any random utility you're running, worst case scenario is you'll just get a time-type dump.


Huh, AFAIK on linux dd responds to SIGUSR1.


The trouble with SIGUSR1 is that the default action is terminate, so you can only send it to processes you know will take it well.

It's safe to send SIGINFO to processes that don't know about it; the default action is to ignore it. You can send it to an entire process group, and maybe some of them will answer it. But even if they don't, they won't just get killed for it.

This makes it so much more useful and discoverable, since you can almost always ^T with little risk. Usually you'll get at least a bit of information about how long the current command (more or less) has been running. If the running program happens to handle it, so much the better.


On Linux you can use progress¹ for many of these use cases. By default it scans for running processes that you might want to know about, but you can also ask it tell you about a PID with -p. It supports a -m[onitor] mode to report status until the command exits, and features some basic filtering options to ignore certain files.

You can also manually dig about in /proc/$pid/fd{,info}/ if you want something more fancy, like using gdbar² to display a graphical progress through files for a given process.

1. https://github.com/Xfennec/progress

2. https://github.com/robm/dzen


Its good but it suffers from the same slowdowns as htop when there are multiple operations happening as it crawls through /proc


Yeah, I'd recommend using the `-p $pid` option when you can. Not just because it doesn't need to scan all of /proc, like the -c[ommand] or default mode do, but also because it doesn't suddenly start listing other processes when you're in monitor mode.

That said, sometimes it is nice to see the other commands pop up in monitor mode. For example, when the rate suddenly drops in a command that you care about then the other output will often show the culprits for you to `kill -STOP`.




This is absolutely fantastic; thank you for teaching me about this. Seems like the default output is

    load: {load%}  cmd: {cmd name} {PID} running {user time}u {system time}s
Being able to grab the PID from a currently running process -- in the same shell it's running in! -- is priceless on its own. The rest is icing on the cake.


Wow! I have been a Mac user since PPC was a thing, and I had no idea! Thank you!


Does cp support it (on a BSD)?


macOS's does.


This is cool. But it seems to me that this person's setup could benefit from a way for the computers to be notified that battery power would run out soon so that they could shut down cleanly. I have that going at home through a USB connection to a UPS...it would be harder to set it up with a central battery but they seem to be up to fun challenges :)


(I'm this person.) It so happens that the solar inverter I'm using right now doesn't provide a data connection that I could use to shut down the computers cleanly.

But I should also clarify that my long-running fsck isn't always the result of an unclean shutdown. There's something about my combination of iSCSI+crypttab+NFS that causes fsck to be run too often—even if I shut down the machine cleanly while the NAS is running, it usually decides to fsck when it comes back up.

Something to investigate next winter, perhaps.


Hi :) Thanks for the awesome write-up. I understand...I have a few devices in my house that resist all attempts at integration. I have considered doing something totally ridiculous like setting up a Raspberry Pi with a camera and machine vision software to watch the LED displays on these devices to glean status information. Silly, but...

Interesting about fsck running for unclear reasons. "What is it up to now?" is a valid question at multiple levels!


> Thanks for the awesome write-up.

Glad you enjoyed it. :-)

> I have considered doing something totally ridiculous like setting up a Raspberry Pi with a camera and machine vision software to watch the LED displays on these devices to glean status information. Silly, but...

Ha! I actually have a PoE camera pointed at the display of my UPS. Here's what it looks like right now: https://toroid.org/misc/ups-display.jpg

Notice that horizontal blank row of dead pixels halfway down the right side of the display? The one that makes "54.6" look like "51.6"? That gap defeated my naïve five-minute attempt to use image recognition to extract the battery voltage.


I would be tempted to tie the LED lines to digital ins on an Arduino. I'm betting some finagling with the display data lines could get you the information from the display, as well.

Depending on how the cabling to that display works, you might be able to do all of that without having to disable the display.


Whats up with the "[f]" in your grep command?

    grep '[f]sck'


It keeps the grep command itself from showing up in the output.


Ooh, that's very cute! I usually throw a "| grep -v grep" on the end, but I'm gonna try to remember this.


Better to remember pgrep, instead of an incremental change that you'll find still has problem cases (such as matching usernames).

* http://mywiki.wooledge.org/ProcessManagement#But_I.27m_on_so...

M. Wooledge's description of ps options is not quite accurate, but that is incidental to xyr main point.


You could run a cheap/small UPS off your main stores maybe? That could possibly provide the signal and management


Enjoy some more manual pages in the same vein:

* http://jdebp.uk./Softwares/nosh/guide/commands/monitored-fsc...

* http://jdebp.uk./Softwares/nosh/guide/commands/monitor-fsck-...

And a service:

    % system-control print-service-scripts monitor-fsck-progress
    start:#!/bin/nosh
    start:true
    stop:#!/bin/nosh
    stop:true
    run:#!/bin/nosh
    run:#local socket used for monitor-fsck-progress
    run:local-stream-socket-listen --systemd-compatibility --backlog 2 --mode 0644 /run/fsck.progress
    run:setsid
    run:setlogin -- daemon
    run:vc-get-tty console
    run:fdmove -c 4 2
    run:open-controlling-tty
    run:fdmove 2 4
    run:setuidgid -- daemon
    run:./service
    service:#!/bin/nosh
    service:#fsck combined progress information displayed on /dev/console
    service:monitor-fsck-progress
    restart:#!/bin/sh
    restart:exec false      # ignore script arguments
    %


Don't usually care about titles but surely, "What, the fsck, have you done?"


I've always figured that fsck is called that because they couldn't come up with an acronym with u.


Filesystem Uniformity ChecK isn't a huge reach…



Ugh, the more I see of systemd the worse it looks. What a mess of a "design" we see a glimpse of here.


Did he ever get to fsck ? Or did it just hang the entire time ?


     killall -USR1 e2fsck
or start it with -C


systemd-fsck, as the article outright told you, already starts it with -C .




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: