So I’m working on a server from home.

I do a cat /sys/class/net/eth0/operstate and it says unknown despite the interface being obviously up, since I’m SSH’ing into the box.

I try to explicitely set the interface up to force the status to say up with ip link set eth0 up. No joy, still unknown.

Hmm… maybe I should bring it down and back up.

So I do ip link set eth0 down and… I drive 15 miles to work to do the corresponding ip link set eth0 up

50 years using Unix and I’m still doing this… 😥

    • GaMEChld@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      10 hours ago

      Can also use Pi KVM to add a similar capability to non server grade hardware that doesn’t have it. I did that for a workstation once.

  • moonpiedumplings@programming.dev
    link
    fedilink
    English
    arrow-up
    11
    ·
    23 hours ago

    Use cockpit by Red Hat. It gives you a GUI to make networking changes*, and will check if the connection still works before making the change. If the connection doesn’t work (like the ip addresses changed), it will undo the change and then warn you. You can then either force the change through or leave it be.

    *via NetworkManager only.

    • caseyweederman@lemmy.ca
      link
      fedilink
      arrow-up
      2
      ·
      21 hours ago

      That’s probably because of netplan, right? You should be able to get the same results with just netplan try.

      • moonpiedumplings@programming.dev
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        21 hours ago

        Netplan is an abstraction layer, so it can go over systemd-networkd, NetworkManager, or iproute. I suppose it’s better though, because it can be used with multiple backends.

              • moonpiedumplings@programming.dev
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                18 hours ago

                No. Netplan uses it’s own yaml format, which people would have to learn and use. I don’t want to do that, I would rather just configure my existing networkmanager setup, rather than learning another abstraction layer over what is already an abstraction layer.

                I understand that cockpit (and similar type tools) are “the whole kitchen sink” of utilities, and it may seem like they come with more than you may need. But that doesn’t change the fact that they get the job done, and in some usecases, are better than dedicated tools.

  • iriyan@lemmy.ml
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    22 hours ago

    I still prefer net-tools and use ifconfig eth0 up That ip mess I’d rather do without, and those funky UU device/interface names I wish them out of my system

    By the way, what system/init/svc manager are you using? With 50y in your back, cron job to check if it is up and resetting it while you are away. You can always remotely cancel the cronjob … but it will be a new mistake not the old one :)

    I started on Irix and ultrix if you remember those, what would I know :)

  • mlg@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    24 hours ago

    Lol I’ve locked myself out of so many random cloud and remote instances like this that now I always make a sleep chain or a kill timer with tmux/screen.

    Usually like:

    ./risky_dumb_script.sh ; sleep 30 ; ./undo.sh

    Or

    ./risky_dumb.script.sh

    Which starts with a 30 second sleep, and:

    (tmux) sleep 300 ; kill PID

  • dependencyinjection@discuss.tchncs.de
    link
    fedilink
    arrow-up
    8
    ·
    edit-2
    22 hours ago

    Not SysAdmin but about a year into my first software engineer job I was working on the live DB in SQL without using BEGIN TRAN ROLLBACK TRAN.

    Suffice to say I broke the whole system my making an UPDATE without a WHERE clause. Luckily we have regular backups but it was a lot of debugging with the boss before I realised it was me that caused the issue the client was reporting.

  • apt_install_coffee@lemmy.ml
    link
    fedilink
    arrow-up
    9
    ·
    1 day ago

    A few months ago I accidentally dd’d ~3GiB to the beginning of one of the drives in a 4 drive array… That was fun to rebuild.

    • Like 3 weeks ago on my (testing) server I accidentally DD’d a Linux ISO to the first drive in my storage array (I had some kind of jank manual “LVM” bullshit I set up with odd mountpoints to act as a NAS, do not recommend), no Timeshift, no Btrfs snapshot. It gave me the kick in the pants I needed to stop trying to use a macbook air with 6 external hard drives as a server though. Also gave me the kick in the pants I needed to stop using volatile naming conventions in my fstab.

      • apt_install_coffee@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        11 hours ago

        I wish.

        It was a bcachefs array with data replicas being a mix of 1,2 & 4 depending on what was most important, but thankfully I had the foresight to set metadata to be mirrored for all 4 drives.

        I didn’t get the good fortune of only having to do a resilver, but all I really had to do was fsck to remove references to non-existent nodes until the system would mount read-only, then back it up and rebuild it.

        NixOS did save my bacon re: being able to get back to work on the same system by morning.

  • toynbee@lemmy.world
    link
    fedilink
    arrow-up
    15
    ·
    edit-2
    1 day ago

    A decade and change ago, in a past life, I was tasked with switching SELinux to permissive mode on the majority of systems on our network (multiple hundreds, or we might have gotten above one thousand at that point, I don’t recall exactly). This was to be done using Puppet. A large number of the systems, including most of our servers, had already been manually switched to permissive but it wasn’t being enforced globally.

    Unfortunately, at that point I was pretty familiar with Puppet but had only worked with SELinux a very few times. I did not correctly understand the syntax of the config file or setenforce and set the mode to … Something incorrect. SELinux interpreted whatever that was as enforcing mode. I didn’t realize what I had done wrong until we started getting alerts from throughout the network. Then I just about had a panic attack when I couldn’t login to the systems and suddenly understood the problem.

    Fortunately, it’s necessary to reboot a system to switch SELinux from disabled to any other mode, so most customer facing systems were not impacted. Even more fortunately, this was done on a holiday, so very few customers were there to be inconvenienced by the servers becoming inaccessible. Even more fortunately, while I was unable to access the systems that were now in enforcing mode, the Puppet agent was apparently still running … So I reversed my change in the manifest and, within half an hour, things were back to normal (after some service restarts and such).

    When I finally did correctly make the change, I made sure to quintuple check the syntax and not rush through the testing process.

    edit: While I could have done without the assault on my blood pressure at the time, it was an effective demonstration of our lack of readiness for enforcing mode.

  • markstos@lemmy.world
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    1 day ago

    I was scared to move the cloud for this reason. I was used to running to the server room and the KVM if things went south. If that was frozen, usually unplugging the server physically from the switch would get it calm down.

    Now Amazon supports a direct console interface like KVM and you can virtually unplug virtual servers from their virtual servers too.

  • twinnie@feddit.uk
    link
    fedilink
    arrow-up
    98
    ·
    2 days ago

    I knew a guy who did this and had to fly to Germany to fix it because he didn’t want to admit what he’d done.

  • sleepmode@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    2 days ago

    I was on-call and half awake when I got paged about a cache server’s memcached being down for the third time that night. They’d all start to go down like dominoes if you weren’t fast enough at restarting the service, which could overwhelm the database and messaging tiers (baaaaad news to say the least). Two more had their daemon shit the bed while I was examining it. Often it was best to just kick it on all of them to rebalance things. It was… not a great design.

    So I wrote a quick loop to ssh in and restart the service on each box in the tier to refresh them all just in case and hopefully stop the incessant pages. Well. In my bleary eyed state I set reboot in the variable instead of restart. Took out the whole cache tier (50+) and the web site. First and only time I did that but that definitely woke me up. Oddly enough the site ran better after that for months as my reboots uncovered an undiscovered problem.

  • Ephera@lemmy.ml
    link
    fedilink
    English
    arrow-up
    11
    ·
    2 days ago

    At $DAYJOB, we’re currently setting up basically a way to bridge an interface over the internet, so it transports everything that enters on an interface across the aether. Well, and you already guessed it, I accidentally configured it for eth0 and couldn’t SSH in anymore.

    Where it becomes fun, is that I actually was at work. I was setting it up on two raspis, which were connected to a router, everything placed right next to me. So, I figured, I’d just hook up another Ethernet cable, pick out the IP from the router’s management interface and SSH in that way.
    Except I couldn’t reach the management interface anymore. Nothing in that network would respond.

    Eventually, I saw that the router’s activity lights were blinking like Christmas decoration. I’m guessing, I had built a loop and therefore something akin to a broadcast storm was overloading the router. Thankfully, the solution was then relatively straightforward, in that I had to unplug one of the raspis, SSH in via the second port, nuke our configuration and then repeat for the other raspi.

  • InnerScientist@lemmy.world
    link
    fedilink
    arrow-up
    38
    ·
    2 days ago

    I have a failsafe service for one of my servers, it pings the router and if it hasn’t reached it once for an entire hour then it will reboot the server.

    This won’t save me from all mistakes but it will prevent firewall, link state, routing and a few other issues when I’m not present.