As in, when I watched YouTube tutorials, I often see YouTubers have a small widget on their desktop giving them an overview of their ram usage, security level, etc. What apps do you all use to track this?

  • @ElevenNotesB
    link
    fedilink
    English
    19 months ago

    Netdata, monitoring a few thousand servers (virtual) that way.

    • @SadanielsVDB
      link
      fedilink
      English
      19 months ago

      This. If you have more servers you can also get them all connected to a single UI where you can see all the Infos at once. With netdata cloud

      • @Spaceman_SplffB
        link
        fedilink
        English
        19 months ago

        Just set this up yesterday. I used a parent node and then have all my vms point to that. Took like an hour to figure it out

        • @scotrodB
          link
          fedilink
          English
          19 months ago

          Hey, did you use the cloud functionality or not? I’m tryna go all local with parent-child kind of capability but so far unable to.

          • @Spaceman_SplffB
            link
            fedilink
            English
            19 months ago

            I don’t know if I’ll keep running this. Already the child nodes are complaining about increase write delays since installing the agents on them.

          • @Spaceman_SplffB
            link
            fedilink
            English
            19 months ago

            The parent still is visible to the cloud portal. My understanding is the data all resides local, but when you login to their cloud portal, it connects to the parent to display the information. I’m still playing with it to confirm. My parent node shows all the child nodes on the local interface but the cloud still shows them all.

  • @HCharlesBB
    link
    fedilink
    English
    19 months ago

    Checkmk (Raw - free version.) Some setup aspects are a bit annoying (wants to monitor every last ZFS dataset and takes too long to ‘ignore’ them one by one.) It does alert me to things that could cause issues, like the boot partition almost full. I run it in a Docker container on my (primarily) file server.

    • @TheDeepTechB
      link
      fedilink
      English
      19 months ago

      I use this as well! Works well and has built in intelligence for thresholds.

    • @joshiegyB
      link
      fedilink
      English
      19 months ago

      No… Why? Its old, its trash. Might get hate for it, but its just not good.

    • @djbon2112B
      link
      fedilink
      English
      19 months ago

      I second CMK.

      A TICK stack is unwieldy, Grafana takes a lot of setup, and all of this assumes you both know what to monitor and get stats on it.

      CMK by contrast is plug and play. Install the server on a VM or host, install thr agent on your other systems, and you’re good to go.

      • @joshiegyB
        link
        fedilink
        English
        19 months ago

        I’m running a tick stack with a couple of thousands of servers - way less CPU usage than checkmk/nagios or anything else from the previous millennium …

        • @djbon2112B
          link
          fedilink
          English
          19 months ago

          How do you solve the problem of runaway memory usage? Even monitoring a few dozen hosts, memory usage would grow to many GB and continue to grow indefinitely until it OOM’d, and from my reading Influx has no way to prevent this.

  • @opensrcdevB
    link
    fedilink
    English
    19 months ago

    InfluxDB metrics server and Telegraf agent to collect metrics

  • @talent_deprivedB
    link
    fedilink
    English
    19 months ago

    I use sar for historical, my own scripts running under cron on the hosts for specific things I’m interested in keeping an eye on and my on scripts under cron on my monitoring machines for alerting me when something’s wrong. I don’t use a dashboard.

  • @Mother_Construction2B
    link
    fedilink
    English
    19 months ago

    I know that it needs a fix when my dad complaining that he can’t watch TV and the rolling door doesn’t open in the morning.

  • @JoeB-B
    link
    fedilink
    English
    19 months ago

    I use Telegraf + InfluxDB + Grafana for monitoring my home network and systems. Grafana has a learning curve for building panels and dashboards, but is incredibly flexible. I use it for more than server performance. I have a dual-monitor “kiosk” (old Mac mini) in my office displaying two Grafana dashboards. These are:

    Network/Power/Storage showing:

    • firewall block events & sources for last 12 hrs (from pfSense via Elasticsearch),
    • current UPS statuses and power usage for last 12 hrs (Telegraf apcupsd plugin -> InfluxDB),
    • WAN traffic for last 12 hrs ( from pfSense via Telegraf -> InfluxDB),
    • current DHCP clients (custom Python script -> MySQL), and
    • current drive and RAID pool health (custom Python scripts -> MySQL)

    Server sensors and performance showing:

    • current status of important cron jobs (using Healthchecks -> Prometheus),
    • current server CPU usage and temps, and memory usage (Telegraf -> InfluxDB)
    • server host CPU usage and temps, and memory usage for last 3 hrs (Telegraf -> InfluxDB)
    • Proxmox VM CPU and memory usage for last 3 hrs (Proxmox -> InfluxDB)
    • Docker container CPU and memory usage for last 3 hrs (Telegraf Docker plugin -> InfluxDB)

    Netdata works really well for system performance for Linux and can be installed from the default repositories of major distributions.

    • @daniel280187B
      link
      fedilink
      English
      19 months ago

      Network/Power/Storage

      Pretty cool dashboards. I liked the DHCP clients info, does it also report DHCP reservations?

      Where do you do DHCP, on the PFSense or somewhere else?

      • @JoeB-B
        link
        fedilink
        English
        19 months ago

        does it also report DHCP reservations?

        Thanks, and yes, Typestatic” are DHCP reservations.

        Where do you do DHCP, on the PFSense or somewhere else?

        Yes, on pfSense. I use the Python function written by pletch/scrape_pfsense_dhcp_leases.py (on Github) that scrapes the pfSense status_dhcp_leases.php page. Then added my own function for querying my TP-Link APs using SNMP to determine which AP a wireless DHCP client is connected to.

        I can throw the script up on Dropbox if you are interested. I am mediocre at writing Python, so it is pretty specific to my environment.

  • @MothGirlMusicB
    link
    fedilink
    English
    19 months ago

    We use zabbix here. Zabbix is amazing and we put it in all of our templates so any new servers and hosts pop up on zabbix dashboard preconfigured just like that. For logs and security we use an Elastik “ELK stack” which gives us a heads up if anything is wrong in the logs, and zabbix gives us a head up of the systems health all together. Between the two, our health monitor panel combines the two windows so we can see full server health and any problems right there as a todo list for the IT team

  • @kindrudekidB
    link
    fedilink
    English
    19 months ago

    If get ahead of it by getting extra.

    Need 16 gb of ram and 8 cores ? Well let me add 64 gb to my cart and 12 core CPU.

    Hasn’t failed me

  • @5c044B
    link
    fedilink
    English
    19 months ago

    I use Home Assistant already. They have a plugin for glances. I guess all I’m interested in is cpu temp and load. Any changes =somethings up

  • @MacGyver4711B
    link
    fedilink
    English
    19 months ago

    CheckMK for general monitoring, Grafana/Prometheus for Proxmox-cluster, Wazuh for IDS-purposes and UptimeKuma for general uptime on services. It’s not like it’s necessary, but it’s nice to tinker in my homelab before implementing the same services on a “professional level” at work.

    My HomeAssistant is stable, so wifey is not being used as a monitor ;-)