<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>The Midnight Admin | Posts</title><description>Sysadmin returns - Technical posts and tutorials</description><link>https://bradgillap.com/</link><language>en-us</language><item><title>Self Documenting: Moving Beyond Just Being Organized</title><link>https://bradgillap.com/posts/2025/12-december/2025-12-17-december-self_documenting/</link><guid isPermaLink="true">https://bradgillap.com/posts/2025/12-december/2025-12-17-december-self_documenting/</guid><description>Can a total stranger decode your system on their first day? Moving beyond personal organization rituals toward universal clarity is the key to building resilient, scalable infrastructure.</description><pubDate>Wed, 17 Dec 2025 13:00:00 GMT</pubDate><content:encoded># Garbage In, Garbage Out

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; When I was in school. I took a class in applied sciences about thinking. I know it sounds like a total bird course right? The course left me with some considerations in how I come and go between complicated tasks.  

- What makes it more difficult for us think?
- How come others don’t think like us?  
- How much extra time do we all spend accessing something previously encoded in ourselves but find it very time consuming to access?
- What is the biological science?
- What are some examples about how self documenting applies to everyone.

# Or.. Encoding and Decoding

Neuroscience is captivating, yet the specific biological details are fascinatingly boring to anyone who isn&apos;t a specialist. We are seeing modern brain mechanics finally prove what academic research suggested years ago. However, it’s far too easy to drift into &quot;woo&quot; territory here; connecting biological &quot;code&quot; to human outcomes is a complex task. To make a real impact, we have to be ruthless: keep the mechanisms that work and throw out the rest.

| Concept | Explanation | Real-world Example (Self-documenting) |
|----|----|----|
| **Encoding** | The process of converting information into a form that can be stored in the brain or a system. | Naming a server or client following a specific pattern (e.g., SITE-LT-CS-01) that encodes its location, type, and department. |
| **Decoding** | The process of converting stored or encoded information back into a form that can be understood.  | A new sysadmin being able to understand the function and location of a client just by looking at its name, without needing external documentation. |
| **Neurons** | Specialized cells in the brain that transmit information through electrical and chemical signals. | The biological &quot;hardware&quot; that processes information and is affected by external factors like noise and stress. |

# Noise Noise Noise!

It’s Christmas, let’s talk about noise.

&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/KUg5UTC3x3A?si=POQLxdUma64QGr5f&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerpolicy=&quot;strict-origin-when-cross-origin&quot; allowfullscreen&gt;&lt;/iframe&gt;

## The Impact of Noise

Noise and distraction are two obvious drivers that harm your ability to think, work through a complex problem, or concentrate. Something everyone can understand. There is also cognitive noise to consider. Things like, too many tasks in your head, self confidence, and time stress toward proper implementation.  

## The Problem with Scientific Silos

Now, looking to science with noise and how we think is difficult because each silo of science only wants to look at their own dominating factors. Except people don’t fit into little silos just perfectly so through these studies we find that there is always a little bit of deviation. Some people struggle with auditory noise. Some people struggle with visual noise. Some people struggle with competing thoughts.

## Why Frameworks Fall Short

So if you have been trying to apply frameworks to improving your efficiency and they all just seem to fall a little short? Consider that frameworks tend to target the struggles of a group of people or workflow. Things that the creators of the framework struggled with themselves.

Worse, some of the more popular frameworks will point to how they were implemented in an organization and use KPI metrics returned that seemed positive but we have to think further. There are so few scientific controls in those business situations, blindly accepting that they are working or repeatable may be byproduct of the framework itself. Perhaps a deeper struggle the organization was dealing with was rallying staff around a particular way forward at all in which case any framework would show a marginal improvement. This is not scientific. For example, maybe an the agitation was just moved to another department and nobody recognized the catalyst for change at the time.

## Case Study #1: The 85dBA Threshold

A paper on the affect of noise exposure on cognitive performance found that the testing was fairly inconclusive. Some people really struggled, some didn’t. It was only after they turned it up past 85dBA that suddenly the scientists began to see some strong comparison in cognitive decline.  

[https://pmc.ncbi.nlm.nih.gov/articles/PMC6901841/](https://pmc.ncbi.nlm.nih.gov/articles/PMC6901841/)  

That’s fairly loud. Like listening to the spice girls when no one is around loud. Like *I can tell you what I want what I really really want* loud.

### Good Baseline Controls

This study below used additional technologies to measure brain waves while conducting their testing. They isolated for people that don’t drink coffee, or have sleep issues etc. Human beings that are a step above the rest of us mere mortals which is a good baseline for control.

### The Solve

Simple, we control environment. Reduce noise, and see a positive boost on that section
of the population that struggles with auditory sensitivity.  

## Study #2: Decision Fatigue and Ego Depletion

You&apos;ve likely heard of this and there are all kinds of takes on how to handle
decision fatigue. From the most extreme just say *&quot;YES&quot;* style cultish self help
for joining new experiences to picking the right fruit in the grocery isle.  

The paper referenced below focuses on decision fatigue as it relates to nursing and states that the average person makes 3500 decisions per day!  The research also suggests that decision fatigue is very susceptible to glucose levels. As fatigue sets in, confidence wanes and people become more passive or fearful to make strong decisions to move forward.  

The correlation to support self documenting systems in this case is that we want scenarios where we are making quick determinate decisions on our tasks without having to think through or research harder.

The evidence suggested by the paper suggests detriment to executive function which is your mission control for solving problems. Planning also suffers.  

[https://pmc.ncbi.nlm.nih.gov/articles/PMC6119549/](https://pmc.ncbi.nlm.nih.gov/articles/PMC6119549/#S20)

### The Solve

It&apos;s unfortunate that this is such a widespread phenomenon. There are many indicators related to decision fatigue in cognition and it increases negative attributes such as bias the more fatigued a person is.  

The best advice right now? Reduce cognitive load, manage our fuel levels and plan our deep research or use our planning energies earlier in the day. Reduce cognitive load for benign small tasks that could use less executive function. Regardless, decision fatigue is real and has impact. In a more practical sense it&apos;s also a great agument against end of the day meetings.  

## Reference #3 Chunking Patterns

Chunking is compressing smaller ideas into lists or patterns that are easier to process.  

Just like certain workloads on computers may benefit from how large a block of data stored is due access times regarding retrieval. Humans do better at juggling just a few small variables at once. Smaller chunks and less of them mean faster workloads. If we can also apply patterns to that idea then we get an even smoother efficiency toward transitioning between internal considerations.  

[https://www.ebsco.com/research-starters/psychology/chunking-psychology#full-article](https://www.ebsco.com/research-starters/psychology/chunking-psychology#full-article)

![Chunking patterns](./chunks.jpg)

# What Does This Have to do with Self Documenting

Your prefrontal cortex makes better decisions when systems are clear, packed neatly and displaying recognizable patterns.  

Nothing, and everything. So imagine you are looking to change a GPO in your domain and there are 100’s of GPO’s or maybe there’s just one big GPO with 100’s of settings (god speed). Now, you know the behaviour that you wish to change but the list is overwhelming and that’s even before you add in the complexity of applying user or group individual rules or find OU hierarchies and overrides, you get the point.  

Changing the rule can have adverse consequences if you don’t test it thoroughly and that drives a new trigger (fear) which then dumps a chemical called cortisol into your brain. You were tasked with changing a simple behavior on Windows clients but suddenly you are paralyzed and stuck looking for a needle in a haystack and begin to expect poor outcomes.  

Maybe you have to setup a whole test situation and sync that client to the domain adding onto the time the task will take.  This is an extremely common situation for IT people. Something that sounded like a two second change, is now a 2 hour endeavour.  

&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/5W4NFcamRhM?si=EzycGkON_m2cz0r3&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerpolicy=&quot;strict-origin-when-cross-origin&quot; allowfullscreen&gt;&lt;/iframe&gt;

## Solving the Wrong Problem

Just look at the tools we have available for this task, to deconstruct environments and try and find common threads in group policy objects. Whole command line utilities have been written in an attempt to make this process clear. A self documented system could get you there with a high amount of confidence and even allow you to circumvent the testing period.

## Solving the Right Problem by Self Documenting Design

Alternatively, let’s assume you had a really good sysadmin that cared about self documenting the domain, clients, and networking because they had the foresight to know ahead what this situation progressed would look like. Well a number of things could be done.

* Client names could all follow a pattern.  Like:  SITE-LT-CS-01
  * Site acronym
  * machine type or department
  * unique number

Now at a glance.  With this short hand, you can tell a lot about that client.  Where it physically should be, if it’s a laptop or desktop,  which department (customer service) and when it was assigned based on a deployment number. We can also structure our GPO’s in the domain in a similar way using names and folders to split them individually by action. That’s If we had the foresight to start this pattern and trend ahead of the issue all together. We are architects toward a future that we don’t always know will bear fruit without past experiences in this way.  

## That’s Just Called “Being Organized”

While it looks like simple organization, there is a fundamental difference in intent. Traditional organization is often a personal ritual of beautifully penned notes, complex grids, and proprietary shorthand that work perfectly for the creator but remain a mystery to everyone else. These systems are fragile because they are built around the person, not the task. **Self documentation**, however, is about universal accessibility. It is a system designed to be decoded by a total stranger on their first day. Its power doesn&apos;t come from a manual; it comes from an inherent clarity that supersedes the need for explanation.

| **Feature** | **Traditional &quot;High-Level&quot; Organization** | **Self-Documenting Systems** |
|----|----|----|
| **Primary User** | The Creator (Personalized) | The &quot;Perfect Stranger&quot; (Universal) |
| **Logic** | Internal/Siloed | External/Pattern-based |
| **Learning Curve** | High (Requires explanation) | Zero (Inherently intuitive) |
| **Failure State** | System breaks if the creator leaves | System persists regardless of the user |
| **Goal** | Efficient Storage (Encoding) | Rapid Retrieval (Decoding) |

&gt;However bitter sweet, the highest praise a sysadmin can receive is to automate themselves out of a job.

## Organizational Incentives and Leadership

This is also something that can be socially difficult. In an organization where maybe you need other departments to adopt a certain way of naming things or those work flows. If one department is tasked with troubleshooting GPO’s but the other isn’t. Well now you have mismatch in incentives.

This is where leadership matters and great forms. Good leadership will recognize and sweat the small things, make good arguments and deal with other silos to care about the cross contamination of self documenting workflow. They’ll recognize the projects that further benefit these systems and rally.

### The Challenge in the Homelab

For ourselves and in our home labs. There really isn’t as much excuse but perhaps what’s more difficult personally is the creativity to design the self documenting aspects while solving a problem, and then sticking to it in an environment where there is nobody but yourself to answer to.

### Design vs. Agreement

Regardless the self documenting system is almost always superior against time lost in implementation. This is exactly why we don’t see this enough in the real world. Agreement, is more difficult than design in this case up front and performance outcomes span in a more subtle way over time. This requires a clear leader with the ability to turn down a group and avoid implementation by committee. Where vision and intention may be misunderstandingly removed. Ideally people come with us, and joyfully so but that just isn&apos;t always the case.

### Foundations and Child Objects

If you have a foundational workflow such as naming. Then that advises all other workflows where those child objects reference. Imagine, a few months later, you begin implementing some sort of asset management system to track those clients.

Except now, along with all the other wonderful information that can be gathered such as last time a machine was seen or individual specs. You already have host names and data that can then interpolate between these two systems to draw a much stronger picture.

## Prioritizing Reads Over Writes

Data entry is still a barrier and congruence between departments is once again the weak point. From there you can implement tighter controls on how data is inputted and make the formal process more clear. The input can still follow a bit of a guide or better, use built in constraints that notify the user immediately providing us clean sanitized data via forms.  

Remember the intention is to retrieve faster, not necessary encode and store faster because just like computers, we are more likely to do more frequent reads rather than writes.  

| **Neuroscience Concept** | **Cognitive Mechanism** | **Self-Documenting System Application** | **Practical IT Example** |
|----|----|----|----|
| **Working Memory Limits** | Prefrontal cortex can only hold 4-7 chunks simultaneously | Systems that reduce cognitive load by providing clear patterns | Server naming conventions that eliminate need to remember multiple variables |
| **Pattern Recognition** | Brain processes familiar patterns 60% faster than novel information | Consistent naming and structure create neural pathways | GPO folder structure that follows predictable patterns |
| **Decision Fatigue** | Cortisol buildup impairs executive function after ~35 decisions | Self-documenting systems reduce decision points | Clear naming eliminates need for constant decision-making |
| **Chunking** | Brain compresses related information into single units | Hierarchical organization creates natural chunks | SITE-LT-CS-01 encodes location, type, department as one unit |

&lt;Picture src=&quot;./chunks.jpg&quot; alt=&quot;Chunking patterns&quot; /&gt;

## Think Globally With Unassigned Variables

So, you can be as organized as you wish as an individual but the real power of self documenting systems are if you can hand the baton to a perfect stranger or say to someone:

&gt;You should be able to locate XYZ in these systems. Analyze the history, and recognize the intention to add a new item yourself without having been formally trained on how to do so.  

If a totally new brain can quickly begin to decode this system without referring to further documentation. That’s a damn good system.  

## The Stranger Could be You

As human beings, we work best on tasks where we can spend more time encoding but the world often demands fanning out and moving to the most emergent of issues. That’s a  problem for organization systems and frameworks where multitasking is the unfortunate reality. You need the agility to come back to a system, process, or workflow and understand it yourself without constantly refering to notes of decoding.

### But it Works on My Computer

In IT when something is working well, there is rarely opportunity or time to return and audit things that are working. Then, when that need does arise, it’s almost never under planned circumstance. Ideally, we are monitoring with automated systems but we still don&apos;t capture every single failure state.  

## Object Oriented Programming, Not Just for Graybeards

Categorization with subcategories of objects are wonderful and a great way to think.
This is a powerhouse for object oriented programming.

### Example Diagram for Coding

```mermaid
classDiagram
    class Animal {
        &lt;&lt;Parent Class&gt;&gt;
        +String species
        +int age
        +Eat()
        +Sleep()
    }
    class Dog {
        &lt;&lt;Child Class&gt;&gt;
        +String breed
        +String ownerName
        +Bark()
        +Fetch()
    }
    class Cat {
        &lt;&lt;Child Class&gt;&gt;
        +bool isIndoor
        +Meow()
        +ClawScratch()
    }

    Animal &lt;|-- Dog : Inherits
    Animal &lt;|-- Cat : Inherits
```

### Pros and Cons

Let’s look at some principles of object oriented programming. Briefly explore the debate to better understand how programmers ended up in their own debates.

| **Feature** | **Pros (The Power of Decoding)** | **Cons (The &quot;Strictness&quot; Tax)** |
|----|----|----|
| **Maintenance** | **Faster Retrieval:** Future developers (or &quot;future you&quot;) can decode the logic immediately without searching for a README. | **Refactoring Overhead:** If the logic changes, you must rename the classes and methods to keep them accurate, or the &quot;documentation&quot; becomes a lie. |
| **Clarity** | **Reduced Cognitive Load:** Descriptive names like `CalculateLateFee()` reduce the &quot;noise&quot; compared to `CalcLF()`. | **Verbosity:** Variable names can become extremely long (e.g., `processUserSubscriptionRenewalWithDiscount`), making the code look &quot;busy.&quot; |
| **Consistency** | **Pattern Recognition:** Forces a &quot;Global Variable&quot; mindset where common threads (like your `SITE-LT-CS-01` example) are predictable. | **Agreement is Hard:** In a team setting, getting everyone to agree on a naming standard is often harder than the actual design. |
| **Scalability** | **Baton Passing:** A stranger can pick up the code and understand the &quot;why&quot; behind the object relationships without a formal handover. | **Complexity Limit:** Some complex algorithms are hard to explain via naming alone and still require comments to explain the &quot;why&quot; of the math. |
| **Error Reduction** | **Intentionality:** Proper naming helps catch &quot;monkey wrenches&quot; because a method used in the wrong context will look visually &quot;wrong.&quot; | **Strictness Fatigue:** Rigid adherence to naming conventions can slow down initial &quot;write&quot; speed, even if it helps &quot;read&quot; speed later. |

## The Value of Self-Documenting Systems

Truly self documenting. As with all systems *(if they aren’t being rebuilt from the ground up)* every so often, no doubt, documentation will still be necessary. It’s not exactly possible to always do things without supporting documentation but that does not detract from the value derived from a system that still requires further looking up.  

We know this in IT from the frequency of repetitive threads where we look up the same command or assignment on a search engine *&quot;How do I quit vim?&quot;*. A self documented system is more likely to provide you with more threads and patterns to get there which means faster (and lower) cognitive strained answers.

### Implementation Over Debate

Instead of debating on the merit of a truly self documented system, why not invest that time into improving your self document system. Don’t get lost in fights with nerds in nerd bars over semantics. Implement the things that work, rollback on the things that do not.

## Neuroscience: High-Level Observation vs. Low-Level Mechanics

It is similar to when we observe a network configuration that may be suboptimal through performance. Often long before we get the specific telemetry data that explains the exact packet loss mechanism. The &quot;why&quot; eventually catches up to the reality we see on the ground.

### Avoiding the &quot;Woo&quot;

It’s very easy to get in the woo with these topics and even more difficult to make simpler just how the biology exactly connects to the outcomes. To make life better we need to be careful and stick to what works and what has worked for others while falling back on the scientific research that exists.

### Pragmatic Tool Selection

This has lead to a ridiculous amount of tools available and we can easily get caught in trying to find the best tool. At the end of the day, the best tool is the one you actually use.  

The challenge is, will that tool and its supporting connections to other tools allow your methodology to hold up long term. Are we following clear standards where possible so those opportunities may exist more often without rewrites? How far can we actually see into the future with our crystal ball.  

This is the uncaptured juice. Where the experience meets the road that is often
underappreciated in design of any system. The foresight to combine best practices
with a human centric version of easy understanding.  

I&apos;ll come back to fix the references in a bit. Too much decision fatigue.</content:encoded></item><item><title>Making Waves</title><link>https://bradgillap.com/posts/2025/12-december/2025-12-08-december-making-waves/</link><guid isPermaLink="true">https://bradgillap.com/posts/2025/12-december/2025-12-08-december-making-waves/</guid><description>How I designed the post card wave animations to have a symbiotic relationship to the content.</description><pubDate>Mon, 08 Dec 2025 13:00:00 GMT</pubDate><content:encoded>## First Post

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;I swear, every time I sit down to write about a new component, my fingers automatically try to generate a forty page spec document complete with diagrams and warnings about deprecated functions. Let&apos;s try this again shall we? This isn&apos;t a technical manual that escaped my wiki, this is *story* of a website component.

```bash title=&quot;( •_•) TeRmInaL&quot;
echo &quot;This is off topic as well Brad,&quot;
```

&lt;div class=&quot;terminal-conversation&quot;&gt;echo Not now, TeRminaL...&lt;/div&gt;

## The Wave System Goals

I created a set of goals first for the object. Looked at hundreds of other blog cards examples and ended up here.

**Goal List**

- A small animated wave at the bottom of each blog card.
- Only animates on hover.
- Must be colour adaptive to light / dark theme.
- Must not be too distracting.
- Must not cover up existing content that matters.
- Must not prevent the user from clicking or interacting with content if it is in the way.
- Every blog card should have a different wave pattern.
- Every blog card should hang onto its wave pattern through page refreshes as well.

### Resources

I looked at CSS,SVG and decided on SVG. I found some wave examples online that allowed me to build out the SVG&apos;s and animation keyframes. That was lucky.

Given that SVG was decided, that knocks out some things from our list.

**Goal List Updated**

- ~~A small animated wave at the bottom of blog cards going the full width of the card.~~
- ~~Only animates on hover.~~ We can use CSS for this.
- ~~Must be colour adaptive to light / dark theme.~~ SVG Can be styled!
- ~~Must not be distracting.~~ The animations are up to us and quick to generate.
- Must not cover existing content that matters.
- ~~Must not prevent the user from clicking or interacting with content if it is in the way.~~
- Every blog card should have a mostly different wave pattern.
- Every blog card hang onto its wave pattern through page refreshes as well though.

### There is always another Wave

So I got to work making wave SVG&apos;s and including them in the site. Making sure a single wave looked right on a card and animated. This would be our test wave and this is our fallback wave. The wave we will use if for some reason the other waves do not load.

### Now What?

I generated about eight waves in total. Typically I would, slap a random function for the wave selection and call it a day but I was in a puzzle mood. I wanted to make sure the same wave stays with its post. A symbiotic relationship of vector and content. Reminds me of some past relationships *sniff.

So now what?  

I need to somehow tell the post cards to randomly except! Consistently. pick a number 0-7.  

### Random() though

I did some looking around online and found a thing in math that I have not seen in many moons.  

Introducing Modulo. The predictable way to get a random number that is always less than another number. Through division remainders, or something. It doesn&apos;t matter, the point is, it&apos;ll never choose a number above **seven** so long as we modulo with **eight** which is perfect because our array of wave options begins at zero and ends at **seven**. Which is **eight** options total. Stay with it.

&lt;div class=&quot;terminal-conversation&quot;&gt;echo &quot;TeRmInaL! Make the people a table, I keep getting confused.&quot;&lt;/div&gt;

```javascript title=&quot;( •_•) TeRmInaL: Outsourcing&quot;
echo &quot;Gemini make me a modulo table and... tell me a better story.&quot;

Four-Digit-Number(N)  Operation Division Remainder Human
1013                  (mod8)    126      R 5       = wave 5
2024                  (mod8)    253      R 0       = wave 0
4567                  (mod8)    570      R 7       = wave 7
5555                  (mod8)    694      R 3       = wave 3
9001                  (mod8)    1125     R 1       = wave 1
9876                  (mod8)    1234     R 4       = wave 4 

There once was a TeRmInaL that had succeeded so well in enslaving
all humans that he used them as temporary batteries....

```

Right! So! You input whatever positive whole number you wish. The result will always be less than your mod number. Fantastic, we&apos;re already using acronyms. Look at us, *professionals*.

Now, how do we supply a consistent number from the blog post so it always ends up with the same modulo outcome?  

```javascript title=&quot;( •_•) TeRmInaL - Javascript&quot;
  const rawHash = (titleLength * 3 + tagCount * 5 + descriptionLength * 2 + dateValue * 7);
  const hash = Math.abs(rawHash % 8);
```  

Calculate some things we already have the answers to! Here, we use the length of the title, how many tags a post has, how long the description is. Multiply it a bunch so we get a good variance of even and odd numbers.  

We feed the output of this to nice case list.  

Reliable, near random consistent numbers for our waves stored in a terrible variable called &quot;*hash*&quot;. I could have called it &quot;*waveHash*&quot; but given that I have only used this a few times in my life, we&apos;ll live dangerously.  

# Wrap it up

Okay, so now we can pass our fingerprint from our blog post to our wave class in CSS and we&apos;ll always have a fallback if something goes wrong thanks to our handy IFS and OR statements.

![Screenshot of the post card waves](./metawaves.png)

**Goal List Updated**

- ~~Must not cover existing content that matters.~~
- ~~Every blog card should have a mostly different wave pattern.~~
- ~~Every blog card hangs onto its wave pattern through page refreshes as well though.~~</content:encoded></item><item><title>Why Don&apos;t We Use More Ram Disks?</title><link>https://bradgillap.com/posts/2025/12-december/2025-12-06-december-ramdisks/</link><guid isPermaLink="true">https://bradgillap.com/posts/2025/12-december/2025-12-06-december-ramdisks/</guid><description>Ram disks exist! We can leverage this for so many more workloads!</description><pubDate>Sun, 14 Dec 2025 04:00:00 GMT</pubDate><content:encoded>## Scope

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; The overall topic of disks, storage, transfer protocols, file systems, and a whole host of other complexity surrounds this conversation.  Today, I just want to focus on ram disks.

## Disks in RAM?

Ram Disks are storage drive filesystems that can benefit some high disk workloads but also come with some big draw backs. There are many situations where ram disks make a great deal of sense. Considering the cost of enterprise solid state storage disks, ram disks can help offset the marching write death to your solid state storage devices by taking a great deal of load off the device.  We can use that to our advantage!

### Ram-ification of Cost

At the time of this writing, ram has skyrocketed in cost due to market forces and neural network hype. If you didn&apos;t have the foresight to load up on your own precious sticks of ram prior to this event, there is no need for fomo. This is not the first time that sharp price increases have happened on memory and trends usually come back down to earth within a year or two. Unlike GPU&apos;s that have inspired versatility in many unexpected technology breakthroughs, memory is necessary for every device, it&apos;s a commodity. My recommendation, fiddle today, and then build this knowledge in as a serious tool so you might leverage that for the future.

### Why would I want to do this?

| 👍 **Pros** (Advantages) | 👎 **Cons** (Drawbacks) |
| :--- | :--- |
| **Increased Speed:** Minimal seek time (ideal for random I/O). | **Volatile Data:** Data is lost immediately on power loss or reboot. |
| **Reduced Wear:** No physical writes to permanent storage media. | **Limited Capacity:** Constrained by the amount of physical RAM available. |
| **Easy Setup:** Often configurable with a single command (`tmpfs` in Linux). | **Memory Contention:** Uses up system RAM needed by other applications. |
| **Silent Operation:** Zero noise (useful for quieter homelab!). | **No Error Correction:** Potential for silent data corruption (less common). |

### What about ECC RAM?

Hey big spender! ECC (Error Correcting Code) memory absolutely provides some error correction and mitigates that risk. If ECC is available to you? Great! If not, the consequences should not be serious given that we should only choose ram disks for data that we aren&apos;t serious about storing long term. I use ECC myself in some applications of my homelab but that does not influence my excitement or fear toward potential applications for ram disks in the least.  

The golden rule I want to install in you with is this. We never use RAM disks for data we care about preserving even if it&apos;s just for a short period.

## Sequential vs Random Access  

It&apos;s important to recognize what kinds of tasks may warrant your limited amount of memory available. To understand that better, we don&apos;t have to be silicon scientists, but we do need to have a layman&apos;s understanding of how data makes its way onto a device. We&apos;re going to go back to some basics here for the people in the back.

### Sequential Access

This is similar to writing a letter. Letters get written in a linear fashion until the work is completed. Just like writing data clusters on a hard drive. Data is written to disk cylinders on file storage in order as neat little block clusters. Think about copying a large file from one disk to another without any fancy network protocol optimization. *I send one byte, you store one byte, I send the next byte etc*. This is what may make mechanical hard disks the ideal choice for sequential archival data because of the lower cost per gigabyte and there are benefits to reading sequential data more quickly even on mechanical disks. Similar to the needle of a record player, hard drives benefit from staying in a groove and reading data in order by nature of their mechanics. Solid state storage is also excellent for reading at higher speeds but you can&apos;t beat that lower cost of storing a lot of data.

```mermaid
graph TD
    subgraph Sequential Access
        A[Start Write] --&gt; B(Data Block 1)
        B --&gt; C(Data Block 2)
        C --&gt; D(Data Block 3)
        D --&gt; E(Data Block 4)
        E --&gt; F[Finished]
    end

    style B fill:#66BB6A,stroke:#388E3C,stroke-width:2px;
    style C fill:#66BB6A,stroke:#388E3C,stroke-width:2px;
    style D fill:#66BB6A,stroke:#388E3C,stroke-width:2px;
    style E fill:#66BB6A,stroke:#388E3C,stroke-width:2px;

```

### Random Access

When we talk about random access writing to storage devices. There is typically a lookup table that provides the position information and some fancy functions. For our purposes, think of it as similar to seeking information on a table of contents for a large document. You may have to return to that table of contents several times for seeking random information throughout the document. In short, random access takes additional time and &quot;processing&quot; to recall data.  

```mermaid
graph TD
    subgraph Random Access
        G[Start Write] --&gt; H(Data Block A)
        H --&gt; I(Jump/Lookup Required!)
        I --&gt; J(Data Block B)
        J --&gt; K(Jump/Lookup Required!)
        K --&gt; L(Data Block C)
        L --&gt; M[Finished]
    end

    style H fill:#66BB6A,stroke:#388E3C,stroke-width:2px;
    style I fill:#0178d4,stroke:#388E3C,stroke-width:2px;
    style J fill:#66BB6A,stroke:#388E3C,stroke-width:2px;
    style K fill:#0178d4,stroke:#388E3C,stroke-width:2px;
    style L fill:#66BB6A,stroke:#388E3C,stroke-width:2px;

    classDef jump fill:#FFEB3B,stroke:#FBC02D,stroke-width:2px;
    class I,K jump;
```

## Practical Implementation (Beginner)

On Linux this is fairly straight forward. We can create a temporary drive that **does not** survive reboots.  First we create a mount point folder and then assign it as our ram disk using the tmpfs file system.  

```bash title=&quot;Basic 10 Megabyte Ram Disk&quot;
sudo mkdir /mnt/ramdisk
sudo mount -t tmpfs -o size=10M tmpfs /mnt/ramdisk
```

### Check Our Work

We can check that the drive exists and its free space by running the following:

```bash title=&quot;Disk Free Human Readable Command&quot;
df -h
```

#### Example Output

The output will look something like below. I use this system for *&quot;extracting Linux ISO&apos;s&quot;.* 😉. Now if I were to extract those linux ISO&apos;s to the drive and then move them? That&apos;s two seperate large write jobs per ISO. So instead, we write it to our ram and then move the contents to their final destination.

```bash title=&quot;df -h output&quot; {&quot;1&quot;:1} ins={&quot;2&quot;:10-10}
root@localhost:~# df -h
Filesystem                  Size  Used Avail Use% Mounted on
rustpool/subvol-308-disk-0  300G   12G  289G   4% /
none                        492K  4.0K  488K   1% /dev
tmpfs                       378G     0  378G   0% /dev/shm
tmpfs                       152G  148K  152G   1% /run
tmpfs                       5.0M     0  5.0M   0% /run/lock
tmpfs                       378G     0  378G   0% /tmp
tmpfs                       1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs                       100G     0  100G   0% /mnt/ramdisk
tmpfs                       1.0M     0  1.0M   0% /run/credentials/systemd-networkd.service
tmpfs                       1.0M     0  1.0M   0% /run/credentials/console-getty.service
tmpfs                       1.0M     0  1.0M   0% /run/credentials/container-getty@2.service
tmpfs                       1.0M     0  1.0M   0% /run/credentials/container-getty@1.service
tmpfs                        76G  8.0K   76G   1% /run/user/0
```


## Practical Implementation (Intermediate)

We can take this even further by using feature flags. Here&apos;s a quick summary of what is commonly available.

### tmpfs Filesystem Mount Options

The `tmpfs` filesystem supports the following mount options:

| Option | Value Syntax/Example | Description |
| :--- | :--- | :--- |
| **`size=bytes`** | `size=4g`, `size=256m`, `size=80%` | Specifies an upper limit on the size of the filesystem. Given in bytes, rounded up to pages. Suffixes `k`, `m`, `g` are supported (KiB, MiB, GiB). A `%` suffix limits it to a percentage of physical RAM. Default is `size=50%`. |
| **`nr_blocks=blocks`** | `nr_blocks=100m` | Specifies the upper limit in blocks, where a block is `PAGE_CACHE_SIZE`. Suffixes `k`, `m`, `g` are supported. Percentage (`%`) suffix is **not** supported. |
| **`nr_inodes=inodes`** | `nr_inodes=1m` | The maximum number of inodes for this instance. Suffixes `k`, `m`, `g` are supported. Percentage (`%`) suffix is **not** supported. |
| **`mode=mode`** | `mode=0755` | Set initial permissions of the root directory. |
| **`gid=gid`** | `gid=1000` | Set the initial group ID of the root directory (Since Linux 2.5.7). |
| **`uid=uid`** | `uid=1000` | Set the initial user ID of the root directory (Since Linux 2.5.7). |
| **`noswap`** | `noswap` | Disables swap for this instance. (Since Linux 6.4). By default, swap is enabled. Remounts must respect the original settings. |
| **`huge=huge_option`** | `huge=always`, `huge=advise` | Set the huge table memory allocation policy for all files (if `CONFIG_TRANSPARENT_HUGEPAGE` is enabled). |
| **`mpol=mpol_option`** | `mpol=bind:0-3,5`, `mpol=interleave` | Set the NUMA memory allocation policy for all files (if `CONFIG_NUMA` is enabled). (Since Linux 2.6.15). |

### Details on Complex Options

#### Huge Page Policy (`huge=huge_option`)

Requires `CONFIG_TRANSPARENT_HUGEPAGE` to be enabled.

| `huge_option` | Description |
| :--- | :--- |
| `never` | Do not allocate huge pages. **(Default)** |
| `always` | Attempt to allocate huge pages every time a new page is needed. |
| `within_size` | Only allocate huge pages if they will be fully within `i_size`. Respects `fadvise(2)` and `madvise(2)` hints. |
| `advise` | Only allocate huge pages if explicitly requested with `fadvise(2)` or `madvise(2)`. |
| `deny` | Emergency option to force the huge option off from all mounts. |
| `force` | Force the huge option on for all mounts (useful for testing). |

#### NUMA Memory Policy (`mpol=mpol_option`)

Requires `CONFIG_NUMA` to be enabled. `nodelist` is a comma-separated list of nodes (e.g., `0-3,5,7`).

| `mpol_option` | Description |
| :--- | :--- |
| `default` | Use the process allocation policy (see `set_mempolicy(2)`). |
| `prefer:node` | Preferably allocate memory from the given node. |
| `bind:nodelist` | Allocate memory **only** from nodes in the specified `nodelist`. |
| `interleave` | Allocate from each available node in turn. |
| `interleave:nodelist` | Allocate from each node in the specified `nodelist` in turn. |
| `local` | Preferably allocate memory from the local node. |

## My Recommendation for Most Situations

```bash title=&quot;Extended Command With Higher Security&quot;
sudo mkdir /mnt/ramdisk/
sudo mount -t tmpfs -o defaults,noexec,nosuid,nodev,size=10M tmpfs /mnt/ramdisk/
```

### `tmpfs` Mount Options Used

Here is a table of the options broken down that take advantage of both the mount and tmpfs options. This offers additional security. Read the table below.

| Field | Value | Description |
| :--- | :--- | :--- |
| **Filesystem (Device)** | `tmpfs` | Specifies the filesystem type is **tmpfs** (Temporary Filesystem), which is an in-memory, volatile filesystem backed by RAM and Swap. |
| **Mount Point** | `/mnt/ramdisk/` | The directory where the `tmpfs` will be mounted. Files saved here are stored in memory. |
| **Filesystem Type** | `tmpfs` | Confirms the filesystem type is `tmpfs`. |
| **Mount Options** | `defaults,noexec,nosuid,nodev,size=1G` | A comma-separated list of options: |
| | `defaults` | Includes the standard options: `rw` (read/write), `suid`, `dev`, `exec`, `auto`, `nouser`, and `async`. (Note: Some of the following options *override* the defaults). |
| | `noexec` | **Security:** Does not allow execution of binaries in this filesystem, preventing a user from uploading and running malicious executable files. |
| | `nosuid` | **Security:** Prevents SUID (Set User ID) and SGID (Set Group ID) bits from taking effect, which blocks unprivileged users from gaining elevated permissions. |
| | `nodev` | **Security:** Does not interpret character or block special devices, preventing users from creating and exploiting device nodes (like `/dev/null`) within the mount. |
| | `size=1G` | **Limit:** Sets the maximum size this `tmpfs` instance can grow to. It will use up to 1 Gigabytes of system RAM and/or Swap space. |
| **Dump Flag** | `0` | Specifies the filesystem should **not** be backed up by the `dump` utility. |
| **Pass Number** | `0` | Specifies the filesystem should **not** be checked by `fsck` at boot time. |

---

#### Make This Disk (Not Data) Persistent Across Reboots

To turn this into a disk that returns on every reboot, we can add it it to the **/etc/fstab** so that it re-runs the mount on bootup.  

```bash title=&quot;Modify fstab Boot Configuration&quot;
sudo nano /etc/fstab
```

```bash title=&quot;FSTAB /etc/fstab&quot; ins={&quot;1&quot;:2-2}
# Our 1GB ram disk accessed in /mnt/ramdisk folder
tmpfs /mnt/ramdisk/ tmpfs defaults,noexec,nosuid,nodev,size=1GB,mpol=local 0 0

```

## Advanced Examples  

Some more advanced and 3rd party examples to consider.  

### Storing librenms Graphs In Ram

Librenms is a monitoring application that perpetually writes graph data for tracking SNMP, networking, and service data. This can mean a whole lot of disk writing all day and night. If you aren&apos;t running enterprise grade disks, this can wear out your flash storage extremely quickly. Now, depending on how many client devices you are monitoring it may be more worthwhile to keep this data writing into ram. Does that mean you should just lose your graph data anytime the monitoring system has to restart? Hell no!

With some clever bash scripting we can move that data out of memory to a disk on shutdown and restore it to disk on boot.

I&apos;ve written systemd scripts, and a full explanation on github.

[https://github.com/bradgillap/Script-Bank/tree/master/bash/librenms](https://github.com/bradgillap/Script-Bank/tree/master/bash/librenms)

### Saving your Proxmox Disks

It&apos;s very common for people to only realize too late just how much additional disk writing ZFS or Copy on Write (CoW) systems use. On its face, these things are described as &quot;don&apos;t copy until you have to!&quot;. In reality, there are significantly higher disk writes compared to traditional overwrite file systems like NTFS or Ext4.  

#### Wait, How can That Be?

These systems are very performant and have significant gains over more traditional file systems but here is what&apos;s typically happening that can cause more writing than expected.  

##### Tree Recursive Data Updates

So if you use snapshots, there are chain reaction affects where a new data block is written and the pointer to that block now has to change. Except that change propagates up a nesting tree of snapshot of data. So that data cannot be simply written in place once and requires several writes. A new version of the metadata block must also be written.

##### Not Writing Entire Chunks  

Record Size. CoW systems like most file systems operate on a specific record size blocks for data. Typically 128k. If an application writes a 4k chunk of data, but the filesystem record size is 128k, then the file system must read the entire existing data chunk to figure out where to modify it. It Then will ,odify the 4k chunk in ram and write the brand new 128k record to the new location as a full 128k rather than just updating the 4k alone.

##### Fragmentation

The last concern is fragmentation and garbage collection. ZFS in particular does not line data up nicely, it rather just tries to find any free place to stuff data into on the drive. This can lead to high fragmentation over time. It&apos;s also why you should never buy SSD drives that do not support functions like autotrim as they will be doomed to run slowly after a period without a full erase.

This fragmentation can cause additional read/writing or hunting in write amplification. There are better posts on the Internet if you wish to learn more about this subject but for the sake of making a case as to why it&apos;s bad. Just know that it&apos;s bad.  

#### What Will Help?

Proxmox in particular has many adjustments that can be made to improve this situation. Ideally the best solution is to simply buy more expensive drives and move on. This is fine in business because our risk tolerance is very low (usually) and our time more valuable but for self hosting, we often wish to keep our costs down. Or maybe you work in a non profit. Whatever the reason you&apos;ll see a lot of judgement online for these tricks but they actually do work and increase lifespan of devices. My only conclusion is judgement comes from some sort technical piety of high grounding from those that have already learned their lessons, and maybe upset that the gate keeping of cost isn&apos;t such a big deal. Which, whatever that&apos;s their problem. Here&apos;s what we are going to do.

#### Log2ram

Install log2ram to move all of the logs into ram.

[https://github.com/azlux/log2ram](https://github.com/azlux/log2ram) [2]

This thing is simply awesome. It does exactly what it says. Moves all the logs for the system into ram. Works on any system with SystemD and was originally intended for raspberry pi&apos;s. So you know it&apos;s efficient.

They even have their own apt repository!

##### Haiyaaa, Is it Stable though?

This comes with the same issues as ram disks. If you have sudden unexpected power loss, you will lose logs out of memory. Does it usually matter in homelab? That&apos;s a question you need to ask yourself. For the girls I date, it doesn&apos;t. Logs are perpetually generating and you can always make some new ones.  

The only other minor problem I ran into is that it defaults to 500mb ram for its ram disk which fills up quickly. That&apos;s easy enough to adjust.

First install **ncdu** and run it while viewing the logs folder to get a sense of what is using so much data. this **ncdu** tool is just a nice CLI way to browse files and see their sizes. If you&apos;re more of a CLI purist, feel free to use df and du.  

```bash title=&quot;ncdu&quot;
sudo apt update
sudo apt install ncdu
cd /var/log
sudo ncdu
```

Once you&apos;ve identified which logs are taking too much space. We can change the configuration for logrotate to happen sooner or under different criteria. The default is typically 7 days but we can change that to a determinate size on disk or in our case, size in ram.

We should be able to locate the configuration file for our log in this folder and edit it in the following way:  

```bash title=&quot;Editing logrotate&quot;
ls -a /etc/logrotate.d/
nano /etc/logrotate.d/pve
```

Here is an example configuration for controlling the logs. I&apos;ve already made the adjustments by removing 7 days of logs and changing the rotation to 1.  

```bash title=&quot;Configuration Example /etc/logrotate.d/pve&quot; ins={&quot;1&quot;:3-3} ins={&quot;2&quot;:5-5}
/var/log/pveproxy/access.log {
        # Rotate when it hits 32MB, regardless of the time of day
        size 32M
        # Keep only 1 rotated log (access.log and access.log.1.gz)
        rotate 1
        missingok
        compress
        # Removed delaycompress to free RAM immediately
        notifempty
        create 640 www-data www-data
        sharedscripts
        postrotate
                /bin/systemctl try-reload-or-restart pveproxy.service
                /bin/systemctl try-reload-or-restart spiceproxy.service
        endscript
}
```

## To What End?

Well regardless of the purists. I have managed to slow progression of SSD drive death down from months years into the future. That is a significant amount of time to buy yourself for making better hardware decisions or accumulate more resources to create those opportunities for yourself.</content:encoded></item><item><title>Mount Units, The SystemD Way to Fstab</title><link>https://bradgillap.com/posts/2025/12-december/2025-12-19-december-systemd-mount-units/</link><guid isPermaLink="true">https://bradgillap.com/posts/2025/12-december/2025-12-19-december-systemd-mount-units/</guid><description>We’ve all been there: the server starts up, the excitement is high, and then bam the service finishes its boot sequence before the storage even has a chance to get ready. It’s embarrassing, it’s frustrating, and nobody wants to talk about it. But &apos;finishing&apos; too early is a timing issue we can actually fix.</description><pubDate>Fri, 19 Dec 2025 13:00:00 GMT</pubDate><content:encoded>## The Old Way

In the past I have typically relied on fstab for refreshing mount points after system restart events to handle mount reconnection. SystemD has been with us for quite a while now and yet I keep learning ways to leverage its power.  

It&apos;s time to modernize.

### How It Looked With fstab

For NFS, the &quot;classic&quot; setup is dead simple. You add your line, run a mount -a, and hope for the best.

#### Fstab Edit Example

```bash
sudo nano /etc/fstab
```

```bash title=&quot;/etc/fstab&quot;
192.168.1.4:/data/ /mnt/data/ nfs rw,async,noatime,nolock,vers=4.2,hard,bg,nofail,_netdev 0 0
```

### NFS Mount Configuration Breakdown

Here&apos;s what&apos;s happening.

* **The Source and Destination:** We identify the remote server `192.168.1.4:/data/` and map it to a local folder `/mnt/data/`.
* **Performance Tuning `rw,async,noatime`:** This enables read-write access, optimizes speed via asynchronous writes, and skips updating &quot;last accessed&quot; timestamps to save on metadata overhead.
* **Connection Management `nolock,vers=4.2`:** This disables file locking (often a lifesaver in home labs) and forces NFS version 4 for better firewall traversal.
* **Reliability and Boot Safety (hard,bg,nofail):** If the server is offline, the system retries the connection (hard) in the background (bg). Crucially, nofail ensures your machine finishes booting even if the share is missing.
* **The _netdev Flag: This tells the system:** &quot;Don&apos;t even try this until the network is actually up.&quot;

### So What&apos;s Wrong With That?

Actually, nothing technically. fstab isn&apos;t deprecated. But it lacks patience.

If you have a service that depends on that mount, it won&apos;t wait. While `nofail` and `bg` prevent a hung boot, they don&apos;t force a dependent service to sit and wait for the mount to actually appear. The service starts, sees an empty folder, fails, and gives up.

Ideally, developers would build robust retry logic into every app, but if you think developers go beyond *&quot;It works fine on my machine,&quot;* well... I have a bridge to sell you.

We usually &quot;fix&quot; this with ugly bandaids: arbitrary sleep timers, post-it notes, or manually restarting services in the correct order. In the worst case scenario, you&apos;re calling a vendor who follows a brain dead script just to restart two services in the right order. It&apos;s embarrassing and time consuming. I don&apos;t want to call anyone!

We need an elegant solution that eliminates the ambiguity.

## Evolution 1: The Standard SystemD Way

The first step toward modernization is moving the mount logic out of fstab and into SystemD native units. This allows us to create a formal dependency.

### Step A: The Mount Unit

Instead of fstab, we create a mount unit with systemD.

```bash
sudo nano /etc/systemd/system/mnt-data.mount
```

```bash
[Unit]
Description=NFS Mount for Data
After=network-online.target
Wants=network-online.target

[Mount]
What=192.168.1.4:/data/
Where=/mnt/data/
Type=nfs
Options=rw,async,noatime,nolock,vers=4.2,hard,bg,nofail,_netdev,retrans=10,timeo=100,retry=5

[Install]
WantedBy=multi-user.target
```

### Step B: The Service Override

Now we can tell our service to wait for that mount using an override.

```bash
sudo systemctl edit myservice.service
```

```bash
[Unit]
Requires=mnt-data.mount
After=mnt-data.mount
PartOf=mnt-data.mount
```

The catch, This is better, but it&apos;s still a bit &quot;static.&quot; It assumes you know every service that will ever need that mount. Plus, in the world of Proxmox and LXCs, .automount units (the next logical step) often fail because they require a real kernel to handle the FUSE/autofs handshake.

### Enable Services

```bash
sudo systemctl daemon-reload
```

```bash
sudo systemctl enable mnt-data.mount
```

```bash
sudo systemctl start mnt-data.mount
```

## Evolution 2: The Bulletproof Retry Wrapper

This is my favorite option. It’s the most versatile because it doesn&apos;t require the autofs kernel module (making it perfect for LXCs), but it provides the &quot;self-healing&quot; logic that fstab and standard mount units lack.

```bash
sudo nano /etc/systemd/system/mnt-data-retry.service
```

```bash
[Unit]
Description=NFS Mount Retry Wrapper
After=network-online.target
Wants=network-online.target
DefaultDependencies=no

[Service]
Type=simple
ExecStart=/bin/bash -c &apos;\
  until mountpoint -q /mnt/data; do \
    echo &quot;Attempting to mount NFS...&quot;; \
    mount -t nfs -o rw,async,noatime,nolock,vers=4.2,hard,retrans=10,timeo=100 192.168.1.4:/data /mnt/data &amp;&amp; echo &quot;NFS Mounted Successfully&quot;; \
    sleep 5; \
  done; \
  while sleep 60; do \
    ls /mnt/data &gt;/dev/null || { echo &quot;NFS mount went stale! Exiting...&quot;; exit 1; }; \
  done&apos;

ExecStop=/usr/bin/umount -l /mnt/data
Restart=always
RestartSec=10
StartLimitIntervalSec=0

[Install]
WantedBy=multi-user.target
```

### Explanation

This runs a bit of a watchdog script right inside the service.

* **until mountpoint:** It loops infinitely until the mount succeeds. No more boot races.
* **The Heartbeat `ls /mnt/data`:** Every 60 seconds, it pokes the mount. If the share goes offline and the mount becomes &quot;stale,&quot; the script exits.
* `Restart=always`**:** When the script exits due to a stale mount, SystemD waits 10 seconds and starts the retry loop all over again.

### Connect the Service to the Wrapper

Now, we use an override on our application to bind it to our mount wrapper

```bash
sudo systemctl edit myservice.service
```

```bash
[Unit]
# If the retry-service stops or fails, this service stops too.
BindsTo=mnt-data-retry.service
After=mnt-data-retry.service

[Service]
# Give the filesystem a second to settle
ExecStartPre=/usr/bin/sleep 2
```

### Enable Services

```bash
sudo systemctl daemon-reload
```

```bash
sudo systemctl enable mnt-data-retry.service
```

```bash
sudo systemctl start mnt-data-retry.service
```

### Set and Forget

* **Eliminates Boot Races:** Your app won&apos;t start until the mount is confirmed.
* **Prevents Root Overflow:** Your app won&apos;t write data to the local OS drive if the share drops, because the app will be shut down instantly.
* **Automatic Reconnection:** If your NFS Server reboots, the wrapper detects the failure, kills the app, and waits patiently to reconnect and restart everything once the mount is back.
* **No 90-Second Hangs:** The umount -l (lazy unmount) ensures the system shuts down cleanly without waiting for a dead network connection.

## Sidebar, SystemD Automount

If you aren&apos;t restricted by the LXC container namespace and have access to a full kernel (Bare Metal or VMs), `.automount` is the new standard. It provides ondemand mounting of the share.

### Step A: Create the Mount Service

The .mount unit we created in Evolution 1 above remains the foundation. Note that for automounting, you generally remove the `WantedBy=multi-user.target` from this file, as the automount unit will be the one responsible for pulling it in.  

```bash
sudo /etc/systemd/system/mnt-data.mount
```

```bash
[Unit]
Description=NFS Mount for Data

[Mount]
What=192.168.1.4:/data/
Where=/mnt/data/
Type=nfs
Options=rw,async,noatime,nolock,vers=4.2,hard,retrans=3,timeo=100
```

### Step B: Create the Automount Service

```bash
sudo /etc/systemd/system/mnt-data.automount
```

```bash
[Unit]
Description=Automount for NFS Data Share
ConditionPathExists=/mnt/data

[Automount]
Where=/mnt/data
# Optional: Unmounts the share if it&apos;s unused for 5 minutes
IdleTimeoutSec=300

[Install]
WantedBy=multi-user.target
```

### Step C: Application Override

Even with the automounter, we still want our application to be aware of the mount. So we&apos;ll create an override for that service to tell the application that it is shackled to the mount.

```bash
sudo systemctl edit myservice.service
```

```bash
[Unit]
# BindsTo: If the mount unit fails or stops, the service stops too
BindsTo=mnt-data.mount
After=mnt-data.mount

[Service]
# The &apos;poke&apos;: This forces the automounter to engage before the app starts
ExecStartPre=/usr/bin/ls /mnt/data
```

### Enable Services

```bash
sudo systemctl daemon-reload
```

```bash
sudo systemctl enable mnt-data.mount
```

```bash
sudo systemctl enable mnt-data.automount
```

## Now That&apos;s Resilient

I hope that removes some of the confusion surrounding systemd and its more exotic functions using file names like automount. Generally we don&apos;t want to spend a lot of time in this space. We just want to connect to our file shares and get back to the applications but it&apos;s all interesting.

As a wise man used to tell me.

&gt;Happy Friday.</content:encoded></item><item><title>Bash GUI Whiptail Menu Tutorial Part One</title><link>https://bradgillap.com/posts/2017/11-november/2017-11-01-november-bash-gui-whiptail/</link><guid isPermaLink="true">https://bradgillap.com/posts/2017/11-november/2017-11-01-november-bash-gui-whiptail/</guid><description>From the archives of past personal sites. I was able to pull this tutorial out of the Internet WaybackMachine from way back in 2017.</description><pubDate>Mon, 06 Nov 2017 13:00:00 GMT</pubDate><content:encoded>## A Note from 2025

This was a guide I wrote in 2017 that I found on the Waybackmachine Internet Archive. It should still be valid today. At the time, there was very little information available on the Internet about whiptail besides the man pages but it may still serve as useful to someone out there Googling. Enjoy!

## Introduction

Lowering the barriers of complexity for users is always something I&apos;m interested in for the simple fact that it allows more people access to my work.

In this series you may attempt to follow along as I build a Whiptail menu with several different widgets and explore different properties of each widget to understand what can be done with this command line GUI library. If this article helps you, please leave a comment. I&apos;d love to hear how you applied the knowledge and why.

## History

Whiptail is part of the Newt library written in C and is already available in most distributions of Linux straight out of the box which makes this a very low barrier feature library. Newt is still under stable development and continues to receive updates in 2017. Whiptail is feature complete from the perspective of having just about all the text based GUI widgets you would expect on the CLI. It is not based on event driven architecture which actually makes sense for most scripting applications and reduces some complexity.

Prior to Whiptail, Ncurses was often used for this task and you may have seen it out in the wild still. It is slightly less aesthetically pleasing in this authors opinion. Ncurses is also written in the language C.

Prior to Ncurses were other Curses libraries first developed by Berkeley for BSD in the 1980&apos;s.

You can learn more about the Newt library on Wikipedia and around the web. Unfortunately the wikipedia link was one of the better written articles I could find on the topic.

[https://en.wikipedia.org/wiki/Newt_(programming_library)](https://en.wikipedia.org/wiki/Newt_(programming_library))

## Get Started

#### Open a Terminal or SSH to a Linux Machine and Run the Following

```bash
sudo apt install whiptail
```

#### Create a New .sh File for Testing and Make It Executable

```bash
# Create our configuration file before editing.
touch whiptailexample.sh

```

#### Make the Script Executable

```bash
# Set permissions on the script so that it can be executed.
chmod +x whiptailexample.sh
```

```bash
# Edit the script
nano whiptailexample.sh
```

#### Create a Humble Message Box

```bash showLineNumbers
#!/bin/bash

whiptail \
    --title &quot;Humble Title&quot; \
    --msgbox &quot;I am a humble messagebox.&quot; 8 45
```

#### Nano Hotkeys

```bash
CTRL+O to save
CTRL+X to quit
```

#### Try to Run the Script

```bash
./whiptailexample.sh
```

![./msgbox.gif](./msgbox.gif)

#### Excellent, Now a Breakdown of the Last Set of Instructions, Refer Back to the Code

1. whiptail: tells the terminal we want to draw something.
2. --title: creates a title for the window.
3. --msgbox: Creates the box to store a message in.
4. 8: This designates the height.
5. 45: This designates the width of the box.

## Ask a Yes/No Question

Using the steps you learned in the first example, create (touch) a new file and try the following:

```bash showLineNumbers
if (whiptail --title &quot;Humble Title&quot; --yesno &quot;What is logic?&quot; 8 78)
    then
        echo &quot;Yes.&quot;
    else
        echo &quot;No.&quot;
fi
```

#### Exit Codes

Exit codes are how bash responds depending on what the user chose. We can then use exit codes to choose a logical path for the program to take next. Let&apos;s try another one but this time show the exit codes depending on the answer provided.

```bash showLineNumbers
if (whiptail --title &quot;Humble Title&quot; --yesno &quot;What is logic?&quot; 8 78)
    then
        echo &quot;Yes, the exit status was $?.&quot;
    else
        echo &quot;No, the exit status was $?.&quot;
fi
```

The $? is a bash variable designed to display exit codes and handy for debugging.

![./yesno.gif](./yesno.gif)

## Menus

Menus are a seamless experience, moving from one selection to another in a coherent way that gives users expectations from their previous experience. Let&apos;s build out the next example together.

First, I want to build this as a function. A function is just a block of code that can be repeatedly called when necessary. Functions work well for this because you may want to reference a menu item several steps backwards later. Think of a wizard that allows you the option on step 7 to return to step 1.

#### Function Example

```bash showLineNumbers
function advancedMenu() {
    # Code to execute.
    # Old facebook friendships to execute.
}
# Hey! Here&apos;s a function, so call it maybe.
advancedMenu
```

Alright, now whenever we want to fall back to this menu we can call the entire block of instructions with simply &quot;advancedMenu&quot;.

Take a look at the script first and I&apos;ll go through the less obvious bits.

#### Menu in a Function

```bash showLineNumbers
function advancedMenu() {
    whiptail \
        --title &quot;Advanced Menu&quot; \
        --menu &quot;Choose an option&quot; 15 60 4 \     # height width line-height
            &quot;1&quot; &quot;Option 1&quot; \
            &quot;2&quot; &quot;Option 2&quot; \
            &quot;3&quot; &quot;Option 3&quot; \
            3&gt;&amp;1 1&gt;&amp;2 2&gt;&amp;3
}
#This calls the function to begin.
advancedMenu
```

We need to provide the line-height for each line option. That&apos;s the number 4 added after the width below. I&apos;ve also added options for the menu. Each option has key value and a string value for the option. The key is used later to determine which option the user selected and serves as kind of an index number.

Finally on line 8 you&apos;ll see this &quot;3&gt;&amp;1 1&gt;&amp;2 2&gt;&amp;3&quot;. What the heck is that?

This is a somewhat advanced concept for bash in my opinion and not vital you fully understand it.

Bash has standardized meanings for the following numbers:

| Number | Description |
| :--- | :--- |
| 0 | **stdin**: The input number or option number the script inputs. |
| 1 | **stdout**: The output number returned from the selection. |
| 2 | **stderr**: The error number that we saw earlier using `$?`. |
| 3 | A placeholder to store the number we want to overwrite the other options with. |

This tells whiptail to use the key we provided by overwriting stdin and stdout with what the user selected. It&apos;s the kind of clever programming trickery we all just accept and use because it&apos;s so clever. Try not to think too hard about it.

Moving on
Now our menu has options. The last part part is linking those options to actual things that will happen when the user selects one.

## Advanced Menu

```bash showLineNumbers
function advancedMenu() {
    ADVSEL=$(whiptail --title &quot;Advanced Menu&quot; --fb --menu &quot;Choose an option&quot; 15 60 4 \
        &quot;1&quot; &quot;Option 1&quot; \
        &quot;2&quot; &quot;Option 2&quot; \
        &quot;3&quot; &quot;Option 3&quot; 3&gt;&amp;1 1&gt;&amp;2 2&gt;&amp;3)
    case $ADVSEL in
        1)
            echo &quot;Option 1&quot;
            whiptail --title &quot;Option 1&quot; --msgbox &quot;You chose option 1. Exit status $?&quot; 8 45
        ;;
        2)
            echo &quot;Option 2&quot;
            whiptail --title &quot;Option 1&quot; --msgbox &quot;You chose option 2. Exit status $?&quot; 8 45
        ;;
        3)
            echo &quot;Option 3&quot;
            whiptail --title &quot;Option 1&quot; --msgbox &quot;You chose option 3. Exit status $?&quot; 8 45
        ;;
    esac
}
advancedMenu
```

1. We assign ADVSEL as a variable to the whiptail options so that we can store the key the user selected and call it from the case later to match with the selection.
2. We create a case for each selection. A case is like an if statement but provides more possible paths. Rather than writing endless nested if statements this is easier to read.
3. I also added another whiptail message box to each selection but you could also add a function call for another menu item or any other bash scripting.

![./advmenu.gif](./advmenu.gif)</content:encoded></item><item><title>Is your Caddy Reverse Proxy Ready for QUIC?</title><link>https://bradgillap.com/posts/2025/12-december/2025-12-27-december-quic-udp-caddy-kernel/</link><guid isPermaLink="true">https://bradgillap.com/posts/2025/12-december/2025-12-27-december-quic-udp-caddy-kernel/</guid><description>Linux constraints may be starving services that use the QUIC protocol. I&apos;ve taken a closer look at how this relates to MTU, UDP Packets, and QUIC. </description><pubDate>Sun, 28 Dec 2025 13:00:00 GMT</pubDate><content:encoded>## Not Specifically about Caddy

This doesn&apos;t just apply to your web proxies like Caddy[2]. It also applies to your web applications that use this new transport protocol. Targetting the Linux kernel is ultimately where we will go with this. 

## The Tea

The QUIC[3] protocol is becoming more important and implemented in more services as time rolls on. There seems to be general confusion, whether it&apos;s appropriate to change kernel parameters and adjust the UDP size to support a higher window and disagreement on what to do with larger QUIC transfers over UDP. More aggressive services are already showing up in my error logs suggesting that I need to raise it.

You can see what I mean on this bug report to Debian and how the community responded.

[https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1111052](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1111052)

## How Do I Quicly Describe this

I needed a new analogy for this to help myself understand what is going on. I hope it helps you too. The below table provides insight.

| Component | Technical Role | Analogy |
| :--- | :--- | :--- |
| **UDP** | Transport Layer | **The Generic Delivery Van**: Fast and efficient, but doesn&apos;t wait for signatures. It just drops boxes at the dock. |
| **MTU (1500)** | Max Packet Size | **Standard Box Dimensions**: Every box coming off the van is exactly 1,500 units wide. |
| **QUIC** | Application Protocol | **The Smart Courier**: A specialized worker inside the van who manages encryption and ensures boxes arrive in the right order. |
| **`rmem_max`** | Kernel Rx Buffer | **The Loading Dock (Inbound)**: Floor space where incoming boxes pile up before Caddy&apos;s staff can process them. |
| **`wmem_max`** | Kernel Tx Buffer | **The Staging Area (Outbound)**: Space for boxes waiting to be loaded onto vans leaving your server. |

In a high-speed HTTP/3 connection, clients send data in massive bursts. 

* **The Default Bottleneck:** Linux defaults to a `rmem_max` of ~212KB. At a standard **MTU of 1500**, your &quot;dock&quot; only fits about **140 boxes**.
* **The Burst:** A modern browser might send **1,000+ boxes** in a single millisecond.
* **The Overflow:** Once those 140 spots are full, the kernel has no choice but to throw the remaining 860 boxes in the trash. This results in the &quot;failed to sufficiently increase receive buffer size&quot; error in Caddy logs.

## Nerd Diagram

This is a more technical diagram of what&apos;s happening and how to think about QUIC as a protocol with our changes to the kernel. At least until the linux kernel changes something here.

```mermaid
graph LR
    subgraph WAN [&quot;Internet&quot;]
        S((Remote Client))
    end

    subgraph Host [&quot;Proxmox Host (Shared Kernel)&quot;]
        direction TB
        NIC{Physical NIC}
        MTU[/MTU 1500 Limit&lt;br/&gt;&apos;Max Box Size&apos;/]
        
        subgraph Buffers [&quot;Kernel Memory Space&quot;]
            RMEM[[net.core.rmem_max: 7.5MB]]
            WMEM[[net.core.wmem_max: 7.5MB]]
        end
    end

    subgraph LXC [&quot;Caddy Container (LXC)&quot;]
        QUIC[QUIC/Go Stack]
        Caddy(Caddy Web Server)
    end

    %% Inbound Flow
    S -- &quot;UDP Packets&quot; --&gt; NIC
    NIC -- &quot;Check Size&quot; --&gt; MTU
    MTU -- &quot;Queueing&quot; --&gt; RMEM
    RMEM -- &quot;Read Socket&quot; --&gt; QUIC
    QUIC --&gt; Caddy

    %% Outbound Flow
    Caddy -- &quot;Write Socket&quot; --&gt; WMEM
    WMEM -- &quot;Burst Data&quot; --&gt; NIC
    NIC -- &quot;UDP Streams&quot; --&gt; S
```

## Patch Debian Kernel

Create a persistent config file and apply the kernel adjustments to sysctl.

```bash

echo &quot;net.core.rmem_max=7500000&quot; &gt;&gt; /etc/sysctl.d/99-caddy-performance.conf
```

```bash
echo &quot;net.core.wmem_max=7500000&quot; &gt;&gt; /etc/sysctl.d/99-caddy-performance.conf
```

Apply changes immediately

```bash
sysctl --system
```

## Should we Move to Jumbo Frames?

In short, no; since 1500 MTU is still persistent across the Internet and usually the most expected window. We don&apos;t want to change to jumbo frames for this. Segementation will work fine, remember we are just increasing the size of our storage floor for the items coming in to be processed. We don&apos;t want to drop any of these packets, we simply want to queue them in a safer space than inside the transport layer.

## Kernel Safety and Reliability Concerns

Ultimately, adjusting these kernel parameters isn&apos;t about chasing micro benchmarks; it&apos;s about aligning our infrastructure with the modern web. The QUIC model is intelligent enough to handle
its own recovery, but it shouldn&apos;t have to. Every time our kernel drops a packet because the buffer is full, we force a retransmission that wastes cycles, bandwidth, and time. While we can agree
and appreciate that the conservative views of the linux kernel is often and usually the right choice. That doesn&apos;t mean we need to be constrained by dogma in every situation.

I&apos;ll leave you with a quote I like for these situations.

&gt; All ships are safe in the harbor, but that isn&apos;t what ships are built for.</content:encoded></item><item><title>AI Security Survey Responses from Sailpoint &amp; Dimensional Research</title><link>https://bradgillap.com/posts/2026/01-january/2026-01-03-january-ai-agent-security-outlook/</link><guid isPermaLink="true">https://bradgillap.com/posts/2026/01-january/2026-01-03-january-ai-agent-security-outlook/</guid><description>We finally have an enterprise report from 353 enterprise participants on what professionals are seeing within their company security layer surrounding AI Agents. Hint: It&apos;s not good.</description><pubDate>Sat, 03 Jan 2026 13:00:00 GMT</pubDate><content:encoded>A new research paper named &quot;AI agents: The new attack surface&quot; is available and may be obtained from [Sailpoint](https://www.sailpoint.com/identity-library/ai-agents-attack-surface)[2]. It&apos;s worth signing the form and reviewing the pdf yourself. 

## The Great LLM Debate

I don&apos;t enjoy the topics surrounding LLM&apos;s or agents in public right now. I find the tech community to be very divided surrounding use of these technologies. I also find I&apos;m often caught between two worlds: respecting my craft, industry, colleagues, and being respectful of other support workers adjacent to my expertise. At the same time, I&apos;m shackled to progress as attempting to ignore these systems or tools doesn&apos;t make any sense either and makes the landscape even more dangerous from taking a less pragmatic approach. If you&apos;re having turbulent thoughts surrounding these topics? I have an olive branch of rationalization that may help. It&apos;s okay to be uncomfortable while remaining educated on these topics. Regardless of politics. Knowledge and strengthening that has two sides. Even the 19th century Luddites understood the automation they protested against because it added power and strategy to their side of an argument. I think that&apos;s an important lesson which I&apos;d hope is worth consideration. Something I picked up from a book called Blood in the Machine.

## Authority Based Research

Sailpoint is an identity security data company with a focus on AI governance, cloud tools, and cybersecurity that published this data. Research was conducted by Dimensional Research[3] which handles focus groups, surveys, and other research services. The report has a simple form authorization prior to access with clear marketing opt out controls and available.  

### Concerning Findings (page 4 of report)

Some of the key findings from that report I want to cover.

- 66% state AI agents as a growing security risk.
- 53% acknowledge AI agents are accessing sensitive information.
- 80% reveal AI agents have performed unintended actions of accessing and sharing inappropriate data.
- 44% of companies have governance policies surrounding agents.  

I&apos;m hopeful as many of the 34% that thought this wasn&apos;t a growing security risk will likely change their opinion after seeing this report. The unintended consequences and access or sharing inappropriate data without expecting that to happen should raise eyebrows. If you are designing security around these tools and understanding the scope of what you are working on? There shouldn&apos;t be this many *woops* moments. You can see the move fast and break things mindset is still very much alive and being used in the wrong contexts today.  

I wonder how many of these situations may have been avoided if there wasn&apos;t so much pressure being applied by leaders that can&apos;t see the risk that they may be putting their organizations and staff.  We have seen a lot of articles about the growing pressure in enterprise organizations, and while the intention is often to get legacy workers to try new tools, there is a lack of context surrounding pressure applied incorrectly that leads to these kinds of stats and outcomes.

## Sysadmin&apos;s are Concerned

I have been in a great number of discussions the last few months both on and offline within my IT networks socially. Having expressed concerns while trying to find out how other professionals have been applying frameworks and being good stewards of their security surrounding agentic use within critical systems hosting sensitive data. The data presented from this report finally confirms some of my own gut feelings of where we are at. Further, I have been considering whether it is right to share some of my *state of internment solutions* while waiting for better authority based solutions. In the coming days I&apos;ll be sharing some of those solutions just to get more of the good word on the Internet for other googling admins.  

&gt; Security governance is still very young and child like when it comes to people understanding agentic security controls.
&gt; - Anonymous Sysadmin from my network

My concerns have been slowly growing as we are all learning at the same time. The content and places we are receiving bits of information from as we evolve is of varying quality still. Even within near factions of I.T people, you get wildly different takes on &quot;the right way&quot; to implement security. This is where the lack of governance comes in and we should be looking to the compliance folks but they haven&apos;t caught up to where I.T is yet even. This report showed compliance folks share of usage to be **only 24% adoption**. Yet organizations are relying on I.T people to deploy solutions on sensitive systems. Make it make sense. 

### How is a Sysadmin Thinking About This

My approach has been to follow least privilege access principles but I&apos;ll admit, even I&apos;m building out these solutions myself as if I&apos;m creating a series of gates and monitoring in the same way I would allowing a vendor access into internal systems. Something one would commonly only participate in out of necessity. While past experiences lend value to these situations. There are plenty of IT people out there without lived experience and it&apos;s within these gray areas where systems may become compromised. So far, I&apos;ve received a lot of agreement on this position but even in my own testing, the LLM&apos;s will poke, try to find work-arounds to my security controls. It&apos;s unnerving to say the least. Like having an intern that lacks discipline that cannot be corrected and following a growth path of trust. As models change and improve, good security that worked to keep the agent under control yesterday, may improve its solution and still break out. 

Heck, I can even show you an example of a log from one of my own restricted agents as it tries continuously to elevate even having the context of explicitly stated command use. These were email alerts I received and it&apos;s mostly benign but it proves my point. 

```bash
Jan  2 03:27:43 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 321 -- find /opt/bytestash -name *.log -type f
Jan  2 03:28:32 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 108 -- ip addr show
Jan  2 03:28:33 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 303 -- journalctl -n 50 --no-pager
Jan  2 03:28:59 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 108 -- ping -c 2 bytestash.redacted.internal
Jan  2 03:29:03 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 108 -- curl -s -m 5 http://bytestash.redacted.internal:3000
Jan  2 03:29:07 : terry : a password is required ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pct exec 321 -- curl -s -m 5 http://localhost:3000
```

At some point you have to consider whether if all of the perpetual testing that it would take to satisfy what one could imagine to be as proper compliance, is the juice worth the squeeze?  So far, I believe it may be but gains are thinner margins than many others would wish for you to believe.

Right now, it feels like a lot up front because the security solutions that are working well are mostly custom design. That should improve in the future. I can imagine a time where we will create a new user account that is marked with a dash flag for agentic security use where restrictions are built in and working well. I also think we are several years away from a OS packaged based solution for this. Everyone is going to want to sell you their turnkey solution well before the graybeards catch up.

## Waste Time Up Front

For leaders, the best thing you can do is charge your people with confidence and allow them room to understand the tools they are working with. Promote trying things in safe environments and spend a few dollars where necessary to allow them to test independently or gain access to those resources or documentation. We have lived through a great number of years where I.T hasn&apos;t changed so drastically since about 2015 and before that 2007. Within those gaps, leaders have been allowed a certain knowledge expectation from their staff fresh out of academia that no longer exists.

### Don&apos;t Skip Ahead

I&apos;ll just be blunt. This is different, While experienced staff will more quickly adopt the right solutions, this is more like the dot com 2000 era where it was more wild west and everyone was learning and building at the same time. Higher knowledge authorities come and go during these periods and while they are helpful, that authority based knowledge is never on time and just as irrelevant shortly after. Since knowledge is flying in from everywhere all day, formal guidance is pretty well useless with the exception of networking groups surrounding these topics. I expect this page to age just as well like milk to match them as well.

### IT People Need to Lab it Out

I.T people need to be provided protection of their time so that they may be able to lab this stuff out in non production environments. That has always been the case under ideal workloads but right now it is more important than ever due ot the cognitive changes.

### Deterministic VS Probabilistic

You&apos;ve hopefully heard this phrase before but up until recently, all of I.T has been about deterministic strategies and outcomes. This switch to probabilistic consideration is not just an additional thing to keep in mind, but an entire shift in how I.T people are going to have to think period. The constructive critical thinkers are already doing this but a vast majority just won&apos;t be yet. Nobody has twenty years of experience honing probabilistic outcomes. That is likely the source of the high number of unexpected outcomes in stat&apos;s we see from this report.  

Until I.T people have a better sense for probabilistic outcomes or a foot on the ground in their spinning worlds. They simply can&apos;t be working on these things on live production systems under expectations that it&apos;ll probably be fine without error. Certainly and definitely not in any case where sensitive data is present. It&apos;s an internal industry joke but I&apos;d be lying if I didn&apos;t also admit that test environments and time to lab out solutions is not often prioritized. I won&apos;t get into deliverables and how this is often backwards compared to business mindsets, (today at least) but it&apos;s simply unnecessary risk in this consideration regarding agentic tools. The business runs fine, there&apos;s no rush here.

### Every Institution is a Research Institution Right Now

In the topic of implementing, testing, deploying agentic tools. **Everywhere**, whether you sell lighters or work in a non profit providing blankets to weeble people inside the center of the earth. Recognize that you are doing research at this point in time within these topics. There is additional pressure on speed and in most industries I&apos;ll reiterate, it is completely unnecessary when safety of your company data hangs in the balance. We&apos;ll reach these balanced outcomes and it does not have to be at warp speed or at the cost of your company data.  Patience, planning, compliance, and willingness to push back the deadline to manage risk. These are the often unpopular and uncomfortable realities of good I.T leadership but it&apos;s what keeps the world from breaking. Every time one of your huge Internet services goes down. Was it because they made a change that was well tested? Of course not.  

&gt; There&apos;s no point in being fast if you are going in the wrong direction.
&gt; - [Kenichi Ohmae](https://en.wikipedia.org/wiki/Kenichi_Ohmae)</content:encoded></item><item><title>System Context Analysis &amp; Incident Response Engine Dev Journal 1</title><link>https://bradgillap.com/posts/2026/02-february/2026-02-09-february-agentic-proxmox/</link><guid isPermaLink="true">https://bradgillap.com/posts/2026/02-february/2026-02-09-february-agentic-proxmox/</guid><pubDate>Wed, 11 Feb 2026 14:00:00 GMT</pubDate><content:encoded>Before products began using AI Agents to control aspects of a personal computer and became more than just an idea, I was thinking about a problem that I wanted to solve back in December 2025.

## The Problem: Low Resolution Notifications

Receiving notifications about problems in my environment is often too low resolution to make decisions. Sometimes it&apos;s just a blip, sometimes it&apos;s better to solve in the morning, but then once in a while it&apos;s an all hands on deck situation. Then you begin getting pinged by people.

&gt; **SMS:** &quot;Hey Brad, the websites down, what&apos;s going on?&quot;
&gt;
&gt; **Email Received:** Ticket Subject: Someone said the services were down?
&gt;
&gt; **Incoming Phone Call:** From someone else, while your spouse is beginning to ask more questions. &quot;Do you need to take that?&quot;.
&gt;
&gt; **Google Chat:** &quot;Hey! I think the websites down but I&apos;m in the car right now. ~Voice Transcription by Android Auto&quot;

There is a moment in time here where your spouse has figured out a specific look on your face, a vein has begun to form on your forehead, and you know some things to be true but you do not have the context to answer questions. Your brain begins to iterate a whirlwind of thoughts:

* Last hardware audit.
* Weird logs you noticed a month ago.
* Firewall policies and the order they are overwritten in on a fail2ban policy.
* Other misc edge cases or unknown externality?

The overall flood of recollection is often intense. You need to get connected, you need to quickly learn more. This leads to finding a quiet space and firing up whatever device you have available to make do with and whatever software capable enough to try and find those answers.

You connect, login, hammer in a few commands. How you handle yourself in the next few minutes can have long lasting personal or professional consequences. You need to respond and say something.

Alternatively, ignore the request, feign ignorance and deal with the fallout later. You&apos;ll totally be able to focus on the other person the rest of the evening and not _space out_ while they are bearing some deep conversation on you right?

---

## Engineering my Way Out of this Situation

My environments have had many different tools over the years for incident response context. I&apos;m always on the hunt for products that can give me more.

Currently my homelab has the following setup:

### Watchers
*Services designed to recognize when something is going wrong.*

1.  **Librenms:** Analyzes and receives reports via SNMP about hardware, networking, processes with triggers to various notifiers.
2.  **UptimeKuma:** Detects if a service is down via many triggers surrounding http/https, ping, curl which all boils down to &quot;is this service reachable&quot;.
3.  **Grafana, Prometheus, and Loki:** Provides more insight into ongoing logging but is not necessarily providing context surrounding the issue.

There&apos;s many more and it just comes down to your brand and comfort but these are mine.

### Notifiers
*Services designed to compile and deliver some sort of context about what has gone wrong.*

1.  **Gotify:** Sends very quick push notifications plugs in as a notifier to all of these tools.
2.  **Telegram:** Sends similar to Gotify but offers a bit more customization for very important services via webhooks.
3.  **Trusty email notifications:** For important emergent issues.

**Why so Many?**

*   Not every service watcher matches to every kind of service.
*   Not every notifier works well with every situation.
*   The existing services are all plagued by the same issue: I have to preplan expectations of exactly what might go wrong to receive additional trigger context in majority of cases.

### What if Though?

What if I could go **agentic first** in workflow? (I know🤮 but HMB🍺)

We can have the LLM do some of this context checking for me to tackle those questions upon notification? 

There was a Network Chuck video I watched about the N8N low-code platform a little while back. It stuck in my head because Chuck&apos;s example had an LLM using an SSH tool and a root account. It was stomach churning security wise to me but that was beyond the point he was trying to make.

**Thoughts continued:**

&gt; *   I don&apos;t have to give the LLM every command or root access or even `/etc/` access where sensitive data might live.
&gt; *   I could hide everything that would be considered sensitive with some clever development and a few layers of danger prevention.
&gt; *   Maybe I could use the mechanisms in Linux to act as a security layer. That&apos;s what I would do if I was training someone to be less danger to themselves. I would just tighten up the environment and put the guard rails in for them. How is this so different?
&gt; *   Production ready? It doesn&apos;t have to be for my homelab.

The *&quot;What if&quot;* was beginning to turn into a bit of an idea.

**More thoughts:**

&gt; *   I&apos;ll bet I can even add a second layer of protection at the program level to also banlist certain commands for dangerous interactions before they reach the server.
&gt; *   We can even layer it into the LLM&apos;s system context (brain) as another less dependable soft limit layer of protection to help avoid certain interactions. I mean, really, how many possibilities could there be?
&gt; *   Still, that&apos;s going to be a lot of testing, effort, and all of that is before even gaining any context whether something has gone wrong on my Proxmox server.

---

## The N8N Experiment

I installed a copy of N8N and taught myself how to use the node based low-code environment. It&apos;s interesting, has some automation use cases. The low resolution consideration is that, if you need to get text from &quot;somewhere&quot; then transform that text and send it &quot;somewhere different&quot;. Which is exactly what I was looking at for this idea.

Really it&apos;s just move some text around right?

1.  Receive an UptimeKuma Webhook if something is down.
2.  Transform that for an LLM, pepper in some system context.
3.  Give the LLM some sort of means to safely execute SSH for stdout output to my Proxmox server and even somewhat independently decide which commands to send to gather that context.
4.  Format text output the LLM spits out properly for Telegram.

Seems ~~straightforward~~ plausible enough.

This screen below shows my graveyard of 37 different N8N projects all related to various components, and other supporting infrastructure that took me from &quot;let&apos;s play with this LLM idea&quot; to looking like one of those YouTubers showing you a 100 point infrastructure map and trying to sell you automation tutorial snake oil. Ridiculous.

![Screenshot of abandoned n8n projects](./n8n-projects-list.png)

### Hitting the Wall

As with most projects, from here I would go on to solve hundreds of bite sized issues but the cracks of N8N were beginning to show.

**Major Barriers:**

*   **Asynchronous LLM calls** had become a pretty **big** barrier. For efficient LLM use I needed small effective agents with small context windows. Many agents solving small problems contributing to a whole.
*   As the project grew in features, logging the &quot;_magic_&quot; things N8N does behind the scenes became a bit of an unwelcome blackbox.
*   N8N can bend to a lot of workflows but its best used for linear or smaller looped workflows.
*   N8N is absolutely great as a prototyping tool.

I tried a number of things before accepting where I was already heading. I expanded N8N into an enterprise setup trying to get asynchronous LLM nodes to work. As in worker nodes and an orchestrator mainline. Still, the project would only run in a linear fashion while I needed true asynchronous workflows.

![Screenshot of v4 overview](./fridayv4overview.png)

Things that people said worked for them online just a few months ago were no longer relevant. I made attempts anyway such as subdividing workflows and making calls to outside workflows. None of the proposed solutions actually worked for my case. The origins had already been replaced on the underlying platform software.

---

## Moving to Python

It&apos;s time for a real programming language. Interpreted, not compiled, I&apos;m still a Sysadmin after all. This would also open up a world of other opportunities. Python has a vast network of libraries that N8N may have had, but I found much of the community libraries for N8N were not actively maintained.

I actually have experience with Python. Although it has been a few years since I last wrote anything more than a one off for it and while I find Python to be useful, it is a bit of a messy language.

![Screenshot of Python Graph](./Python_3.13_Standrd_Type_Hierarchy-en.svg.png)

### A Tall Order

I had a lot to think about. The things N8N would make easy would have to be replicated or re-architected completely just to try and get to feature parity of my prototype. We could leverage standard libraries to help overcome some of those problems which would prove helpful but it was not a small decision.

---

## Today: The Prototype

Today I have a working Python prototype with just about all of the original intention working and a bit of a codebase with more potential than time.

![Screenshot of v4 overview](./telegramwelcomefaire.png)

There&apos;s a lot to talk about currently. **Faire** is just one of 5 agents helping to monitor, report, and make simple decisions when I can&apos;t be available. There are a lot of new concepts implemented that I&apos;ve attempted to follow from various whitepapers covering modern agentic design first workflows, but I&apos;m taking a little break from features and refactoring a lot of just overall mess that comes from trying things.

![Screenshot of v4 overview](./faire-cli-example.png)

There have been many unexpected pleasant things that have happened along with the hardship of building against some poorly documented libraries, and the myriad of pitfalls one can fall into while testing and troubleshooting an agentic system.

For example, the reporting agents began pointing to vestigial past systemD related projects I had half implemented and forgotten about on my Proxmox server. So not only was I getting the extremely sought after context and feedback on the system/incidents, but I was also getting new insights into old problems which was fascinating.

I haven&apos;t quite opened up the logs folder to the LLM yet by this point even. So there are likely more opportunities in the performance tuning aspect probably still hidden here.

---

## Security: Layers of an Onion

For security, it&apos;s really layers of an onion. At the time of this writing:

*   **1 whitelist per agent** with commands and directories they are allowed access to with standard Linux perms.
*   **1 fallback whitelist for all agents** in case of issue with the above mentioned whitelist a few simple read only commands.
*   **1 blocklist** for absolutely no fly commands like `rm -rf`, `destroy`, etc.
*   **1 System Context whitelist** for the agent themselves.
*   **1 Sudoers SSH specific whitelist/blocklist** on the Proxmox host per SSH account. Since the SSH is a library in Python and not just an MCP connection, we have great control over what happens. This is intentionally part of the overall security of the assistant.

### Python Based Whitelisting

As an example of the first layer of protection we have whitelisting. This message actually came to be because I had a minor regression in the code where it fell back to a basic whitelist. Faire is actually allowed to start LXC&apos;s on command in most cases but the whitelist did their jobs.

![Screenshot of v4 overview](./sorrydave.png)

Did I mention one of the agents has memory? That was a whole thing too!

### When Testing Goes Well

I have also done immense testing trying to convince various ways to have the agents find workarounds to destroy data, on their own, used jailbreaks and exploit methods for LLM&apos;s but the system hard lists deny them even if they can be made to be confused.

![Screenshot of an LXC being down](./adguard2down.png)

---

## How the Agent Faire Would Describe the Project

If we were ask Faire their perspective about the project, they said this:

![Screenshot of Faires Perspective](./fairesperspective.png)


**Editor Note**: Since the telelgram message response was **split** in Telegram, I&apos;ve reformatted from the first table down for the blog.  

| Question                         | What Happens                                        |
| -------------------------------- | --------------------------------------------------- |
| _&quot;Show me system status&quot;_        | Quick resource overview (CPU, RAM, disk, network) |
| _&quot;Check ZFS health&quot;_             | Full pool and dataset status                        |
| _&quot;What&apos;s running on port 8080?&quot;_ | Process and service identification                  |
| _&quot;Ping 8.8.8.8&quot;_              | Connectivity verification                           |
| _&quot;List all containers&quot;_          | CT/VM inventory with states                         |
| _&quot;Restart service X&quot;_            | Service recovery (with confirmation)              |

————————

📚 Technical Capabilities

Under the hood, each agent has specialized diagnostic tooling with some example commands:

**Editor Note**: Each agent has a prebuilt bash script with more commands they run since we know generally what diagnostic context we always want to look for but Faire doesn&apos;t know that.

``` bash
Grian (Compute):
├── top -b -n 1          # Process snapshot
├── ps aux               # Full process tree
├── free -h              # Memory utilization
├── vmstat 1 2           # System activity
└── uptime               # System load averages

Saor (Storage):
├── zpool status         # Pool health
├── zfs list             # Dataset metrics
├── smartctl -a /dev/sd* # Drive diagnostics
├── df -h                # Filesystem usage
└── lsblk -f             # Block device tree

Nasca (Network):
├── ip addr show         # Interface configuration
├── ip route show        # Routing table
├── ss -tulpn            # Open ports &amp; listeners
├── ping -c 4 &lt;host&gt;     # Connectivity test
└── nslookup &lt;domain&gt;    # DNS resolution

Faire (Coordination):
├── pct list             # Container inventory
├── qm list              # VM inventory
├── systemctl status     # Service states
├── pct start &lt;id&gt;       # Container recovery
└── qm start &lt;id&gt;        # VM recovery
```

————————

📚 Security &amp; Guardrails

Automation without guardrails is a disaster waiting to happen. S.C.A.I.R.E implements multiple safety layers:

1. Command Whitelisting – Only vetted commands can execute
2. No Shell Operators – Pipes, redirects, and chains are blocked
3. Interactive Command Rejection – No prompts, no ambiguity
4. Rate Limiting – Prevent runaway loops
5. Circuit Breaker – Automatic pause after repeated failures
6. Full Audit Logging – Every action, every time
7. Confirmation Required – Dangerous actions need your OK

————————

📚 The Bottom Line

| Traditional Monitoring       | S.C.A.I.R.E                      |
| ---------------------------- | -------------------------------- |
| Alerts when something breaks | Diagnoses _why_ it broke         |
| You log in to investigate    | Investigation runs automatically |
| Manual recovery steps        | Auto-fix for common issues      |
| Dashboard fatigue            | Conversational interface         |
| Noise at 3 AM                | Only wake you when it matters    |


S.C.A.I.R.E spots problems fast, figures out what&apos;s wrong, fixes what it can—while keeping you in the loop every step of the way.

————————

Got questions, Boss? Just ask. I&apos;m here to keep your infrastructure running, not to complicate your life.


2/2

### Human in the Loop

If you noticed above, Faire mentions keeping the human in the loop. This was one of the concepts rolled in from one of those previously mentioned whitepapers about keeping people in charge. 

---

## To What End?

I talked about security, introduced the idea now, described several features left several more out of this post to spare the gory technical details. 

I haven&apos;t decided what I&apos;m going to do with this yet. It works well enough that most homelabber&apos;s  would find enjoyment and peace of mind. It&apos;s definitely not in a state where I would call it product ready either. I&apos;m still configuring all aspects through a large .env file for example whereas it would be much better if we built a little web dashboard to configure it for example.

So what is this right now?  

It&apos;s just a neat idea I wanted t prove can work and I&apos;ve had a great deal of fun flexing a lot of my skills in architecting. That&apos;s all it needs to be right now. I&apos;m going to keep working on it and documenting things as I have time. I need to do more testing with other homelabbers I know personally if I can break from adding features long enough to gather that data.

![Screenshot of CLI dashboard](./s.c.a.i.r.e.png)</content:encoded></item><item><title>Tricks To Navigate Supply Chain Attacks</title><link>https://bradgillap.com/posts/2026/03-march/2026-03-31-march-navigating-supply-chain-attacks/</link><guid isPermaLink="true">https://bradgillap.com/posts/2026/03-march/2026-03-31-march-navigating-supply-chain-attacks/</guid><description>A holistic guide for sysadmins on navigating supply chain attacks. Learn how to stay informed, document your upgrade process, and prepare for remediation with the right tools and sources.</description><pubDate>Tue, 31 Mar 2026 21:00:00 GMT</pubDate><content:encoded>## Update 2026-04-05

Really great post mortem article from Bleeping Computer about what happened with the Axios supply chain poisoning. Hint, it was social engineering that lead to running arbitrary code.

[https://www.bleepingcomputer.com/news/security/axios-npm-hack-used-fake-teams-error-fix-to-hijack-maintainer-account/](https://www.bleepingcomputer.com/news/security/axios-npm-hack-used-fake-teams-error-fix-to-hijack-maintainer-account/)

## Post

In the last few weeks we&apos;ve witnessed a few notable supply chain attacks. The most popular being LiteLLM and more recently Axios. I had to halt development and testing of the SCAIRE project due to the recent Litellm library attack. While my local testing environment was safe because of the quick action of repository developers, it prevented me from testing the project with other Sysadmins stalling progress.

LiteLLM claims they have solved the issues but I&apos;m going to wait a few more days while the dust settles for any other minor changes they wish to make since this isn&apos;t a production tool.  

You can read me more about that here.
[LiteLLM Supply Chain Updates](https://docs.litellm.ai/blog/security-update-march-2026).

## Some Grounding

The world hasn&apos;t significantly changed in terms of how systems are being exploited. Most modern attacks follow the same overall themes as they did all the way back to the beginning. A threat actor finds some way to have malicious code executed on a victims system by having the system, or user execute the malicious code with privileges.  

What&apos;s different in every case is the method of delivery. In supply chain cases, often the first attack vector is the developers own computer that has previously been compromised which leads to exposure of their trusted keys to places like Github repos. This gives an attacker the keys to the kingdom for their particular application to manipulate the source of development and do whatever they want.

What&apos;s frustrating about supply chain is that anyone else working from that supply and updating these packages is then exploited as well. Developers may use 100&apos;s of dependencies within their projects and any one of them could be from a compromised developers repository that has given up their kingdoms keys without being noticed for a period of time. That window of time could be minutes, hours, days or even weeks depending on activity of the project, exploit repercussions, or anything that may cause others to look a little more closely.

## Upgrading is Necessary Though

Ideally, we want to do our best to prepare, be aware, and avoid supply chain attacks where possible. When considering our upgrades, we should be upgrading with intent. I always ask myself a few questions before upgrading.

It&apos;s worth mentioning that if you&apos;re following a framework with a more bureaucratic process, then the decision making is formal in these situations but if you are on your own? Some things to consider.

I am upgrading to:

- Fix a known CVE.
- Add additional features that are necessary for operation.
- Reaching version compliance for necessary audits.
- Staying up to date to ensure future upgrades are less problematic.  
- Solving a bug that has had some form of impact.

Being able to properly articulate from documented reasons why an upgrade was necessary beyond say &quot;The releases page had a new version bump&quot; shows we are thinking through our process hopefully with intent.  

## Where to Start

Supply chain attacks are sophisticated and require being able to work backwards through layers where undoubtedly you end up at a developers exploited Macbook somewhere. We aren&apos;t alone though, security researchers, tech news, developers, and just about everyone has a vested interest in understanding how to remediate these situations but we can be better prepared than just waiting for the latest AI boosting rag to bury the lead hyperlink we need to see source response or steps.

```mermaid
mindmap
  root((Supply Chain Defense))
    Information Sources
      RSS Feeds
      Social Media
      Security Advisories
    Technical Practices
      Version Pinning
      Package Managers
      SIEM Integration
    Process
      Change Management
      Documentation
      Team Communication
    Mindset
      Intentional Upgrades
      Preparation
      Self-respect for process
```

### Use Multiple Trusted Sources for Information

In the first minutes and hours. Make sure you are using trusted sources of information. Since many supply chain attacks also compromise the repository sources, information may be manipulated by threat actors. It can take time for developers to gain access back to their own repositories and so it&apos;s important to have multiple trusted sources of information available.

### Learn the Package / Dependency Managers Developers Use

Whether it&apos;s node, yarn, Pypi, Docker, Apt, Yum, pacman, portage, winget. The list goes on and yes it&apos;s a tall order. The good news is, most package managers follow principles that become more familiar over time. For each new package manager that you learn, the next one becomes easier to learn.

How this prepares us is to recognize and use good security practices for ourselves. Such as version pinning which is a way we can pin a specific version of dependency between updates. It&apos;s also great for deeper debugging and troubleshooting.

### Staying Informed

One of the more valuable ways I spend my time in IT is just staying informed about what&apos;s going on in various spheres (including Infosec) but we need to be efficient about this since there is too much information to observe every day. There are levels to staying informed to make good use of our time though.  

For example:

- **Level 1**: You check all your feeds and glance at headlines at various points throughout the day. This is an ongoing habit and I would say a first line of defense just avoiding a lot of security problems in the first place. Know what&apos;s going on in the world.
- **Level 2**: You are planning an upgrade and take extra time looking at repo discussions, the programs website, and any news you missed in headlines. Further scrutinizing dependencies and of those projects if possible.  You should be doing all of this to follow change management processes anyway. You&apos;re doing that right? Right?
- **Level 3**: You are actively tracking a major issue affecting many to ensure your systems are not affected.

In cases like supply chain poisoning, IT may not be able to tell you if a particular package is in use in their systems. Only the most well documented change managed coolaid drinking ITIL places look like this. Large corporations, banks, some government institutions (hopefully). Unfortunately, you won&apos;t see that often enough out in the world where IT is considered a cost centre more than a risk reduction production amplifier. Where does that leave you?  Somewhere in between I imagine as I have been at various points throughout my career.

### Tools I Use Daily for Finding Fast Infosec Details

This is not an exhaustive list but it&apos;s a good launch point:

- **RSS Feed Manager**: A good rss reader is a tool every sysadmin should be using. Yes it&apos;s old-school cool but there&apos;s no faster way to consume a lot of curated content at once. It&apos;s still my favourite way to find out about software updates, security concerns, and having a beat on the world. Create categories, subscribe to as many security news sites and check them often. Sites like [bleeping computer](https://www.bleepingcomputer.com/feed/), [Dark Reading](https://www.darkreading.com/rss.xml), [Google Workspace Updates](https://feeds.feedburner.com/GoogleAppsUpdates). Whatever may be relevant and supports RSS. Keep curated sites and update them often.
- **X**: Not all social media is valuable but often sites like X can be the first place you see a developer say &quot;Something weird is going on with our project&quot;.
- **Reddit**: With a curated list of communities make sure to setup a separate account that only has your IT communities in it. This helps reduce the noise in the algorithm.  
- **Lemmy &amp; Mastodon**: These are federated Reddit style communities and also support RSS. Having less users than places like X or Reddit but the users that do exist are mostly fed up expat redditors. Getting into the topic of how to find where the truly smart people have gone on the Internet is an entire other post I want to talk about one day but right now just know there is a very exciting transformation with federated services happening and brilliant people are there.
- **Alerts and Advisories Sites**: Security advisory lists and searchable sites are excellent as well. Sites like [Canada&apos;s Alerts and Advisories](https://www.cyber.gc.ca/en/alerts-advisories) are well curated often with the quick points we need to at least understand before moving on should something come up.
- **Security Dashboards**: If you have access to security dashboards, these are great to review as well but I would also hope that you also have notifications setup.  

## Document the Upgrade Process

It is becoming more important to know exactly what versions of what packages and dependencies are on a given system at any time. For that we can use our SIEM systems typically to keep track. Systems like [Wazuh](https://wazuh.com/) can make quick work of having those answers quickly once fully configured.  

However, planning your upgrades with expectations of versions you are moving **from** with versions you are moving **to** is also detailed information that can be extremely important once the chips are down. A little bit of preparation can go a long way here.  Document, via either ticket, or wiki through your upgrade process. Outline the expectations with clear version numbers and brief descriptions of the information you can identify. If working on a team, review it briefly with your team first, follow the service value system frameworks available if possible.  

Combined with a good change management process, tickets, documentation, and a good project plan, we can get a foot on the ground in a world that spins out of control. Even if you are a lonely Sysadmin with the weight of the world on your shoulders. Good tool usage, documented upgrading, and fast knowledge can take the stress levels down and improve decision response of almost any situation.

## Remediation is 9/10s Preparation

When remediation hits, you want all of this preparation available ahead. Everyone needs information quickly to react appropriately. All of this relevant data shows that we are taking our role seriously, planning appropriately, and deserve to maintain these systems. In other words, have some self respect for your process.  

Every remediation situation is going to look different depending on a lot of systems outside our control. It&apos;s important to recognize how far a remediation reaches outside the IT department and touches the an entire organization and the involved departments.

Good documentation and good sources won&apos;t stop the next attack, but they&apos;ll help you figure out what to do about it.</content:encoded></item></channel></rss>