When Automation Goes Wrong: A Painful Lesson in PowerShell

Have you ever written a script that worked flawlessly in testing only to wreak havoc in production? It’s a nightmare scenario for any automation specialist. I’ve been there, and in this post, I’ll share one of my biggest mistakes, how it happened, and what I learned from it. If you write PowerShell scripts for critical tasks, this story might save you from making the same mistake.

Learning Through Mistakes

Learning is not just about gaining knowledge. We also learn by making mistakes. In this post, I want to share a mistake I made, break it down, and see what lessons we can take from it.

The Background

A few years ago, I was working on a script for offboarding users. Part of this script involved disabling the user’s mailbox. Throughout the script, I used the user’s GUID because it is unique for every user. That was my main reason for choosing it.

I looked up how to disable a mailbox and found the Disable-Mailbox cmdlet. I quickly realized I didn’t need many parameters. My initial command looked like this:

Disable-Mailbox -Identity "00000000-0000-0000-0000-000000000000"

Almost perfect, but I didn’t want the script to prompt for confirmation. It needed to run fast and without manual intervention. I had already built checks for permissions earlier in the script. So, I added the -Confirm:$false parameter:

Disable-Mailbox -Identity "00000000-0000-0000-0000-000000000000" -Confirm:$false

I tested it with a test user and added -WhatIf for safety. Everything worked as expected.

The One-Liner That Broke Everything

Next, I wanted to check if a mailbox existed before disabling it. My idea was simple: use Get-Mailbox and pipe the result to Disable-Mailbox. I had done similar one-liners before, and Microsoft Learn often showed this approach. So, I wrote:

Get-Mailbox -Identity "00000000-0000-0000-0000-000000000000" | Disable-Mailbox -Confirm:$false

One more detail: at the top of my script, I set $ErrorActionPreference = "Stop". My assumption was that any error would stop the script before causing damage.

I tested this thoroughly. Everything worked. Time to deploy to production.

When It Went Wrong

Weeks passed with successful offboardings, until one day, disaster struck.

A colleague triggered an offboarding. Later, the service desk reported multiple customers having email issues. After investigation, the source was traced to the server running my scripts. My IP address was in the logs.

Dozens of mailboxes had been disabled. Not just a few but hundreds of active users. I was shocked. How could this happen? I had tested everything!

After digging deeper, I found the problem: The GUID didn’t match a mailbox and Get-Mailbox did not return an error. Instead, it returned the first 1000 mailboxes by default. And because I used -Confirm:$false, the script disabled them all without asking.

My assumption was wrong. I thought nothing would be piped if nothing was found and I would get an error. But the cmdlet behaved differently. $ErrorActionPreference didn’t help because no error occurred.

The Impact and Recovery

The result? Dozens of active users lost acces to their mailboxes.

We immediately formed a crisis team. I checked if any other scripts were running. We restored mailboxes using Enable-Mailbox. For users with archive mailboxes, recovery was trickier, but standard naming conventions helped us reconnect them quickly.

Lessons Learned

The next day, we held a post-mortem to understand what went wrong and how to prevent it in the future. One of the biggest takeaways was to never assume how a command behaves. Always test every possible scenario. What you expect to happen might not be what actually occurs. I also realized the importance of reading documentation thoroughly; in this case, the behavior of Get-Mailbox when no results are found was clearly documented, but I overlooked it. Another lesson was to be cautious with one-liners. They are powerful and convenient, but if you don’t fully understand their implications, they can lead to disastrous results. Finally, mistakes are inevitable, but what matters most is learning from them and putting safeguards in place so they don’t happen again.

The Fix

I added extra checks to ensure the mailbox exists before disabling it. We also decided to avoid one-liners for critical actions. Here’s the updated approach:


Try {
    Write-Output "Remove User Objects | Remove Mailbox | $UPN"
    $Mailbox = Get-Mailbox -Identity $UserGuid -DomainController $Domaincontroller -ResultSize 1 -ErrorAction Stop
    if ($Mailbox.Count -eq 1) {
        Disable-Mailbox -Identity $Mailbox -DomainController $Domaincontroller -Confirm:$False -ErrorAction Stop
    } else {
        Write-Output "Remove User Objects | Remove Mailbox | $UPN | Mailbox Object not equals 1 ($($Mailbox.Count))"
        throw
    }
} catch {
    $_.Exception.Message
    throw
}

Final Thoughts

This experience taught me to be far more cautious when writing scripts and to never underestimate the importance of thorough testing and documentation. Automation can save time, but it can also cause massive disruption if not handled carefully.

Have you ever had a script go wrong in production? What lessons did you learn from it? Share your experiences and tips in the comments. I’d love to hear how others approach preventing these kinds of mistakes.