With 4x4 you can get in bigger troubles

In many cases, with more experience comes greater responsibility, and with greater responsibility comes the user permissions to make far more impactful mistakes.
Experience allows you to skip the smaller mistakes and make much bigger ones.

Twitter user ElleArmageddon started a thread with kind of a theme 'sharing mistakes is caring'. Especially for new people coming to IT field (or actually, to any field...) making mistakes can feel almost too big an issue. "If I'll make a mistake, they see I don't know my stuff!" Impostor syndrome was also mentioned.

That thread contains many real gems what comes to war stories and mistakes people have made, and it also shows that making mistakes is normal. Everyone makes mistakes. But of course, if you continue making the same mistake over and over and again, you probably should look into the mirror....

Those quotes above can be found from that Twitter thread.

After some discussion in Citysec Mattermost server I decided to shed light to my own murky past around this sensitive topic.

Be prepared! Nerdy details, missed heart beats, sweat and tears, and so on.

Kicking you out

My job was to look after the network systems power grid companies were using to control their grid. Quite critical piece of an infra, especially during storms, some might say.

To the more critical locations we had two separate connections with totally different and not at all coupled technologies. The modem for the primary connection was not as stable as I would have liked to, and it crashed every now and then. The only way to recover the situation was to connect to the remote system via the backup link, and power cycle the modem of the primary connection.

It was quite stormy day, and yet again one connection failed. I connected to the remote system, wrote the command to shutdown the power for the primary connection aaaaand.... Yes, one small and tiny mistake, and the connection was lost. Instead of 1 I had written 2 to the command, and cut down the power of the backup modem. *sigh*

So, I had to find a car quickly to drive to the location. After calling to the operations center of the power grid company, I heard that some thousands of people were without electricity, and it was really important to get the connection back asap.

Luckily the location wasn't really remote, only 1 hour drive, or something like that.

I arrived at the location, went to the ICT rack, put my hand behind of the rack to take the power cord off to power cycle the whole rack and *HUGE BOOM*

This was a power substation where they transform 110 kV to 20 kV and one of the 20 kV protecting relays in the next room had just tripped. It's quite loud sound, especially because those are located in metal cabinets. You can guess that my heart skipped a couple of beats, before I realized what had happened.

After power cycling the rack everything started working, power grid company could again monitor & control the substation remotely and continue more easily to fix the grid.

Lesson's learned: UX for this setup was crap. The commands were too similar, too close to each other. With one small typo I was able to lock me out. Also, when you are shutting things down, you should always double and triple check if you really have written the correct command.

You shall not pass

I think most of people who have worked long enough in sysadmin/networking duties have at least once made a problematic firewall rule. Examples of firewall rule books are usually really simple, but the reality is totally different. Even though the rule book started out clean and nice, after some years it's nothing but. There are dusty and murky corners you have not touched for a while. Some parts were configured before you knew enough, but you have still left them there. Do not touch if it works! Or actually "if it does not work too badly".



This is one of those cases.

The rule book for this particular firewall was long and complex. We had spoken quite many times that it should be rewritten from the scratch, but as it normally goes, we didn't have enough resources (basically time) to do it.

This time we were creating a really secure and important network whose firewall configuration would be located quite high in the rule lists.

"Okay, let's add a dedicated deny everything else from and to this network rule to the end of this section."

"I'll do that. So, one rule to deny to this network and another deny from this network. Hmm... I'll combine those to make this cleaner."

Someone might already have guesses where this is going...


"No, wait! You can't combine it. If you'll combine those, you'll have basically deny from all to all in that rule!"

Luckily. This time the huge mistake was missed. Just before I was clicking commit button.

To make this worse, this would have locked us out of the firewall management, because this rule was above those....

Lesson's learned: When you are making more critical changes to any system, pair working and peer reviews are must! I think quite many issues in IT world would have been prevented if lone sysadmin would have not done the changes by him/herself (during murky hours).

AC/DC weirdness

We were sitting in the meeting room of a power grid company. Agenda for next two days contained kind of proof of concept installations of new remote control systems to smaller substations. I had one of the remote control units with me so I could show to the electricians and customer project manager how it will actually operate. The room was filled with people who were really experienced with everything related to electricity, but I was just a network guy.

I connected the power cable to the screw terminals of the device. Connected the other end to the power socket on the wall and *BOOM* (+ smoke).

Again, a couple of heart beats were missed. What just happened!??!? And, we lost one of the devices for sure, and thus needed to change the agenda. It had to be sent back to the manufacturer so they could put the magic smoke back. *sigh*

After a couple of seconds someone came to the door. "Do you have any idea why we lost power to the whole floor? My computer crashed because of that and I lost some work!"

Well. If a device wants 24 VDC it usually doesn't like 230 VAC, or does it? *shrug*

Lesson's learned: There are good reasons why normal lay people are not allowed to play with electrical connections. If you really, really, really want to try the case above, please do it in a location where it won't trip the fuses/RCDs of the whole floor of a customer building!

Packets here and there and everywhere

It was just a normal work day at the office quite many years ago. Nothing out of ordinary was happening or had happened for a while. Except absolutely too many tasks popping up all the time to the Kanban board.

I was playing with new switches, trying to make them work as stacked devices. I connected cables, clicked enable to the stacking configuration and switch rebooted.

My colleague sent me a bit troubling message 'Our CEO just called that his video conference was cut of! Do we have any issues here??' I noticed that I had one cable connected from the stacked switch configuration to our central switch, but only the switch management VLAN was tagged to that port. I plugged it out hastily. This couldn't be the reason, could it?

We got some more comments from our users that there had been some hiccup in our network, but it was restored quickly. Weird? We checked the network and other graphs, and we observed some weird things even in our Internet connection. Okay, some hiccup in the connections to our ISP. Hopefully it was only some temporary issue, and we can continue working normally.

I closed some other open issues, and returned back to the switch thingy. For some reason which I can't remember anymore I had to revert the stack configuration back and start from the scratch.

"Do you want to reboot this device to enable the stack? Yes/No" Pressed Y and waited a while.

What!?? Network issues again! No, this can't be it. I looked the switches, and noticed that I had replugged the cable coming from the central switch. No, I didn't leave it connected anymore....

Things returned to normal.

We checked everything, and network graphs showed huge spikes in the traffic inside of the central switch, which should have easily coped with the traffic coming from one 1G port. But based on timestamps and logs I was 99,42% sure I was the guilty one.

'Hello, we have this weird issue with our system. One of the databases is not working correctly. Do you have any ideas?'

"Hi, our customer environment crashed. And when we checked it, we noticed that at least one of the virtual machines had rebooted? Any thoughts? Why it did this in the middle of day?"

No, it couldn't be. No no no...

Yes. Because of the network outages I had caused with my packet accelerator, our virtual environment had decided that some (well, a bit too many actually...) virtual machines were unreachable, and it had restarted those. *sigh*

Good thing! High availability part of the virtual environment was tested in the live scenario! Hooray! It worked!

Bad thing. Not many systems like it much when they or part of their setup goes away unexpectedly. So kids, do not try this at home. Especially with high level customer systems.

I don't know what actually caused this issue. I have some ideas, but a bit too many years have passed so can't be really sure of the order of my actions. Most probably the way I made the stacking configuration caused a packet loop between switches, but I'm not 100% sure about that, because of how the system behaved otherwise.

Lesson's learned: When configuring some new network setup, it's better to keep it away from the main network before it's 'ready for testing'. Additionally, I probably should have spent 5 more minutes by reading the HOWTOs and manuals. I know, manuals are for rookies, but still.... ;)

Btw. We all have test systems. Some lucky enough might even have separate production systems!

Screw those terminals!

"We have this spare UPS, so I'll connect our whole ICT rack to that."

"Umm... Why? We are located at the center of this big city. There are no power outages here."

"Yup, but this UPS is spare. So why couldn't I use it?"

Some weeks after this discussion I was testing a broken device with screw terminals for all connections. Almost all terminals were either for 24 VDC or some low voltage sensors. But someone had thought that it would be just fine if the 230 VAC power in was located in those same connectors.

I took the connector out, screwed the 230VAC cable to the connector, plugged it in and connected the cable to the wall socket. *BOOM*


It was one of those connectors

I had connected the cable to the "correct" terminals but I had counted from wrong end, because it was positioned wrong way on my hand.... Perhaps I was a bit too tired because didn't notice the issue when I rotated the connector so I could plug it in. Or perhaps I got distracted for some reason? Can't remember anymore.

Luckily we had that UPS and all had laptops, so work could continue even though we lost power to the whole floor. *sigh* And where was the phone number for the maintenance department....

Good thing: No need to test that device anymore, it really was broken!

Bad thing: Magic smoke of electronic devices doesn't smell good.

Lesson's learned: Usability should be considered even when designing things for industrial usage. And again, when playing around with 230 VAC, it's better to triple check before plugging it in.....

Coffee !addict

I don't drink coffee, never had and never will. And I also have a rule that if you want coffee, you can make it by yourself, it's better that I don't touch the coffee machine.

Most of the time I've been able to follow this rule, but it has also caused some issues.

We had a morning meeting with our customer. I had checked that things are looking good in our meeting room, had bought sandwiches and checked the clean cups and plates were available. Alas, my coffee drinking colleague was again late, and thus we didn't have fresh coffee available.

Customer engineers came and one of them decided that he can't start the day without coffee. "I can load the machine, it's fine for me" he said and went to the kitchen corner.

He opened the coffee machine and we heard with a bit mysterious tone: "So..... this is why your coffee tastes usually like it does, I should also try to leave the coffee grounds to mildew! This pink looks nice!"

Luckily we had a long contract with them, and still many years to come.

Lesson's learned: Check everything. And a bit more.

Epilogue

I have made other mistakes also, but these have caused quite strong memory traces for me. No idea why, and I'm not sure if I even want to know... All of those have taught me things, so they have not been only bad issues. I have now much more knowledge, can thus skip the smaller issues and stumble upon even bigger ones.

Oh, and what about the topic of this post? I once discussed with my friend who loves driving offroad, mud and everything. I said that I've thought getting some 4x4 car because front wheeler is a bit problematic in some situations. His answer was basically "4x4 is nothing magical. It just allows you to get into much worse situation. Stuck in places where the car is much more difficult to rescue."


Comments

Popular posts from this blog

The only constant is change

Passion is a fruit

Hack the Box, CTF, challenges, and ethical hacking (+ some thoughts about courses)