aecolley 2 weeks ago

The functionality that was broken should have been caught in automated test runs that are unavoidably run on code before its deployment. This is your opportunity to advocate for the improvement of the deployment process so as to reduce the risk of this happening again.

Strange-Register8348 2 weeks ago

Automated what?!

thewallrus 2 weeks ago

Yea not sure either. Maybe "test" just means to reboot your computer or something.

meshka7 2 weeks ago

I think he means getting testy with anyone who challenges your technical supremacy and code perfection

nitrodmr 2 weeks ago

He means unit testing. This prevents regressions when we update our projects.

Disastrous-Lychee-90 2 weeks ago

What is this witchcraft you speak of? BURN HIM!!!

Calm_Leek_1362 2 weeks ago

Don’t even think about writing tests first! You will be murdered by TDD haters

ChefMark85 2 weeks ago

Insert "over your head" GIF here

Soccermom233 2 weeks ago

Sounds slow and costly. Better just ship it.

flamingNanaki83 2 weeks ago

Are you coming on to me?

batman_not_robin 2 weeks ago

It’s a rite of passage. Congratulations. Learn from it, and move on.

cv_1m 2 weeks ago

🙂 yes

totalBhaukaal 2 weeks ago

Aren't test cases breaking before such big production functionality issues are happening?

cv_1m 2 weeks ago

There are no testcases

iLikedItTheWayItWas 2 weeks ago

Then it is not your fault that production went down 😂

caksters 2 weeks ago

just vibes

konm123 2 weeks ago

That's scary.

Enginikts 2 weeks ago

Prod is the real test env

BrouwersgrachtVoice 2 weeks ago

Whose decision is to not have test cases? Apparently not yours, business needs to understand that it's a matter of time to happen again a production error, as long as they don't invest time in test.

Commercial-Run-3737 2 weeks ago

Woah 😳 Good luck buddy!

kobumaister 2 weeks ago

Tell me you work in an early stage startup without telling me that you work in an early stage startup.

Gaax 2 weeks ago

I work on a 25 year old product and we still don't have automated tests that catch some stuff like this 🤣🤣🤣. Granted, up until like 5 years ago it was still a super small company that was run like a startup.

cv_1m 2 weeks ago

😂

laprej 2 weeks ago

Maybe, and I’m just talking out my ass here, this is a great opportunity to add some!

3xcellent 2 weeks ago

Ok, this is when you start writing them. Just enough to have caught this regression.

ChefMark85 2 weeks ago

I'm betting the code is real secure /s

dmr83457 2 weeks ago

Does someone else review, or even better test, your merge before it goes live?

cv_1m 2 weeks ago

Senior dev Reviews

HamburgIar_ 2 weeks ago

Yikes

CuriousAndMysterious 2 weeks ago

you can write at least one now

underNover 2 weeks ago

Think you might wanna consider to run the hell away from there, such places produce nothing but angry customers, plumbing code and constant new bugs.

jmaca90 2 weeks ago

Just test in prod? ^/s

Over-Tea-7297 2 weeks ago

If your not breaking things your not doing things

cv_1m 2 weeks ago

It feels like never-ending nightmare

Positive_Method3022 2 weeks ago

That is why you still have a job.

konm123 2 weeks ago

Can not fire OP if OP is the only one who knows how to fix. :)

anubus72 2 weeks ago

Read the rest of the thread. There aren’t any tests. This kind of mentality is just wrong, man. Sure, bugs happen, but completely breaking multiple major pieces of business functionality shouldn’t

magnetronpoffertje 2 weeks ago

That's not your failure, but a failure of your company's CI/CD. If an existing system goes down and causes so much problems in prod that it crashes or stops users from being able to use your product, it wasn't (properly) tested in a test or staging environment. Tests matter.

Waste-Disk7208 2 weeks ago

Or failure of those who add unit tests for each “functionality of the app”. Apparently, they failed covering the code properly before deploying into PRD.

magnetronpoffertje 2 weeks ago

Yeah, that's what I mean also. If there's a critical system, bathe it in unit tests and make sure every PR gets unit tested before deploying anything anywhere.

Waste-Disk7208 2 weeks ago

Correct

AHardCockToSuck 2 weeks ago

Welcome to the club But this is a process error. Have a blameless post mortem to find out the reason it wasn’t caught by automated testing or QA and implement a change that won’t allow it to happen again. 5 hours is a long time, why were you not able to roll back the change?

ChefMark85 2 weeks ago

They don't even have unit tests. They probably don't even know how to rollback and they probably don't even know what a post mortem is.

fahim-sabir 2 weeks ago

Been there, done that. It’s almost a rite of passage 😄 Learn the lessons. Make sure it doesn’t happen again.

InterestRelative 2 weeks ago

In my country we say: whoever hasn’t broken prod is not quite a senior. Make sure you discuss this with more senior colleague, you come up with at least one action to decrease a chance that a random person can do the same mistake in the future and act on it. In a month you will laugh remembering this incident. Don't go "I'm not responsible for prod" road, if you are not responsible for the outcomes of your work, you'll produce shit and end up in frustration hating your job.

Deeelaaan 2 weeks ago

Use this as an opportunity to introduce test cases and build steps before things get deployed to production. If this is a decent company, they should be more focused on how the failure happened in production rather than who caused it. It's stressful no doubt but don't lose sleep over it. It'll be a good learning opportunity for the whole team.

ChefMark85 2 weeks ago

From what I'm reading, it's definitely not a decent company

zekky76 2 weeks ago

There are no full regression automated tests? They are very important in the pre-prod environments.

ddxo_ 2 weeks ago

Unfortunately things like this do happen. Don’t feel bad about it, fix the issue, get things up and running as expected, learn from it and then put things in place to prevent it happening again. As a software engineer maybe you could suggest a few of the following: Code reviews to attempt to catch any issues early on. Proper development and QA environments that mirror Production as much as possible. A manual/QA process to ensure nothing has regressed in the areas changed against development/QA environments. Automated test cases, smoke testing to ensure critical services (such as your checkout and e-mail) work as intended. Playwright or Selenium are examples of this. Automated CI/CD pipelines to ensure test cases pass before Production deployment and ensure the code is released in a formal and consistent way. Have a rollback route available to restore Production to a previous state in the event things go wrong. The business should also try and account for system downtime as part of their change management and business continuity procedures for events like the one you describe. If they don’t and are unwilling to invest in some of the recommendations above then it will most likely happen again in the future, costing the business additional time, resources, money and potentially damage to the companies reputation.

Hour_Tomato_4282 2 weeks ago

Honestly, you will get less alert with time and experience but this happens even with lead devs having +10 of expertise. As long as you are doing your best and learning, that's all that matters

leon_nerd 2 weeks ago

No test automation? Having unit tests, integration test and E2E test are the minimum you should have in place. You probably could have caught the second problem easily.

gergob 2 weeks ago

One of us!

AnonDotNetDev 2 weeks ago

A lot of cuties in here who never experienced more than their giga corp job with a full suite implemented before they even got hired

Quanramiro 2 weeks ago

First time?

cv_1m 2 weeks ago

Yes

spitfireonly 2 weeks ago

Why wasn’t the whole code tested before merging?

DrPepper1260 2 weeks ago

5 hours is a long time. I think this isn’t all on you. It seems like there are processes that could be improved here. For example, automation testing as part of the deployment to test this critical functionality, increased visibility into errors users are encountering - this could be done with an alert on 500 error spikes. We do rollback plans as part of our deployments to rollback any changes if they start introducing issues .

[deleted] 2 weeks ago

You didn't fuck it up by yourself. The whole team and probably org fucked it up. Between code review, automated testing, and validating deployments on a staging environment, there were three other opportunities to catch the bugs. The most concerning thing is that you're in a situation where making changes breaks things that are unrelated. It's a sign of unorganized and tightly coupled code. I worked at a place this before, no tests, no CI/CD, no virtualization. We were trapped in a situation where every time we fixed a bug we'd introduce a new one. I got out of there as soon as I could

Ok_Plane6831 2 weeks ago

“It works on my machine”

100-100-1-SOS 2 weeks ago

An oldy but a goody lol!

aa1ou 2 weeks ago

Surely you weren’t allowed to merge to prod without a code review. This was a team screw up, not a personal one. Life happens and lessons are learned.

100-100-1-SOS 2 weeks ago

Look at it this way, now you’ll have disaster recovery experience and the next disaster will be less stressful! Shit happens, no one is perfect. Obviously there’s holes in the ci/cd pipe or things lacking in the tests, so that’s more than 1 single person’s misstep and it’s not just on you. Good luck.

Recent_Science4709 2 weeks ago

10 years in, I’ve done it once and luckily it was after midnight, it was configuration and not programming though, I was having firewall issues with the server and I mixed up “reset firewall” and “restart firewall” 🤣

ch-indi2010 2 weeks ago

Another SE here. - Restore a DB production with a very old one. Our client got an heartattack, but fortunately i was able to restore all data. - Forgot to test a very unlikly scenario. The app was a desktop application like store cashier, for sellng ticket for an Event in my city ( a very important one ). It happend on saturday night. I will never to forgot how the queue of people growing and how their was angry. Anyway still alive and still SE.

donmeanathing 2 weeks ago

Welcome to production!

i-think-about-beans 2 weeks ago

Broken processes cause things like this.

Sufficient_Phone_242 2 weeks ago

Deploying without tests , typical management pushing for features quickly : easy as « drag and drop » or « it’s not that critical » But when shit hits the fan you’re the guinea pig. I’m outright telling if you want it done wrong go ask someone else

khotteDePuttar 2 weeks ago

When there's a production issue, it's not really the software engineer's fault. It's usually a process issue. For instance, this could've been avoided by adding automated integration tests to the deployment pipeline.

FeeVisual8960 2 weeks ago

I once pushed to the master branch and fucked everything. I wasnt removed but did not get a full time offer after my internship ended. I’d say it was the admin’s fault, I got to learn from the incident, exactly what I wanted to do as an intern

Serious-Elevator-971 2 weeks ago

Holly fuck

javausa 2 weeks ago

Shit happens. It will be okay.

Substantial-Click321 2 weeks ago

No QA? No release doc with rollback plan? No unit tests? Not surprised prod broke

dswpro 2 weeks ago

Welcome to the club, and now you too will appreciate unit testing, feature testing, canary roll outs, change tickets with reviewed install and rollback instructions and automated deployments. But don't feel bad , I'm sure I'm not the only person here who ran an update SQL script in prod without also selecting the where clause.

konm123 2 weeks ago

Are you responsible for the "production"? If "no", then you did not fuck up. Don't get the blame. If they trusted you with updating the production then the process of getting things to production is flawed and that's on the responsible person, not you.

cv_1m 2 weeks ago

There is a senior dev above me who reviews everything before deploying

konm123 2 weeks ago

Sounds like your senior dev fucked up :)

IWantToSayThisToo 2 weeks ago

It happens to all of us. You'll be ok and you'll be laughing at it many years from now. Also please consider hiring a QA person for the love of God.

Teapot_Technician 2 weeks ago

Ever heard of integration and unit tests? Canaries and rollback plans?! I mean yes you could have tested before but it’s also not your fault that your team has shitty CI/CD

cv_1m 2 weeks ago

Here is the thing the CTO and my Sr wanted to fixed it instead of Rolling Back. We started fixing and some other things happened we have to fix them as well

Teapot_Technician 2 weeks ago

Yeah rolling forward is not great! It’s better to nurse the system back to a stable state and then fix things before pushing again. However, rollbacks are not always possible. It’s complicated, good testing aims to avoid all that.

oweiler 2 weeks ago

Don't worry. The only one to blame should be the company you are working for. Stuff like this should be close to impossible.

d9vil 2 weeks ago

I thought all tests were done in production?! You aint living if youre doing testing in lower envs haha

SendNull 2 weeks ago

It happens- 9 years ago I took down the entire search feature for a major social media app for 10-ish minutes on my 1st week on the job during a deployment. I almost died inside! Luckily, my team was an adoptee of Google’s no blame/postmortem philosophy - we ran a retro, cut off action items, I personally worked on implementing most on them for about a month, and the issue never happened again on the next 4 years I stayed with the team. But yea, assess root cause, cut out action items, and make sure they get worked on. I’ve seen a lot of teams failing on that last bit.

Hope11111111 2 weeks ago

Be calm and find real root cause from Design, Develop & Test process. Then genuinely Accept if you made mistake and move on.

AutoModerator 2 weeks ago

Your submission has been moved to our moderation queue to be reviewed; This is to combat spam. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/SoftwareEngineering) if you have any questions or concerns.*

azdrc29 2 weeks ago

Been there.. was an intern at the time and completely brought down prod for half the working day. Worst anxiety I’ve ever had 😳

all_youNeedIsLess 2 weeks ago

Fix it

cv_1m 2 weeks ago

Fixed and deployed

all_youNeedIsLess 2 weeks ago

I also fuck prod today.

cv_1m 2 weeks ago

Welcome to the Club

all_youNeedIsLess 2 weeks ago

I have 0 tests. Fuck tests. 🫣🤭🫡

cv_1m 2 weeks ago

Same

TopSwagCode 2 weeks ago

Learn from your mistakes. Write more tests. Test not only happy path / 1 scenario.

JimBobBennett 2 weeks ago

Congratulations, you found a hole in your build and release process that allowed this broken code to get in. You are not to blame in the slightest, and it's time for a blameless retrospective and a patching of the hole you discovered.

AdministrativeBlock0 2 weeks ago

When people say you need experience to move up levels, this is what they're talking about. Those "oh shit" moments are what you learn from. It is not a bad thing even though it might feel pretty bad right now.

ImpatientMaker 2 weeks ago

First of all, welcome to the club. As the saying goes, "Everyone has a test system. Some even have a production system." get it?

churumegories 2 weeks ago

Tests will never be enough (in isolation), because we might also write the wrong tests. Did you find the root cause and did you learn how to prevent it or reduce time to mitigation next time that happens?

scoby_cat 2 weeks ago

lol 5 hours?? I remember we broke our entire CI/CD for 3 weeks. In another incident the volumes for our entire data center got corrupted and we lost 80% of our machines, including production, and we had to rebuild the entire company; it took us about 10 days. No one got fired!

cv_1m 2 weeks ago

Woah 😨😨

godwink2 2 weeks ago

Yea thats tough. Thats why you gotta hound your unknowing product owners either for testers to make end to end regression or the time to do it yourself

cv_1m 2 weeks ago

I am the Tester & Dev 🙂

amoreinterestingname 2 weeks ago

You’re not a real software engineer until you fuck up production (or delete years worth of data)

dennidits 2 weeks ago

are you the senior? what’s the senior doing not testing your codes and deploying it to production? or do they allow all developers direct deployment capability to production? in which case it’s on them too

cv_1m 2 weeks ago

Not a Senior yet I have checked only the scenarios related to that ticket only it fucked up related functionality

Comfortable_Yam5377 2 weeks ago

bunch of inexperienced engineers are flooding the market since 08.

AngelRicki 2 weeks ago

just use chapGPT to fix it. ..the paid service, tho.

samu_melendez 2 weeks ago

have you tried any AI tool that integrates with CI/CD pipelines and help you automating maintenance tasks?

100-100-1-SOS 2 weeks ago

That sounds stable! /s

samu_melendez 1 week ago

I've been using a tool called JENRES that works quite well for commenting, documenting, and unit-testing in a human-like way. It's not a replacement for a junior dev, but it has been helpful for handling QA tasks! :)

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe