I’ve got so many more stories about bad optimizations. I guess I’ll pick one of those.
There was an infamous (and critical) internal application somewhere I used to work. It took in a ton of data, putting it in the database, and then running a ton of updates to populate various fields and states. It was something like,
Put all data in x table with batch y.
Update rows in batch y with condition a, set as type a. (just using letters as placeholders for real states)
Update rows in batch y that haven’t been updated and have condition b, set as type b.
Update rows in batch y that haven’t been updated and have condition c, set as type c.
Update rows in batch y that have condition b and c and condition d, set as type d.
(Repeat many, many times)
It was an unreadable mess. Trying to debug it was awful. Business rules encoded as a chain of sql updates are incredibly hard to reason about. Like, how did this row end up with that data??
Me and a coworker eventually inherited the mess. Once we deciphered exactly what the rules were and realized they weren’t actually that complicated, we changed the architecture to:
Pull data row by row (instead of immediately into a database)
Hydrate the data into a model
Set up and work with the model based on the business rules we painstakingly reverse engineered (i.e. this row is type b because conditions x,y,z)
Insert models to database in batches
I don’t remember the exact performance impact, but it wasn’t markedly faster or slower than the previous “fast” SQL-based approach. We found and fixed numerous bugs, and when new issues came up, issues could be fixed in hours rather than days/weeks.
A few words of caution: Don’t assume that building things with a certain tech or architecture will absolutely be “too slow”. Always favor building things in a way that can be understood. Jumping to the wrong tool “because it’s fast” is a terrible idea.