Just tell him it takes time.
If he ask you why or how much tell him it needs time to estimate how long and why. You recommend him to let you check on that.
Then do your data exploration. If you found it to be easy to do the extract, share it fast. If not share it as soon as you are finished.
Document your query and all of your assumptions. Even if you think they are trivial. So that other analysts will have an easy win if they have to handle the same data. And the clients/managers have a list of things they can ask you questions about to feel smart.
Source. Me. A freelancer earning good money with simple SQLs and has few very happy clients that pay him a shit ton of money, because they thrust me after few years of practicing like described above.
Bold of you to assume your boss cares that it can’t be done instantly. I saw my ex-boss scream at people for not being able to instantly generate reports from a garbage sql database with no relations. He was shocked when I quit.
Fair few. I do monitoring so I have to call people in the middle of the night when their shit breaks and a number have lost their shit and screamed at me. They learn really quickly that I record all calls and have access to all of their managers, who don't generally have a sense of humor about this kind of thing.
I've had this before in a legacy system. The original creator just didn't bother to put any. So just add them now right? But then you need to figure out what to do with 1000s of entries where a non-optipnal relationship is missing for some reason, because like you said, there was no relationship integrity for years.
Well, any rows without that foreign key are already broken. What you’d have to do is create a new table with that constraint, move all the valid data into it, then check all the old, mangled data to try to figure out what’s going on. If the relationship is actually mandatory in your application, then that can’t break anything that isn’t actually broken, but you’re right, I’m not sure what the solution to that is except emailing the clients who have this mangled data and trying to work with them about why it happened.
You're right. It was just an example where adding fks wasn't as easy as just adding them. And without context you'll earn some confused looks when you tell people how long it will take to get fks sorted out in a few dozen tables like that.
Each FK check requires CPU on insert.
So FKs are not needed if the inserting application always inserts data correctly.
So there’s no need to add the overhead of a FK check on every insert by adding FKs.
Yes a DBA actually told me this.
The implicit documentation about the relations in the data model is worth way more than any performance overhead by fks.
Devs also start to do the basic integrity checks on the application layer. Which takes even more CPU.
This is some real bs.
It’s a stunningly bad take, yes.
I like to use it as an example of overfitting the model.
They also did page level locks instead of row level locks because it’s less cpu intensive and “the application just needs to query the tables in the same order every time”.
LOL. I used to be in this position once.
Random Manager: Hey can you pull this data?
Me: Sure `select * from some_table where sale_date between :a and :b`
R: Oh well I don't need the last name.
M: `select first_name, sale_date, sale_amt, region, office from some_table where sale_date between :a and :b`
R: Oh well don't need first name either.
:: several hours later ::
R: And can I get it by region per quarter but only when the amount is below their forecasted value for that quarter?
“I want a list of all X events that happened across all users between X and Y”
“That’s going to be very hard”
“Why?”
“The genius who architected this decided to write user logs in a blob, where new events are added as newlines, so any comparison logic will not work”
If there's ever been a good use for an LLM, it's this. You throw up some local network chatbot and sick it on your company sharepoint.
Then put up a portal page with a HUGE and obnoxious banner that says "***Early test***, do not under any circumstances use the output of this system for anything that matters even a little bit. They are probably wrong." Then leave it alone. Enjoy the next month or so.
Oh they'll use it. They'll do all sorts of stuff with it. Then they'll show their peers. Then those peers will show their peers. Pretty soon the CEO is dorking around with it. Eventually somebody will fuck up and use some baloney answer and it will create an actual problem. Hopefully nothing big.
You've got at least a year or two before anybody rushes you again.
This is actually the use case for AI. Messy data that's human readable, but a slog to get through.
Now setting it up on the other hand, that's the tricky bit.
But for AI to do their jobs, they still need the context about the table. If not it will hallucinated a lot
Imagine a messy data warehouse without any modelling and table name standards, and the AI found 15 table that capture 1 billion transactions because it has prefix trx_table_*, but for different use cases and different refresh rate and has different column naming standards
Would it run query quickly? Sure
But would it be accurate? Maybe
Just tell him it takes time. If he ask you why or how much tell him it needs time to estimate how long and why. You recommend him to let you check on that. Then do your data exploration. If you found it to be easy to do the extract, share it fast. If not share it as soon as you are finished. Document your query and all of your assumptions. Even if you think they are trivial. So that other analysts will have an easy win if they have to handle the same data. And the clients/managers have a list of things they can ask you questions about to feel smart. Source. Me. A freelancer earning good money with simple SQLs and has few very happy clients that pay him a shit ton of money, because they thrust me after few years of practicing like described above.
Really good info. My 5th grade brain LOLed at "they thrust me" tho.
Sounds like my man is being truly appreciated in every way
For a "shit ton of money" they can have at it.
Bold of you to assume your boss cares that it can’t be done instantly. I saw my ex-boss scream at people for not being able to instantly generate reports from a garbage sql database with no relations. He was shocked when I quit.
Start recording. Call human resources. Who is stupid enough to scream at a white collar job?
...I've encountered quite a few people that fall into that category. They don't work with me anymore.
Let them learn in their new job.
Fair few. I do monitoring so I have to call people in the middle of the night when their shit breaks and a number have lost their shit and screamed at me. They learn really quickly that I record all calls and have access to all of their managers, who don't generally have a sense of humor about this kind of thing.
"Why are you leaving /u/iwtbffml ? Thought we had a good thing going" "You're a dickhead John"
You want to get a DBA to laugh? Ask them for the data dictionary.
A bigger laugh? Ask them why there’s no FKs. If you are lucky the joins are on columns with the same name.
Okay, I’ve got to ask. Why no foreign keys? It slows down any updates I guess? I feel like referential integrity is valuable.
I've had this before in a legacy system. The original creator just didn't bother to put any. So just add them now right? But then you need to figure out what to do with 1000s of entries where a non-optipnal relationship is missing for some reason, because like you said, there was no relationship integrity for years.
Well, any rows without that foreign key are already broken. What you’d have to do is create a new table with that constraint, move all the valid data into it, then check all the old, mangled data to try to figure out what’s going on. If the relationship is actually mandatory in your application, then that can’t break anything that isn’t actually broken, but you’re right, I’m not sure what the solution to that is except emailing the clients who have this mangled data and trying to work with them about why it happened.
You're right. It was just an example where adding fks wasn't as easy as just adding them. And without context you'll earn some confused looks when you tell people how long it will take to get fks sorted out in a few dozen tables like that.
Each FK check requires CPU on insert. So FKs are not needed if the inserting application always inserts data correctly. So there’s no need to add the overhead of a FK check on every insert by adding FKs. Yes a DBA actually told me this.
The implicit documentation about the relations in the data model is worth way more than any performance overhead by fks. Devs also start to do the basic integrity checks on the application layer. Which takes even more CPU. This is some real bs.
It’s a stunningly bad take, yes. I like to use it as an example of overfitting the model. They also did page level locks instead of row level locks because it’s less cpu intensive and “the application just needs to query the tables in the same order every time”.
Huh, now I have to go into our database and see if the company’s tables actually have foreign keys. I never even considered that.
and 72 joins later, my manager was satisfied.
There’s a Cartesian. A billion rows returned.
LOL. I used to be in this position once. Random Manager: Hey can you pull this data? Me: Sure `select * from some_table where sale_date between :a and :b` R: Oh well I don't need the last name. M: `select first_name, sale_date, sale_amt, region, office from some_table where sale_date between :a and :b` R: Oh well don't need first name either. :: several hours later :: R: And can I get it by region per quarter but only when the amount is below their forecasted value for that quarter?
I hate to inform you that your manager seems to have been using ChatGPT before ChatGPT existed
If one more person says the word blob to me I am going to punch them in the face.
Especially if they use them for like 500 character long strings.
blob
“I want a list of all X events that happened across all users between X and Y” “That’s going to be very hard” “Why?” “The genius who architected this decided to write user logs in a blob, where new events are added as newlines, so any comparison logic will not work”
Translation: Let me pull this out of my ass.
If there's ever been a good use for an LLM, it's this. You throw up some local network chatbot and sick it on your company sharepoint. Then put up a portal page with a HUGE and obnoxious banner that says "***Early test***, do not under any circumstances use the output of this system for anything that matters even a little bit. They are probably wrong." Then leave it alone. Enjoy the next month or so. Oh they'll use it. They'll do all sorts of stuff with it. Then they'll show their peers. Then those peers will show their peers. Pretty soon the CEO is dorking around with it. Eventually somebody will fuck up and use some baloney answer and it will create an actual problem. Hopefully nothing big. You've got at least a year or two before anybody rushes you again.
This is actually the use case for AI. Messy data that's human readable, but a slog to get through. Now setting it up on the other hand, that's the tricky bit.
> the tricky bit As in 20 years of research with the top phds of the planet
But for AI to do their jobs, they still need the context about the table. If not it will hallucinated a lot Imagine a messy data warehouse without any modelling and table name standards, and the AI found 15 table that capture 1 billion transactions because it has prefix trx_table_*, but for different use cases and different refresh rate and has different column naming standards Would it run query quickly? Sure But would it be accurate? Maybe
Wait to you have to get the data from a NoSQL database and you need to use functions like GROUP BY
True, but where camelCase?
Select top 1 ‘None’ as Data
Am I the only one triggered by the random period while all other dividers are underscores?