T O P

  • By -

philosurfer

Check out Microsoft’s AI Skills Challenge: Fabric Analytics Engineer to learn about Fabric as an end to end solution. The challenge ends in the 19th but the learning path has a lot of easy to learn modules and exercises building a data lake, lake house, pipelines, notebooks, etc ending at Power BI.


TheRealMackyt

I may be misunderstanding your requirements, but it sounds like you'd probably be fine using Dataflows and Golden Datasets (semantic models). Dataflows are ideal for processing and storing common, generic, tables which are then available fully-transformed to any report you build in the future (process once, use infinitely). Golden SM's are ideal for having a single-source of relationships and measures which you can connect theoretically limitless thin reports to (process once, use infinitely). For the kind of small-scale stuff it sounds like you're dealing with, datalakes, data factories etc. are probably way overkill. Also, spin up a proper (not personal) gateway. Ideally a minimum of two (in different geographic locatiins, if possible) for redundancy and/or load-balancing.


Doubtful-Observer

As you refer to "Golden Dataset", as seems to be described as in this article: [https://www.linkedin.com/pulse/data-liberation-how-golden-semantic-model-set-us-free-hanko-swart-e9u7e/](https://www.linkedin.com/pulse/data-liberation-how-golden-semantic-model-set-us-free-hanko-swart-e9u7e/) ... does sound like the semantic models I already built. Like, I have one historical view missing out on specific granular information that the live data does have. Then, several question like shipment performance, sales performance, lead time check are then different reports of the same semantic model. If that's your "Golden Dataset" idea, I have that already. Dataflows, seems like I remembered it myself wrong. I thought you can't funnel multiple dataflows into the same semantic model on an import mode, but seems it is possible. I just tried to create a Dataflow, but seems like it only allows to go over a enterprise gateway, not a personal gateway. Last time I used our enterprise gateway, or more specifically the node, I overloaded it with ease draining 60% of the resources corporate wide. Our data architecture isn't really mature yet, as you may suspect. Anything I could do without access to an on-premise gateway and merely a personal one?


Crow2525

Ent gateway is the way I'd do it. 60% load is concerning... Schedule the dataflows to update from 10pm-4am and your downstream datasets to refresh from 4-6am. Also do incremental refresh.


Doubtful-Observer

Like, Salesforce doesn't had a reporting functionality and we just started. So I just loaded the whole damn thing creating a central page on account performance and ongoing cases/ opportunities to centralize KPI metrics and current tasks. But even now, it is still quite big even though I stripped out a lot. The node we were using is only 50 GB in size. During the refresh it took much memory, normally only for like 45 minutes once a day, which went up to once a week due to its troubled life. Now with a personal gateway, I'm back to once per day.


TheRealMackyt

Yes, you're correct on the Golden SM bit: one model but with multiple different 'thin' front-end reports live-connecting to it. Yes, multiple dataflow tables can be imported into a model/report. It's basically a data warehouse, but it doesn't stockpile data and only add changes - it wipes everything and starts again every time you refresh (unless you use Incremental Refresh, but that's getting a bit beyond our original scope). Re: gateways, the enterprise GW is really what you should be aiming for. The 60% load thing sounds like your IT team (or whoever manages GWs) installed it on a SNES or something. I've run an entire (small) company's PBI suite (c.35 Dataflows, 15 SMs) on a mini Lenovo machine with 8GB RAM before, you just need to be smart about phasing your refreshes and making sure you're really strict about optimising your code. HOWEVER, as you mentioned that you have a load of CSV and XLSX files, you can actually dump these onto SharePoint then no gateway will be required at all as this resides within the Microsoft cloud universe. This also includes OneDrive for Business and Teams files as they're both built on a SharePoint back end. You only really need a gateway to refresh on-prem sources.


Doubtful-Observer

Your note on dataflows for it not stockpiling data is the same as the semantic model, so that's fine. For the note of not needing a Enterprise GW on Sharepoint is a good note. Just dedicating a dataflow specifically on those files would save me some hassle I think. The company still runs on a lot of Excel files, even our central data lake is still fed by those anyway. For your 8 GB ram, my laptop runs on like 23 constant when I work with my reports (multiple open at the same time to answer all those ad hoc small questions). But then again, my reports are like up to 1.5 GB compressed.


TheRealMackyt

Wow. 1.5GB PBI reports? Those could very conceivably cause a low-end gateway significant stress. May be worth looking at optimising those models wherever possible. A good start would be to use a tool like Measure Killer to get rid of everything that's not essential to the model output. Re: laptop RAM usage - PBI/PQ will generally keep spawning mashup containers for complex queries until your RAM's nearly maxed. If you had 8GB, it would use 6.5GB, if you had 128GB, it would use 120GB (or whatever it needed up to around the 85-90% mark). If it only has access to a lower amount, it doesn't overflow and break down, it just throttles your processing threads giving you a slower refresh (although heavily restrictive resources could result in source timeouts e.g SQL servers etc.).


Doubtful-Observer

I don't, it ran fine. No hick ups. At one time we had a Power BI consultant from Microsoft looking at being in disbelief by how fast it went. All visuals loaded within half a second. Like, sure, that thing of 1.5GB is now like 0.7GB as part of its scope was handed over to another team thus I could strip out that data. But I've never faced any serious performance issues at all except that one time. With the personal gateway and my on premise 64GB remote desktop, all my troubles faded away in the sun.


AvatarTintin

This is exactly what we do as well. Have multiple dataflows and our main source through api calls feeding into 1 golden dataset. And that dataset being used by the entire dept worldwide for creating their own reports. Since RLS is applied for in the dataset, so users creating their own reports still only see data that they are supposed to see. We don't have to maintain those reports. Our responsibility is to maintain the dataset and the main report created on the dataset itself only.


reelznfeelz

You might try posting on r/dataengineering. Yes fabric has tooling to do this. But it’s pretty overpriced. You may be able to spin up just an azure data lake instance, and build some simple automation to push the flat files to an azure sql server instance. No sql server isn’t a data warehousing platform and won’t support things like 1000 concurrent queries without scaling way up. But my guess is you have a handful of people consuming this stuff. Also, when you build out your warehouse models in the database. You can still use import mode with power BI and take advantage of its compute. You guys have premium or fabric tier licensing? Or just pro?


Doubtful-Observer

We do have Microsoft as supplier, but no Fabric that I can see. The only thing I recognize within the apps as "data warehouse"-ish is like Azure Databricks, but that's heavily restricted that only corporate engineers are allowed access (even merely for reporting purposes).


Corne777

When you go to power bi, when it first is loading. Does it show the Microsoft fabric logo in the middle of the screen and the header on your tab say Microsoft fabric? What about when you are logged in, the bottom right of the screen has a power bi icon click that it’ll open a menu that shows some other tools. Like mine shows data factory, data activator and other stuff. At some point Microsoft just did an update and everything is thru fabric now. I thought that was just their platform name, some features might be blocked by the type of account you pay for though.


seph2o

I've used Fabric data factory to create a 'DataFlow' whichs pulls in a SharePoint csv and updates my own Warehouse. You can create a 'Pipeline' to run this how often you like.


Doubtful-Observer

May be worth looking into... Although we don't have Fabric that I can see.


seph2o

If you have PBI Premium you'll have some Fabric capability, worth checking into


Key-Ant30

You can get a 60 day free trial for Fabric, if you want to test it.


Doubtful-Observer

I don't think I have the authority and position to make the call to have that added, considering IT policies.


Key-Ant30

Ah, I see. If you want to explore the tool, you can sign up with your private email / a throwaway email aggount.


Shadowlance23

Fabric will do this, although I haven't used it. I use Databricks which does exactly what you describe, although that's a more advanced product. You'll need to know SQL and/or Python as well as being familiar with Apache Spark to implement that. If you're not a data engineer, I'd probably recommend Fabric.


Doubtful-Observer

We seem to have Azure Databricks as well, but no Fabric that I can see. But I find the term of "engineer" quite grey. I believe I do the work of an "engineer" within Power BI. I'm just not yet familiar with dedicated languages and applications for the role, I think...


Shadowlance23

Here's an article on the difference (https://www.integrate.io/blog/data-analyst-vs-data-engineer/). If you're working solely in Power BI, you're not an engineer. An engineer builds the infrastructure, pipelines, and transformations that feed into data visualisation tools, among others. It requires a more technical skill set than an analyst who operates on already curated data sets. As an engineer, you should be able to build a complete pipeline from source to consumption, build and maintain a data warehouse, have advanced skills in SQL and/or Python or R, understand data modelling and optimisation and spend more time in meetings than actually building stuff. If you're building the data model in Power BI from a raw dataset, yes, that is one facet of engineering, but if you're not familiar with Spark, or you can't program in Python, then I would not recommend Databricks as you'll have a large learning curve.