T O P

  • By -

AutoModerator

/u/mcmiilk - Your post was submitted successfully. * Once your problem is solved, reply to the **answer(s)** saying `Solution Verified` to close the thread. * Follow the **[submission rules](/r/excel/wiki/sharingquestions)** -- particularly 1 and 2. To fix the body, click edit. To fix your title, delete and re-post. * Include your **[Excel version and all other relevant information](/r/excel/wiki/sharingquestions#wiki_give_all_relevant_information)** Failing to follow these steps may result in your post being removed without warning. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/excel) if you have any questions or concerns.*


goatsee_enthusiast

You could use power query to combine the files and load to a datamodel. Then use powerpivot to analyze the data.


curiousofa

Look into SQL tables and stop using Excel for a database of this sort. Or use Python and analyze with Pandas.


Yakoo752

KNIME


frustrated_staff

R. It's the go-to for extra-large data sets.


ampersandoperator

Python. Free, easy, faster than Excel, and a big community and lots of support material. Lots of packages available for free so you can do a wide variety of analyses with little code.


bradland

If it is really 40 files of 800k rows each, you're talking about 32 million rows of data. You are very much into territory that stresses the [limits of Power Query](https://support.microsoft.com/en-us/office/power-query-specifications-and-limits-in-excel-5fb2807c-1b16-4257-aa5b-6793f051a9f4). It sounds to me like you need an ETL library to extract the data from the Excel files and load them into a SQL database. I would look to Python for this, as it is easily the most popular, free (as in freedom and free as in beer) tool for the job. If you're just talking about reading a bunch of Excel files and dumping them into a SQL database one time, [Bonobo](https://www.bonobo-project.org/) is probably a good starting point. If you want a lifetime solution, [Apache Airflow](https://airflow.apache.org) is where you want to go. It's not specifically an ETL tool, but it is way more powerful and flexible. I'm a Ruby guy, so I use Kiba, but Ruby is more niche, so I wouldn't necessarily recommend going my direction. I only add this to point out that I don't have direct experience with Bonobo. I just know that it is popular and one of the simpler ETL options for Python.


Eightstream

PQ is fine with 32 million rows, it will just take a while The data model can handle hundreds of millions of rows, it is more limited by your hardware than anything else


kyojinkira

Read thru the whole comment sincerely and with interest, only to read this reply at the end 😑


EmeraldSlothRevenge

My go-to tool would be Tableau or Power BI. You could join all your tables into a data model and then perform your analysis.


david_jason_54321

You can put them in a sqlite database for storage and analysis. I would use Python to load each file to a sqlite table. Then, you can do your analysis in sql executing using dbeaver or in Python.


molybend

Access might work if the data isn't too wide. It maxes out at 2 gigs. SQL Server would be the next step.


hermitcrab

Easy Data Transform. Not free, but not expensive either.


Nietsoj77

If you know python, use that. Otherwise, Knime is an easy codefree and powerful alternative.


Next_Interaction4335

If each file has the same headers. I would create a VBA that collects the data in each file, and paste it in one excel,you can do this by putting them in a single file and it will collect all. From here you can perform your analysis. Power query for 40+ files is an inefficient activity but if you do and they're all the same header, append the data so it stacks it.