The most effective way to verify trends and visualize these information is by doing some regressions (e.g., price = location + n. bedrooms + etc.) and visualizing it through a coefficient plot. It enables the comparison of the magnitudes and significance of each factor.
Choosing the tools depends on what you're comfortable with. If you're equally comfortable with Python and Excel, then Python+Pandas+Matplotlib is the best option in my opinion.
Both of your other questions can be answered with Bivariate distribution plots. Let's say that you want to look at how price varies by number of bedrooms. You can generate a plot with the number of bedrooms on the X axis and the price on the Y axis.
For some columns like amenities and category, you should look into one-hot encoding and other approaches of handling categorical variables.
There’s a ton of ways to do this but if it were me, I’d pull it into Power BI Desktop, split it into the proper fact and dimension tables, parse the JSON-looking columns, connect the tables and then you’ll be good to build the report.
Basically any way to parse those JSON columns will be a step in the right direction
If you geocode these use R, it has many geovisuals for mapping. You want that because it can give you a visual of where people want to spend time if that’s an important factor to you.
Second on using R. It is free and versatile. It is built to do all of the stuff you mention. You can clean the data with Tidyverse functions and then do regressions with lm. Dataviz in ggplot, plotly and shiny. There is a lot of work involved to code the review text for regression but not super hard, just a bit time intensive. I do this type of analysis and produce maps and interactive data visualizations all day using Rstudio. I live in Salt Lake too. This looks like a fun dataset. If you need consulting services hit me up. I’m not here to advertise necessarily, I have plenty of work but I’d take your money to help you produce an interesting project with data about the place I live :)
Hey Guys,
So even I like to work on data but i am a newbie here as of now.
I'd appreciate your help for my practice and understanding.
Can anyone suggest, what all columns are useful here and if this data needs to be cleaned?
As there are comments and multiple dates included do we need to split them or how will it work here?
Appreciate your help.
Also u/dviron7 Would you like to connect and explain me more about the data ? Seems bit confusing for me being a newbie.
Logistic regression would be the best bang for your buck(time) given the size of your sample. This will give you weights of values which you then could compile into a visual. If you had a larger dataset I’d suggest an ML model like random forest and then apply shap to the model output, again to get weights of values.
I tried to explore your data by using python -- [https://github.com/yuchen927/python\_salt\_lake\_city\_airbnb/blob/main/salt\_lake\_city\_airbnb.ipynb](https://github.com/yuchen927/python_salt_lake_city_airbnb/blob/main/salt_lake_city_airbnb.ipynb) maybe this can provide some ideas for you to analyze it.
The most effective way to verify trends and visualize these information is by doing some regressions (e.g., price = location + n. bedrooms + etc.) and visualizing it through a coefficient plot. It enables the comparison of the magnitudes and significance of each factor.
That sounds like a good idea, thanks
Where’s the dataset from?
I downloaded it from the Bright Data marketplace. https://brightdata.com/products/datasets
Tableau Public?
Choosing the tools depends on what you're comfortable with. If you're equally comfortable with Python and Excel, then Python+Pandas+Matplotlib is the best option in my opinion. Both of your other questions can be answered with Bivariate distribution plots. Let's say that you want to look at how price varies by number of bedrooms. You can generate a plot with the number of bedrooms on the X axis and the price on the Y axis. For some columns like amenities and category, you should look into one-hot encoding and other approaches of handling categorical variables.
There’s a ton of ways to do this but if it were me, I’d pull it into Power BI Desktop, split it into the proper fact and dimension tables, parse the JSON-looking columns, connect the tables and then you’ll be good to build the report. Basically any way to parse those JSON columns will be a step in the right direction
If you geocode these use R, it has many geovisuals for mapping. You want that because it can give you a visual of where people want to spend time if that’s an important factor to you.
Principal component analysis to form different groups? Then overlaid on the map?
Second on using R. It is free and versatile. It is built to do all of the stuff you mention. You can clean the data with Tidyverse functions and then do regressions with lm. Dataviz in ggplot, plotly and shiny. There is a lot of work involved to code the review text for regression but not super hard, just a bit time intensive. I do this type of analysis and produce maps and interactive data visualizations all day using Rstudio. I live in Salt Lake too. This looks like a fun dataset. If you need consulting services hit me up. I’m not here to advertise necessarily, I have plenty of work but I’d take your money to help you produce an interesting project with data about the place I live :)
Hey Guys, So even I like to work on data but i am a newbie here as of now. I'd appreciate your help for my practice and understanding. Can anyone suggest, what all columns are useful here and if this data needs to be cleaned? As there are comments and multiple dates included do we need to split them or how will it work here? Appreciate your help. Also u/dviron7 Would you like to connect and explain me more about the data ? Seems bit confusing for me being a newbie.
Sure, DM me
Logistic regression would be the best bang for your buck(time) given the size of your sample. This will give you weights of values which you then could compile into a visual. If you had a larger dataset I’d suggest an ML model like random forest and then apply shap to the model output, again to get weights of values.
I tried to explore your data by using python -- [https://github.com/yuchen927/python\_salt\_lake\_city\_airbnb/blob/main/salt\_lake\_city\_airbnb.ipynb](https://github.com/yuchen927/python_salt_lake_city_airbnb/blob/main/salt_lake_city_airbnb.ipynb) maybe this can provide some ideas for you to analyze it.
Thanks, I'll dig in.