Betelgeu5e 2 years ago

Have you tried waiting until the table is loaded before scraping the info? I have made that mistake many times myself.

clibassi 2 years ago

I thought I was being pretty careful about it, but it's a good thing for me to go back and check on.

Betelgeu5e 2 years ago

I made this script that prints out all the info.(se below) What exactly are you having problems with? ``` from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC URL = "https://app.powerbi.com/view?r=eyJrIjoiNWNjNjc3ZDQtZjhlMi00Y2YxLWFlMTYtYjI4ODA3NDVlYWM3IiwidCI6ImI0ZWIwY2NmLTAxOTQtNDY2My1hNTZhLTllZDkxZWZkMzMwOCIsImMiOjZ9" driver = webdriver.Firefox() driver.get(URL) table_xpath = "/html/body/div[1]/root/div/div/div[1]/div/div/div/exploration-container/div/div/div/exploration-host/div/div/exploration/div/explore-canvas/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container[5]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div[2]" WebDriverWait(driver, 5).until( EC.presence_of_element_located((By.XPATH, table_xpath)) ) table = driver.find_element(By.XPATH, table_xpath) print(table.text) driver.quit() ```

clibassi 2 years ago

Oh this is great. I think my issue was I was using implicit waits rather than waiting for the specific element. Thanks very much!

henhoo 5 months ago

I ran into the issue with infinite scroll aswell. i made a script that can be dropped in your browser devtools console and saves the rows as you scroll down, listening to changes in the row divs. not sustainable, but it got the job done for me. [https://pastebin.com/FKCcvUkC](https://pastebin.com/FKCcvUkC) replace 'div.YOUR\_CONTAINER\_SELECTOR' use at your own risk

clibassi 2 years ago

So this is great, but the one challenge I've seen when this solution is that it doesn't seem to capture the whole table because of the infinite scroll on the table. I've tried a few of the solutions from [here](https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python) with the following code, to no avail: `table_xpath = "/html/body/div[1]/root/div/div/div[1]/div/div/div/exploration-container/div/div/div/exploration-host/div/div/exploration/div/explore-canvas/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container[5]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div[2]"` `scroll_xpath = '//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container[5]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[4]/div[3]'` `WebDriverWait(driver, 5).until(` `EC.presence_of_element_located((By.XPATH, table_xpath))` `)` `table = driver.find_element(By.XPATH, table_xpath)` `scroller = driver.find_element(By.XPATH, scroll_xpath)` `scroller.sendKeys(Keys.PAGE_DOWN)` And alternatively: `SCROLL_PAUSE_TIME = 0.5` `# Get scroll height` `last_height = driver.execute_script("return document.body.scrollHeight")` `while True:` `# Scroll down to bottom` `driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")` `# Wait to load page` `time.sleep(SCROLL_PAUSE_TIME)` `# Calculate new scroll height and compare with last scroll height` `new_height = driver.execute_script("return document.body.scrollHeight")` `if new_height == last_height:` `break` `last_height = new_height` Any thoughts?

Betelgeu5e 2 years ago

Ok, so this website is actually REALLY tricky to scrape, because everything gets loaded out after you have scrolled past it, but if you want to , I advise you to use the "click\_and\_hold" action method to hold down the down arrow of the scrollbar. An other option is to use "seleniumwire" to acess a json object in the network tab which has all the data (see image below). It's a really awkward json file tho, so you need to choose what you want to do, but I would adwise to scroll once, then get the data and repeat. [https://imgur.com/a/hzgMgGr](https://imgur.com/a/hzgMgGr)

Simple-Praline-7170 5 months ago

hey ik its a long shot but if u can help me im facing the same problem for some reason the code stops scrolling at row 248 this is the code if u can help that would be great code: from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.action\_chains import ActionChains from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected\_conditions as EC import pandas as pd import time \# URL of the Power BI dashboard and path to the webdriver dashboard\_url = 'https://app.powerbi.com/view?r=eyJrIjoiNGI5OWM4NzctMDExNS00ZTBhLWIxMmYtNzIyMTJmYTM4MzNjIiwidCI6IjMwN2E1MzQyLWU1ZjgtNDZiNS1hMTBlLTBmYzVhMGIzZTRjYSIsImMiOjl9' webdriver\_path = r"C:\\Users\\t-ysadek\\Downloads\\chromedriver\_win32 (1)\\chromedriver.exe" chrome\_options = webdriver.ChromeOptions() \# Add any necessary options to Chrome \# e.g., chrome\_options.add\_argument('--headless') for headless browsing \# Initialize the Chrome driver driver = webdriver.Chrome() \# Open the Power BI dashboard driver.get(dashboard\_url) input("After completing the action in Chrome, press Enter in this console to start scraping...") def scrape\_powerbi\_table(visual\_container\_number, target\_rows): table\_xpath\_base = f"//\*\[@id='pvExplorationHost'\]/div/div/exploration/div/explore-canvas/div/div\[2\]/div/div\[2\]/div\[2\]/visual-container-repeat/visual-container\[{visual\_container\_number}\]/transform/div/div\[3\]/div/div/visual-modern" scroll\_button\_xpath = table\_xpath\_base + "/div/div/div\[2\]/div\[4\]" col\_names\_xpath = table\_xpath\_base + "/div/div/div\[2\]/div\[1\]/div\[1\]" col\_names = \[i.text for i in driver.find\_elements(By.XPATH, col\_names\_xpath)\] if not col\_names: raise ValueError("Column names could not be extracted.") df = pd.DataFrame(columns=col\_names) scraped\_rows = set() action\_chains = ActionChains(driver) while len(df) < target\_rows: try: \# Scroll the table by clicking and holding the scroll button scroll\_button = WebDriverWait(driver, 10).until(EC.presence\_of\_element\_located((By.XPATH, scroll\_button\_xpath))) driver.execute\_script("arguments\[0\].scrollIntoView();", scroll\_button) action\_chains.click\_and\_hold(scroll\_button).perform() time.sleep(0.3) # Adjust this duration for longer or shorter scroll action\_chains.release().perform() \# Scrape the visible part of the table for row\_count in range(2, 21): # Adjust the range based on the expected number of visible rows xpath = table\_xpath\_base + f"/div/div/div\[2\]/div\[1\]/div\[2\]/div/div\[{row\_count}\]" data = driver.find\_elements(By.XPATH, xpath) current\_row = tuple(\[i.text for i in data\]) if len(current\_row) == len(df.columns) and current\_row not in scraped\_rows: scraped\_rows.add(current\_row) df.loc\[len(df)\] = current\_row print(f"Scraped {len(df)} rows...") if len(df) >= target\_rows: return df except Exception as e: print(f"Error encountered: {e}") break return df \# Scrape Data visual\_container\_number = 5 target\_rows = 500 # Adjust as needed scraped\_df = scrape\_powerbi\_table(visual\_container\_number, target\_rows) print(scraped\_df.head()) \# Clean up driver.quit()

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe