
Whether you are searching for the best tacos in the city or trying to avoid a vacation haircut disaster, Google reviews have basically become the universal guidebook for public opinion. Millions of people rely on them every day to check everything from coffee quality to customer service. For anyone who knows how to extract that information, scraping these reviews can unlock powerful business insights.
What is Google review scraping?
Web scraping is the automated collection of information from websites, similar to copying and pasting on a large scale. When you scrape Google reviews, you gather details such as ratings, customer comments, and reviewer profiles directly from business listings. This information can then be analyzed to uncover important patterns and insights.
Google reviews show how customers feel in real time, making them extremely valuable for developers building sentiment analysis tools, monitoring brand reputation, or conducting market research. From uncovering recurring complaints to understanding what customers appreciate most, data can reveal what truly influences customer decisions.
Different ways to scrape Google reviews
There are multiple approaches to extracting Google review data, and each comes with its own strengths. Some methods are simple and structured, while others take more experimentation. Below are the four main approaches you can use. These include the official Google Places API, manual collection, third-party scraping APIs, and automated scraping using Python.
Google Places API (official option)
The Google Places API provides the most stable and clean approach to working with review data. You can search for a business by name and address to get a place ID, and then use that ID to retrieve structured information such as the business name, overall rating, and a small set of user reviews in JSON format. This makes it ideal for dashboards, small applications, and any project that requires reliable, compliant data access.
The limitation is that the API returns up to 5 reviews per location, and your usage will be tracked and billed based on the number of requests you make. These reviews are also pre-sorted by Google, so you may only see a specific type of feedback rather than a balanced sample.
Use this option when you need accuracy and official data over depth or volume.
Manual collection
Manual scraping involves opening the Google Maps listing, navigating to the review section, and copying the required details yourself. You can do it completely by hand or with basic browser tools such as Chrome DevTools. It is slow and not suitable for large projects, but for one-time tasks, it works perfectly fine.
This approach is helpful when you only need data from a single business, or when testing an idea before building something more complex.
Scraping APIs
Scraping APIs offer an easier and more powerful alternative by handling the difficult parts of the process for you. They handle HTML parsing, request routing, and CAPTCHA protection.
For example, Decodo provides a Web Scraping API with a Google Maps Scraper that collects place information, ratings, and other key details without needing to build a scraper from scratch.
Choose this method when you want reliability, speed, and scale without dealing with technical challenges. It is ideal for collecting large datasets across multiple businesses.
Automated scraping with Python
Using Python gives you full freedom to create a tailored scraper that meets your exact requirements. Libraries such as Selenium or Playwright allow your script to behave like a real user, interact with the page, scroll through reviews, load dynamic content, and avoid common blocking issues.
This method is ideal when you need to gather a significant amount of review data across many businesses or locations. It offers maximum flexibility and scalability, though it does take some setup effort. The advantage is that you can control every part of the scraping process.
In the rest of the guide, you will learn how to build a working scraper step by step, starting from a beginner-friendly setup to a complete automated solution.
Tools for scraping Google reviews with Python
Before you build your own review scraper, make sure you have the following items ready:
Python
Install the latest Python version on your system, as all scripts and examples will be written in this language.
Playwright
This automation framework is essential for running a headless browser, imitating real user actions, and loading content that appears only after interaction.
Beautiful Soup
A popular Python library used for pulling information from HTML and XML pages. It helps you navigate and extract the specific parts of the document you need.
Proxies
When collecting a large volume of Google review data, you can run into blocks or rate limits. A dependable proxy service lets you rotate IP addresses and avoid detection.
IDE
Use an integrated development environment such as Visual Studio Code. It makes writing scripts, running commands, and debugging much smoother.
A web browser
If you are reading this online, you already have one. In practice, something like Google Chrome is extremely helpful because its DevTools make it easy to inspect page elements and understand the structure of the content you want to scrape.
Setting up
Start by preparing a workspace for your project. Once Python is installed on your system, move through the steps below:
Create a project folder
Make a new directory in an easy-to-reach location. This will hold all the files for your scraper. You can also set up a virtual environment here if you want to keep dependencies organized.
Install the necessary libraries.
Open your terminal and run the installation command to add Beautiful Soup and Playwright to your setup.
pip install playwright beautifulsoup4
Install the required browsers.
Next, download the browser engines that Playwright depends on. These include Chromium, Firefox, and WebKit. Playwright uses these binaries for browser automation, and they are not bundled with the library’s initial installation.
python -m playwright install
Get your proxies ready.
You will need to integrate proxies into your script, so make sure you have your login details and endpoint information on hand. You can quickly find everything you need inside the Decodo dashboard.
Run a simple test script.
Create a small sample file to confirm that Playwright, Beautiful Soup, and your proxy setup are all functioning correctly. Use the sample script below to confirm that everything has been installed and configured correctly.
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoupdef test_proxy_with_playwright():
proxy_config = {
“server”: “http://gate.decodo.com:7000”, # Proxy host and port
“username”: “user”, # Your proxy username
“password”: “pass” # Your proxy password
}with sync_playwright() as p:
browser = p.chromium.launch(
headless=False,
proxy=proxy_config
)page = browser.new_page()
try:
# Visit Decodo IP info page
page.goto(“https://ip.decodo.com/”)
page.wait_for_selector(“.item-value”, timeout=10000)content = page.content()
soup = BeautifulSoup(content, “html.parser”)# Find all elements with class ‘item-value’
items = soup.find_all(“p”, class_=”item-value”)ip = items[0].text.strip()
country = items[1].text.strip()
print(“IP Address:”, ip)
print(“Country:”, country)except Exception as e:
print(“Proxy test failed:”, e)finally:
browser.close()if __name__ == “__main__”:
test_proxy_with_playwright()
Run the test script:
To execute your test file, open your terminal and run the following command, replacing the file name with whatever you used:
python file_name.py
Step-by-step guide to scraping Google reviews
After you finish preparing your environment and confirm that everything works as expected, you can move on to creating your Google reviews scraper.
Identify the target address.
The first obstacle you will face is that there is no single, simple public address where you can type a business name and instantly get reviews in a clean, easy-to-scrape layout. Google makes this process deliberately complicated to discourage automated extraction. Even so, you still have a couple of practical approaches you can rely on.
Search address trick
One of the primary places where reviews appear is Google Maps. The problem is that each business page has a long, complex address that you cannot guess in advance unless you already have a prepared list of links.
User browsing emulation
The approach described earlier might be the simplest, but it is not always reliable. In many cases, you will want something more flexible, for example, discovering all businesses in a given area and collecting reviews for each of them.
To do this, you need to move through Google Maps the same way a real visitor would. The good news is that you do not have to click around by hand. A tool like Playwright can simulate this behavior for you. You will create a script that navigates to the main Google Maps page, enters a search term into the search field, opens each business result, locates the reviews section, and extracts the information you need.
Go to the main page
Begin with the most basic task: visiting a specific address with Playwright. When you start a new browser session, Google often displays a message asking you to accept or reject cookies, which can block access to the content you want to view. Using proxies usually helps avoid this situation, but you should still be prepared to handle the prompt when it appears.
Below is a simple Playwright script that visits https://google.com/maps, Checks whether the cookie dialog is visible, accepts it if needed, and then returns the raw HTML from the target address.
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoupdef test_proxy_with_playwright():
proxy_config = {
“server”: “http://gate.decodo.com:7000”, # Proxy host and port
“username”: “user”, # Your proxy username
“password”: “pass” # Your proxy password
}with sync_playwright() as p:
browser = p.chromium.launch(
headless=False,
proxy=proxy_config
)page = browser.new_page()
try:
# Navigate to Google Maps
page.goto(“https://www.google.com/maps”, timeout=60000)# Accept cookies if prompted
try:
page.locator(‘div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d’).click(timeout=5000)
page.wait_for_timeout(2000)
except:
pass # Cookie popup not shown# Wait for additional content to load
page.wait_for_timeout(5000)# Get and return raw HTML
html = page.content()
print(html[:1000]) # Preview first 1000 charactersreturn html
except Exception as e:
print(“Failed to retrieve page:”, e)finally:
browser.close()if __name__ == “__main__”:
test_proxy_with_playwright()
The script itself is quite straightforward. The real difficulty lies in identifying the correct Accept all button on the page. You need to locate the correct class name, then use a locator to target the specific button element for clicking. Chrome DevTools includes a useful feature called Recorder that can help here. You can record pressing the button, save the recording, export sample code, and use it to identify the correct selectors. The example script already uses a selector, but the page layout can change over time, so it is important to verify that the selector still works.
Get a list of locations.
The next step is to use the search bar to find the places whose reviews you want to collect. Before doing that, consider two key aspects: the location from which your requests are sent and how the search phrase will influence the results.
When we refer to location here, we are not talking about where you are physically, but about the region of the proxy that sends the requests. For instance, if your proxy is located in France and you search for “Starbucks,” Google Maps will show local Starbucks branches in that region, such as branches in Paris and other parts of the country.
The good news is that you can control your proxy region directly inside the Decodo dashboard. You can choose a country, city, state or even a specific postal code in the United States. Decodo then generates an endpoint tied to that region. This lets you retrieve results that are tailored to a particular country, city or neighborhood.
The wording of the search phrase also has a strong effect on what you see. Even when you pick a specific proxy region, if your query includes the name of another country or city, the results will reflect that. For example, if you use proxies located in the United Kingdom and search for “Starbucks Poland,” the results will come from Poland.
With these points in mind, you can perform your first search. Just like with the cookie prompt, you will need to locate the search field using selectors, click into it, and type your query. After that, press Enter to start the search. Finally, loop through the first five results and extract the business name and address from each. Next, you will see the code that performs this process.
from playwright.sync_api import sync_playwright
def test_proxy_with_playwright():
proxy_config = {
“server”: “http://gate.decodo.com:7000”, # Proxy host and port
“username”: “user”, # Your proxy username
“password”: “pass” # Your proxy password
}with sync_playwright() as p:
browser = p.chromium.launch(
headless=False,
proxy=proxy_config
)
page = browser.new_page()try:
page.goto(“https://www.google.com/maps”, timeout=10000)# Accept cookies
try:
page.locator(‘div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d’).click(timeout=5000)
page.wait_for_timeout(2000)
except:
pass# Search for query
search_bar = page.locator(‘#searchboxinput’)
search_bar.click(timeout=5000)
search_bar.fill(“Starbucks London”)
search_bar.press(“Enter”)page.wait_for_timeout(7000) # Wait for search results
# Grab all result cards
result_cards = page.locator(‘a.hfpxzc[aria-label]’)for i in range(min(5, result_cards.count())):
try:
result = result_cards.nth(i) # Get the search result (business card)
name = result.get_attribute(“aria-label”) # Extract the business name from the aria-label attribute# Go down to the container holding all business info
parent = result.locator(“xpath=ancestor::div[contains(@class, ‘Nv2PK’)]”)# Find all W4Efsd blocks (these contain all the needed info)
address_blocks = parent.locator(“div.W4Efsd”)# Get the 3rd W4Efsd block, which typically contains the address
full_text = address_blocks.nth(2).inner_text(timeout=5000)# Split the text by the “·” symbol and keep only the last part (the address)
parts = [part.strip() for part in full_text.split(“·”)]
address = parts[-1] if parts else full_text.strip()# Print name and cleaned address
print(f”{i + 1}. {name} – {address}”)except Exception as e:
print(f”{i + 1}. Failed to extract info: {e}”)except Exception as e:
print(“Failed to retrieve page:”, e)finally:
browser.close()if __name__ == “__main__”:
test_proxy_with_playwright()
The script begins by opening the Google Maps page and looking for a cookie window; it accepts it whenever it appears, so nothing blocks the view. Once the page settles, it locates the search box identified by the searchboxinput element and enters the phrase “Starbucks London” into it. After pressing Enter to launch the search, the script waits for the results to appear, then reads the first few entries. It prints each one on a single line, including the place name and its cleaned address.
Scrape the reviews
Now you can finally collect what you are after: the reviews themselves. If you have followed the tutorial so far, you already know the main ideas. You will repeat the same general steps as before: wait for the content to load, click the right elements, and locate the page elements that contain the data you want.
In this example, you will click the first result in the search results list, navigate to the Reviews section, and capture the overall rating along with the total number of reviews. After that, you will gather the first twenty individual reviews. The earlier part of the code that printed the first five locations is removed here.
You need one extra piece: scrolling. Only a small group of reviews is visible at the start, so you must scroll through the list to trigger the loading of more entries. The good news is that Playwright lets you scroll within a specific element, making it easy to load as many reviews as needed. The script also includes a few small adjustments to improve its reliability, which will be explained shortly.
from playwright.sync_api import sync_playwright
import re
from hashlib import sha256def run_google_maps_review_scraper():
proxy_settings = {
“server”: “http://gate.decodo.com:7000”,
“username”: “user”,
“password”: “pass”
}with sync_playwright() as playwright_instance:
# Open Chromium with proxy rules
browser = playwright_instance.chromium.launch(
headless=False,
proxy=proxy_settings
)# Use a fresh browser context with fixed size and language
context = browser.new_context(
viewport={“width”: 1280, “height”: 1280},
locale=”en-US”,
extra_http_headers={“Accept-Language”: “en-US,en;q=0.9”}
)
page = context.new_page()query_text = “starbucks london”
max_reviews = 20try:
# Go to Google Maps
page.goto(“https://www.google.com/maps?hl=en”)# Try to close possible consent or cookie banner
try:
page.locator(‘div.VtwTSb > form:nth-of-type(2) span.UywwFc-vQzf8d’).click(timeout=5000)
page.wait_for_timeout(2000)
except Exception:
pass# Fill search field and start lookup
search_box = page.locator(“#searchboxinput”)
search_box.click(timeout=5000)
search_box.fill(query_text)
search_box.press(“Enter”)
page.wait_for_timeout(5000)# Open first place in the result list
try:
page.locator(‘a.hfpxzc[aria-label]’).first.click()
page.wait_for_timeout(5000)
except Exception:
pass# Read business name
title = page.locator(“h1.DUwDvf.lfPIob”).inner_text(timeout=5000)# Open the reviews panel
page.locator(‘button.hh2c6[aria-label*=”Reviews for”]’).click(timeout=5000)
page.wait_for_timeout(5000)# Read overall score and total review amount
star_score = page.locator(“div.jANrlb div.fontDisplayLarge”).inner_text(timeout=5000)
reviews_text = page.locator(“div.jANrlb div.fontBodySmall”).inner_text(timeout=5000)
total_reviews = int(re.sub(r”\D”, “”, reviews_text))print(f”Business Title: {title}”)
print(f”Star Rating: {star_score}”)
print(f”Total Reviews: {total_reviews}”)
print(“=” * 32)# Do not request more reviews than there are
if max_reviews > total_reviews:
print(f”Requested {max_reviews} reviews, but only {total_reviews} available.”)
max_reviews = total_reviewscollected_reviews = []
seen_keys = set()# Scroll through the list until enough reviews are gathered
while len(collected_reviews) < max_reviews:
review_blocks = page.locator(“div.jJc9Ad”).all()
found_new = Falsefor block in review_blocks:
if len(collected_reviews) >= max_reviews:
breaktry:
# Expand long text if button is present
more_btns = block.locator(“button.w8nwRe.kyuRq”)
if more_btns.count() > 0:
more_btns.nth(0).click(timeout=2000)
page.wait_for_timeout(300)# Person who wrote the review
author_name = block.locator(“div.d4r55”).inner_text(timeout=5000)# Read rating from main or alternate pattern
rating_node = block.locator(“span.kvMYJc”)
if rating_node.count() > 0:
rating_label = rating_node.first.get_attribute(“aria-label”)
rating_value = re.sub(r”\D”, “”, rating_label) if rating_label else “N/A”
else:
alt_node = block.locator(“span.fzvQIb”)
if alt_node.count() > 0:
rating_text = alt_node.first.inner_text()
match = re.search(r”(\d+(?:\.\d+)?)/5″, rating_text)
rating_value = match.group(1) if match else “N/A”
else:
rating_value = “N/A”# Main review text, or fallback message
text_node = block.locator(“span.wiI7pd”)
if text_node.count() > 0:
review_body = text_node.inner_text(timeout=5000)
else:
review_body = “No review comment”# Build a unique tag to avoid duplicates
review_id_attr = block.get_attribute(“data-review-id”)
unique_key = review_id_attr or sha256(f”{author_name}-{review_body}”.encode()).hexdigest()if unique_key in seen_keys:
continuecollected_reviews.append((author_name, rating_value, review_body))
seen_keys.add(unique_key)
found_new = Trueexcept Exception as extract_error:
print(f”Failed to extract review info: {extract_error}”)if not found_new:
print(“No new unique reviews found, finishing.”)
break# Scroll inside the review container to reveal more entries
try:
scroll_container = page.locator(“div.m6QErb.DxyBCb.kA9KIf.dS8AEf.XiKgde”).nth(2)
scroll_element = scroll_container.element_handle()
page.evaluate(“(element) => element.scrollTop = element.scrollHeight”, scroll_element)
page.wait_for_timeout(1500)
except Exception:
print(“Scrolling did not work, stopping.”)
break# Show summary
print(f”\nCollected {len(collected_reviews)} reviews for {title}:”)
for index, (author_name, rating_value, review_body) in enumerate(collected_reviews, 1):
print(f”{index}. {author_name}, {rating_value}/5, ‘{review_body}'”)except Exception as main_error:
print(“Error during script execution:”, main_error)finally:
browser.close()if __name__ == “__main__”:
run
Here is a clear summary of the updates made in the script:
Imports
Two new modules were added: re and hashlib. The re module is used to cleanly extract numbers such as ratings, while hashlib.sha256 creates a unique signature for each review so repeated entries are not collected twice.
Browser context
The browser context now includes additional settings. A fixed viewport ensures the page layout stays consistent and allows reviews to load properly once a place is opened. You could use scrolling here instead, but this approach has proven more reliable. The locale and language headers were also added to make the browser request look real and to keep the interface consistent.
Page address
A language parameter (?hl=en) was added to the Google Maps link. This ensures the interface remains in English regardless of which proxy region you use. That is important because the placement and wording of the Reviews button vary in other languages.
Search and navigation
The search process has been expanded. After entering the query, the script now automatically clicks the first result, taking you directly to the page where the reviews are displayed.
Opening the review section
Once the business page loads, the script locates the Reviews button using its aria label, which is the most dependable way to target it. This is another reason the language was forced to English earlier.
Review summary
Before collecting individual comments, the script pulls essential information such as the overall rating and the total number of reviews. These values are helpful for creating summaries or dashboards.
Review the extraction loop.
This is the core of the scraper. The script steps through each review block, opens long comments by pressing the More button, and collects the reviewer’s name, score, and full text.
Duplicate control
Sometimes reviews repeat while scrolling. To avoid duplicating entries, the script checks for a built-in review ID. If none exists, it generates a SHA-256 signature of the author and the text to ensure the data remains unique.
Scrolling logic
Since reviews only load when the list is scrolled, the script scrolls inside the container to reveal new entries until it reaches the required number or no more fresh content appears.
Output formatting
Instead of printing only basic business info, the script now lists each review with the reviewer’s name, rating, and comment, giving a complete picture.
Error control
Additional try-except blocks keep the scraper stable. If a single review fails to load, it does not halt the entire process, which is essential when collecting large volumes of data.
Closing the browser
All cleanup happens inside a finally block. This ensures the browser always shuts down properly, even if something unexpected happens; it’s good practice for any automation script.
Storing and Analyzing the Data
To complete your scraper and save everything in a CSV file, you can include a small export step before the final printout. This allows you to work with the data later in spreadsheets, dashboards, or other analysis tools.
- Start by importing the CSV library at the top of your script:
import csv
- Insert the following block right before the line that is print(f”\nCollected {len(reviews)} reviews for {title}:”)
This section will create a CSV file and write every review into it:
# Save results in a CSV file
with open(“google_reviews.csv”, “w”, newline=””, encoding=”utf-8″) as csvfile:
writer = csv.writer(csvfile)# Summary row with business name, rating and total review count
writer.writerow([title, star_rating, review_count])# Header row for individual review entries
writer.writerow([“Reviewer name”, “Rating”, “Review text”])# Record each review from the list
for author, rating_number, review_text in reviews:
writer.writerow([author, rating_number, review_text])
The snippet opens a new CSV file for export, writes an initial row with the business name, average rating, and total number of reviews, adds a header line for the columns, and then stores every collected review in a clear, table-like layout.
For deeper analysis, you can use tools such as the pandas library to compute statistics or spot patterns. You can also use modern AI services that let you upload the CSV and generate an intelligent summary of what people are saying in their comments.
Troubleshooting common issues
The script already includes several try-except blocks to prevent a single problem from stopping the entire run. Still, Google Maps can behave in unpredictable ways, so keep these points in mind:
Proxy stability
Choose strong, rotating residential proxy networks from trusted providers to minimize dropped connections and avoid blocks.
Loading times
If pages feel slow, add smarter waiting logic or increase timeouts so the content has enough time to appear, especially during busy periods.
Changing locators
Focus on reliable attributes, such as aria-label, visible button text, or heading elements, rather than class names, which tend to change frequently.
Scrolling issues
Make sure you target the correct scrollable container and set its scrollTop property to its scrollHeight to trigger loading of additional reviews.
Layout differences
Test your script against different types of businesses and search phrases so you can cope with cases where the reviews section is missing or the results are presented in another layout.
Conclusion
In this walkthrough, you learned how to collect Google Maps reviews with Playwright while using proxies, controlled scrolling, and carefully chosen selectors. You also saw how to handle shifting layouts, unstable locators, and proxy-related problems with practical safeguards that keep the scraper running smoothly. Whatever your end use for Google review data, this approach provides a reliable stream of information without unnecessary frustration.