pcdt-scraper Documentation
Table of Contents
- Introduction
- Requirements
- Installation
- Getting Started
- Core Features
- API Reference
- Examples
- Troubleshooting
Introduction
pcdt-scraper is a Python web scraping library that combines the power of PyChromeDevTools with a Selenium-like syntax. It’s designed to handle websites that block traditional HTTP requests but allow Chrome browser requests. The library provides an intuitive interface for web scraping tasks while utilizing Chrome’s DevTools Protocol.
Requirements
- Python 3.6 or higher
- Chrome or Chromium browser
- bs4(BeautifulSoup4)
- PyChromeDevTools
Installation
Install using pip:
pip install pcdt-scraper
Or using pip3:
pip3 install pcdt-scraper
Getting Started
1. Start Chrome/Chromium in Debug Mode
First, you need to run Chrome or Chromium with remote debugging enabled:
# Regular mode
chromium --remote-debugging-port=9222 --remote-allow-origins=*
# Or headless mode
chromium --remote-debugging-port=9222 --remote-allow-origins=* --headless
2. Basic Usage
from pcdt_scraper import WebScraper
# Initialize the scraper
scraper = WebScraper()
try:
    # Navigate to a webpage
    scraper.get("https://www.example.com")
    
    # Find elements and extract data
    element = scraper.find_element_by_class_name("my-class")
    text = element.text()
    
finally:
    # Always close the scraper
    scraper.close()
Core Features
WebScraper Class
The main class that handles all scraping operations. It provides:
- Selenium-like syntax for easy transition
- Chrome DevTools integration
- BeautifulSoup parsing capabilities
ElementWrapper Class
Wraps web elements with convenient methods:
- text(): Get element’s text content
- get_attribute(attribute): Get specific attribute value
- is_displayed(): Check if element exists
- get_html(): Get element’s HTML content
Elements Class
Collection class for handling multiple elements:
- Iterable interface
- Length checking
- Index-based access
API Reference
WebScraper Methods
Navigation
scraper.get(url, timeout=60)  # Navigate to a webpage
scraper.close()  # Close the browser
scraper.quit()   # Alias for close()
Element Finding Methods
# Single element finders
scraper.find_element_by_id(id_)
scraper.find_element_by_class_name(class_name)
scraper.find_element_by_tag_name(tag_name)
scraper.find_element_by_name(name)
scraper.find_element_by_css_selector(css_selector)
scraper.find_element_by_xpath(xpath)  # Limited support
# Multiple elements finders
scraper.find_elements_by_class_name(class_name)
scraper.find_elements_by_tag_name(tag_name)
scraper.find_elements_by_name(name)
scraper.find_elements_by_css_selector(css_selector)
Page Content
scraper.get_page_source()      # Get page source (alias)
scraper.get_page_content()     # Get parsed page content
Examples
1. Basic Scraping
from pcdt_scraper import WebScraper
scraper = WebScraper()
try:
    scraper.get("https://www.example.com")
    title = scraper.find_element_by_tag_name("h1").text()
    print(f"Page title: {title}")
finally:
    scraper.close()
2. Working with Multiple Elements
from pcdt_scraper import WebScraper
scraper = WebScraper()
try:
    scraper.get("https://www.example.com")
    links = scraper.find_elements_by_tag_name("a")
    
    for link in links:
        href = link.get_attribute("href")
        text = link.text()
        print(f"Link: {text} -> {href}")
finally:
    scraper.close()
3. Using CSS Selectors
from pcdt_scraper import WebScraper
scraper = WebScraper()
try:
    scraper.get("https://www.example.com")
    elements = scraper.find_elements_by_css_selector(".content article")
    
    for element in elements:
        title = element.find_element_by_class_name("title").text()
        print(f"Article title: {title}")
finally:
    scraper.close()
Troubleshooting
Common Issues
- ConnectionError
    - Error: “Got ConnectionError, it seems your chrome remote instance is not running”
- Solution: Ensure Chrome/Chromium is running with remote debugging enabled
 
- Page Load Timeout
    - Error: “Page load timed out after X seconds”
- Solution: Increase the timeout parameter in the get()method
 
- Element Not Found
    - Solution:
        - Check if the element exists in the page source
- Try different selector methods
- Ensure the page has fully loaded
 
 
- Solution:
        
Best Practices
- Always use try-finally blocks to ensure proper cleanup
- Close the scraper after use
- Handle potential exceptions appropriately
- Use appropriate timeouts for your use case
- Choose the most specific selector method available
This documentation provides a comprehensive guide to using pcdt-scraper. For more information or to contribute to the project, visit the GitHub repository.