python - How to load additional episodes for a series via IMDB pagination-container? - Stack Overflow

admin2025-04-30  1

I am writing this code:

import requests
from bs4 import BeautifulSoup

url = '/'
headers = {
    "Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

I want to get data like episode ID and name for all 80 episodes but when run this code it just gives me 50 episodes and others are under the pagination '30 more'.

I tried many things as discovering the HTML code of the website and finding the class

<div class="sc-f09bd1f5-1 hoKmdt pagination-container">
        <span class="ipc-see-more sc-33e570c-0 cMGrFN single-page-see-more-button">
            <button class="ipc-btn ipc-btn--single-padding ipc-btn--center-align-content ipc-btn--default-height ipc-btn--core-base ipc-btn--theme-base ipc-btn--button-radius ipc-btn--on-accent2 ipc-text-button ipc-see-more__button" tabindex="151" aria-disabled="false">
                <span class="ipc-btn__text">
                    <span class="ipc-see-more__text">
                        30 more
                    </span>
                </span>
                <svg xmlns="; width="24" height="24" class="ipc-icon ipc-icon--expand-more ipc-btn__icon ipc-btn__icon--post" viewBox="0 0 24 24" fill="currentColor" role="presentation">
                    <path opacity=".87" fill="none" d="M24 24H0V0h24v24z"></path>
                    <path d="M15.88 9.29L12 13.17 8.12 9.29a.996.996 0 1 0-1.41 1.41l4.59 4.59c.39.39 1.02.39 1.41 0l4.59-4.59a.996.996 0 0 0 0-1.41c-.39-.38-1.03-.39-1.42 0z"></path>
                </svg>
            </button>
        </span>
    </div>

but I couldn't find a way to get all the data

I am writing this code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.imdb.com/title/tt5189554/episodes/'
headers = {
    "Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

I want to get data like episode ID and name for all 80 episodes but when run this code it just gives me 50 episodes and others are under the pagination '30 more'.

I tried many things as discovering the HTML code of the website and finding the class

<div class="sc-f09bd1f5-1 hoKmdt pagination-container">
        <span class="ipc-see-more sc-33e570c-0 cMGrFN single-page-see-more-button">
            <button class="ipc-btn ipc-btn--single-padding ipc-btn--center-align-content ipc-btn--default-height ipc-btn--core-base ipc-btn--theme-base ipc-btn--button-radius ipc-btn--on-accent2 ipc-text-button ipc-see-more__button" tabindex="151" aria-disabled="false">
                <span class="ipc-btn__text">
                    <span class="ipc-see-more__text">
                        30 more
                    </span>
                </span>
                <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" class="ipc-icon ipc-icon--expand-more ipc-btn__icon ipc-btn__icon--post" viewBox="0 0 24 24" fill="currentColor" role="presentation">
                    <path opacity=".87" fill="none" d="M24 24H0V0h24v24z"></path>
                    <path d="M15.88 9.29L12 13.17 8.12 9.29a.996.996 0 1 0-1.41 1.41l4.59 4.59c.39.39 1.02.39 1.41 0l4.59-4.59a.996.996 0 0 0 0-1.41c-.39-.38-1.03-.39-1.42 0z"></path>
                </svg>
            </button>
        </span>
    </div>

but I couldn't find a way to get all the data

Share Improve this question edited Jan 8 at 9:09 HedgeHog 25.3k5 gold badges17 silver badges41 bronze badges asked Jan 5 at 1:01 Kasper JcobKasper Jcob 1
Add a comment  | 

1 Answer 1

Reset to default 0

Based on the comment that mentioned selenium here is a basic example to handle the issue and load the additional episodes by clicking the specific element - May also take a closer look to use the imdb api or perform additional requests to load additional content:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
# call the episodes of season
driver.get('https://www.imdb.com/title/tt5189554/episodes/?season=1')

# search for the get more episodes button and click it
element = driver.find_element(By.CSS_SELECTOR, '.ipc-see-more__button')
driver.execute_script("arguments[0].click();", element)

# give some time to load additional episodes
time.sleep(2)

# convert drivers page source into bs4 object
soup = BeautifulSoup(driver.page_source)

data = []

# iterate the episodes and select specific content
for e in soup.select('article.episode-item-wrapper'):
    data.append({
        'title': e.h4.get_text(),
        'link': 'https://www.imdb.com'+e.a.get('href')
    })

print(data)

Output

[{'title': 'S1. E1 ∙ Un sueño, sobre ruedas',
  'link': 'https://www.imdb.com/title/tt5378740/?ref_=ttep_ep1'},
 {'title': 'S1. E2 ∙ Una nueva historia, sobre ruedas',
  'link': 'https://www.imdb.com/title/tt5585752/?ref_=ttep_ep2'},
 {'title': 'S1. E3 ∙ Nuevas aventuras, sobre ruedas',
  'link': 'https://www.imdb.com/title/tt5585754/?ref_=ttep_ep3'},
...
'title': 'S1. E79 ∙ La final de la InterContinental, sobre ruedas (Parte 1)',
  'link': 'https://www.imdb.com/title/tt6139312/?ref_=ttep_ep79'},
 {'title': 'S1. E80 ∙ La final de la InterContinental, sobre ruedas (Parte 2)',
  'link': 'https://www.imdb.com/title/tt6139318/?ref_=ttep_ep80'}]
转载请注明原文地址:http://anycun.com/QandA/1746021095a91452.html