python - How to extract the value of the link tel from the internal web page using Beautifulsoup? - Stack Overflow

admin2025-04-25  2

I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.

Date ID Comment
20240514 May, 14 22:00 R_111 Le client ne répond pas

I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.

Date ID Comment
20240514 May, 14 22:00 R_111 Le client ne répond pas

I think you can take a list of ID customers from Dataframe and use the library to notify the phone number by ID. Example example.com/client/?id=111

The order (ID) page looks like this:

%%html
<!doctype html>
<html>
    <head>
        <title>id 111</title>
    </head>
    <body>
    <div>
            <div id="contactButton" class="bg-primary-subtle py-2 px-3 rounded-3 text-primary fw-medium" style="cursor: pointer">
                Contact
            </div>
            <div class="d-flex flex-column position-relative mt-2 d-none" id="contactBlock">
                <div id="phone" class="position-absolute end-0 text-nowrap">
                    <a href="tel:+77777777777" class="btn btn-lg btn-outline-primary fw-medium">
                        
                    <button class="btn btn-lg btn-outline-secondary fw-medium" data-bs-toggle="modal" data-bs-target="#exampleModal">
                     
                    </button>
                </div>
            </div>
        </div>
</body>
</html>

I want to get such a dataframe:

ID Phone
R_111 777777777

I wrote the following code:

import requests
from bs4 import BeautifulSoup

def get_client_phone(client_id):
    # url client 
    _url = f"https://example.com/client/?id={client_id}"

    response = requests.get(_url, data=cfg.payload, headers=headers)
    
    # Status
    if response.status_code != 200:
        print(f"Eror: {response.status_code}")
        return None

    # Parse page
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find phone
    phone_element = soup.find(id='phone')
    
    if phone_element:
        # Extract phone
        phone_link = phone_element.find('a', href=True)
        if phone_link:
            phone_number = phone_link['href'].replace('tel:', '')  # Remove 'tel:'
            return phone_number
    else:
        print("The phone was not found")
        return None


client_id = 'R_111' 
phone_number = get_client_phone(client_id)

if phone_number:
    print(f"Phone {client_id}: {phone_number}")
else:
    print("Error")
Share Improve this question edited Jan 16 at 12:23 HedgeHog 25.3k5 gold badges17 silver badges41 bronze badges asked Jan 16 at 12:00 mikhailtugushevmikhailtugushev 135 bronze badges 1
  • Based on the HTML provided, the corresponding result would be generated, i.e. the queries to the source (unfortunately not known) should really deliver this. Tip - That something does not work is very general - you will usually get error messages, provide these, this usually leads to much more focused responses on the error. As a first step, try to print the status of the response and the soup to make sure that they deliver the expected content. – HedgeHog Commented Jan 24 at 9:52
Add a comment  | 

1 Answer 1

Reset to default 0

Seems that mapping works and focus is on dataframe - Extract the ids from your dataframe as a series, iterate over them and record the results in a dictionary, which you can then easily transfer back into a dataframe.

# list or series of your ids
client_id_series = ['R_111','R_222']

pd.DataFrame(
    [
        {'ID':client_id,'Phone':get_client_phone(client_id)} 
        for client_id 
        in client_id_series
    ]
)
ID Phone
R_111 +77777777777
R_222 +88888888888

Or simply iterate your existing dataframe directly and only add the column with the result of the phone number

data = {
    'Date': ['20240514 May, 14 22:00', '20240514 May, 14 23:00'],
    'ID': ['R_111', 'R_222'],
    'Comment': ['Le client ne répond pas', None]
}

df = pd.DataFrame(data)

df['Phone'] = df['ID'].apply(get_client_phone)

print(df)
Date ID Comment Phone
0 20240514 May, 14 22:00 R_111 Le client ne répond pas +77777777777
1 20240514 May, 14 23:00 R_222 +88888888888
转载请注明原文地址:http://anycun.com/QandA/1745533349a90866.html