I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.
Date | ID | Comment |
---|---|---|
20240514 May, 14 22:00 | R_111 | Le client ne répond pas |
I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.
Date | ID | Comment |
---|---|---|
20240514 May, 14 22:00 | R_111 | Le client ne répond pas |
I think you can take a list of ID customers from Dataframe and use the library to notify the phone number by ID. Example example.com/client/?id=111
The order (ID) page looks like this:
%%html
<!doctype html>
<html>
<head>
<title>id 111</title>
</head>
<body>
<div>
<div id="contactButton" class="bg-primary-subtle py-2 px-3 rounded-3 text-primary fw-medium" style="cursor: pointer">
Contact
</div>
<div class="d-flex flex-column position-relative mt-2 d-none" id="contactBlock">
<div id="phone" class="position-absolute end-0 text-nowrap">
<a href="tel:+77777777777" class="btn btn-lg btn-outline-primary fw-medium">
<button class="btn btn-lg btn-outline-secondary fw-medium" data-bs-toggle="modal" data-bs-target="#exampleModal">
</button>
</div>
</div>
</div>
</body>
</html>
I want to get such a dataframe:
ID | Phone |
---|---|
R_111 | 777777777 |
I wrote the following code:
import requests
from bs4 import BeautifulSoup
def get_client_phone(client_id):
# url client
_url = f"https://example.com/client/?id={client_id}"
response = requests.get(_url, data=cfg.payload, headers=headers)
# Status
if response.status_code != 200:
print(f"Eror: {response.status_code}")
return None
# Parse page
soup = BeautifulSoup(response.text, 'html.parser')
# Find phone
phone_element = soup.find(id='phone')
if phone_element:
# Extract phone
phone_link = phone_element.find('a', href=True)
if phone_link:
phone_number = phone_link['href'].replace('tel:', '') # Remove 'tel:'
return phone_number
else:
print("The phone was not found")
return None
client_id = 'R_111'
phone_number = get_client_phone(client_id)
if phone_number:
print(f"Phone {client_id}: {phone_number}")
else:
print("Error")
Seems that mapping works and focus is on dataframe - Extract the ids from your dataframe as a series, iterate over them and record the results in a dictionary, which you can then easily transfer back into a dataframe.
# list or series of your ids
client_id_series = ['R_111','R_222']
pd.DataFrame(
[
{'ID':client_id,'Phone':get_client_phone(client_id)}
for client_id
in client_id_series
]
)
ID | Phone |
---|---|
R_111 | +77777777777 |
R_222 | +88888888888 |
Or simply iterate your existing dataframe directly and only add the column with the result of the phone number
data = {
'Date': ['20240514 May, 14 22:00', '20240514 May, 14 23:00'],
'ID': ['R_111', 'R_222'],
'Comment': ['Le client ne répond pas', None]
}
df = pd.DataFrame(data)
df['Phone'] = df['ID'].apply(get_client_phone)
print(df)
Date | ID | Comment | Phone | |
---|---|---|---|---|
0 | 20240514 May, 14 22:00 | R_111 | Le client ne répond pas | +77777777777 |
1 | 20240514 May, 14 23:00 | R_222 | +88888888888 |
soup
to make sure that they deliver the expected content. – HedgeHog Commented Jan 24 at 9:52