python - How to filter a pandas object type column using an array for indexing - Stack Overflow

admin2025-04-27  3

I im trying to extract a title from the famous "Titanic" dataset, where the format is like this:

[Name] [1]: .png

I'm trying to avoid an iterative solution, so i've tried something like this:

df['Title'] = df['Name'].str[df['Name'].str.find(' ') + 1 : df['Name'].str.find('.')]

This doesn't work since i'm using series as indexes instead of an unique value. ¿What would be the correct way to do this?

This works, but seems too complex:

space_pos=data.Name.str.find(" ")
dot_pos=data.Name.str.find(".")
data["Title"]=[data.Name[i][space_pos[i]+1:dot_pos[i]] for i in range(len(data.Name))]


I im trying to extract a title from the famous "Titanic" dataset, where the format is like this:

[Name] [1]: https://i.sstatic.net/HlkH8zHO.png

I'm trying to avoid an iterative solution, so i've tried something like this:

df['Title'] = df['Name'].str[df['Name'].str.find(' ') + 1 : df['Name'].str.find('.')]

This doesn't work since i'm using series as indexes instead of an unique value. ¿What would be the correct way to do this?

This works, but seems too complex:

space_pos=data.Name.str.find(" ")
dot_pos=data.Name.str.find(".")
data["Title"]=[data.Name[i][space_pos[i]+1:dot_pos[i]] for i in range(len(data.Name))]


Share Improve this question asked Jan 11 at 15:33 Martin BeraldoMartin Beraldo 431 gold badge1 silver badge5 bronze badges 1
  • The code "space_pos=data.Name.str.find(" ") dot_pos=data.Name.str.find(".") data["Title"]=[data.Name[i][space_pos[i]+1:dot_pos[i]] for i in range(len(data.Name))] " does not select titles like Prof. and more. Then why do you think it works? – Subir Chowdhury Commented Jan 11 at 16:15
Add a comment  | 

2 Answers 2

Reset to default 0

You can use regular expressions to pull out text without looping through each row. Here's a way to do it using str.extract in pandas:

import pandas as pd

# Assuming df is your DataFrame and 'Name' is the column with the names
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.')

According to the image, the schema seems to be "LastName, Title. FirstName Other" with no comma or dot in Other. So we can split the name first by the comma and take the group 1 (2nd element) and then split by the dot and take the first group. So you can use:

data['Title'] = data['Name'].map(lambda s: s.split(",")[1].split(".")[0].strip())
转载请注明原文地址:http://anycun.com/QandA/1745706026a91086.html