Python Forum
Output substrings from rows in pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Output substrings from rows in pandas
#1
I have a dataframe with a column "C" which contains address information. How can I single out the number of unique cities? In the example below it would be London, Paris and Barcelona?

A B C
Grey no 84 Sussex Gardens Westminster Borough London W2 1UH United Kingdom
Red yes 71 Rue de Charonne 11th arr 75011 Paris France
Blue yes Diputaci 262 264 Eixample 08007 Barcelona Spain
Reply
#2
Do you have the list of cities you want to look for? You'll need some form of logic to identify which words in the string are cities. If you're looking for a particular city (e.g. London) you could create a column with something like (I'm doing this syntax from memory so you might want to play with it).

df['Is London'] = np.where(df['C'].str.contains('London'), 1, 0)
Reply
#3
(Jun-20-2018, 02:54 PM)mecampbell Wrote: Do you have the list of cities you want to look for?
No but it's usually the 2nd last string in the column or if it's UK then it is the string before the postcode.


(Jun-20-2018, 02:54 PM)mecampbell Wrote: df['Is London'] = np.where(df['C'].str.contains('London'), 1, 0)
this works but means I need to know the cities prior
Reply
#4
How about using split() with " " to get the second last word and then use isnumeric() to determine if it has a number. If it has a number, then consider the previous word.
Reply
#5
(Jun-20-2018, 03:57 PM)Nwb Wrote: How about using split() with " " to get the second last word and then use isnumeric() to determine if it has a number. If it has a number, then consider the previous word.

That sounds good, do you know how I can separate output these from each cell?
Reply
#6
I have singled out a column of data from a dataframe via
address = reviews.iloc[:,1]
which contains addresses. How can I output the cities alone for each row? Each city is the 2nd last string or if it's UK then it is the 3rd last string that does not contain numbers.
Thanks in advance

0 372 Strand Westminster Borough London WC2R 0JJ United Kingdom
1 Rossell 249 Eixample 08008 Barcelona Spain
2 Damrak 1 5 Amsterdam City Center 1012 LG Amsterdam
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas Dataframe Filtering based on rows mvdlm 0 1,396 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  [Pandas] Help with finding only non-matching rows LowEnd 3 3,460 Nov-12-2021, 02:34 PM
Last Post: jefsummers
  No Output In Pandas DataFrame Query eddywinch82 1 1,904 Aug-17-2020, 09:25 PM
Last Post: eddywinch82
  pandas head() not reading all rows naab 0 1,777 Apr-07-2020, 01:06 PM
Last Post: naab
  How does pyplot know what was plotted by the output of pandas.DataFrame(...).cumprod( codeowl 2 2,163 Mar-28-2020, 08:27 AM
Last Post: j.crater
  How to add a few empty rows into a pandas dataframe python_newbie09 2 16,288 Sep-20-2019, 08:52 AM
Last Post: python_newbie09
  Subtract rows (like r[1]-r[2] and r[3]-r[3]) and no pandas pradeepkumarbe 1 2,571 Dec-18-2018, 01:16 PM
Last Post: ichabod801
  Write specific rows from pandas dataframe to csv file pradeepkumarbe 3 5,433 Oct-18-2018, 09:33 PM
Last Post: volcano63
  Using Pandas to store spotipy output in csv (python) pouyonsel 2 4,004 Jan-29-2018, 09:47 PM
Last Post: pouyonsel
  pandas restricting csv read to certain rows metalray 5 20,755 Dec-16-2017, 07:39 AM
Last Post: metalray

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020