There comes a point in every Excel users workflow that they've got to work with a list of names. Usually the data you get is in the wrong format, or you have to compare two dataset of different formats. This can be frustrating but here is a quick guide of how to manipulate them.
This tutorial is based around extracting the surname / family name from a variety of different formats, if you need to get to the title or firstname then it should be easily enough to adapt the workarounds.
Possibly the best format to get the data in, it is quick and easy to extract the surname, like everything in excel all you have to do is break the task into logical steps. Lets go with this example:
Fish, Phil the
Breaking it down, the portion of the name we want to take are all the letters before the comma. Using the FIND
function and assuming the string is in cell A1 we'd use this formula:
=FIND(",",A1,1)-1
This results in the expected answer of 4. All we need to do next is use the LEFT
formula to extract the letters:
=LEFT(A1,4)
We can then nest these formulas into one compact equation:
=LEFT(A1,FIND(",",A1,1)-1)
Et voila
Fish
It wouldn't be so bad to extract the Surname from a Forename Surname configuration, you just repeat the above steps but instead look for a space and use the RIGHT function along with LEN
- which tells you how many characters are in the string so you can extract the space to last letter. E.g. for
Phil Fish
=RIGHT(A1,LEN(A1)-FIND(" ",A1))
Again giving you
Fish
the trouble is that the data rarely comes through like this. There are several permitations which will come through including middle name/no middle name/multiple middle names and title:
Phil Fish
Phil the Fish
Lord Phil Fish
Phil Overloard of All Fish
Back to our human logic, all we really have to do here is take the letters after the last space, unfortunately this is trickier than it sounds!
To accomplish our task we have to break this down into futher steps, first off:
Counting how many occurances a string has can be calculated by counting the total length of the string minus the length of the string without the text you are trying to find. Pretty cunning really, you'll have needed a bit of out of the box thinking for that one. Here's the formula to use:
=SUM(LEN(<range>)-LEN(SUBSTITUTE(<range>,"text","")))/LEN("text")
or in the context of our example:
Phil the Fish
this formula:
=SUM(LEN(A1)-LEN(SUBSTITUTE(A1," ","")))/LEN(" ")
will yield the answer of 2.
Now we have found how many times the space appears we have to locate the last occurance, we do this by replacing the last space with something unique which we know won't appear for this we'll use a special system charachter Null and call it using the CHAR function which retreives is using its ASCII code:
CHAR(1)
So as a formula we'll substitute the space in the text string and find it:
=FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))
This give us the position of 9,
Next all we have to do is proceed to grab the last 9 letters of the string as described earlier and we are away:
=RIGHT(A1,9)
Finally once we've figured out all the steps we can bung all the code into one formula:
=RIGHT(A1,LEN(A1)-FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),SUM(LEN(A1)-LEN(SUBSTITUTE(A1," ","")))/LEN(" "))))
And we are away. I told you it wasn't as easy as you first imagined!