More Exercises
Overview
Teaching: 0 min
Exercises: 50 minQuestions
How do you find and match strings with regular expressions?
Objectives
Test knowledge of use of regular expressions
Exercises
The exercises are designed to embed the regex knowledge you learned during this module. We recommend you work through it sometime after class (within a week or so).
What does
Fr[ea]nc[eh]
match?Answer
This matches
France
,French
, in addition to the misspellingsFrence
, andFranch
. It would also find strings where there were characters to either side of the pattern such asFrance's
,in French
, orFrench-fried
.
What does
Fr[ea]nc[eh]$
match?Answer
This matches
France
,French
,Frence
, andFranch
only at the end of a line. It would also match strings with other characters appearing before the pattern, such asin French
orSino-French
.
What would match the strings
French
andFrance
only that appear at the beginning of a line?Answer
^France|^French
This would also find strings with other characters coming afterFrench
, such asFrenchness
orFrance's economy
.
How do you match the whole words
colour
andcolor
(case insensitive)?Answer
In real life, you should only come across the case insensitive variations
colour
,color
,Colour
,Color
,COLOUR
, andCOLOR
(rather than, say,coLour
. So one option would be\b[Cc]olou?r\b|\bCOLOU?R\b
. This can, however, get quickly quite complex. An option we’ve not discussed is to take advantage of the/
delimiters and add an ignore case flag: so/colou?r/i
will match all case insensitive variants ofcolour
andcolor
.
How would you find the whole-word
headrest
orhead rest
but nothead rest
(that is, with two spaces betweenhead
andrest
?Answer
\bhead ?rest\b
. Note that although\bhead\s?rest\b
does work, it would also match zero or one tabs or newline characters betweenhead
andrest
. In most real world cases it should, however, be fine.
How would you find a 4-letter word that ends a string and is preceded by at least one zero?
Answer
0+[a-z]{4}\b
How do you match any 4-digit string anywhere?
Answer
\d{4}
. Note this will match 4 digit strings only but will find them within longer strings of numbers.
How would you match the date format
dd-MM-yyyy
?Answer
\b\d{2}-\d{2}-\d{4}\b
In most real world situations, you are likely to want word bounding here (but it may depend on your data).
How would you match the date format
dd-MM-yyyy
ordd-MM-yy
at the end of a line only?Answer
\d{2}-\d{2}-\d{2,4}$
How would you match publication formats such as
British Library : London, 2015
andManchester University Press: Manchester, 1999
?Answer
.* : .*, \d{4}
You will find that this matches any text you put beforeBritish
orManchester
. In this case, this regular expression does a good job on the first look up and may be need to be refined on a second depending on your real world application.
Key Points
Regular expressions answers