Slides

Regular Expressions

Regular expressions are terribly powerful, and terribly complicated. They could easily fill a two-day (or longer) class on their own. In this section we will assume you know something about them already, and just cover the basic syntax and point you to some deeper references.

Ref. WGR Chapter 11, Regular expressions and regexp-based string operations

regexp literals

slashes define a regular expression literally (inline)

foo_matcher = /foo/

The above regexp will match a string containing "foo" anywhere inside it

ooo_matcher = /oo*/

The above regexp will match a string containing "o" or "oo" or "ooo" (and so on) anywhere inside it

a more complicated regexp

    word_exp = /\b[a-z]*\b/i
  • slashes delineate the pattern
  • backslash b means "match a word boundary"
  • brackets delineate a match of a single character (aka "character class")
  • a hyphen z means "all the letters between a and z, inclusive"
  • star means match any number (0 or more) of the previous character (or, as in this case, character class)
  • slash i at the end means "ignore capitalization" (or "case insensitive")

So the above expression will match any word containing only normal English letters, in any combination of upper- or lowercase.

regexp operators: equal tilde

equal tilde returns the position in the string that matches, or nil if no match

>> "abcde" =~ /bcd/
=> 1

Note that the return value is truthy if the string is a match, and falsey if it's not, which lets you use it inside conditionals:

if ("abcde" =~ /bcd/)
  puts "yay! it matches!"
end

regexp globals

  • After a successful match, some global variables are set
    • $~ is what it matched
    • $1 is the first substring match
    • $2 is the second substring match
    • etc.
    • substrings are defined with parentheses in the regexp
if "the quick brown fox" =~ /(quick).*(f..)/ then
  puts "The matching string was #{$~}"
  puts "The first substring was #{$1}"
  puts "The second substring was #{$2}"
end

Prints this:

The matching string was quick brown fox
The first substring was quick
The second substring was fox

See Also

regexp operators: bang tilde

bang tilde returns false if the string matches, or true if it doesn't

>> "abcde" !~ /xyz/
=> true
>> "abcde" !~ /bcd/
=> false

some methods that can use regexes

String.split

String.[]

String.sub and String.gsub

>> "foobar".sub("foo", "baz")
=> "bazbar"
>> "foobar".sub(/foo/, "baz")
=> "bazbar"
>> "foobar".sub(/fo*/, "baz")
=> "bazbar"
>> "fooooooobar".sub(/fo*/, "baz")
=> "bazbar"
>> "fooooooobarfoo".sub(/fo*/, "baz")
=> "bazbarfoo"
>> "fooooooobarfoo".gsub(/fo*/, "baz")
=> "bazbarbaz"

Array.grep

>> "foo bar baz".split.grep(/f../)
=> ["foo"]
>> "foo bar baz".split.grep(/b../)
=> ["bar", "baz"]

MatchData

  • If you need more control you can use the match method
  • It returns a MatchData object which is rather complex
  • See rubydoc for Regexp and MatchData

learning more