11 sed
- Explain the basic usage of the
sedprogram for text replacement. - Understand what the
\“escape character” does and why it is needed.
11.1 Text Replacement
One of the most prominent text-processing utilities is the sed command, which is short for “stream editor”. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
sed contains several sub-commands, but the main one we will cover is the substitute or s command. The syntax is:
sed 's/pattern/replacement/options'Where pattern is the word we want to substitute and replacement is the new word we want to use instead. There are also other “options” added at the end of the command, which change the default behaviour of the text substitution. Some of the common options are:
g: by defaultsedwill only substitute the first match of the pattern. If we use thegoption (“global”), thensedwill substitute all matching text.i: by defaultsedmatches the pattern in a case-sensitive manner. For example ‘A’ and ‘a’ are treated as different. If we use theioption (“case-insensitive”) thensedwill treat ‘A’ and ‘a’ as the same.
For example, let’s create a file with some text inside it:
echo "Hello world. How are you world?" > hello.txtIf we do:
sed 's/world/participant/' hello.txtThis is the result
Hello participant. How are you world?
We can see that the first “world” word was replaced with “participant”. This is the default behaviour of sed: only the first pattern it finds in a line of text is replaced with the new word. We can modify this by using the g option after the last /:
sed 's/world/participant/g' hello.txtHello participant. How are you participant?
Finding patterns in text can be a very powerful skill to master. In our examples we have been finding a literal word and replacing it with another word. However, we can do more complex text substitutions by using special keywords that define a more general pattern. These are known as regular expressions.
For example, in regular expression syntax, the character . stands for “any character”. So, for example, the pattern H. would match a “H” followed by any character, and the expression:
sed 's/H./X/g' hello.txtResults in:
Xllo world. Xw are you world?
Notice how both “He” (at the start of the word “Hello”) and “Ho” (at the start of the word “How”) are replaced with the letter “X”. Because both of them match the pattern “H followed by any character” (H.).
To learn more see this Regular Expression Cheatsheet.
11.2 The \ Escape Character
You may have asked yourself, if / is used to separate parts of the sed substitute command, then how would we replace the “/” character itself in a piece of text? For example, let’s add a new line of text to our file:
echo "Welcome to this workshop/course." >> hello.txtLet’s say we wanted to replace “workshop/course” with “tutorial” in this text. If we did:
sed 's/workshop/course/tutorial/' hello.txtWe would get an error:
sed: -e expression #1, char 5: unknown option to `s'
This is because we ended up with too many / in the command, and sed uses that to separate its different parts of the command. In this situation we need to tell sed to ignore that / as being a special character but instead treat it as the literal “/” character. To to this, we need to use \ before /, which is called the “escape” character. That will tell sed to treat the / as a normal character rather than a separator of its commands. So:
sed 's/workshop\/course/tutorial/' hello.txt
↑
This / is "escaped" with \ beforehandThis looks a little strange, but the main thing to remember is that \/ will be interpreted as the character “/” rather than the separator of sed’s substitute command.
The output now would be what we wanted:
Hello world. How are you world?
Welcome to this tutorial.
11.3 Alternative separator: |
Instead of using the escape character, like we did above, sed can also use the character | to separate the two parts of the expression. Our command could have instead been written as:
sed 's|workshop/course|tutorial|' hello.txtHello world. How are you world?
Welcome to this tutorial.
This is a little easier to read, as we avoid using the \ escape character in our pattern to be replaced.
11.4 Removing text
The sed command can be used to remove text from an input. The way to do it is to use nothing as the text to be replaced with. For example, if we wanted to remove the word “world” from our example file, we could do:
sed 's/ world//g' hello.txtOr, equivalently, using the | vertical separator:
sed 's| world||g' hello.txtHello. How are you?
Welcome to this workshop/course.
A few things to note in our command above:
- We included the ” ” space before “world”, to make sure we also remove it from the text.
- The second part of the
sedsubstitution we left blank, that’s why we have two consecutive//(or||), to indicate we are replacing ” world” with nothing. - We made sure to include the
gmodifier, so that we replace both occurrences of the world “world”.
11.5 Exercises
11.6 Summary
- The
sedtool can be used for advanced text manipulation. The “substitute” command can be used to text replacement:sed 's/pattern/replacement/options'. - Common options that can be used with
sedinclude:gfor global substitution, rather than just the first match.ifor case-insensitive substitution, rather than being case-sensitive.
- To remove part of a text we can leave the “replacement” part of the command empty:
sed 's/pattern//g'(this would replace “pattern” with nothing, i.e. removing it). - While
sedis extremely versatile, learning and remembering all of its operations can be challenging. Instead, effective web-searching can often lead us to solutions for not-so-trivial text manipulation problems, without the need to learn all the workings of the tool.