12 SED
- Explain the basic usage of the
sed
program for text replacement. - Understand what the
\
“escape character” does and why it is needed.
12.1 Text Replacement
One of the most prominent text-processing utilities is the sed
command, which is short for “stream editor”. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
sed
contains several sub-commands, but the main one we will cover is the substitute or s
command. The syntax is:
sed 's/pattern/replacement/options'
Where pattern
is the word we want to substitute and replacement
is the new word we want to use instead. There are also other “options” added at the end of the command, which change the default behaviour of the text substitution. Some of the common options are:
g
: by defaultsed
will only substitute the first match of the pattern. If we use theg
option (“global”), thensed
will substitute all matching text.i
: by defaultsed
matches the pattern in a case-sensitive manner. For example ‘A’ and ‘a’ are treated as different. If we use thei
option (“case-insensitive”) thensed
will treat ‘A’ and ‘a’ as the same.
For example, let’s create a file with some text inside it:
echo "Hello world. How are you world?" > hello.txt
If we do:
sed 's/world/participant/' hello.txt
This is the result
Hello participant. How are you world?
We can see that the first “world” word was replaced with “participant”. This is the default behaviour of sed
: only the first pattern it finds in a line of text is replaced with the new word. We can modify this by using the g
option after the last /
:
sed 's/world/participant/g' hello.txt
Hello participant. How are you participant?
Finding patterns in text can be a very powerful skill to master. In our examples we have been finding a literal word and replacing it with another word. However, we can do more complex text substitutions by using special keywords that define a more general pattern. These are known as regular expressions.
For example, in regular expression syntax, the character .
stands for “any character”. So, for example, the pattern H.
would match a “H” followed by any character, and the expression:
sed 's/H./X/g' hello.txt
Results in:
Xllo world. Xw are you world?
Notice how both “He” (at the start of the word “Hello”) and “Ho” (at the start of the word “How”) are replaced with the letter “X”. Because both of them match the pattern “H followed by any character” (H.
).
To learn more see this Regular Expression Cheatsheet.
12.2 The \
Escape Character
You may have asked yourself, if /
is used to separate parts of the sed
substitute command, then how would we replace the “/” character itself in a piece of text? For example, let’s add a new line of text to our file:
echo "Welcome to this workshop/course." >> hello.txt
Let’s say we wanted to replace “workshop/course” with “tutorial” in this text. If we did:
sed 's/workshop/course/tutorial/' hello.txt
We would get an error:
sed: -e expression #1, char 5: unknown option to `s'
This is because we ended up with too many /
in the command, and sed
uses that to separate its different parts of the command. In this situation we need to tell sed
to ignore that /
as being a special character but instead treat it as the literal “/” character. To to this, we need to use \
before /
, which is called the “escape” character. That will tell sed
to treat the /
as a normal character rather than a separator of its commands. So:
sed 's/workshop\/course/tutorial/' hello.txt
↑
This / is "escaped" with \ beforehand
This looks a little strange, but the main thing to remember is that \/
will be interpreted as the character “/” rather than the separator of sed
’s substitute command.
The output now would be what we wanted:
Hello world. How are you world?
Welcome to this tutorial.
12.3 Alternative separator: |
Instead of using the escape character, like we did above, sed
can also use the character |
to separate the two parts of the expression. Our command could have instead been written as:
sed 's|workshop/course|tutorial|' hello.txt
Hello world. How are you world?
Welcome to this tutorial.
This is a little easier to read, as we avoid using the \
escape character in our pattern to be replaced.
12.4 Removing text
The sed
command can be used to remove text from an input. The way to do it is to use nothing as the text to be replaced with. For example, if we wanted to remove the word “world” from our example file, we could do:
sed 's/ world//g' hello.txt
Or, equivalently, using the |
vertical separator:
sed 's| world||g' hello.txt
Hello. How are you?
Welcome to this workshop/course.
A few things to note in our command above:
- We included the ” ” space before “world”, to make sure we also remove it from the text.
- The second part of the
sed
substitution we left blank, that’s why we have two consecutive//
(or||
), to indicate we are replacing ” world” with nothing. - We made sure to include the
g
modifier, so that we replace both occurrences of the world “world”.
12.5 Exercises
12.6 Summary
- The
sed
tool can be used for advanced text manipulation. The “substitute” command can be used to text replacement:sed 's/pattern/replacement/options'
. - Common options that can be used with
sed
include:g
for global substitution, rather than just the first match.i
for case-insensitive substitution, rather than being case-sensitive.
- To remove part of a text we can leave the “replacement” part of the command empty:
sed 's/pattern//g'
(this would replace “pattern” with nothing, i.e. removing it). - While
sed
is extremely versatile, learning and remembering all of its operations can be challenging. Instead, effective web-searching can often lead us to solutions for not-so-trivial text manipulation problems, without the need to learn all the workings of the tool.