· unix

Mac OS X: GNU sed - Hex string replacement / replacing new line characters

Recently I was working with a CSV file which contained both Windows and Unix line endings which was making it difficult to work with.

The actual line endings were HEX '0A0D' i.e. Windows line breaks but there were also HEX 'OA' i.e. Unix line breaks within one of the columns.

I wanted to get rid of the Unix line breaks and discovered that you can do HEX sequence replacement using the GNU version of sed - unfortunately the Mac ships with the BSD version which doesn't have this functionaltiy.

The first step was therefore to install the GNU version of sed.

brew install coreutils
brew install gnu-sed --with-default-names

I wanted to replace my system sed so that's why I went with the '--with-default-names' flag - without that flag I believe the sed installation would be accessible as 'gs-sed'.

The following is an example of what the lines in the file look like:

$ echo -e "Hello\x0AMark\x0A\x0D"

We want to get rid of the new line in between 'Hello' and 'Mark' but leave the other one be. I adapted one of the commands from this tutorial to look for lines which end in '0A' where that isn't followed by a '0D':

$ echo -e "Hello\x0AMark\x0A\x0D" | \
  sed 'N;/\x0A[^\x0D]/s/\n/ /'
Hello Mark

Let's go through the parts of the sed command:

Now let's check it works if we have multiple lines that we want to squash:

$ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D"

$ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D" | \
  sed 'N;/\x0A[^\x0D]/s/\n/ /'
Hello Mark
Hello Michael

Looks good! The actual file is a bit more nuanced so I've still got a bit more work to do but this is a good start.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket