Character Gremlins - Line Endings for Dummies
You’re testing out a new service, you need to run over ‘https’ and you need to get an SSL certificate onto the testing box. Your server is running Linux, but you’re a Windows guy so you fire up PuTTY, open the certificate in Notepad and copy and paste the text of the certificate into the ssh session. The certificate isn’t valid. You ask your admin, and they just
scp the file over, so now it works. You
cat the contents and it looks the same as what you’re viewing in Notepad. What did you do wrong?
The most likely issue involving copy/paste and going between Windows and other operating systems is line endings. It may be obvious that text contains characters like ‘a’ or ‘z’, but it can also contain characters like a space, a tab, or what we call a newline, which is a character that means the line of text is done and a new one should be started.
On Windows, this is actually represented by two characters, a carriage return (CR) followed by a line feed (LF). Virtually all other modern operating systems, including Linux and MacOS, use only the LF character.
So the problem is that there is an invisible piece of extra data in our certificate. The opposite situation can also happen: LF-only line endings on Windows computers not being properly understood. Luckily, the solution is pretty simple.
Confirming the issue
fileprogram can tell you what the content of a file is, including encoding and line endings.
hexdumpprogram can show the content of the file in hexadecimal format. Using the
-Cflag will show hex as well as ASCII representation (in which newlines will appear as a .)
- Text editors such as
Visual Studio Code,
Notepad++also have ways of displaying unprintable characters.
Fixing the issue
The simplest solution is to use very simple programs called
dos2unix (and its inverse
unix2dos) to convert a file from one line-ending format to another. This can generally be installed through package managers such as
apt under the name
dos2unix. The homepage, including downloadable binaries, can also be found here.
Other Similar Issues
We often refer to these types of files as plain text files, but it’s a little more complicated than that. Textual files still have some sort of encoding, such as ASCII or UTF-8. Each character is represented by a sequence of bytes, which may differ between encodings. If an application is expecting a specific type of encoding and gets a different one, this can cause all kinds of problems. Using the
file program can help show the encoding.
Some programs replace regular double or single quotes with angled quotes. If you’ve copied and pasted or edited your text in an application like Microsoft Word, Pages or some email clients, you may have quotes that the intended program doesn’t understand.
One of benefit of using a software like Jungle Disk to share files is that you can ensure the exact file makes its way to the destination without any alterations. Other options include copying files via SCP or via source control systems like Git which have settings for translating to the correct format depending on your operating system. If you’re using FTP, you’ll need to make sure you understand when to use binary mode (copy the file literally) vs ascii mode (use native line endings for the OS you’re currently on).
Line-ending and encoding issues are easy to diagnose and fix, as long as you remember they exist. The next time everything looks correct, but a file still isn’t being understood or parsed properly, be sure and check the line endings.