Gmail’s Archaic Export

I’ve been working on webaddressbook’s heretofore unannounced contact synchronization feature recently, and ran into a grindingly bad problem today trying to parse Gmail’s contact export CSV file.

According to Gmail, their format is better than the Outlook format because it allows you to preserve different character encodings.

Gmail CSV (for import into another Gmail account): formats your contacts’ information so you can easily import it into another Gmail account:

This option encodes your CSV file in Unicode/UTF-8.

Well and good. Unfortunately, as the hours of my life wasted just now on this fallacious supposition testify, the claim simply isn’t true. I tried every trick in python to convert the file from utf8, to no avail. It was clear that the file wasn’t utf8; its contents looked like a binary from the utf8-friendly bash shell; gEdit couldn’t make heads or tails of it. Only OpenOffice Calc could display the contents, and only then when the encoding was set to Unicode, not UTF-8.

The first 10 bytes of the file for your perusal:

\xff\xfeN\x00a\x00m\x00e\x00

Now, what encoding would add the zero suffix ‘\x00′ to every ascii character? Not a smart one like utf8, never. I Googled. And I couldn’t believe my eyes.

The excellent response indicated the encoding was Little Endian UTF-16.

WHAT YOU SAY???

It seems like the worst possible choice ever, from a portability standpoint. I thought that everyone had already agreed that utf8 would be the spiritual successor of ascii - compact, fully representative, and portable. Not only did the Gmail team choose go disregard this standard, they misrepresented the real format they chose in the end-user documentation!

I hate to be down on Gmail, but really, that sucked. I hope I never have to spend this long buggering around with another Google goof.

Comments are closed.


Bad Behavior has blocked 824 access attempts in the last 7 days.