Thursday March 7, 2002
So far on my programming project, (I'm calling it 'posterchild') I spend most of my time worrying about the codepages and escaping of specialized markup or punctuation characters. Separating content from design - or designing the separation of the separation. This is disappointing. I wish I was designing new ways to structure my content and narratives. To provide interesting context for the reader. To make recycling the garbage I've already written easier for anyone. As I tried to mention last week, this is the fault of the current crop of X related technologies. XML etc force me to process textual information with tools the equivalent of assembly language programming.
Unicode is a pretty dry subject and it's something that software engineers in charge of localization have only been forced to deal with. But I, as a not-so-humble writer, if I want to do something as simple as have curly quotes in my writing, or make sure a tool I was using to compose a piece wasn't inserting characters in my writing that would not display on someone else's computer, have to wallow in the details of Unicode or XSLT syntax.
Do you think this is a common thing for writers to deal with? Hell no. Most of them don't even have the most recent copy of Word. In the old days you had grunts to rework and misinterpret your type and/or handwriting before publishing. Think about what happened to James Joyce: fuckups who can't grok some cutting-edge shit creating entire industries of critical analysis bent on figuring out the real intentions of the writer.
I don't care about internationalization of my writing. Maybe that will matter some day but not now. I'm just doing it for the punctuation and IN-LINE MARKUP. Oh, the place that "standardization" has got us to in regard to in-line markup is one of the best yet! Most XML proponents claim that I, as a writer, (because of some 9 year-old girl that's reading my shit in Braille by the fire in a cabin in the woods), don't really want *bold* text, I want <strong> text. They say that bold is just plain wrong. It has no semantics. What bullshit. They say, "It is the job of the author to indicate the meaning and let the rendering device determine the best way to communicate the meaning of the words." That makes sense on some levels but I just want my texts to be text. If you know how to use it, text is rich enough. I'm a professional after all and I put a lot of time into rearranging words.
They (and I mean the SGML and W3C nerds that spec'd this shit out) build the tools assuming I'm some stupid "knowledge worker" writing a software manual who tries to use Word like it's a typewriter. These standards are reactionary. Because, I admit it, there are a lot of dumb people out there, bastardizing poor little web standards. But have you looked at their web pages? They suck. I can immediately spot your basic weak-ass CSS-driven box-model crap. This is the hand-me-down toolset that the poor artist gets from the business world. Some of it is genuinely soul-sucking and evil. Well, I won't give up just because it's hard. Now that I know how much a computer mangles type and that text files have a codec the same way computer video and audio have codecs, I can't ignore it. Encoding schemes are central tools to a digital artist.
I thought XML was the magic high level text encoding scheme I was waiting for. It does allow a writer to boost the meaning in their words, layering meta information over a narrative, placing hooks into the stream of words that can used later for undiscovered purposes. Its descriptive power is one level higher than HTML, and I don't have to litter my writing with programming calls to a web browser. (What a revolution!!) Everybody seems to be considering it the preferred archival format because it is text-based and human readable. But XML was the asshole that drug Unicode, and XSLT (the most fucked-up code I've seen since my attempts at 16-bit Windows GUI programming in C) in front of my face. I know of no way to work with Unicode except through programming languages. I suppose you could script a text editor to open all your files and then convert the codepages and re-save them as UTF-whatever the hell. But I doubt it would work.
XML has its own encoding problems that are !< Unicode, but the Unicode thing was something that I'd never anticipated. Now I have to escape Unicode strings into browser understandable general entities and I have to escape HTML markup inside of XML markup and I have to escape XML markup inside of XSTL markup and I have to escape XSLT markup inside of Python code. That's all so I can print the apostrophe in the first word of this sentence. There is no intelligence built into the layers. (I should mention that in the present entry, this is not exactly the case: I've decided to do this entry in UTF-8 but only utilizing "lower 128 ASCII" punctuation marks so one level of those escape sequences is reduced. Whew!) In networking these things take care of themselves: Ethernet and IP and Winsock and HTML use trUE layer abstraction. The hard stuff is done by driver programmers who manage the transition between layers. But now I'm that driver programmer.
>>> print 'I’ve had it'.encode("html-utf-8")
I’ve had it
In text processing, the distinctions I make go from the categorical difference between this being a rant and a review, the semantic difference of a summary paragraph and a body paragraph, down to the difference between how many bytes of computer memory are allocated to represent the RIGHT SINGLE QUOTATION MARK. That html-utf-8 codec listed above is obviously doing some bad byte allocation. It should have been I’ve had it. But what are you going to do? I downloaded it off some ftp server at Xerox PARC and there was NO documentation and this is the only method I've found for escaping Unicode characters into HTML entities that are above ASCII 127.
I guarantee you that nobody is going to do this for me. I don't want to count how many times I've had to remove hard line-breaks from a piece of text because it went through email or some text editor. I don't think most people know that you can do a search and replace on a piece of formatting like an end-of-line in Word but you can. Not to mention the difference between Unix and Windows line breaks.
On a previous attempt at redesigning my website I ran into Cascading Style Sheets and Netscape 4.x and it sent me reeling. I vowed to never go near HTML again and never to trust "rendering devices". Maybe this is why there are no abstraction layers between my rants and memory registers. If I trusted the next level down or up it would be fine but when you've got shit like HTML and Netscape to work with....well, fuck that. I trust Ethernet. The guys who designed that (my man Richard Johnson!) made it so what you put in one end of the pipe came out the other. Wow, how novel.
voyeurs of the world, give something back!
Nearby Entries
<prev<----
Home
----
>next>
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
Search
Categories
- blog meta info (23)
- essays (15)
- eyes (6)
- india (10)
- my book (6)
- movies (17)
- music (40)
- photos misc (59)
- cuba photos (24)
- india photos (52)
- san francisco photos (51)
- the mission (19)
- videos (25)
Archives
- May 2006
- October 2005
- September 2005
- May 2005
- April 2005
- March 2005
- February 2005
- November 2003
- October 2003
- September 2003
- August 2003
- July 2003
- June 2003
- May 2003
- April 2003
- March 2003
- February 2003
- January 2003
- December 2002
- November 2002
- October 2002
- September 2002
- August 2002
- July 2002
- June 2002
- May 2002
- April 2002
- March 2002
- February 2002
- January 2002
- December 2001
- November 2001
- October 2001
- September 2001
- August 2001
- July 2001
- June 2001
- May 2001
Recent Entries
- Act Of The Apostle Part 1 May 22, 2006 10:42 AM
- Listen to Thomas Friedman October 17, 2005 1:39 AM
- Personal Continuity September 6, 2005 2:28 AM
- White Gold May 17, 2005 5:35 PM
- Hate, lies and perverted racism May 1, 2005 3:33 PM
- Ballroom Chairs April 29, 2005 1:21 AM
- Supreme Court to pro-lifers -> deeeenied. March 24, 2005 11:02 AM
- Blurry SFO March 23, 2005 3:38 AM
- A site for the blind March 20, 2005 9:07 PM
- The tube is out! March 19, 2005 10:33 AM