Tuesday, November 4, 2008

Whitespace In Representation and Presentation

DISCLAIMER: This discussion includes certain things which are undoubtedly studied in linguistics, a field in which I have interest, but no formal training and little knowledge. Some of the discussion below may seem trivial, or be just plain wrong. This discussion is also NOT intended as a guide to formatting your documents well. It is just an attempt to organize a few of my thoughts on this subject.

Whenever we share information, regardless of the medium, we invest some effort giving it a format and a layout. When typing emails, we insert blank lines to break up salutations from paragraphs, and paragraphs from signatures. We insert blank lines between paragraphs to imply the reader should embark on a new train of thought, with at least some independence from that of the previous paragraph.

For example, you most likely subconsciously accept this passage to be differentiated from the last one in some way. The experienced author may insert this formatting, or "meta-literature", almost as subconsciously as it is interpreted. The analog in verbal speech may be a prolonged pause between sentences, or the insertion of tone or pitch (although I'm sure this phenomenon has is respective denomination and study in linguistics). Similarly, the parentheses in the last sentence arose because I wanted to express something which wasn't related to the topic I am pursuing longer term.

These formatting elements arise because of the nature of the communication channel in question. I will assume written communication seems to be largely one-dimensional on the scale of individual words and sentences, since we read from left to right. It's ok if this assumption isn't completely true, since we can imagine forcing ourselves to read (ie, with a couple of bookmarks blotting out the surrounding sentences) in a truly left to right fashion. Out of this one-dimensional property of written speech arises the desire for parentheses, in order to notify the reader that the enclosed words are a digression of some kind. On the other hand, we insert the aforementioned blank lines in order to leverage the two dimensional nature of most reading material. Although we read, to some extent, one dimensionally, the words are physically laid out on a two dimensional surface, and blank lines leverage our human proficiency in spatial visualization. The "whitespace" gives us an immediate impression of the interconnectedness of the information laid out on the page. We know, for example, and without reading a single word, when we are looking at a list of items sharing some property, because there is a new line after each word or phrase. We automatically categorize the listed things as a collection, whose contents have yet to be read. A general observation is the amount of whitespace between any two bits of text is roughly proportional to their interrelatedness (in some context). ie,

  1. Text with NO whitespace separation: Same word.
  2. Text separated by one space: Distinct words.
  3. Text separated by a new line (for a reason other than being out of room on the current line): Distinct paragraphs, or items in a list.
Because of conventions, certain whitespace (and other formatting) patterns, are also isntantly recognized. Little bits of text on the corners of a document are traditionally things like page number, date, etc. The signature of a message is traditionally at the bottom. However, it is the whitespace separation that allows us to distinguish the "edges" of the document from the main textual content. If the text content filled the entire page with no margins, we could not leverage our spatial perceptive capabilities to isntantly look up these key "labels" on the document.

If I don't get sidetracked, my plan is to continue this discussion into recent technologies which attempt to separate data from its formatting.

Tuesday, October 28, 2008

First Blog.

This is my first use of of blogging app.