Monday, January 24, 2011

Blog Post #2 - Weinberger Prologue, Ch. 1


Most important concepts

Weinberger attempts to illustrate through the example of the Staples Prototype Lab the importance of availability of information for a user. He also the extreme structural difference between information availability in the physical world vs the digital world. Finally, he includes the pitch line or hook of the rest of the book - that the structure of information availability shapes the content and character of the information itself - and how we use it.
“The physical limitations that silently guide the organization of an office supply store also guide how we organize our businesses, our government, our schools. They have guided—and limited—how we organize knowledge itself. From management structures to encyclopedias, to the courses of study we put our children through, to the way we decide what’s worth believing, we have organized our ideas with principles designed for use in a world limited by the laws of physics.”
I think it's also worth noting that these ideas are limited and organized less by the actual laws of physics than the perceived order of the physical world, which is shaped by cultural values as much as actual observable physical laws.


I've written a semantically-informed search engine before, and the concept of creating a store frontend which puts products in “every different category in which users might conceivably expect to find them.” ties into that ideal neatly; Amazon and Netflix both produce recommendations on the basis not only of the shopper's preferences, but also by association - a sort of semantic database where items are given relevance by their association. A system that knows how users group items together knows in a way what an item actually is, by association. A system like this could use a semantic engine like the one I built for the purpose of searching news stories to associate products to categories and even (with some guidance) invent new categories. If a significant subset of the individuals who purchased a shower curtain also purchased a new shower curtain, curtain rod, curtain rings, bathmat, soap dishes, etc. then the system could label this common associative group and bring it to the attention of the individuals running the storefront system. The proprietors could then label this group "new bathroom" or something similar.

I spend a good deal of time trying to keep the projects I have going at any given time organized and trying to put in time in all of them and still have a life. I tend to review what I've done regularly and try to put in a little time in every one of these projects at least every 3-4 weeks, but the "order" I've chosen doesn't really work very well for me. The projects are too disparate - two versions of Visual Studio, a couple of web servers, two different academic papers and a creative writing project plus all of the various homework assignments and extracurricular work-related activities that come up. I think the best strategy for minimizing the disorder in this would be to pick one or two and just finish them, thus minimizing the amount of time I spend juggling them.

Chapter 1

Most important concepts

Weinbergern loves the concept of miscellaneity perhaps too much; I find the concept of a sort of battle against the miscellaneous to be the most important point in this chapter. Order may not reflect the natural state of the world but it is the preferred means of human interaction with reality - we as a species have advanced from the small pack unit living and dying with the seasons solely by reorganizing both the physical reality of the world around us and our perception of it.
The quote below about photos of Aunt Sally makes a valuable point about the weight of the data we produce in modern life exceeding our capacity to organize it. This is especially problematical from the perspective of preservation - how can we save the significant portions of this information if we cannot find it?“When you have ten, twenty, or thirty thousand photos on your computer, storing a photo of Aunt Sally labeled ‘DSC00165.jpg’ is functionally the same as throwing it out, because you’ll never find it again.”
“The real world, though, limits the amount of additional data we can supply: Staples has to keep the product information labels on the shelves small enough so they won’t obscure the product; a manila folder’s label can’t have more than a few dozen characters on it without becoming illegible; and if previous students have already highlighted every other sentence in your textbook, the marks you make won’t add much information at all.”

Chapter 2

I didn't get to chapter 2 as I've still not gotten the book yet... The other two books have arrived, naturally.


  1. It is very interesting how you tied in the key points of the book, with your own personal life. Your experience building a search engine and working on web servers must give you a different, more technical perspective on Weinberger's book. All those projects sound like a lot of work. I have a simular project, building a php based dynamic website, and I understand how it is hard to finish them, but I like your goal to complete at least a couple of them, in order to minimize disorder.

  2. I agree with your point about weight of the data we produce in modern life exceeding our capacity to organize it. I have a folder with pictures from two deployments and a lot of them are not label so when I want to see pictures when I was in Baghdad I can’t find them because I failed to label the pictures correctly. So I have to open the folder which holds the pictures from Iraq and I almost have to browse to them one by one to find what I’m looking for. Sometime I don’t feel like have thrown them away since I can’t find what I’m looking for.
    I have to agree with you that Weinberger loves the concept of miscellaneous and organization. I agree with his idea of how we strive to have everything organize and work very hard at it. We live in a fast pace world that if we are not organized time slips away.

  3. Great post Tom. If others were smart, they'd "borrow" your notes cliff-notes style. Also, given you've actually written a semantically based search engine before, you're coming at this from an entirely different perspective than most of us. Feel free to pipe up in class at any point if it seems I'm full of it. I'm all theory on this one, and zero practice.

  4. Semantic search technology really is only as good as the data going into it - if you are informing or training the system with a small corpus, it'll perform well with that corpus but not much else. What any engine of that sort needs to know is how the words that appear most frequently relate to each other. It can "learn" that in two ways - by analyzing the frequency of co-occurrence of words (by sentence grouping, etc) and by watching how often its search results are used by a user. If it makes an informed guess (by throwing together a list of associations ordered by weight descending), and the user encourages it by clicking the link, it strengthens the associations that it used to make the guess.

    It's a long way from that sort of engine to the actual semi-intelligent stuff that Google's working with/on, though they're (at least in principle) the same. The difference is really the astronomically huge database that Google has to work with and the fact that they have hundreds of actual geniuses working on the problem. =)