Notes From a Journey into Metathought

I’ve been working a lot of hours and watching a lot of TED talks.  I’ve been thinking a lot about thought, and I’m stumbling into some “ideas worth sharing.”

There is and has been lot of talk about the semantic web.  At the same time, a lot of research is going into self-assembling structures.  Some of this research is into tangible structures that self-esemble, and some is into intangible structures that self assemble.  The semantic web is an example of the latter.  What I’m beginning to see though, is that there isn’t a lot of difference between the two.

I’m a Computer Scientist.  A good portion of my work output is abstractions.  In a way, you can think of me as a little machine that just spits out abstractions all day long and then links them together with transformations to form primative automata.

In thinking about the semantic web and all these great self-replicating/self-assembling structures in the world around us, I found myself wondering — what gives me the ability to form these abstractions?  As soon as I asked the question, I realized it was a bad one as any answer could be philosophically debatable.  So then I thought, how do I form abstractions?

Well, before I can answer that, I need a clear definition of what an abstraction is.

From Princeton’s Wordnet –

A concept or idea not associated with any specific instance; “he loved her only in the abstract–not in person”

From Wikepedia –

In computer science, abstraction is a mechanism and practice to reduce and factor out details so that one can focus on a few concepts at a time.

Those both do nicely, but a recent talk I watched inspired me to stop consulting authorities on absolute truths and operate on what I know to be true.  My definition of an abstraction is a concept, either simple or compound, that unifies other more complex concepts in some way.

In the semantic web, the tag is the perfect form of abstraction.

A tag is an identifier that acts as a commonality between multiple pieces of content.  The type, structure, source, and location of the content don’t matter.  It’s content.  It’s somewhere, and it’s linked to other content through one or more of these little identifiers.

Except a tag itself has content.  Its relative uniqueness serves as an identifier for the abstraction, but the content of the tag qualifies the commonality between the content that it annotates.

It’s becoming obvious that our definition of an abstraction relies too much on some common context between myself and the reader.  Let’s do a bit better than that.

Imagine a space of unknown or possibly even infinite dimension (okay, that’s kind of hard to do, but just roll with it).  Let’s call this thoughtspace.  Each concept or idea is a vector in thoughtspace.  If two vectors are orthogonal, they share no common component.  If they are not orthogonal there is some commonality.  These vectors are each unique pieces of information, but they each reproduce a part of the other.  That is, they’re not independent from one another.  An abstraction could be defined as the common component between two or more vectors in thoughtspace.  It is the piece of information that is not unique within the set of concepts that it abstracts.

As an aside, orthogonality is measured with a dot product.  If the dot product of two vectors is zero, they are orthogonal.  In thoughtspace we’d probably say that an abstraction between two vectors that are nearly orthogonal is very abstract, while two vectors that are nearly the same would yield an obvious abstraction.  This means that the dot product (if it were possible to compute) could be used as a quantifier for how abstract (far reaching) a given abstraction is.

Here’s the amazing thing that’s happening here.  Tags aren’t actually unique identifiers.  Remember that a tag has content.  That means that the tag itself is a vector in thoughtspace.  For a tag to be truly unique from all other tags it must be orthogonal to all other tags.  In a space of dimension greater than zero, it’s not possible for one vector to be orthogonal to all other vectors and therefore tags aren’t actually unique.  This is very powerful.

Call me a futurist, but I have a feeling that self-assembling abstractions aren’t too far away.  As long as these abstractions don’t pair up with self-assembling transformations to form self-assembling automata that put me out of a job, I’ll be very excited to see it happen!

Leave a Comment

The Google App Engine

Never before have I been so torn on a new piece of technology.  On one hand, you’ve got access to Google’s amazing infrastructure.  On the other, you’re very much confined in the way you write your app.  No piece of the App Engine highlights my confliction like the datastore.

I come from a broad backround.  Relevant to this article, I’ve done a lot of Object Oriented programming, and a lot of simple work on relational databases.  The App Engine datastore has relational features, but it is not a relational database.  I say again, not a relational database.

In creating a data model for a relational database, you think a lot about how best to normalize your model.  That is, you want to be certain that every entity (row) in your database is represenative of one, and only one real-world concept.  You get used to the idea that you can buld complex query filters based on normalized data.  You don’t worry so much about the queries you will be performing in modeling your data, instead you worry about how best to represent the data within a normalized relational model.  You figure that if you model your data properly, any query you can imagine can be ran through powerful SQL.

The datastore isn’t so nice.  You don’t necessarily have to account for all of your filtering parameters ahead of time, but it certainly helps.  See, on the datastore, you can only filter based on a single property.  This means that if you want to filter based on multiple properties, you need to find a good way to represent them as a single property on which to filter.  Wave bye to normalization.  And those joins you’ve come to find so endlessly useful?  Don’t count on them being there to help you out.

Let me save you some trouble.  First, start off modeling the data like you normally would.  Wherever you fall subject to the query limitations provide special denormalized properties for the sole purpose of filtering queries.  For instance, if you need to filter on two or more string properties, build a single property that is the concatenation of these two strings, then filter on it.  If you need to filter and order, be sure to consider the lexicographic ordering of the property you build.  Consider using the @property decorator or overriding db.Model.put() to build your composite filter properties automatically from your normalized properties.

Keep in mind the 1000 entity query limitation.  You will need to use a counter to get around this limitation — preferrably a sharded one so that you eliminate write contention.  The idea is that you store the value of the counter in a property and use that property to paginate your queries.  Of course, if you need to paginate filtered queries, you’ll need to create a filter property that is a composite of the counter value and the property/properties you want to filter on.  Again, consider lexicographic ordering if you need sorting.

If you need transactions, you’ll need entity groups.  If you read from the datastore as a sanity check before inserting/updating some information in the datastore, you need transactions.  The rule is that a transaction can only operate on entities within the same entity group.  If an entity does not have a parent, it is the root of an entity group. Two entities that have no parent are in different entity groups.

Memcache is your friend.  If you find yourself performing queries on the same entities frequently, you should consider cacheing them.  If you need any short-term persistance, you should consider memcache.

I know this seems like a lot of work to model your data, but it’s work you’d be performing anyway to achieve the kind of scalability that App Engine has to offer.

Leave a Comment

The Obligatory First Post

Oh look, a blog!  You must be so curious as to why I’m writing this blog.  You’re not?  Oh, but c’mon.  Let me tell you!

No?  Okay, fine.

Sorry.  This is the obligatory first post.  I’ll have real content soon.  I promise.

Leave a Comment