Monday, November 20, 2017

Rose-Tinted Glasses by Any Other Name

I read an interesting blog post the other day.


Although I generally agree with his position, I found the author's style a bit off-putting, because he seemed to just assume that there is a clear white and black/ right and wrong nature to "right" and "left" thinking with regard to scholarship. That seemed a little simplistic to me, but, overall, I thought his point was a good one.


I had heard rumors of black-listing against conservative-minded professors in academia, and while the author (somewhat tacitly?) acknowledges that, he goes deeper than that. It's easy to see those sorts of statistics feel frustrated, but I appreciate that Treadgold dug deeper than conspiracies and victimhood, and looked to address root causes.


This statement was a real eye-opener:


The truth is that non-leftists are discriminated against not so much because of their politics (which they can often hide) as because of their failure to do the kind of scholarship that hiring committees want.


As a free-market guy, the statement doesn't bother me so much on the surface, but the implications behind it cause me consternation.


Reading the article and its explanations of the different philosophical approaches to scholarship and historical analysis brought to my mind advertising through contrasts.


Imagine I want to sell something for $3.00, and my friend offers to sell the same thing for $2.00.


Is his price 50% lower or 33% lower?


Well, it could be either. His price is 50% {of his price} lower than mine (as he would do well to emphasize), but it's *only* 33% {of my price} lower (as I might try to point out).


The problem is one of ambiguity, and it's a complication that's baked in to the very nature of the situation. When comparing and (more importantly) contrasting two distinct contexts or frames of reference, then there is no objectively neutral context to serve as the "standard."


In advertising, is your product or the other guy's the standard-bearer for deriving your calculations for quantifying how much better, brighter, cleaner, etc. yours may be? Well, it probably depends on which one makes for a more favorable percentage.


Scholarship should be more intellectually honest than that, though. Maybe I'm biased, but it seems to me that it's more consistent (and charitable) to read history through those actors' contemporary lenses rather than our own.


I can definitely see interest in juxtaposing Shakespeare, et al. against the politics and mores of our present age, but I think we're doing a tremendous disservice to those historical figures as well as ourselves if our academic pursuits stop there and we only analyze things through the lenses we are already comfortable with.


Friday, November 3, 2017

Making a case for NoSQL

Yesterday at work, some colleagues and I had a discussion on databases (riveting, I’m sure, for all the cubicle dwellers around us), and somehow the topic of NoSQL came up.   I tried to explain NoSQL and to make a case for it, but in the heat of the moment, I had some trouble conceptualizing scenarios in which NoSQL made sense.   I’m a tad introspective, so when I left work that afternoon the thoughts of NoSQL continued to bubble and gurgle in my mind the whole drive home.

In an effort to practice writing a bit more, I thought, it would be a good exercise to dump my meanderings   gracefully pour those thoughts out into a refreshing pool of insight.  Here goes…

My database background is primarily relational (Relational Database – RDB).   I’m mostly self-taught when it comes to computer-skills, and I remember disagreeing with my manager early in my career over where or not it would be good to normalize a one-to-many relationship.   I was naïve, and I thought it would be easier to just add 10 or so child fields to our primary Access (!) database table.   It wasn’t immediate, but that discussion was part of the epiphany that opened my eyes to relational design.

There is a certain beauty to relational design: 
  • It reduces data footprints (at least if done right) – If we think of a library-type database, it’s a lot more efficient (disk-space-wise) to store an Author ID integer for each of our 3,000,000 books, than it would be to store 3,000,000 separate ‘Author First Name’, “Author Last Name”, etc. string-based fields.
  • It improves data integrity – In that same database, it would be much easy to keep up with 1 ‘Charles Dickens’ author, than it would be to pick out all the various iterations of ‘C. Dickens’, “Charle Dickings’, etc, that people might have entered as authors for the various books.

 That doesn’t come without a cost, though:
  • De-normalizing data requires work – Well-designed schemas are elegant and efficient, but it does take a little effort (for man and machine) to unravel that.  Database servers are very good at that (it’s almost as though they were designed specifically for the purpose of handing data), but it’s not always a trivial thing even for them….and even trivial things take their toll when you’re being asked to do them in bulk.

So let’s branch out a bit.   Library’s are great and all (unless you ask Ron Swanson), but video games are more my speed, so I’m going to imagine a shooter-game.

There are probably players in the game, so an RDB would likely need a Player table.   There’s also going to be a collection of available weapons, so we probably need a Guns table.

Players will have guns, so we would want a many-to-many relational table for that [PlayerGuns].

Guns also have ammo, but to make things interesting, there may be different types of ammo for each gun (hollow-point, slug vs. pellet shells, etc.), so we also need some tables to handle that ([Ammo], [PlayerGunAmmo]).

Maybe players can also customize their guns, so, maybe, to keep it simple, the [PlayerGuns] table simply has a reference to our [GunSkins] table, but stickers also cool, so maybe each player-gun can have multiple stickers.  So, we also need [Stickers], and [PlayerGunStickers].

Guns are only part of the equation, though, so our players also need some [Gear] (& [PlayerGear]). 

Our database design is starting to get fairly complicated now, but, again, this is what Database Servers are good at…

I’m going to take a little intermission now, and babble wax poetic about websites for a bit

Let’s suppose I design this super-cool, web page that includes a “real-time” animated clock that ticks in time with the actual…well…time.  Pretty awesome, right?  I’m sure no one’s thought to do anything like that before.  Anyway, the way this thing works is that a user types my URL into their browser address bar, DNS servers track down my webserver which then receives the request, and generates a bundle of content in response.  It then ships this content back to the user’s browser which renders the page.

Somehow (magic!), the page requests a reload every second, and so every second, that same process repeats, and voila!, the user has a pretty cool animated clock.  Internet speeds are good, my packet size is small (that's what she said!), and web-servers are good at serving web-content, so it’s a pretty good user experience.

Word gets out, though, and suddenly everyone is logging on to my page, and before you know it, my web server is having to serve up millions of new pages every second.   Before long, my page performance becomes terrible, and my 15 minutes of fame quickly runs out as everyone grumbles about what an idiot I am.

In this scenario, I could have used JavaScript to update the clock client-side instead of server-side.  Instead of having (potentially) millions of users all asking me (well, my web server) to generate content, that work load can be distributed to each person’s computer.

…There was a point to that side bar.   One of the big benefits to NoSQL is that it allows the data workload to be distributed in a similar fashion.   Data can still be complex and sense needs to be made of it, but if we can encapsulate it well, then we can let a million devices do some of that work instead of forcing our database server to do it all.

To wrap things up, a relational design for hypothetical shooter game would be good in keeping the data trim and well-controlled, but hard drive space (cloud or otherwise) is cheap now, and the integrity of gun stickers, and ammo types can (and likely would, at least partially) need to have some application logic involved anyway.
 
In our NoSQL scenario, something like PlayerInfo.json, can keep up with all that data (and more) in a nice, nested structure and the data server doesn’t have to fool with connecting the dots.   It’s always stored as a complete package.   

This particular scenario is also good because there’s not much interaction between the data “packages”.   Maybe my player info tracks my kill-count (but hopefully not my deaths), but even that (which tangentially involves other players) doesn’t have to interact with “their” data.   The application code can increment my kill (or more likely killed) count without having to maintain strict transactional considerations involved outside data.


Another benefit (though it can sometimes feel otherwise) is a lack of strict schema definitions.   In a more rigid database, properties are well-defined, which is nice, because you always know what you’re going to get, but if the application is prone to changes, then maintaining a strict schema can be difficult.   If we decide to add a bonus gun for everyone’s birthday, then we need to add a Birthday field to the Player record in our database, but what do we do with all those existing folks who clearly don’t have the (non-existent until now) Birthday value filled in?  Our application code would have to handle that situation anyway, so it’s not overly cumbersome to make it do so without a strict schema in place.

NoSQL isn't a magic bullet (or  even an incendiary slug shell with camo skin and a smilie face sticker) for every situation, but it does have its place, and it provides a nice paradigm for distributed systems involving fairly well-isolated data.