Yesterday at work, some colleagues and I had a discussion on
databases (riveting, I’m sure, for all the cubicle dwellers around us), and
somehow the topic of NoSQL came up. I
tried to explain NoSQL and to make a case for it, but in the heat of the
moment, I had some trouble conceptualizing scenarios in which NoSQL made
sense. I’m a tad introspective, so when
I left work that afternoon the thoughts of NoSQL continued to bubble and gurgle
in my mind the whole drive home.
In an effort to practice writing a bit more, I thought, it
would be a good exercise to dump my meanderings gracefully pour those thoughts out into a
refreshing pool of insight. Here goes…
My database background is primarily relational (Relational
Database – RDB). I’m mostly self-taught
when it comes to computer-skills, and I remember disagreeing with my manager
early in my career over where or not it would be good to normalize a
one-to-many relationship. I was naïve,
and I thought it would be easier to just add 10 or so child fields to our
primary Access (!) database table. It
wasn’t immediate, but that discussion was part of the epiphany that opened my
eyes to relational design.
There is a certain beauty to relational design:
- It reduces data footprints (at least if done right) – If we think of a library-type database, it’s a lot more efficient (disk-space-wise) to store an Author ID integer for each of our 3,000,000 books, than it would be to store 3,000,000 separate ‘Author First Name’, “Author Last Name”, etc. string-based fields.
- It improves data integrity – In that same database, it would be much easy to keep up with 1 ‘Charles Dickens’ author, than it would be to pick out all the various iterations of ‘C. Dickens’, “Charle Dickings’, etc, that people might have entered as authors for the various books.
- De-normalizing data requires work – Well-designed schemas are elegant and efficient, but it does take a little effort (for man and machine) to unravel that. Database servers are very good at that (it’s almost as though they were designed specifically for the purpose of handing data), but it’s not always a trivial thing even for them….and even trivial things take their toll when you’re being asked to do them in bulk.
So let’s branch out a bit.
Library’s are great and all (unless you ask Ron Swanson), but video
games are more my speed, so I’m going to imagine a shooter-game.
There are probably players in the game, so an RDB would
likely need a Player table. There’s also
going to be a collection of available weapons, so we probably need a Guns
table.
Players will have guns, so we would want a many-to-many
relational table for that [PlayerGuns].
Guns also have ammo, but to make things interesting, there may be different
types of ammo for each gun (hollow-point, slug vs. pellet shells, etc.), so we
also need some tables to handle that ([Ammo], [PlayerGunAmmo]).
Maybe players can also customize their guns, so, maybe, to
keep it simple, the [PlayerGuns] table simply has a reference to our [GunSkins]
table, but stickers also cool, so
maybe each player-gun can have multiple stickers. So, we also need [Stickers], and [PlayerGunStickers].
Guns are only part of the equation, though, so our players
also need some [Gear] (& [PlayerGear]).
Our database design is starting to get fairly complicated
now, but, again, this is what Database Servers are good at…
I’m going to take a little intermission now, and babble
wax poetic about websites for a bit
Let’s suppose I design this super-cool, web page that
includes a “real-time” animated clock that ticks in time with the actual…well…time. Pretty awesome, right? I’m sure no one’s thought to do anything like
that before. Anyway, the way this thing
works is that a user types my URL into their browser address bar, DNS servers
track down my webserver which then receives the request, and generates a bundle
of content in response. It then ships
this content back to the user’s browser which renders the page.
Somehow (magic!), the page requests a reload every second,
and so every second, that same process repeats, and voila!, the user has a
pretty cool animated clock. Internet
speeds are good, my packet size is small (that's what she said!), and web-servers are good at serving
web-content, so it’s a pretty good user experience.
Word gets out, though, and suddenly everyone is logging on
to my page, and before you know it, my web server is having to serve up
millions of new pages every second. Before long, my page performance becomes terrible, and my 15 minutes of fame quickly
runs out as everyone grumbles about what an idiot I am.
In this scenario, I could have used JavaScript to update the
clock client-side instead of server-side.
Instead of having (potentially) millions of users all asking me (well,
my web server) to generate content, that work load can be distributed to each
person’s computer.
…There was a point to that side bar. One of the big benefits to NoSQL is that it allows
the data workload to be distributed in
a similar fashion. Data can still be
complex and sense needs to be made of it, but if we can encapsulate it well,
then we can let a million devices do some of that work instead of forcing our
database server to do it all.
To wrap things up, a relational design for hypothetical shooter
game would be good in keeping the data trim and well-controlled, but hard drive
space (cloud or otherwise) is cheap now, and the integrity of gun stickers, and
ammo types can (and likely would, at least partially) need to have some
application logic involved anyway.
In our NoSQL scenario, something like PlayerInfo.json, can
keep up with all that data (and more) in a nice, nested structure and the data
server doesn’t have to fool with connecting the dots. It’s always stored as a complete package.
This particular scenario is also good because there’s
not much interaction between the data “packages”. Maybe my player info tracks my kill-count
(but hopefully not my deaths), but even that (which tangentially involves other
players) doesn’t have to interact with “their” data. The application code can increment my kill
(or more likely killed) count without having to maintain strict transactional
considerations involved outside data.
Another benefit (though it can sometimes feel otherwise) is
a lack of strict schema definitions. In
a more rigid database, properties are well-defined, which is nice, because you
always know what you’re going to get, but if the application is prone to
changes, then maintaining a strict schema can be difficult. If we decide to add a bonus gun for everyone’s
birthday, then we need to add a Birthday field to the Player record in our database, but what do we do with
all those existing folks who clearly don’t have the (non-existent until now) Birthday
value filled in? Our application code
would have to handle that situation anyway, so it’s not overly cumbersome to
make it do so without a strict schema in place.
NoSQL isn't a magic bullet (or even an incendiary slug shell with camo skin and a smilie face sticker) for every situation, but it does have its place, and it provides a nice paradigm for distributed systems involving fairly well-isolated data.
No comments:
Post a Comment