Why use Data Visualisation

According to the WEC (World Economic Forum), the global system produces almost 2.6 quintillion bytes of data every day,

The data creation rate increases every day, and the data produced within the last two years amounts to approximately 90% of our total recorded data!

With this ever-increasing amount of data, making sense of it becomes a challenge.

This is where data visualisation helps.

What is data visualisation?

Data visualisation is a way to communicate information using charts and tables.

Data visualisation is a technique widely used in today’s world to communicate insights from data through visual depiction.

Its main goal is to transform large datasets into visual graphics to allow for an effortless understanding of complicated connections or correlations within the data.

Commonly used terms such as information graphics, statistical graphics, and information visualisation refer to the same technique called data visualisation.

Why data visualisation?

People absorb and retain information better when portrayed visually. If the data is displayed more visually, such as through visual maps, individuals are 17% more productive and need to use 20% less mental resources.

What’s more, teams collaborating on a joint project use 10% less mental resources and are a whole 8% more productive when using visualisation tools.

Data visualisation is an indispensable part of most companies across the globe. It is because our brain can understand visual depiction much more quickly rather than a written description of something.

A graphical representation of information helps us see large datasets clearly and in a cohesive way and comprehend that information significantly faster.

It has completely changed the way how data analysis used to work previously.

It has helped in digging more insights from data while providing a more imaginative perspective about it.

Data visualisation is used in many walks of life

In the corporate environment, the use of data visualisation is extensive. Dashboards are interactive data visualation.

Dashboards portray business information to users in a way that is simple for them to absorb. Well designed dashboards should answer the initial business question and also follow on questions.

For example, a dashboard may show sales vs budget. Plus it could also show which products are contributing to those sales, which products are up YoY, which are down, etc.

In more day to day life, data visualisation is also prevalent. For example, infographics are often used to portray a message or a story in a visual form.

Building good data visualisations

Designing and building good data visusaliations is a skill.

To build business dashboards the developer needs a great understanding of the business and strong technical knowledge. This is a rare skill to find, hence it is often more efficient for businesses to turn to using a data visualisation consultancy.

Infographics are also difficult to create. Often these are developed by large agencies. Often professional graphic designers design the infographic and give it to a developer to build.

The most important aspect of an infographic to grab attention is the appearance.

This differs from dashboards where the most important aspect is the information it provides. In the business environment, where time is precious, users expect information in as few clicks as possible.

Therefore dashboards should be kept as simple as possible to ensure the key information is portrayed quickly to the user.

To make your data visualisation effective make sure to display only relevant facts with simple graphics.

Make use of colours that are pleasing to the eye. Also, keep in mind colour blindness and try to avoid a red-green colour scheme in dashboards.

In addition, incorporating interactively in dashboards increasing user experience. Enabling the user to drill-down allows them to answer multiple related business questions in once place. Finally, keep your visualisation updated with the latest data to stay relevant to your audience.

What Makes Content Viral a Buzzfeed Developer Opines

I spend more time than most (and some would argue mentally healthy) viewing virally-charged social content. As a front-end developer at BuzzFeed.com “putting together a new model for Internet journalism”, I work off of a local database crammed full of millions (yes millions) of cats, awkward situations, bears in awkward situations, science facts, corgies, a little too much Ryan Gosling, and all manner of oddities and amazingness. I discovered early in my tenure that refreshing this database was a procrastination time-bomb, but also that the tech team here is probably the most productive and happiest because we laugh and discuss these posts all day while cranking out code (it’s scientifically proven!).

With enough viral content under my belt to fill all the mason jars in a hipster’s home, it is inevitable to think about what these posts and my “wasted time” means. Why do I care? Why do I share?

There have been many attempts at describing the unique quality of virality (machine learning patterns, ability to generate discussion, igniting an emotional response, social approval), but a recent article (BuzzFeed, of course) on lucid dreaming sparked a revelation in my understanding of what all the fuss is about. As Jung noted, all dream images reveal something about yourself and humanity through the subjective lens of your prior experiences. It occurred to me that what we call viral content is really just a hyper-effective mechanism to expand that lens, and therefore test that we’re not dreaming. Bear with me here.

I propose that what gives us pleasure from these posts (and makes us want to share them) is that they represent or uncover something pleasurable that we could never have conceived with our own minds. By expanding our understanding of reality in a meaningful way, we are transformed, and naturally desire to share that experience with others. Here are some examples: After viewing AMC’s Walking Dead and a few other flicks, it easy to imagine those half-dead bundle of joys, and therefore it wouldn’t be a quality test in a dream to see a Zombie and think we’re awake. Accordingly, just imagine the reaction of sending a photo of a Zombie to a friend and saying ‘You’ve got to check this out, crazy OMG — It’s dead, BUT ALIVE!’. However, after viewing a post on Peanut Zombies, I’m struck that I could never have imagined something so pleasurably creative in so many ways. Proof I’m not dreaming, consciousness expanded, will never look at peanuts the same again. As another example, take this Ghostbusters movie photo. Boring. But introduce a photo of the actual miniaturized movie set, and again, bam, OMFG. Even when it comes to cats and animals, we’re hardwired to consume Cute, but prefer the cutest of the cute — that which expands our conception of cute. Yes, there is such a thing. If I think that it will also be proof to others that they aren’t dreaming, I’ll share. The stronger I feel this way (subconsciously or consciously), the more likely I am to hit send.

All of this begs the question of why a viral phenotype even occurs within us in the first place. Like most things in life, there’s a reason behind the reaction, and with viral content I believe it’s our species’ mechanism for making sense of the world around us and connecting across generations. By creating and spreading mind-expanding content, humans as a group learn how to adapt, conceptually explain, and revel in this thing we call life and the lives that came before. Take for example the BuzzFeed Time Machine. In a profound statement, navigating through the decades reveals that life in the 1950s was also not easy, going on set behind the scenes was just as much fun in 1960, and cats were just as cute in the 1920s. The actors change, but the movie remains remarkably the same. We’ve simply increased the velocity, upped the dosage, and expanded the scope.

Like any addiction, the bar is continually raised for what content triggers virality, particularly when you view dozens of posts per day like it’s your job (because it is my job…). As your knowledge lens grows, so does an all-consuming demand to expand it more. Welcome to our world today. We’ve entered a ‘that was so last week’ era where ‘novelty’, in the form of what we haven’t seen but could imagine seeing, is no longer enough. Our society now craves a special kind of hyper-concentrated content that can only come from completely outside our minds, and technology has advanced to a point where that urge can be immediately gratified. Cue BuzzFeed’s rapid rise to Internet stardom and the assembly of the smartest team of journalists and writers on the planet as talented guides to comb the net and produce photos, commentary, videos, and music that prove day-in-and-day-out that you’re awake and consistently capable of having your mind deliciously blown. This isn’t a fad, this is what’s next.

Node.js versus Ruby on Rails

Unlike many in the developer community, I started out toying with Node.js, and then dove into Ruby on Rails via the delightful Rails for Zombies followed by Michael Hartl’s Ruby on Rails Tutorial. The result is an unscientific comparison of (Javascript on) Node.js vs Ruby on Rails.

Node.js is not a framework!

First things first, Node.js is a server bundled with low-level routing and sysadmin capabilities and written to be utilized with Javascript. To even begin this comparison, Node.js requires some friends in the form of Express.js and Mongoose (should MongoDB be your database of choice) to align with Rails, which is written in the Ruby language.

Ruby is like a crotchety old man with a heart of gold

Now that we have that out of the way…next up is the language. Node.js is written completely in Javascript, while Rails is written in Ruby. In learning both, I find pros and cons for each. On the benefits side, Javascript has an easier syntax to pick up quickly, since you probably know some from front-end coding (as opposed to Rails’ :VARIABLE => hash-bangi-ness, the fact that functions don’t require parenthesis, VARIABLE do |iterator|, :: namespace resolution, < class declaration, and @local parameters borrowed from Perl). However, Ruby can be more succinct and does not require a forrest of brackets to work its magic (coffeescript readers out there, calm yourselves…). Additionally, Ruby (and rails) comes packed with extremely useful built-in functions surrounding everything from dates (think syntax like 10.minutes.ago) to "string".squeeze and ['array' 'elements'].include?('array') #true. Yes, Javascript can accomplish all of this, but it requires some know-how and a few custom prototyped methods…or something like _underscore.

100% Javascript Rocks, but is ultimately overrated

With the languages out of the way, we can turn our attention to actually building web applications. Node.js shines because it enables a front-to-back Javascript environment, where in Ruby you’re forced to contend with SQL, Ruby, and Javascript. While seemingly a superior and streamlined experience, after playing around with both, I’d put the value in the ‘overrated’ category. It really isn’t hard to pick up on Ruby once you’ve ready a number of sample programs and a few blog posts. Additionally, with Ruby, you can almost completely forget about callbacks and the nested soup that Javascript brings to the table because of it’s non-blocking architecture (which I do realize is a plus for certain performance gains).

Rails gives you the full package

Putting languages and stacks aside, Node.js + Express.js offers routing along with free-form controllers, views, and helpers. These tools pale in comparison to Rails. To make an analogy, Rails is to Node.js + Express.js what Muji is to American Apparel. Both will get you clothing and home accessories, but American Apparel’s clothing looks kind of wonky, doesn’t always have what you need, tries to be a bit too cool for its britches, and doesn’t always play nice with other items in the store. Muji, on the other hand, has obsessively designed products that were seemingly cut from the very same cloth.

To get back to programming…Rails comes with a fulsome ecosystem from models to views, controllers, to data object models (which are mostly hidden btw), with extremely clear and concise interactions that can only occur with crazily OCD vertical integration. Two examples: url_for and image_tag. url_for is a built-in function that will always provide the url for a given data object, such as @products, and image_tag pulls in the correct image url from the asset pipeline. While the same is possible with Node.js, it would require some heavy module interaction and helpers. Additionally, in rails, you just don’t need to worry about your data models. While Mongoose on Node.js helps, it’s still tough to really feel like you’re fully taking advantage of easy access like the Product.find(params[:id]) that are easily available throughout your Rails application, modules, controllers, and views.

However, it’s important to note here that these integrations do come with a downside.  Often I feel like I’m guessing at the ‘correct’ way to do something, always missing out on the most efficient way.  Additionally, when something goes wrong, it’s quite a bit more difficult to find out why, since Rails will read into specific variable names, like how you can add _path to new_product_path and have it magically turn into a function that returns the appropriate path for the `new` route within the product controller/model. With Node.js, what you see is what you get, and there is very little interpolation.

More people make and improve Rails modules

Next up is the ecosystem. While Node.js has an excellent start with NPM, the dust hasn’t settled on the most popular packages for various needs (the number of contenders is starting to get overwhelming).  I often feel like I’m clodging together a broken vase with superglue.  Rails, on the other hand, has one of the most robust (and free) sources for libraries and tools. I love The Ruby Toolbox, which actually shows you the relative popularity of Rails tools broken out by function. Additionally, as I mentioned before, many modules that would have to be hand-picked in Node.js are integrated by default in Rails. Stuff like database connectivity, concatenation, cache-busting, database migrations, data models, routing, command line sandboxing/tasks, and testing, etc… It even comes preset with a favicon.ico! These may seem like trivialities, but they’re all 100% necessary…and it’s nice not to have to wade through error messages from conflicting Node.js modules that don’t want to talk to each other at 12am. Layered on top of the modules are more tutorials and examples than one can typically find with node.js. Case in point, Railscast‘s 300+ video tutorials. Although, be forewarned, Rails has been around longer and seems to change faster, so many of the tutorials are out-of-date…or in some cases workarounds for features added after the article was written…as a result, it is prudent to use Google’s date selector for articles < 1 year old and start your search with Rails Guide.

Rails includes a jet pack, standard

Finally, assuming you have equivalent frameworks, database, and modules all set up, Node.js just takes much more time to string everything together (again because of the disparity in 3rd party plugins and lack of any inherent structure). In rails, putting together a complete RESTful interface takes one command line input for all of the controllers, views, and database models. rails generate scaffold Post name:string title:string content:text #and you're done
In many cases, it’s 1 minute versus 1 hour (which adds up if you have 4-5 database objects to contend with). It’s not necessarily hard work, but very prone to typos and simply a lot to remember. I don’t see myself always using Rails scaffolds, but for simple prototypes, it’s unlikely there’s a faster alternative.

In Conclusion…

Node.js is a great set of tools that are hip, growing rapidly with a dedicated userbase, (debatably) high performance, easy(er) to pick up, and better than most other web building apps out there. However, it’s really just a talented high school track star to Ruby on Rails’ Olympian. No offense to Node.js, but who would you rather train with? If you can get over the pain (and annoyance!) of learning Ruby, the choice is clear.  Remember, at least for me, the best tools are the ones that help you launch better and faster.

Avoiding pitfalls when setting up Heroku for Node.js on Ubuntu

Heroku has an excellent introduction to deploying your first node.js project on their PaaS-tastic Cedar stack. However, there are a few key points to keep in mind:

  1. If you’re new to Heroku and focused exclusively on Node.js, then you’ll likely be developing on Ubuntu and need to install the Heroku command-line client. You’ll find instructions on what to type into the console here. However, if you’re using a default Ubuntu install, then you’ll probably need to use the sudo command on these…and you may get an error when typing in the second line

$ curl http://toolbelt.herokuapp.com/apt/release.key | apt-key add –

that says

$ % Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 2442 100 2442 0 0 2273 0 0:00:01 0:00:01 –:–:– 5856

gpg: no writable keyring found: eof

gpg: error reading `-‘: general error

gpg: import from `-‘ failed: general error

This is an easy one to fix, and is simply a permissions error on the apt-key program. Try this code, and it should work

$ curl http://toolbelt.herokuapp.com/apt/release.key | sudo apt-key add –

  1. Next, after a seemingly successful install of the Heroku command-line client, when typing in

$ heroku login

You might get back

$ /usr/bin/env: ruby: No such file or directory

This is because Heroku requires Ruby to run. Luckily, again, an easy fix…just make sure not to install more than you need (i.e., you definitely don’t want to install Apache or Ruby servers, etc.) if you stumble upon this site. This should be the only installation you need to get going

$ sudo aptitude install ruby build-essential libopenssl-ruby ruby1.8-dev

What are the odds of an internet startup being funded

Every new entrepreneur’s dream is raising millions of dollars from angels, vcs…and maybe even an IPO, but what are the chances of actually landing that cash from institutional investors? Is it a realistic goal or only a pipe dream?

I’ll be using TechCrunch’s CrunchBase database to find an answer this question (and a few more down the road), so if you’re not familiar with the database swing by and take a look before continuing. Here we go:

CrunchBase contains a total of 71,803 companies of all stripes and sizes, fully maintained and annotated by whomever would like to drop their knowledge. Like Wikipedia, there’s always a chance of incorrect data, but I’ve always found it to be reliable, sourced, and largely accurate (especially in the aggregate). Of that sample, 12,925 companies have funding attributed to their profile.1 That means roughly 18% of all companies submitted to CrunchBase have received some sort of funding. For reference, that’s about the same odds of landing undergraduate admission to Cornell University (not bad!).2 But what if we take a look at the funding distribution?

Despite what the media may wish you to believe, not every startup raises $1 Billion and receives an offer from Google.3 In fact, the number is more around 23 (0.18% of funded companies).4 Let’s take a look at the data visually:5

Funding Bar chart here

There are two main points that jump out:

  1. The majority of aggregate funding occurs below $3 Million. This amounts to just over 50% of companies who receive funding in the $0 to $20 Million range (a.k.a., reality).6
  2. Those spikes you see? They’re from investors that tend to make it $rain$ in increments of $1MM (and to a lesser extent $500K). Just one of the quirks that makes data analysis fun.

So what does this all mean for your startup? If you’re already at the top of your class by receiving funding, there’s a good chance you’ll raise less than $3 Million in your company’s lifetime…and if we want to be honest with ourselves, roughly 1 in 4 funded startups will never see more than $1 Million.7

This post is part of a new series I affectionately call “BubbleCrunch”, which seeks to analyze the mountains of data maintained by TechCrunch’s impressive CrunchBase as of August 2nd, 2011. Have questions? Want me to dig into something? Drop a comment!

  1. I know, I know, not all companies publicly post their funding or have been categorized by the CrunchBase Bots
  2. http://www.thedailybeast.com/galleries/2011/02/23/difficult-colleges.html
  3. although it has been pretty common of late
  4. defined as those receiving over $500 Million
  5. many apologies for the lame charts…part of my BubbleCrunch series will be exploring javascript visualization tools, but unfortunately for now I’m stuck with Excel–which doesn’t even have Helvetica!
  6. this histogram represents around 80% of companies funded, and it includes the aggregate amount of funding since the company was founded. In future posts I’ll adjust for inflation, age of company, time period, blah blah blah, but for now, let’s just enjoy the journey
  7. just like any statistical analysis, it’s easy to draw conclusions that aren’t correct. Obviously just dropping your company on CrunchBase won’t guarantee you a chance of getting funding. Additionally, some startups never need to raise funding or are sold before they need to raise more than $1 Million. Let’s stay optimistic!