Posted by: crudbasher | February 13, 2012

The Year Of Big Data

Increased computing power allows us to take ever larger sets of data and discover connections. This is known as data mining. The Internet is by far the largest pool of data every created and is growing at an accelerating rate. Now a lot of this data isn’t really usable yet. Things like video have a tremendous amount of data in it but it’s not really machine readable yet. This will change soon.

(cc) Thomas Hawk

From what I am reading, it looks like we are reaching a tipping point with data, where businesses are looking to it as a way to boost revenues. Of course using data isn’t a new thing in business. Nielsen ratings have been used for decades to determine commercial rates. Even so, it was just an approximation. As TV begins to move to the web, companies will be able to determine exactly who is watching, where they are from, and when you tie it to Facebook, what their interests are.

This connecting pieces of data together is really where the power is. The more data you have, the more useful it is. It really becomes more valuable than the sum of it’s parts.

Will the education community use data effectively? Well it depends on if you are taking about Micro or Macro data.

Macro data is large scale trends. Demographics for example are all about Macro data. Standardized tests are about Macro data. We break down data by race, age and economic conditions. This data is rapidly getting better and more complete. So what is this data used for? Funding mostly. In the end, the public school system isn’t about learning, it’s about funding.

The real potential now is with Micro data. This is data about the individual student and how they are doing. It’s about how each child learns, and about how well each teacher teaches. If we started tracking each child, we could match the exact right teacher to each student. Of course, that totally messes up the current school system so that won’t happen. We will be able to see how each teacher does on average in class and see if they need more help, training or discipline. That won’t happen because of the union. The union cares about only one piece of data: seniority. That’s not intended as a bash on unions. I think they do their job very well, which is to protect teacher’s jobs.

Macro data makes it easier to demand more funding without having to get into specifics so there will be more of it. Micro data creates accountability so it will be avoided. Micro data about the student is useless unless it is tied into some kind of feedback mechanism.

Even so, this is ignores one of the big changes of the Internet with is not just more data, but data transparency. The data tends to find it’s way to more and more people. Concerned parents will be able to put together their own data to start to analyze the way the schools are performing. Of course this won’t change a thing because parents and students generally have very little say about their schools operate.

What this will do is open the door for online, data driven learning. When an independent, online learning provider (notice I didn’t say school) can offer a tremendous amount of feedback and tracking about the student, I think parents will flock to it.  Khan Academy is already doing a lot of this Micro data stuff and people are flocking to it.

The government likes the big picture. Parents like the small picture. Somebody is going to come along and provide this kind of personalized, data driven learning and it will be yet another way schools will be disrupted.


    • Mo Zhou was snapped up by I.B.M. last summer, as a freshly minted Yale M.B.A., to join the technology company’s fast-growing ranks of data consultants. They help businesses make sense of an explosion of data — Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers — to guide decisions, trim costs and lift sales.
    • To exploit the data flood, America will need many more like her. A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired.
    • The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. “It’s a revolution,” says Gary King, director of Harvard’s Institute for Quantitative Social Science. “We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”
    • What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. There is a lot more data, all the time, growing at 50 percent a year, or more than doubling every two years, estimates IDC, a technology research firm. It’s not just more streams of data, but entirely new ones.
    • Data is not only becoming more available but also more understandable to computers. Most of the Big Data surge is data in the wild — unruly stuff like words, images and video on the Web and those streams of sensor data. It is called unstructured data and is not typically grist for traditional databases.


      But the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground. At the forefront are the rapidly advancing techniques of artificial intelligence like natural-language processing, pattern recognition and machine learning.

    • Data measurement, Professor Brynjolfsson explains, is the modern equivalent of the microscope. Google searches, Facebook posts and Twitter messages, for example, make it possible to measure behavior and sentiment in fine detail and as it happens.
    • Today, social-network research involves mining huge digital data sets of collective behavior online. Among the findings: people whom you know but don’t communicate with often — “weak ties,” in sociology — are the best sources of tips about job openings. They travel in slightly different social worlds than close friends, so they see opportunities you and your best friends do not.
    • Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”


      Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.”

Posted from Diigo. The rest of my favorite links are here.



  1. […] I mentioned previously, this is The Year Of Big Data. So how does this apply to Google Maps  Street View? Like the Genome project the technology is […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: