Big data is more than just large-scale databases -- it means vastly different network topologies, sometimes configured on the fly. EnterpriseNetworkingPlanet takes a look at the near future of networking.
As big data becomes increasingly prevalent in data center and cloud environments, the question of how to manage networks for the transfer of millions of records at time is more important to answer than ever.
It's not just a matter of size -- though it should not be minimized in any way that size does indeed matter when approaching big-data networking solutions -- but also a matter of workflow. Big data environments simply don't behave like typical data infrastructures used to. They really can't, given the complexity and speed of the work big-data applications have to perform.
"Traditional" data-analysis architectures assume that data won't be coming from a lot of sources and there will be plenty of time to neatly store that data in the correct table on the correct repository. When looking at networks and applications such as the ones used by Twitter, Facebook and Google, it immediately becomes clear that such an approach would make a "normal" database architecture pop like a single light bulb plugged into a nuclear reactor.
To overcome the hurdles of dealing with massive amounts of data in such a short period of time, big data users have devised a two-pronged approach to the obstacles. First, a large-scale real-time database is implemented, such as BigTable, OpenDremel, MongoDB, or Cassandra. These databases all share the feature of being non-relational: They don't depend on standardized query languages (hence their sobriquet "NoSQL") and they also do not meet all Atomic, Consistent, Isolated, and Durable (ACID) requirements that must apply to all data within a relational database.
The other half of the solution is using analytical databases, such as Hadoop, to do the work of sifting through the huge mass of data, categorizing it properly on the fly.