A History of Analytics - Part 2 | Enrique Javier Gallego

Archive note: Originally published 16 October 2019 and preserved from an earlier version of this site. Historical references link to current sources where available; unavailable sources remain as unlinked citations.

2000s

In 2001, William S. Cleveland publishes “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” a plan “to enlarge the major areas of technical work of the field of statistics. Because the plan is ambitious and implies substantial change, the altered field will be called data science”. Cleveland defines the proposed new discipline in the context of computer science and the contemporary work in data mining as: “…the benefit to the data analyst has been limited, because the knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of knowledge bases would produce a powerful force for innovation. This suggests that statisticians should look to computing for knowledge today just as data science looked to mathematics in the past. … department of data science should contain faculty members who devote their careers to advances in computing with data and who form partnership with computer scientists.” 2001 Leo Breiman published Statistical Modelling: The Two Cultures stating that “There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modelling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modelling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.” In April 2002, Data Science Journal was launched publishing papers on the management of data and databases in Science and Technology. The journal covered descriptions of data systems, their publication on the internet, applications and legal issues. The journal was published by Committee on Data for Science and Technology (CODATA) of the International Council for Science (ICSU). January 2003 Launch of Journal of Data Science: “By ‘Data Science’ we mean almost everything that has something to do with data: Collecting, analyzing, modelling...... yet the most important part is its applications--all sorts of applications. This journal is devoted to applications of statistical methods at large…. The Journal of Data Science will provide a platform for all data workers to present their views and exchange ideas”. In May 2005, Thomas H. Davenport, Don Cohen and Al Jacobson of Bobson College Working Knowledge Research Centre published a report Competing on Analytics describing “the emergence of a new form of competition based on the extensive use of analytics, data, and fact-based decision making... Instead of competing on traditional factors, companies are beginning to employ statistical and quantitative analysis and predictive modelling as primary elements of competition. ” This research was later published by Davenport in the Harvard Business Review (January 2006) and expanded into a book titled Competing on Analytics: the New Science of Winning (March 2007) in conjunction with Jeanne G. Harris. September 2005 The National Science Board published “Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century”. One recommendation of the report reads: “The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.” The report defines data scientists as “the information and computer scientists, database and software engineers and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection.” In 2007, Research Centre for Dataology and Data Science was established at Fudan University, Shanghai China. Two of its researchers, Yangyong Zhu and Yun Xiong in 2009, published Introduction to Dataology and Data Science stating that “Different from natural science and social science, Dataology and Data Science takes data in cyberspace as its research object. It is a new science.” The centre hosts annual symposiums on Dataology and Data Science. July 2008, JISC publishes the final report of a study it commissioned to “examine and make recommendations on the role and career development of data scientists and the associated supply of specialist data curation skills to the research community.“ The report titled “The Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs,” defines data scientists as “people who work where the research is carried out--or, in the case of data centre personnel, in close collaboration with the creators of the data--and may be involved in creative enquiry and analysis, enabling others to work with digital data, and developments in data base technology.” January 2009, A published report by the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council Harnessing the Power of Digital Data for Science and Society states that “The nation needs to identify and promote the emergence of new disciplines and specialists’ expert in addressing the complex and dynamic challenges of digital preservation, sustained access, reuse and repurposing of data. Many disciplines are seeing the emergence of a new type of data science and management expert, accomplished in the computer, information, and data sciences arenas and in another domain science. These individuals are key to the current and future success of the scientific enterprise. However, these individuals often receive little recognition for their contributions and have limited career paths.” January 2009 Google’s Chief Economist, Hal Varian tells the McKinsey Quarterly: “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades… Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it… I do think those skills—of being able to access, understand, and communicate the insights you get from data analysis—are going to be extremely important. Managers need to be able to access and understand the data themselves.” In March 2009, Kirk D. Borne and other astrophysicists submitted to the Astro2010 Decadal Survey a paper titled The Revolution in Astronomy Education: Data Science for the Masses: “Training the next generation in the fine art of deriving intelligent understanding from data is needed for the success of sciences, communities, projects, agencies, businesses, and economies. This is true for both specialists (scientists) and non-specialists (everyone else: the public, educators and students, workforce). Specialists must learn and apply new data science research techniques in order to advance our understanding of the Universe. Non-specialists require information literacy skills as productive members of the 21st century workforce, integrating foundational skills for lifelong learning in a world increasingly dominated by data.” May 2009 Mike Driscoll states in “The Three Sexy Skills of Data Geeks”: “…with the Age of Data upon us, those who can model, munge, and visually communicate data—call us statisticians or data geeks—are a hot commodity.” [Driscoll then followed up with The Seven Secrets of Successful Data Scientists in August 2010] In June 2009, Nathan Yau is seen quoting in Rise of the Data Scientist : “As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now— mentally and physically. However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts… [Ben] Fry… argues for an entirely new field that combines the skills and talents from often disjoint areas of expertise… [Computer science, mathematics, statistics, and data mining, graphic design, info is and human-computer interaction]. And after two years of highlighting visualization on Flowing Data, it seems collaborations between the fields are growing more common, but more importantly, computational information design edges closer to reality. We're seeing data scientists—people who can do it all— emerge from the rest of the pack.” June 2009 Troy Sadkowsky creates a LinkedIn account with the handle data scientists group as a companion to his website, datasceintists.com (which later became datascientists.net).