Data science as a term or area emerged in 2001, when William S. Cleveland wrote “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” As technology has become cheaper, quicker and more versatile, then Data Science has grown stronger and more popular among businesses.
The origins of Artificial intelligence (AI) date back to 1956 when John McCarthy held the first academic conference on the subject. Even before that, Greek myths contain stories of mechanical men designed to mimic our own behaviour. Alan Turing wrote a paper on the notion of machines being able to simulate human beings and the ability to do intelligent things, such as play Chess.
As our understanding of how our minds work, our understanding and definition of AI has changed. These days the world of AI is concentrating on mimicking human decision making and carrying out tasks like humans.
AI is often split into 2 groups:
General AI (also referred to as Wide AI): Systems and/or devices that can handle any task
Applied AI (Also referred to as Narrow AI): Systems/devices designed to complete a single task (intelligently) e.g. predict customer retention, play chess etc.
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. With the internet, Internet of things and big data, more data is being generated to enable machines to learn from enhancing Machine Learning further.
Within machine learning there are 3 sub-categories:
Supervised: Infers a function (model) from a labelled training data set consisting of an input and an associated output.
Un-supervised: No labelled data, it is self-organising and a method of modelling the probability density of inputs. The most widely known unsupervised model is cluster analysis which identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.
Semi-supervised: Semi-supervised learning falls in between supervised/unsupervised learning. The technique makes use of unlabelled data for training, as well as a small amount of labelled data.
Data science is a multi-disciplinary field that uses scientific methods, advanced analytics, algorithms and systems to extract knowledge, information and insight from structured and unstructured data. Harvard University labelled the profession “the sexiest job of the 21st century”, but as you can see above the theory has been around for a number of years. Oracle SQL has been around for over 40 years, within it are a number of analytical functions that enable you to do advanced analytics….
Blending ML, AI and Data Science within Oracle
Oracle has several Machine Learning and Advanced Analytics algorithms in a variety of products. Oracle Data Mining (ODM), a component of the Oracle Advanced Analytics Database Option, provides powerful data mining algorithms that enable data analysts to discover insights, make predictions and leverage their Oracle data and investment.
For the database applications, very large datasets, the theory is bringing the algorithms to the data. This then enables the utilisation of high-performance database environments to enable machine learning. Which algorithm you choose may well be driven on the question:
Do I have any / many labels or not? Can I generate enough labels?
Many of the algorithms are now available in the Autonomous Datawarehouse (ADW) Machine Learning notebooks. The notebooks enable data exploration, creation of tables/ views as well as modelling. This new approach enables machine learning without moving the data, leveraging a collaborative environment for all – Data Science.
About the Author
Dr. Abi Giles-Haigh is the Chief Data Science Officer at Vertice. She has over 10 years’ experience working with data from database management, report writing and advanced analytics. Previously she was part of the Data Analytics Team at the NHSBSA identifying savings and improving patient care. She is an Oracle ACE Director, UKOUG Technical Speaker of the Year 2016, as well as a Nominated Digital Leader 100. Abi is a technical evangelist in the field of predictive/prescriptive analytics and data mining. She holds a PhD in Computational Modelling and a Bachelor of Science in Computing Science, from Newcastle University.