In the early 1990's as data mining was evolving from toddler to adolescent. As a community, we spent a lot of time getting the data ready for the fairly limited tools and computing power. The CRISP-DM that emerged as a result is still valid today in the era of Big Data & Stream Analytics. As the 90’s progressed, the need to standardize the lessons learned into a common methodology became increasingly acute. Two of leading tool providers of the day – SPSS and Teradata – along with three early adopter user corporations, Daimler, NCR, and OHRA convened a Special Interest Group (SIG) in 1996 and over the course of less than a year managed to codify what is still today the CRISP-DM, CRoss Industry Standard Process for Data Mining[1]. CRISP-DM was not actually the first. SAS Institute had its own version called SEMMA (Sample, Explore, Modify, Model, Assess). Nevertheless, within just a year or two many more practitioners were basing their approach on CRISP-DM. CRISP-DM MethodologyThe CRISP-DM process or methodology of CRISP-DM is described in these six major steps[2]: Business UnderstandingFocuses on understanding the project objectives and requirements from a business perspective. The analyst formulates this knowledge as a data mining problem and develops preliminary plan Data UnderstandingStarting with initial data collection, the analyst proceeds with activities to get familiar with the data, identify data quality problems & discover first insights into the data. In this phase, the analyst might also detect interesting subsets to form hypotheses for hidden information Data PreparationThe data preparation phase covers all activities to construct the final dataset from the initial raw data ModelingThe analyst evaluates, selects & applies the appropriate modelling techniques. Since some techniques like neural nets have specific requirements regarding the form of the data. There can be a loop back here to data prep EvaluationThe analyst builds & chooses models that appear to have high quality based on loss functions that were selected. The analyst them tests them to ensure that they can generalise the models against unseen data. Subsequently, the analyst also validates that the models sufficiently cover all key business issues. The end result is the selection of the champion model(s) DeploymentGenerally this will mean deploying a code representation of the model into an operating system. This also includes mechanisms to score or categorise new unseen data as it arises. The mechanism should use the new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modelling. This ensures that the model will treat new raw data in the same manner as during model development Characteristics of CRISP-DMI believe CRISP-DM’s longevity in a rapidly changing area stems from a number of characteristics:
From today’s data science perspective this seems like common sense. Data science has moved beyond predictive modeling into recommenders, text, image, and language processing, deep learning, AI, and other project types that may appear to be more non-linear. If fact, all of these projects start with business understanding. All these projects start with data that must be gathered, explored, and prepped in some way. All these projects apply a set of data science algorithms to the problem. And all these projects need to be evaluated for their ability to generalize in the real world. So yes, CRISP-DM provides strong guidance for even the most advanced of today’s data science activities. APAMLAHarvardVancouverChicagoIEEE Think Insights (October 18, 2022) CRISP-DM – A framework for Data Mining & Analysis. Retrieved from https://thinkinsights.net/data-literacy/crisp-dm/. "CRISP-DM – A framework for Data Mining & Analysis." Think Insights - October 18, 2022, https://thinkinsights.net/data-literacy/crisp-dm/ Think Insights September 25, 2018 CRISP-DM – A framework for Data Mining & Analysis., viewed October 18, 2022,<https://thinkinsights.net/data-literacy/crisp-dm/> Think Insights - CRISP-DM – A framework for Data Mining & Analysis. [Internet]. [Accessed October 18, 2022]. Available from: https://thinkinsights.net/data-literacy/crisp-dm/ "CRISP-DM – A framework for Data Mining & Analysis." Think Insights - Accessed October 18, 2022. https://thinkinsights.net/data-literacy/crisp-dm/ "CRISP-DM – A framework for Data Mining & Analysis." Think Insights [Online]. Available: https://thinkinsights.net/data-literacy/crisp-dm/. [Accessed: October 18, 2022] Yes No
× We will use your feedback to improve the quality and diversity of our content. The more feedback you provide, the better our content will be. Meanwhile, please feel free to:
|