Causal Discovery is a relatively recent form of statistical analysis for determining causa and effect relationships among a set of data. The output of such an analysis is a causal structure graph, where each variable in the data is a node, and discovered causes are edges between 2 nodes. There is an open-source tool called Tetrad that applies Causal Discovery to datasets to produce causal structure graphs.
Our research goal was to apply Causal Discovery to software and systems project data to determine the causes of software and systems effort and schedule. We ran "out of the box" Tetrad analyses on the COCOMO(R) II and COSYSMO 3.0 calibration datasets, with 161 and 68 projects, respectively. In each case, Tetrad found only a few variables that caused effort, perhaps because our datasets had few projects.
Tetrad can also run multiple searches on variants of the original dataset (bootstrapping) reporting the frequency with which a causal edge occurs between a pair of nodes. We have employed an innovation: inserting noise variables into the dataset and examining their causal structure in order to determine what cutoff for edge frequency would eliminate randomly-arising edges, and then applying that cutoff to produce the resulting causal graph). We explored 2 variants of this approach.
We show an example causal graph for each of our different approaches. We thereby show the specific causes of effort that we determined, and discuss the results of our investigation of frequency-based analysis.
Presenters
Anandi Hira
Anandi Hira recently completed her PhD under Dr. Barry Boehm at University of Southern California’s (USC) Computer Science Department. Her research interests include software metrics and its application to project management, software cost estimation, and software process improvement. She had been a part of the Unified Code Count (UCC) development effort at USC CSSE and collected and analyzed the data to calibrate the COCOMO® II model to include functional size metrics.
James P Alstad
Dr. Jim Alstad recently received his PhD from the University of Southern California, with his thesis topic being “COSYSMO 3.0: An Extended, Unified Cost Estimating Model for Systems Engineering”. His thesis was based in part on workshops held at previous ITSWCost Fora. Jim has been working with Anandi and Mike applying Causal Discovery to cost model calibration data since 2017. Previously, Jim had a 32-year career in software engineering at Boeing Satellite Systems. At Boeing Jim developed satellite flight software on a variety of systems. He holds a patent on a satellite algorithm, “Fast Pair Catalog Access”.
Michael Konrad
Dr. Michael Konrad is a Principal Researcher at the SEI providing analytic support to various projects using statistics, machine learning, and recently, causal learning. Since 2013, Konrad has contributed to research in requirements engineering, software architecture, and complexity measurement. From 1998 to 2013, he contributed to CMMI as manager, chief architect, and configuration manager. He is coauthor of the CMMI for Development book. Prior to 1998, Konrad contributed to CMM, SDCE, and ISO 15504; receiving his Ph.D. in Mathematics from Ohio University in 1978.