Software

artifact_detection

A tool automatically classifies text into natural language (e.g., English), and non-natural language text portions (e.g., stack traces, code snippets, log outputs, file listings, urls,) on a line-by-line basis using natural language processing (NLP).

It is intended to be used as a preprocessing step in NLP approaches on bug reports. The Python implementation of the machine learning classifier model, basic scripts for automated training set creation from GitHub issue tickets, a sample dataset sourced from 101 Java projects hosted on GitHub, and a scikit-learn transformer that wraps the pretrained model to be used as preprocessing step in a scikit-learn pipeline can be found on GitHub and Zenodo.

prunedSlicing

A dynamic slicer for Python 3 programs that implements our pruned slicing extension as described in our paper “Pruning Boolean Expressions to Shorten Dynamic Slices“.

This proof of concept implementation is available on GitHub and Zenodo, including a comprehensive test suite, and the TCAS, QuixBugs, and Refactory datasets used in its evaluation. The tracer is based on the Python AST package, and this current implementation is limited to tracing and slicing single-file Python programs.

fault_type_prediction

SVM and ensemble based machine learning models for software fault type prediction.

These models classify a given textual bug report into one of four categories – Semantic, Concurrency, Memory, and Other.
The implementation is based on Python 3 and scikitLearn, and is available on GitHub and Zenodo. This includes manually labeled trainingsets, documentation, examples, and evaluation scripts.