# let's generate some synthetic data
objSDG <- syntheticDataCreator()
EventLog <- objSDG$cohort.RT4(numOfPat = 200)
head(EventLog) %>%
kbl() %>%
kable_minimal()
ID | Event | Date | Priority | LINAC | Sex | Age | Hospital | Stop_Reason |
---|---|---|---|---|---|---|---|---|
1 | Medical Visit | 2001-01-06 | 3 | TO3 | Male | 45 | 1 | NA |
1 | Chemotherapy | 2001-01-13 | 3 | TO3 | Male | 45 | 1 | NA |
1 | RT Start | 2001-04-01 | 3 | TO3 | Male | 45 | 1 | NA |
1 | Clinical Suspension | 2001-04-13 | 3 | TO3 | Male | 45 | 1 | 1 |
1 | Suspension | 2001-04-20 | 3 | TO3 | Male | 45 | 1 | 3 |
1 | RT End | 2001-05-08 | 3 | TO3 | Male | 45 | 1 | 3 |
The Care Flow Miner class class enables the creation of a graph outlining the most frequent paths and allows inferential analysis to be done on it.
The first step is to create an object of the Care Flow Miner class and load the EventLog.
Loading the Event Log within the CFM object must be done using the list generated by the getData method of the dataloader class. For this reason we will first go to create a DL object to perform the loading.
#DataLoader Class initialization:
obj.DL<-dataLoader(verbose.mode = FALSE)
obj.DL$load.csv(nomeFile = "../Data/EL_CFM_Demo.csv",IDName = "ID",EVENTName = "Event",dateColumnName = "Date",format.column.date = "%Y-%m-%d")
#obj.out is the one we need to load the data into the CFM class
obj.out<-obj.DL$getData()
#CareFlowMiner Class initialization:
objCFM<-careFlowMiner()
objCFM$loadDataset(obj.out)
The plotCFGraph() method allows the creation of a graph representing most typical paths in the Event Log.
The creation of the graph is done using the Careflow Miner (CFM) algorithm, which extracts the more “frequent” careflows from process data. CFM algorithm is inspired by sequential pattern mining techniques. To assess the frequency of a particular trace, the algorithm relies on the notion of support. Sequence support is a proportion defined as the number of patients (NS) experiencing a specific sequence (S) of events over the total number of patients in the analyzed population (N). We define “frequent” patters as those with support above a certain user-defined threshold.
The other important parameter of the CFM algorithm is the “maximum length” parameter, which represents a constraint on the maximum number of events included in the careflow.
The plotCFgraph function needs a certain set of specified input:
The plotCFGraph function returns a list. The element of the list that is useful for graph plotting is the “script” element, which, when given as input to the grViz function of the DiagrammR package, allows for the visualization of the process model.
plot.list<-objCFM$plotCFGraph(depth = Inf, abs.threshold = 1)
grViz(plot.list$script)
plot.list<-objCFM$plotCFGraph(depth = Inf, abs.threshold = 10)
grViz(plot.list$script)
plot.list<-objCFM$plotCFGraph(depth = Inf, abs.threshold = 40)
grViz(plot.list$script)
plot.list<-objCFM$plotCFGraph(depth = 3, abs.threshold = 10)
grViz(plot.list$script)
In the default input configuration of the plotCFGraph() function, the plotted node information includes only the event label and the number of patients that pass through the event.
On each edge, are plotted the informations about the support and confidence, calculated as follows:
the support represents the number of patients who transition from a specific starting node to the next node over the total number of patients who pass through that specific starting node;
the confidence represents the number of patients who transition from the starting node to the next node over the total number of patients.
It is possible to enrich the graph with additional information.
Let’s assume, for example, that we want to plot the ID of each node and the median of the times associated with the “root” node. Additionally, we want to use the median times to determine the color of the graph according to the median time from root (darker colors represent longer median times).
out.list<-objCFM$plotCFGraph(depth = Inf, abs.threshold = 2,
printNodeID = T,
show.median.time.from.root = T,
heatmap.based.on.median.time = c(10,50,100),
heatmap.base.color = "Gold")
grViz(out.list$script)
The plotCFgraph() function allows through the use of specific inputs to enrich the graph with information about the probability of incurring a certain future state. Specifically:
out.list<-objCFM$plotCFGraph(depth = Inf, abs.threshold = 2,
predictive.model = TRUE,
predictive.model.outcome = "Technical Suspension",arr.States.color = c("Technical Suspension"="Red"))
grViz(out.list$script)
The CFM implementation in pMineR enables the original version of the technique to be enhanced with several features intended to combine the benefits of Process Discovery with those of inferential statistics.
This is accomplished by splitting the population into two sub-cohorts by the value of a specific event attribute. The Care Flow Mining algorithm is then applied on each sub-cohort , thus creating two different outputs. Given the two different CFMs, these can be compared based on several parameters:
inf.out1<-objCFM$plotCFGraphComparison(depth = Inf,abs.threshold = 10,
stratifyFor = "Priority",
stratificationValues = c("1","4"),
fisher.threshold = 0.005,
kindOfGraph = "dot",nodeShape = "square")
grViz(inf.out1$script)
inf.out2<-objCFM$plotCFGraphComparison(depth = Inf,abs.threshold = 10,
stratifyFor = "LINAC",
stratificationValues = c("CY","TO3"),
fisher.threshold = 0.005,
kindOfGraph = "dot",nodeShape = "square")
grViz(inf.out2$script)
inf.out3<-objCFM$plotCFGraphComparison(depth = Inf,abs.threshold = 10,
stratifyFor = "LINAC",
stratificationValues = c("CY","TO3"),
checkDurationFromRoot = T,
fisher.threshold = 0.005,
kindOfGraph = "dot",nodeShape = "square")
grViz(inf.out3$script)
inf.out4 <- objCFM$plotCFGraphComparison(depth = Inf,abs.threshold = 10,
stratifyFor = "LINAC",arr.stratificationValues.A = c("CY","VE"),
arr.stratificationValues.B = c("TO","TO3"),
hitsMeansReachAGivenFinalState = TRUE,
fisher.threshold = 0.005,
arr.States.color=c("Technical Suspension"="Red"),finalStateForHits = "Technical Suspension",
kindOfGraph = "dot",nodeShape = "square")
grViz(inf.out4$script)
Here is an example of a use case where the use of careflow miner as a Process Discovery algorithm is not preferred. In this example, there is an initial phase where there is a lot of heterogeneity among events. This results in the creation of multiple possible branches that prevent the merging of the final part of the workflow, despite its similarity.
obj.DC<-syntheticDataCreator()
new.DL<-obj.DC$cohort.RT(numOfPat = 150,giveBack = "dataLoader")
new.out<-new.DL$getData()
new.CFM<-careFlowMiner()
new.CFM$loadDataset(new.out)
script<-new.CFM$plotCFGraph(depth = Inf,abs.threshold = 10)
grViz(script$script)
In this case, the model generated by the FOMM algorithm could be a better choice.
objFOMM<-firstOrderMarkovModel()
objFOMM$loadDataset(new.out)
objFOMM$trainModel()
grViz(objFOMM$getModel(kindOfOutput = "grViz"))