IFN711 Process Mining Project
IFN711- Process Mining Project Assignment
Prequisite: Please note that you should have finished IFN650 to be able to successfully complete this project.
The purpose of this project is to expose you to real life event logs and analyse the data using different process mining techniques and tools. The questions asked in this project are of interest to industry. The event log provided to you as a part of this project pertains to a large multinational company operating from the Netherlands in the area of coatings and paints. In the dataset each purchase order contains one or more line items. For each line item, there are four types of flows in the data:
- 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true).
- 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item.
- 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false).
- Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item.
However, the complexity of the dataset goes beyond the division in four categtories. For each purchase item, there can be multiple goods receipt messages and corresponding invoices which are subsequently paid. For example, consider the process of paying rent. The purchase item is typically for one item for paying rent, however, a total of 12 goods receipt messages with invoices (cleared) with a value equivalent to 1/12th of the total amount are generated. The number of goods receipt messages may be much more.
Your event logs consists 1.5 million events for purvhase orders submitted in 2018. All personal information has been anonymised in the event log. The case ID is a combination of purchase document and purchase item. For each purchase item, the following attrbutes are recorded.
|concept:name||A combination of the anonymized purchase document id and the anonymized item id.|
|Purchasing Document||The anonymized purchasing document ID|
|Item||The anonymized item ID|
|Item Type||The type of the item|
|GR-Based Inv. Verif||Flag indicating if GR-based invoicing is required|
|Goods Receipt||Flag indicating if 3-way matching is required|
|Source||The anonymized source system of this item|
|Doc. Category name||The name of the category of the purchasing document|
|Company||The anonymized subsidiary of the company from where the purchase originated|
|Spend classification text||A text explaining the class of purchase item|
|Spend area text||A text explaining the area for the purchase item|
|Sub spend area text||Another text explaining the area for the purchase item|
|Vendor||The anonymized vendor to which the purchase document was sent|
|Name||The anonymized name of the vendor|
|Document Type||The document type|
|Item Category||The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).|
Table 1: An overview of the attributes in the BPIC 2019 event log.
PART A: DISCO Analysis (30%)
You are to use the event log ‘Project_ Log.xes’ for Part A. The data set contains 251,734 cases and 1595923 events. Answer the following questions using the DISCO Tool. For each question, start from the original log (unless specified otherwise).
Process discovery (5%)
- Compare and contrast two process models (maps) – one generated using the setting: 100% activity and 100% paths and the other generated using the setting 100% activity and 50% paths.
- Investigate the case variants detected from the log. Overall, how many case variants are present in the log? Report on the top ten (10) most frequent case variants and their respective frequencies. How much of the log do these ten (10) case variants cover? Explain the implications of a high number of case variants when you try to generate a representative process model.
Performance analysis (5%)
- Filter the log to include cases of any two item catetgories (3-way with GR-based invoicing, 3-way without, 2-way, consignment).
Using the resulting filtered log answer the following questions:
- Where are the bottlenecks (the longest mean waiting times) in the process?
- How many cases are completed within two weeks?
- What is the spend area text in cases that finish within a week?
Process comparison (5%)
- Compare process behaviour and process performance of two groups of cases of your choice. Please provide a reasoning for the selection of two groups. Describe your observations.
- Apply any three relevant filters sequentially to the original log. Please explain which filters you appied and why. Show the overview screen with the statistics of the filtered log.
Using the resulting event log answer the questions:
- How many cases are there in the log?
- What is their mean duration?
- Explain the process behaviour.
Improvement recommendations (10%)
- Based on the insights gained from analysing the process in Disco, please provide two process improvement recommendations for compliance and provide justification for your recommendations using the analysis results (including screenshots).
PART B: Process Mining with ProM (30%)
You are to use the event log ‘Project_Log.xes’ for Part B as well. Answer the following questions using different plug-ins available in ProM Lite. For each question, start from the original log (unless specified otherwise).
Process Discovery (5%)
- Mine the log with Alpha, Heuristics and Inductive Miner algorithms. Show the screenshots of these models. Discuss the models in terms of the notation used, the constructs present in the mined model and compare the similarities and differences between these models.
Process Conformance (5%)
- Replay the log on the process
models you discovered from Alpha and Inductive Miners.
- Do these models completely fit
the log? If not, how many instances fit these models and how many do not?
- Where are the problems for the non-fitting process instances?
- Discuss how well these models describe the process behaviour seen in the log by making use of the trace fitness metrics.
- Do these models completely fit the log? If not, how many instances fit these models and how many do not?
Performance Analysis (10%)
- Conduct the dotted chart analysis using the ‘Analyze using Dotted Chart’ plug-in to analyse the throughput times of cases. Discuss the insights gained from the analysis.
- Identify the bottlenecks in the system using ‘Replay a log on Petri Net for Performance/Conformance’ plug-in. Please make use of the discovered model from the Inductive Miner algorithm.
Improvement Recommendations (10%)
- Based on the insights gained from utilising various process mining techniques, please provide two process improvement recommendations for compliance and provide justification for your recommendations using the analysis results (including screenshots).
PART C: Overall Process Mining Analysis (40%)
The organisation that has provided this data has indicated questions regarding compliance of purchase orders. In this part of the project you are expected to use DISCO, ProM, and a third process mining tool (such as Celonis) to provide answers to the following questions.
- Based on derivation of different process models please describe the overall purchase order process of the organisation.
- Which category of purchase documents take maximum time and why?
- What are the main deviations in the purchase order of different item categories?
- What are the main bottlenecks in the entire purchase order process?