IMAT5167 Data Warehouse Design and OLAP
Coursework Brief 2021/22 IMAT5167
|Module name:||Data Warehouse Design and OLAP|
|Module code:||IMAT 5167|
|Title of the Assessment:||Data Mart Design|
|This coursework item is: (delete as appropriate)||Summative|
|This summative coursework will be marked anonymously: (delete as appropriate)||Yes|
|The learning outcomes that are assessed by this coursework are: A comprehensive understanding of the concepts, purposes, architectures, evolution and benefits of DW. A systematic knowledge of how to apply ERD and Star Schema to design databases, DW or data mart. The ability to design appropriate data extraction, transformation and loading strategy and create reasonable queries.|
|This coursework is: (delete as appropriate)||Individual|
|If other or mixed … explain here:|
|This coursework constitutes 50 % of the overall module mark.|
|Date Set: 1st November, 2021|
|Date & Time Due (the deadline):||10/12/2021 at 12.00 noon|
|In accordance with the University Assessment and Feedback Policy (https://www.dmu.ac.uk/about-dmu/quality-management-and-policy/academic-quality/learning-teaching-assessment/assessment-feedback-policy.aspx), your marked coursework and feedback will be available to you on:||17/01/2022|
|You should normally receive feedback on your coursework by no later than 20 working days after the formal hand-in date, provided that you have met the submission deadline If for any reason this is not forthcoming by the due date your module leader will let you know why and when it can be expected. The Associate Professor Student Experience (CEMstudentexperience@dmu.ac.uk) should be informed of any issues relating to the return of marked coursework and feedback.|
|When completed you are required to submit your coursework via: As a Word Document via Turnitin (via the appropriate Turnitin facility set up within the Turnitin folder of IMAT 5167 module’s BB shell) If for any reason Turnitin is not working at the date/time of the submission deadline, please provide it as an attachment to an email to firstname.lastname@example.org by the specified deadline date/time and a subsequent Turnitin submission can be completed as soon as the facility is available again to use. If you need any support or advice on completing this coursework please visit the Student Matters tab on the CEM Blackboard shell.|
|Late submission of coursework policy: Late submissions will be processed in accordance with current University regulations (https://www.dmu.ac.uk/about-dmu/quality-management-and-policy/academic-quality/academic-regulations-assessment-boards/academic-regs-assessment-board-homepage.aspx) which state: “the time period during which a student may submit a piece of work late without authorisation and have the work capped at 40% [50% at PG level] if passed is 14 calendar days. Work submitted unauthorised more than 14 calendar days after the original submission date will receive a mark of 0%. These regulations apply to a student’s first attempt at coursework. Work submitted late without authorisation which constitutes reassessment of a previously failed piece of coursework will always receive a mark of 0%.”|
|Academic Offences and Bad Academic Practices: These include plagiarism, cheating, collusion, copying work and reuse of your own work, poor referencing or the passing off of somebody else’s ideas as your own. If you are in any doubt about what constitutes an academic offence or bad academic practice you must check with your tutor. Further information and details of how DSU can support you, if needed, is available at: http://www.dmu.ac.uk/dmu-students/the-student-gateway/academic-support-office/academic-offences.aspx and http://www.dmu.ac.uk/dmu-students/the-student-gateway/academic-support-office/bad-academic-practice.aspx|
|Tasks to be undertaken: See attached.|
|Deliverables to be submitted for assessment: See attached.|
|How the work will be marked: See attached|
|Module leader/tutor name:||Yingjie Yang|
Should you need any further information or advice please email email@example.com
Midlands Theatre (MT) Company
Midlands Theatre (MT) is a chain of small theatres that are found within the suburbs of several cities and towns within the Midlands, including Leicester and Birmingham. It specialises in high quality theatre productions that may be seen to be insufficiently “mainstream” for the general population. The company purchased its first theatre in January 1982, and since then has seen substantial increases in the number of clients that want to visit a local theatre to see a production that is more alternative to those typically found within the city centre theatres and/or large entertainment complexes. In response to the growing client base, MT sometimes offers a matinee (i.e., afternoon) performance of a popular production in addition to the traditional evening performance time.
Alongside the increase in number of theatre goers comes the issue of managing the additional bookings and ticketing requirements. The Managing Director of MT, Ms. Heritage, developed a computerised booking system for MT. Although the ultimate aim is to enable the on-line booking of tickets for any MT productions at any MT theatres, Ms. Heritage has created a simple booking system first. This first version booking system has automated the existing manual booking processes, which are detailed below.
Information regarding current booking processes at Midlands Theatre (MT):
Each of the theatres in MT’s portfolio has a schedule of productions for a given month in a particular calendar year. For instance, a schedule for a production entitled “Wind Blows”, which ran for one week at the Cropston Theatre during March 2019, is provided in Table 1.
Table 1. Example of one week performance at the Cropston Theatre
|Name of Production||Day and Date||Afternoon||Evening|
|“Wind Blows” Scriptwriters: William Director: Henry||Monday 4/3/2019||No performance||7.30pm|
|Tuesday 6/3/2019||No performance||7.30pm|
|Thursday 7/3/2019||No performance||7.30pm|
|Sunday 10/3/2019||2pm – note earlier start||6.45pm – note earlier start|
ER Diagram for MT’s first version, computerised booking System
Only one production is shown at a theatre at a time, and some of the more popular productions run both afternoon (matinee) and evening performances.
Productions could be at a theatre for several weeks, although the majority of productions run only for a few days over one week. The run of the next production cannot start until the previous production has ended (in other words there is no interleaving of the performance runs of different productions). It is common for the same production to have runs in all of MT’s theatres – one theatre after another – so that the widest number of potential clients for a production is reached.
When a potential theatre goer wants to purchase a ticket for a particular performance of a production at a MT theatre in advance, s/he contacts the central booking service via the website, telephone or an App. Whenever a ticket is purchased, a unique purchase number is allocated, the performance to be attended, the theatre, the client details (such as name, postcode and house number, and telephone), the total amount paid and the method of payment (cash, Paypal, debit card or credit card) and the method of ticket delivery (either by mail/email or by client pick up from the local theatre box office) are recorded. If several tickets are bought at the same time by the same person, each ticket is recorded as if it were bought separately.
Table 2. Further Information about each Entity (Type) on the EER Diagram
|Entity Type||Entity Instance|
|Theatre||A particular MT theatre e.g., Cropston|
|Production||A particular production e.g., How the Wind Blows|
|Performance||A particular performance of a production run, e.g., a performance of Wind Blows at Cropston Theatre on Monday 4/3/19 at 7.30pm|
|Client||A particular client of a theatre e.g., Mr Eric Charles|
|TicketPurchase||A particular ticket bought for a particular performance e.g., Ticket Purchase No. 29, which is for Cropston Theatre for the 9/3/19 performance at 7.30pm of Wind Blows and is bought by Ms Helen w2544.|
Any remaining tickets for a performance are available to buy on the day from the local theatre box office. The same details are recorded, although clearly in this case the ticket will always be collected from the local theatre box office!
A relational database has been created according to the ERD. The tables
are as follows:
- Theatre(Theatre#, Name, Address, MainTel);
- Production(P#, Title, ProductionDirector, PlayAuthor);
- Performance(Per#, P#, Theatre#, pDate, pHour, pMinute, Comments);
- Client(Client#, title, name, address, telNo, e-mail);
- TicketPurchase(Purchase#, Client#, Per#, PaymentMethod, DeliveryMethod, TotalAmount).
Having established the online transaction processing database, Ms. Heritage wants more intelligence information from the available data and she is looking for a potential data warehouse for MT. As the first step in this process, she expects to setup a data mart for ticket sales as the first step. The data mart should satisfy the following analysis requirements:
- Yearly total sale for each theatre.
- All clients who visited MT theatres in at least 4 different months in a year.
- List the titles, production directors and play authors of all products with the highest total sale.
Here, by ticket sale value/spending, we refer to the value of money rather than number of tickets. For this data mart, only the data involved in ticket sales are stored, and any data not involved in at least one ticket sale should not be included.
You can access the OLTP database in my database schema: ops$yyang00. For example, you can access the data for Theatres using the following SQL statement:
select * from ops$yyang00.theatre;
You are to develop a prototype of an ORACLE data mart for ticket sales as part of a potential data warehouse for MT. For this assignment you are required to work individually.
You have been given the MT company scenario, together with an Entity Relationship (ER) Diagram and corresponding tables for the existing relational database.
Task 1: Analyse the given database design and the requirement for a data mart, and identify the dimensions and fact for your data mart. Here for this assignment, only the essential (minimum) dimensions for the required queries need to be included.
Task 2: Design the star schema for the data mart, and identify the corresponding PKs and FKs.
Task 3: According to given data and requirements, determine the relevant attributes and suitable granularity in your data mart.
Task 4: Map your star schema to logical relations.
Task 5: Create the corresponding tables in Oracle using SQL.
Task 6: Identify your source data from the OLTP database and design your data extraction rules. You need to give a detailed mapping and transformation list from the source to the destination.
Task 7: Implement your data extraction, transformation and loading through Oracle SQL. The number of rows extracted into each dimension or fact table in your data mart should be printed from Oracle query.
Task 8: Comment on how your data mart satisfies the requirements of MT. Implement the required data analysis requirements for both the data mart and the original OLTP database. Compare your queries and comments on the advantages of a data mart in analysis operations. Test results from Oracle query should be included.
- Evidence of the successful execution of your queries is required. For large outputs, you need to provide only the last page of outputs with the number of rows in the end.
The deliverable is a report that summarises your work and justifies your design decisions, it includes the following sections
- Dimension selection and fact identification
- Star Schema
- Logical relations (Tables) and granularity
- SQL for table creation and constraints
- Data Sources Mapping (using diagrams)
- SQL for ETL
- SQL for required queries (both Data Mart and the relational model)
- Comparison between Data Mart and relational models
Each part should be associated with a concise explanation, and the execution results from your SQL code should be provided as well. Everything should be put into one MSWord document for your submission.
IMAT5167 DW&OLAP Criteria Marking Grid for DW Design Assignment
|TASK||No work 0-9%||Little attempt 10-19%||Poor attempt 20-29%||Fail attempt 30-39%||Marginal fail 40-49%||Pass 50-59%||Merit 60-69%||Distinction 70-79%||Excellent 80-89%||Outstanding 90-100%|
|Star schema design||No work||Attempted, but No idea on star schema||Poor attempt, inappropriate choice of dimensions and fact||Some effort insufficient granularity, dimensions and fact quality||Serious errors, but some evidence of understanding star schema||Satisfactory but with some significant errors/ limitations||As for Distinction level but with some minor errors/ limitations||All dimensions and the fact identified, correct granularity||Same as previous, and correct multiplicity, no unjustified redundancy||As excellent. Minimum set of dimensions and facts in fact table. Perfect granularity|
|Derivation of logical relations||No work||Attempted but No idea on logical relations||Poor attempt, inappropriate choice of tables & attributes||Some effort but insufficient table/PK/FK quality overall.||Serious errors, but some evidence of understanding logical relations||Satisfactory but with some significant errors/ limitations||As for Distinction level but with some minor errors/ limitations||Correct choice of tables, and attributes, primary & foreign keys correct.||Excellent choice of tables, and attributes, primary & foreign keys||Same as excellent. Clear explanation on the map from star schema to logical relations.|
|Creation of tables with integrity rules||No work||Attempted, but No idea on how to create a table with integrity||Poor tables and attempt at integrity rules||Some effort but tables and integrity is not as reasonable quality||Some tables and associated integrity rules created but serious errors exist||Satisfactory but with some significant errors/ limitations||As for Distinction level but with some minor errors/ limitations||Tables created correctly. Appropriate implementation of integrity rules||All tables created correctly. Appropriate implementation of integrity rules||Same as excellent. Optimised table creation with minimum code and full set of constraints|
|Data source identification, data extraction, transformation and loading||No work||Attempted, but No idea on data source and ETL||Poor data sources, wrong or no transformation||Some effort but data source is not sufficient and transformation has significant errors||Some data sources are identified, and attempts are made for transformation||Satisfactory but with some significant errors/ limitations||As for Distinction level but with some minor errors/ limitations||Data sources are correctly identified, good transformation rules are set and data are loaded correctly||All data sources are correctly identified, efficient transformation rules are set and all data are loaded correctly||Same as excellent. Minimum data sources and optimised ETL code produced, any potential problems discussed|
|Justification of the data mart design and comparison of data mart and OLTP||No work||Attempted, but No idea on data sources and queries||Little or no evidence of justification||Some effort insufficient understanding of data mart and weak justifications provided.||Some incomplete explanation are provided.||Satisfactory but with some significant errors/ limitations||As for Distinction level but with some minor errors/ limitations||Correct justification for requirements with a good comparison between OLTP and DM queries||Excellent justification for all requirements with excellent comparison between most OLTP and DM queries||Same as excellent. More than one solutions are presented and difference discussed.|