Monday, August 5, 2019
Counterculture Analysis: Blackbeard
Counterculture Analysis: Blackbeard Zachariah Chiles Many groups have been established as countercultures throughout the course of history. However, what makes those groups actually be considered countercultures? Author W. LaVerne Thomas attempts to answer such a question in his book, a group [that] rejects the major values, norms, and practices of the larger society and replaces them with a new set of cultural patterns (Thomas). One group that significantly follows Thomass definition are the Blackbeard pirates. This group rejected the cultural patterns of the British monarchy to live their own cutthroat life of stealing, killing, and raping. To this day pirates are still a significant threat to those who tread international waters, and even those who live in third world countries. Before Blackbeard acquired his name, he was known as Edward Teach or Edward Thatch. As far as origin goes, not much is known about Thatch. However, it is recorded that he joined the British navy as a privateer during the Queen Annes War, and turned to piracy shortly after (Division of Archives and Historys Office of State Archaeology). Blackbeard began his pirating in 1713 under the Captain Benjamin Hornigold (Ossian). Once given a smaller ship by Hornigold and able to command his own crew as a captain, Blackbeard found the French slaver ship La Concorde. This esteemed ship would be known to many as the Queen Annes Revenge, La Concorde was big, fast, and powerful. With such a vessel, Blackbeard knew his men could cause more havocà ¢Ã¢â ¬Ã ¦ (Woodard). In 1717, the two pirates were so deadly that the British monarchy offered both Hornigold and Blackbeard currency in exchange for putting down pirating. Hornigold accepted, whereas Blackbeard denied the offer, and continued ravaging the Caribbean on his esteemed Queen Annes Revenge. However, his time came to an end on November 22nd, 1718 when facing a British Royal Navy Contingent sent by Governor Alexander Spottswood. Blackbeard and his crew mainly raided ships for one thing, and that was gold. Everything they did was based upon how much loot they could take, and although he has died many years ago, his reputation and name still stands out in the history of pirating. Both the sociological perspective and the sociological imagination can be used to explain the actions of Blackbeard and his crew. According to author LaVerne Thomas, The sociological perspective helps you see that all people are social beings. It tells you that your behavior is influenced by social factors and that you have learned your behavior from others (Thomas). Many heard and saw the stories of Blackbeard and his ferocious crew. Because of this, many saw his actions and adopted them, to continue pirating and adapting Blackbeards techniques for more efficient plundering. His name alone put fear in the hearts of men, so many see that fear and want to become it; inspiring many to take up piracy and life on the seas. C. Wright Mills believes the sociological imagination is, the capacity to range from the most impersonal and remote [topics] to the most intimate features of the human self and to see the relationship between the two (Thomas). In other words, this describes the insight of how your social environment shapes you, and how you shape your social environment (Thomas). Blackbeard and his crews environment most likely included a poor social background, and the loss of a loved one. Many who are greedy and kill, have often grown up in these conditions. They surrounded themselves with murderers and thieves, and thus became murderers and thiev es themselves. They shaped their social environment by surrounding others with the same negative behavior, thus having new people join Blackbeards crew. The more people in his crew, means the more people that go out and tell the infamous story of Blackbeard, the cutthroat killer. Ethnocentrism is a large part of any culture. It is described as, [the] tendency to view ones own culture and group as superiorà ¢Ã¢â ¬Ã ¦ (Thomas). Countercultures are subcultures, therefore Blackbeard and his crew is technically a subculture of the larger society the British monarchy. Blackbeard and his crew saw these norms as superior to the restricting life in the monarchy, and therefore ethnocentrism formed. Also, the British already having ethnocentrism, saw the opposing moral standards set by Blackbeards new found subculture, and rejected their views, making Blackbeard and his crew a counterculture. Many examples can be made as to why he and his crew is a counterculture. One such case is that there was no law against killing on Blackbeards ship, whereas it was outlawed in the British monarchy. Another similar case would be with stealing, where Blackbeard plundered and stole from other ships for loot, whereas such atrocities were against the law in the British monarchy. Cultural relativism can be defined as, the belief that cultures should be judged by their own standards rather than by applying the standards of another culture (Thomas). Looting, pillaging, and killing is what pirates know. These simple standards cannot be judged outside cultural beliefs without noticing the large moral negativity that follows. Blackbeard and his crew had no moral compass, so their actions should not be justified through the eyes of the British monarchy. From a logistical point of view, them being strong, picked on the weak in order to gain wealth and become stronger in the world. Although they may know what they do is morally unacceptable and goes against the laws of many larger societies, they followed their own standards and traditions and should not be judged outside of that. My counterculture Blackbeard and his crew, have many intriguing norms and standards that oppose that of many societies of that era as well as modern times. However, this does not excuse the actions of Blackbeard and his crew. Killing, stealing, and plundering all leave large marks on this world. Anywhere from crushing the economy of a British town to killing the last son of a lonely French mother, cultures that directly affect the larger societies in a negative manner should not exist. Cultures having opposing standards is completely fine, as long as the opposing standards does not actively contradict those of a larger society. Blackbeard and his crew have very free standards, however the deaths that have been caused forces me to disagree with the philosophy and norms of their counterculture. References Division of Archives and Historys Office of State Archaeology. Queen Annes Revenge Project. n.d. 12 3 2017. Ossian, Rob. The Pirate King. n.d. 12 3 2017. Thomas, W. LaVerne. Holt Sociology: The Study of Human Relationships. Holt, Rinehart and Winston, 2003. Woodard, Colin. The Republic of Pirates. New York: Houghton Mifflin Harcourt Publishing Company, 2007. Customer Segments in Retail Supermarket | Analysis Customer Segments in Retail Supermarket | Analysis CHAPTER 1 : INTRODUCTION BACKGROUND In todays dynamic retail environment, customers are offered with a tremendous range of choices and their loyalty is increasingly becoming transitory due to the severe impact of competitors actions on existing relationships (Reinartz and Kumar, 2000). This increased competition to satisfy the diverse needs of the customer, forces the traditional production and selling focus of the retailers towards customer relationships. In the context of retail supermarket, this has resulted in large investments in retail information systems to collect the shoppers data to understand the customer shopping behaviour (Brijs.T et al 2001). Several tools and technologies of data warehousing, data mining, and other customer relationship management (CRM) techniques are exploited to manage and analyse this data. Especially through data mining, simply means extracting knowledge from large amounts of data which helps the organisations to find the patterns and trends in their customers data, and then to drive improved customer relationships (Rygielski, Wang and Yen, 2002). According to Witten Frank, (2005), some data mining techniques include decision trees (DT), artificial neural networks (ANN), genetic algorithms (GA), association rules (AR), etc., are usually used to solve problems related with customers in various fields like engineering, science, finance and business. In retail supermarket domain, data mining can be applied to identify useful customer behaviour patterns from large amounts of customer and transaction data (Giudici Passerone, 2002). Consequently, the discovered information can be used to support better decision-making in retail marketing. Data mining techniques have been mostly adopted to make predictions and describe behaviours. During the past decade, there has been an array of significant developments in data mining techniques. Some of these developments are implemented in customized service (Chen et al, 2005) which is vital in retail markets to develop customer relationship. Therefore, this research focuses to provide customised service to distinct customer segments in retail supermarkets, by implementing data mining techniques with the help of data mining tools. Related Work Researchers proposed various approaches to mine sales transaction data of a retail supermarket to improve customer relationships. Previously, the customer behavioural variables such as (RFM) Recency-Frequency-Monetary variables are associated with demographic variables to predict customer purchase behaviour (Chen et al, 2005). Current research improved significantly, as Business Intelligence tools and advanced data mining algorithms are implemented to analyse the data in a much more reformed way. Liao et al, (2008), proposed a methodology based on Apriori and K-means algorithms to mine the customer knowledge from household customers for product and brand extension in retailing. Bottcher et al, (2009), presented an approach which aimed to mine the changing customer segments in dynamic market through deriving frequent itemsets as representations of customer segments at different points of time, which are then analysed for changes. Problem Definition Effective management of sales transaction data is as important as any other asset for a retail supermarket store. The sales transaction data usually contains great amount of information distributed through numerous transactions. This study focuses on applying data mining techniques to analyse the sales transaction data of a retail supermarket store and suggests recommendations to provide customised service to defined customer segments. This research specifically uses two data mining techniques namely clustering and association rule discovery. The research starts with identifying different customer segments based on their purchase frequencies, in order to find out the differences in their purchase behaviour. The definition of behaviour in retail supermarket domain covers different meanings. For example, retailers often distinguish between light, medium and heavy users or weekday or weekend customers etc (Brijs et al, 2001). In this research, the differences will be discovered by identifying frequently purchased items for each customer segment and comparing their combinations. The retailer may use this information to customize his offer towards those segments and also to further examine the underlying relation ships between those items for purposes of pricing, product placement or promotions. AIM OBJECTIVES The aim of this research is to provide customised service to defined customer segments in a retail supermarket, by implementing data mining techniques on sales transaction data with the help of data mining tools. OBJECTIVES To conduct a critical review of the literature and present the current research within the discipline. Obtain the customer sales transaction dataset, in order to apply the data mining algorithms. Based on the literature review, select the appropriate data mining approach to pre-process the dataset and to implement the algorithms on the pre-processed data. Analyse the results obtained from the data mining algorithms and propose recommendations to provide customised service. Draw conclusions, discuss the limitations of this research and suggest the areas of future research. Research Approach This research follows the quantitative methodology by obtaining the dataset and analysing the data with data mining tools. The dataset for analysis was obtained from ABC retail supermarket store, Canada, which was available online (http://www.statsci.org/datasets.html). The data required for this project is selected and loaded onto data mining tools SPSS (Statistical Package for the Social Sciences) and Weka, the tools selected for this research to mine the data. The data mining algorithms that are selected for this study are k-means algorithm for Clustering and Apriori algorithm for association rule mining, the reason behind the choice of these algorithms is justified in the literature review. These algorithms are implemented on the dataset with SPSS and Weka. The results obtained from these algorithms needs to be justified with the help of charts, tables and graphs. Microsoft Excel is used to plot the charts, tables and graphs. Finally, the recommendations are made based on the ana lysis of results. Dissertation Outline This chapter presents the essence of this dissertation, highlighting the aim and objectives of this research. The rest of this dissertation is structured as follows Chapter 2 provides a comprehensive literature review of different aspects relating to the research topic under study. Chapter 3 discuss in detail about the research methods and the data analysis techniques followed, in order to achieve the aim of this research. Chapter 4 presents the analysis of the results obtained from the application of data mining algorithms on the data and provides recommendations. Chapter 5 summarises the entire project and gives insights on limitations of this research and points out the areas of future research. CHAPTER 2 : LITERATURE REVIEW Introduction This chapter provides a critical review of literature addressing the application of data mining in retail supermarkets. It begins with an introduction to data mining, followed by its evolution and applications in todays business world. Then explore the role of data mining in retail supermarkets to improve customer relationships, followed by a discussion about the typical data mining approach. It also discusses the techniques and algorithms implied in this project and the reason for their choice. Data Mining: An Introduction The word mining means extracting something useful or valuable, such as mining gold from the earth (Lappas, 2007).The importance of mining is growing continuously, especially in the business world. Data mining is a process of finding interesting patterns in databases for decision-making. It is one of the fast growing and most prominent fields, which can provide a significant advantage to an organization by exploiting the vast databases (Rygielski, Wang and Yen, 2002). Finding patterns in business data is not new; traditionally business analysts use statistical approach. The computer revolution and huge databases ranging from few Giga Bytes to Tera Bytes changed this scenario. For e.g. companies like Wal-Mart stores huge amount of sales transaction data, which can be used to analyze the customer buying patterns and make predictions(Bose and Mahapatra, 2001). Data warehousing technology has enabled the companies to store huge amount of data from multiple sources under a unified schema. Data mining has been considered to be a tool of business intelligence for knowledge discovery (Wang Wang, 2008). Many people consider data mining as Knowledge Discovery from Data (KDD), but it is actually a part of the larger process called knowledge discovery which describes the steps that must be taken to secure the desired results (Han and Jiawei, 2006). Typical data mining process implicates various iterative steps; the first step is the selection of appropriate data from a single database or multiple source systems followed by cleaning and preprocessing for consistency. The data is then analyzed to find patterns and correlations in the data. This approach compliments the other data analysis techniques like statistics, OLAP (On-line analytical processing) etc, (Bose and Mahapatra, 2001). Every organization follows a different data mining and modelling process to achieve their business imperatives. The Evolution of data mining It all started with the need to store the data in computers and improve the access to it for decision-making. Today the technology enables the users to access and navigate the real time data. At the beginning of 1960s, the data was collected for the purpose of making simple calculations to answer the business questions like the total average revenue for a specific period of time. In 1980s 1990s the usage of data warehouses to store data in a structured format emerged, policies regarding the format of data to be used in an organization were implemented (Therling.K, 1998). The data warehouses extended to be multi-dimensional that facilitates the stakeholder to drilldown and navigate through the data. Nowadays, online analytic tools assist to retrieve the data real-time. Now computers can query data from past to until the current. In recent years many technologies like statistics, AI (Artificial Intelligence) and machine learning have been evolving as core sectors in data mining field(Rygielski, Wang and Yen, 2002). So these technologies combined with relational database systems with data integration provide potential knowledge from the data. Data mining applications Data mining can be implied in many fields depending on the aim of the company. Some of the main areas in todays business world where data mining is applied are as follows (Apte.C. et al, 2002): Finance Telecom Marketing Web analysis Insurance Retail Medicine Data mining for CRM in retail supermarkets Swift (2001) defined CRM as an Enterprise approach to understanding and influencing customer behaviour through meaningful communications in order to improve customer acquisition, customer retention, customer loyalty, and customer profitability. According to research by the American management association It costs three to five times as much to acquire a new customer than to retain the existing one and is especially evident in services sector (Ennew Binks, 1996). Therefore it is very important to create a good relationship with the existing and new customer rather than expanding the customer base. A large number of companies are adopting various tools and strategies to enhance a more effective CRM, in order to gain an in-depth understanding about their customers. Data mining is a powerful new technique, which helps the companies to mine the patterns, trends and correlations in their large amounts of customer, product, or data, to drive improved customer relationships. It is one of the well-known tools given to customer relationship management (CRM) (Giudici Passerone, 2002). In the context of retail supermarket these patterns not only assists the retailers to offer high quality products and service to their customers, but also helps them to understand the changes in customer needs. Data mining applications for CRM in retail supermarkets Data mining improves customer relationship in retail supermarket, which is a wide area of research interest. Depending on the retailers objective, there are various application areas in which data mining can be applied to enhance customer relationship management. Some of the major data mining applications in retail supermarket, identified from literature are as follows: Cross-selling (Brijs et al 1999, Feng and Tsang, 1999) Product recommendation (Shih and Liu 2005, Li et al 2009) Customer behaviour modelling (Baydar.C 2003, Cadez, 2001) Shelf space allocation (Chen and Lin 2007, Chen et al 2006) Catalogue segmentation (Ester et al,2004, Lin and Hong, 2006) Direct marketing (Bhattacharyya, 1999, Prinzie and Poel, 2005) Prize optimization (Chen et al 2008, Kitts and Hetherington, 2005) THE DATA MINING PROCESS Ivancsy Vajk, (2006), defined the three main stages involved in the data mining process which are: (i) preprocessing, (ii) pattern discovery, (iii) pattern analysis/interpretation. Preprocessing Famili .A, (1997), defined data preprocessing as all the actions taken before the actual data analysis process starts. It is essentially a transformation T that transforms the raw real world data vectors Xik, to a set of new data vectors Yij. Yij = T (Xik) Such that: Yij preserves the valuable information in Xik, Yij eliminates at least one of the problems in Xik and Yij is more useful than Xik. In the above relation: i=1 n where n = number of objects, j=1 m where m = number of features after preprocessing, k=1. . . l where l = number of attributes/features before preprocessing, and in general, m ? l. The most common data used for mining the purchase behaviour in retail supermarket is customer and transaction data (Giudici and Passerone, 2002). With a huge collection of customers sales transaction data available in the databases, it is necessary to pre-process the data and extract the useful information from it. In the context of retail supermarkets Pinto et al, (2006), suggested four key tasks in data preprocessing, they are data selection, data cleaning, data transformation, and data understanding. The first preprocessing task is data selection. Here the subset of the data is identified on which pattern discovery is to be performed. This task is especially helpful in solving the problem of large amounts of data through precisely evaluating and categorizing the data into much smaller datasets. Computational requirements necessary for data analysis and manipulation are also hugely reduced by preprocessing large datasets through data selection techniques like clustering or vector quantization (Famili .A, 1997). The second is data cleaning where basic operations include removing noise and handling missing data (Fayyad et al, 1996). Other issues regarding the data quality like errors and insufficient attributes which may complicate data analysis are also addressed in data cleaning. In most cases missing attribute values are replaced by attribute mean but traditionally, if more than 20% of attribute values are missing, the entire record is eliminated (Famili .A, 1997). To handle the outliers and noise data, techniques like binning (partitioning the sorted attribute values into bins), clustering and regression are applied. The next preprocessing task is data transformation. The application of each data mining algorithm requires the presence of data in a mathematically feasible format (Crone et al, 2006). Inaccuracies in the measurements of input or incorrect feeding of data to the data mining algorithm could cause various problems. Since, operations such as normalization, aggregation, generalization and attribute construction are performed. Normalization deals with scaling the attribute value into a specific range, whereas aggregation and generalization refers to the summary of data in terms of numeric and nominal attributes. Attribute construction handles the replacement or addition of new attributes based on the existing attributes (Markov.Z and Larose.T.D, 2007). Once issues regarding the data are solved and the data are prepared, understanding the nature of data would be useful in many ways. According to Famili .A, (1997), the majority of the data analysis tools have some limitations regarding the data characteristics; therefore, it is important to recognize these characteristics for appropriate setup of data analysis process. He further pointed out that techniques like visualization and principal component analysis are useful for better understanding the data. Pattern discovery Fayyad et al, (1996), defined that core of the process is the application of specific data-mining methods for pattern discovery and extraction. Pattern discovery is the key stage of the process in this research, which is where the data is mined. Once the data is pre-processed, and the irrelevant information is eradicated, it is then used for mining, using data mining techniques to discover patterns. However, it is not the intent of this paper to describe all the available algorithms and techniques derived from these fields. This research focuses on two main data mining methods that to helps to mine the data and find patterns. They are Clustering and Association. The reason behind choosing these rules is justified below. Clustering Clustering can be defined as a technique to group together a set of items having similar characteristics (Kuo et.al, 2002). In retail domain, cluster analysis is a common tool to segment the customers on the basis of their similarity on a chosen segmentation base or set of bases (Stewart.D.W and Girish.P., 1983). The actual choice for one or a combination of these bases largely depends on the business question under study (Wind, Y., 1978). Segmentation can be done on the basis of various variables/bases, such as 1) general or product-specific, and 2) observable or non-observable as classified by wedel M and Kamakura (2000). General bases for segmentation are independent of products, services or circumstances, whereas product-specific bases for segmentation are related to the product, the customer or the circumstances. Observable segmentation bases can be measured directly, whereas non-observable bases must be inferred. The combination of classification of segmentation bases is shown below. Twedt, D.W., (1967) as cited in Engel.J.F et.al, (1972), stated that the existence of huge amounts of transaction data in retail supermarket domain provides a great impetus for segmentation on the basis of purchase frequencies. Segmentation based on this divides customers into groups on their intensity of buying a product(s), such as light, medium and heavy buyers. According to Brijs.T, (2002), if customers are classified by their purchase frequency, these segments could then be treated differently in terms of marketing communication (pricing, promotion, product recommendation etc.) to achieve greater return of investment (ROI) and customer satisfaction. Therefore, in this research clustering is employed to segment the customers into various clusters on the basis of their similarity in purchase frequency. Several algorithms have been proposed in the literature for clustering, such as ISODATA, CLARA, CLARANS, ScaleKM, P-CLUSTER, DBSCAN, Ejcluster, BIRCH and GRIDCLUS (Kanungo.T. et al, 2002). It is not the objective of this research to use all these algorithms for clustering. However, as discussed earlier, k-means clustering algorithm would be used to cluster and its justification is given below. k-Means Clustering Algorithm The K-means has been considered as one of the most effective algorithms in producing good clustering results for many practical applications (Alsabti et.al, 1998). The main reason behind this is, when clustering is done for the purpose of data reduction, the goal is not to find the best partitioning, but simply needs a reasonable consolidation of N data points into k clusters, and, if necessary, some efficient way to improve the quality of the initial partitioning (Faber, 1994). Therefore, k-means algorithm proves to be very effective in data reduction and produces a good clustering output. The k-means algorithm clusters the data that are similar into various clusters namely Cluster 0, Cluster 1 to Cluster n (Kanungo et.al, 2002). Provided a set of n data points in real d dimensional space (Rd) and an integer k, the aim is to determine k points in Rd, called the centers, so as to minimize the mean squared distance from each data point to its nearest center. This measure is often called as squared-error distortion (Jain Dubes, 1988). The diagram below illustrates the standard k-means algorithm. It shows the results during two iterations in the partitioning of nine two-dimensional data points into two well separated clusters. Points in cluster 1 are shown in red, points in cluster 2 are shown in black; data points are denoted by open circles and reference points by filled circles. Clusters are indicated by dashed lines. The iteration converges quickly to the correct clustering; even there was a bad initial choice of reference points. Lloyds algorithm is another popular version for K-means clustering which requires about the same amount of computation for a single pass through all the data points, or a single iteration, like the standard K-means algorithm (Faber, 1994). Lloyds algorithm is similar to standard k-means algorithm, except when the cluster centroids are chosen as reference points in subsequent partition; the centroids are adjusted both during and after each partition. However, the k-means algorithm constantly updates the clusters and requires comparatively less iterations than Lloyds algorithm, thus, k- means algorithm is considerably faster. This is the key reason that leads to the selection of k-means algorithm, since it can group the customers which have similar purchase frequency into different clusters in less iterations. However, Faber, (1994), pointed two major drawbacks to this algorithm. Firstly, it is computationally inefficient for large datasets. Secondly- although the algorithm will always produce the desired number of clusters, the centroids of these clusters may not be particularly representative of the data. Association Rules Association rule discovery was proposed to find all rules in a basket data to analyze how items purchased by customer in a shop are related (Gery Haddad, 2003). The rule refers to the discovery of attribute value associations that occur frequently together within a given data set (Han Kamber, 2001). It is typically used for market basket analysis to discover rules of the form x% of customers who buy item A and B, also buy item C (Zaiane, 2001) and is an implication of the form (A, B) à ¨C. Some of the key definitions drawn from literature that characterize association rule technique are provided below (Agarwal, Imielinski and Swami, 1993). Itemset (i) Set of items that contain in a single transaction (e.g. {milk, sugar, curd}) Support (s) The support expresses the percentage of transactions in the data that contain both the items in the antecedent and the consequent of the rule. Confidence (c) Confidence estimates the conditional probability of B given A, i.e. P (B |A) and it can be calculated as Confidence (c) =s (A B) / s (A). Association rule discovery typically involves a two phased sequential methodology (Brijs T., 2002). Finding frequent itemsets The first phase involves looking for so-called frequent itemsets, i.e. itemsets for which the support in the database equals or exceeds the minimum support threshold set by the user. This is computationally the most complex phase because of the number of possible combinations of items that need to be tested for their support. Generating association rules Once all frequent itemsets are known, the discovery of association rules is comparatively straightforward. The general scheme is that, if ABCD and AB are frequent itemsets, then it can be calculated whether the rule AB à ¨ CD holds with sufficient confidence by computing the ratio confidence = s (ABCD) / s (AB). If the confidence of the rule equals or exceeds the minconf threshold set by the user, then it is a valid rule. For an itemset of size k, there are potentially 2k-2 confident rules. Association rules can help to discover frequently purchased combinations of products within a customer segment and provide customised service by promoting certain products or product combinations to the defined segments (Brijs T. et al, 2001). Therefore, in this research, frequent itemsets for each customer cluster will be generated and their combinations are compared to identify the differences in purchase behaviour to provide customised service. Traditionally, support and confidence are used in association rule discovery, but Aggarwal Yu, (1998), criticized this support-confidence framework for association rule discovery for the following main reasons. First of all, setting good values for the support and confidence parameters in association rule mining is critical. For example, setting the support threshold too low will lead to the generation of more frequent itemsets. But even if they would be statistically significant, their support is usually too low to have a significant influence. On the other hand, setting the support threshold too high increases the probability of finding insignificant relations and of missing some important associations between items. Further Agarwal Yu, (1998); Brin et al., (1998), as cited in Brijs.T,(2003), introduced the lift (also called interest) measure to overcome the disadvantage of confidence in not taking the baseline frequency of the consequent into account. Lift/Interest (l) Lift is computed as the confidence of the rule divided by the support of the right-hand-side (RHS). In other words, lift is the ratio of the probability that A and B occur together to the multiple of the two individual probabilities for A and B. Lift (l) = s (A B) / s (A).s (B) In order to perform predictive analysis, it is useful to discover interesting patterns in the given dataset that serve as the base for future trends. The best and most popular algorithm used for this analysis is called the Apriori algorithm (Varde et.al, 2004). Apriori Algorithm The Apriori algorithm was proposed by Agarwal et.al, (1994) (Varde et.al, 2004). The algorithm finds frequent items in a given data set using the anti-monotone constraint (Petrucelli et.al, 1999), as cited in Varde et.al, 2004). It works under the principle that all subsets of a frequent itemset must also be frequent. In other words, if at least one subset of an itemset is not frequent, the itemset can never be frequent anymore. This principle simplifies the discovery of frequent itemsets significantly because for some itemsets, it can be determined that they can never be frequent before checking their support against the data anymore. This is the key reason to select this algorithm, since the association rules for the items can be discovered more quickly and efficiently. Given a data set, the problem of association rule mining is to generate all rules that have support and confidence greater than a user-specified minimum support and minimum confidence respectively. Candidate sets having k items can be generated by joining large sets having k-1 items, and deleting those that contain a subset that is not large (where large refers to support above minimum support). Frequent sets of items with minimum support form the basis for deriving association rules with minimum confidence. For A à ¨ B to hold with confidence C, C% of the transactions having A must also have B. Though the algorithm is very efficient in association rule mining, it has certain drawbacks, found by Margahny Shakour, (2006). After discovering the 4-frequent itemsets this algorithm needs extra data structure and methods to process, since the further itemsets can be obtained by different ways. This method is fast only while handling small data. There are several tools available for clustering and association rule mining such as ARMiner, Clementine (SPSS), Enterprise Miner (SAS), Intelligent Miner (IBM), Decision Series (NeoVista). To mine association rules, WEKA is used, which is a collection of machine learning algorithms for data mining tasks and SPSS statistics 17.0 for clustering. WEKA is an open source software available online and very efficient in mining large datasets, where as SPSS statistics 17.0 is a statistical analysis package available at Brunel university computer labs. Pattern Analysis Pattern analysis means understanding the results obtained by the algorithms and drawing conclusions. This is the last phase in data mining process, where the uninteresting rules or patterns from the set found in the pattern discovery phase are filtered out (Cooley et.al, 2000). The uninteresting patterns are filtered out by applying appropriate methodologies on the results and produce some interesting statistical patterns. SUMMARY This chapter discussed the concept of data mining, its evolution and applications in todays business world. Then, it provided an overview regarding the role of data mining in retail supermarkets to improve customer relationships, followed by a discussion about the typical data mining approach. It also discussed the techniques and algorithms implied in this project and the reason for their choice. The following chapter will explain about the research approach followed in this dissertation. CHAPTER 3 : RESEARCH APPROACH Introduction This chapter will discuss about the research approach employed in this project. It starts with a discussion about the research and literature review methods, followed by the data collection and justification of data mining approach on the data. Research Methods The research approach depends upon the objectives and aim of the study, as it assists the researcher to elicit appropriate responses. Boyatzis (1998) defines research methods as taxonomic procedure used for problem solving where, first data is collected based on the research question, hypotheses are stated, data analysis is carried out using appropriate techniques, results are interpreted and conclusions are derived. According to Hussey et al (1997), research methods can be distinguished in two types they are Qualitative and Quantitative approach. Oates (2006) says that, quantitative research method is the data or evidence on numbers whereas qualitative research method includes all non-numeric. In this research, quantitative research methodology is used. Quantitative study makes use of the numeric data that has been collected from a group of people interested in the subject area which is then analysed and interprete
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.