I might not realize, but data analysis to me is really important even in my life. When I was very young~ around 4 or 5. My father travelled a lot for business purpose. I lived with my grands while I was waiting for my mother every evening of the weekdays. I just learnt how to read the time from the clock. My mum used to get home at 6:30pm. So~ I knew that if I stand at the front door at 6:25pm, there would be a great chance that I could give my mum a big welcome back home soon. However, I might be disappointed due to my mother’s OT some time. Nowadays, I understand that the OT could be excluded as an exception, so that the data set of the time my mother getting home should still be around 6:30pm. More deeply, there was a seasonality of the OT since my mum used to work for a financial department and each end of the month was her busy season.
That was my first taste of the data analysis. Although, I was naive and simple, I tried to gather what I can understand to identify the rule and pattern. Several years later, when I was in my middle school, I was assigned to record the attendance of my classmates due to the school’s strict punctuality rule. At the beginning, I found that a group of students they were always in the late list. I was never late, not only because I was the one to record it and I do value punctuality, but also because I got up very early each morning to put a big time buff on the way since I lived far away from the school. After some investigating, I discovered that those who lived closer were more likely to be late. It was an interesting finding. I tried to identified the root cause, but each time the students who was late had a different excuse. My data on hand told me there should be a common cause, but the reality was much more complicated. From today’s perspective, my data sample was not big enough to support my finding either. While, years after, when I was working in the PMO department in Bleum, I faced this problem again. It was a workshop about root cause analysis. We tried to use simple examples to express the difference between common cause and special cause. The special cause is just like the different excuse the students used when they were caught late. The common cause might be they didn’t keep the punctuality as an important value, or the punishment system was not effective.
Yes~ I used data to rule my life for a long time. From my daily and longterm schedule check, my personal financial management, to the winning and losing cause analysis of video games and board games playing, I relied on the data on hand a lot. My professional development helped me understand that data is not everything but it is the base. During the work time in PMO of Bleum and PwC SDC, I collected the data from project teams everyday. Everyone can tell what the data obviously shows. However, whether the points make sense or not, depends on the completion and accuracy of the data set and the correct metrics selection for the analysis base on different situation. Even the same metric with the exact data, may represent various meanings for different projects.
As a project data analysis in PMO, I had the chance to deal with a large data set. Each end of the quarter, our team started to work on the project performance baseline. We consolidated the project data of the current quarter from all the projects and processed it with all the historical data to see the pattern, make assessment, identify risks and baseline the organizational standard. We used excel to process all the data. Since I learnt SAS and R during my MBA study in Schulich School of Business, I realized that the data set can be processed in excel is not large enough to be cause big data~~~
R is very professional as to analyze the trend and seasonality of the data set, and in order to predict the data. It was commonly used in economy analysis and stock price prediction. I picked Toronto house price index as my final project of R, because I planned to buy a house at that moment. The extremely increasing trend and weak seasonality predicted an obvious high result. I bought my house immediately after that semester. And now, the real market data proved my prediction was right and my decision was correct~~
The real first dating with big data was the SAS project of Marketing Metrics. It was the data set for a grocery chain store, including all the transactions and clients information for about 5 years. There were in total 12 data table, and each table has more than 10 thousands record. The question was simple: Where to located a new store? But the gap between the answer and the data was really big. I didn’t know what information was useful, how many tables I should use and what direction I should go. I started to play with the data as the learning of using SAS. With getting familiar with SAS, I began to identify I could go with 2 directions: product mix and client demographics. The data set of product mix was completed and I quickly generated the categories and visualized them into charts. The result show that the ethnic foods were on the top sell base on the 5 years revenue data. While from the clients part, there were a lot of data missing, so that I stopped investigating in gender, family size and professionals, but to the address since the postal code was the only completed data of customers. By using PRIZM5, I identified that the customer base were mostly mixed culture, digital product preference and young. Mixed culture and ethnic food product mix was a big match. I was so happy to find the correct direction with only several days wasting on the investigation of the 3 promotion data tables, and they were proved to be useless in the end.
I really want to share my projects result here with all of you. But I can’t due to course material protection. While~ I will start to do some research and analysis myself and share the result here. That is the main reason I opened this blog.