I am a data scientist for an educational company in Irvine, theDevMasters. Ask Me Anything!

Kate Ta
Aug 12, 2017

Hello everyone, I will be available to answer any questions you have about my job title, data science in particular, statistics, the skills necessary, & some ways that I found are most helpful in becoming a data scientist, especially in the Southern California area. Ask me anything!

Conversation

How has data management changed over the last 5 years?

Aug 12, 9:28AM EDT50

In the last 5 years, not particularly much, but 10 years is a different story. I will explain more.

10 years ago, data storage & the technology behind it was not as advanced as it is now. It used to take a lot of resources & processes to even store 1GB of a wedding video, with crackling video quality & a shot in the dark with the sound. The quality was limited by the space.

Fast forward to 2017, technology has advances to the stage of which I can store 128GB in a microSD card on my phone to film an entire HD 1080P 2 hour concert with the best sound quality & video capabilities limited to not space, but my phone / medium. & the SD card cost me a good $30 bucks.

Now multiple that 128GB to TBs of data; an airhanger sized room of continuous storages maintained by the best data engineers that companies can depend on. It now costs companies more to delete the valuable data they collect than buy more storage for the data.

R&D will bring us zettabytes of data storage options & I am quite frankly ready for that change but not the impact of the resources it will take to achieve that. So there is some unofficial cons to the changes in data management, more resources are available with these concerns but none that are immediate yet.

No doubt about it, data management has tripled, maybe ten-fold in the past 10 years, for the better.

Aug 12, 2:58PM EDT50

Wow, I had no idea technologies have progressed so much! What do you think will change in the future?

Aug 13, 3:54AM EDT46

How important is historical data versus recent data in learning things about consumer behavior?

Aug 11, 2:27AM EDT43

Firstly, they are not mutually exclusive so saying one is better than another is not correct if recent data depends on historical data. (Deep down, philosophically, recent data is actually historical data, wink.) Taking customer behavior into account (add even in your own industry's behavior), the best information you can receive by equally collecting them / maintaining them is determining which customer's behavior changed (for the better as well as worse) & which event triggered them! Adjust accordingly using data science to either spread the effect for the growth of the company or halt the damage before it gets worse. That's at the top of my head for this question.

Aug 12, 2:46PM EDT33

I guess you're right about the equality ...

Aug 13, 4:26AM EDT21

In what areas of work is data studied and used?

Aug 10, 4:52PM EDT0

I can simply answer your question with a different question: is there any industry that should NOT collect data? Data is information & if there is a place where information is not important, bury me there.

Aug 12, 2:42PM EDT0

How does a human employee add value to a data management operation?

Aug 10, 1:07PM EDT51

Using your experience at the very front & center of where the data is being collected, you will have the most expertise & intuition on how the data should be collected as well as a small grasp of what the data should tell the people who are analyzing it. You might not be the one immediately collecting the data, but you will be the best person to come to when the data doesn't make sense. As simple as a cashier can be asked how come there is always an influx of hot dog buns being sold on the weekends regardless of the season? Well, based on the cashier's weekend shifts, the local church always hosts a barbeque to promote the community & collect donations; they serve hot dogs at the event. (That is a whole different marketing strategy on its own, wink.) Or it can be as complex as you are the only person that knows  the amount of water necessary for this portion of the vineyard based on the past experiences on the color of the grapes (no one needs to be told how much the wine industry prospers with dedication). If even the pH level of the water was beyond a certain level, the grapes would not be in the best pristine state to be the starting point of wine. Data science in agriculture is gonna be a big thing, if it wasn't before. No amount of machinery nor data will be intelligent enough beyond human input.

Aug 12, 2:40PM EDT22

So, you think we can't be replaced by machines?

Aug 13, 4:39AM EDT35

Would you consider yourself "gifted"?

Aug 10, 1:24AM EDT22

Yikes, I am humbly average, at the least. My knowledge & passion comes with hard work & dedication to the craft & willingness to let others succeed in this field as well as I have been able to. It doesn't take a remarkable individual to start an AMA & open up their experiences for others to read. So no, I am not gifted.

Aug 12, 2:30PM EDT60

You're very modest

Aug 13, 4:13AM EDT56

What kind of equipment do you need to collect important data?

Aug 10, 12:28AM EDT53

It depends on what is this... 'important' data. If it is industry specific like healthcare, I'm pretty sure the minimal equipment necessary, like a blood test machine (again, not part of the industry so that's not the correct term), would be sufficient. If it is real time data, your company's budget for the most sensitive sensors to changes in the data & the massive computational power to keep registering that data as fast as it comes, the budget should be as big as the data is expected to be collected. This is all machine-power, but we should not disregard manpower, too. The Census of the US that runs every ~10 years needs qualified individuals to travel to the necessary sample sections to collect enough data to generalize the population. There is a depth to that method of collecting data too. I would express more, but I could write a whole article/blog on possible scenarios as well as improvements just on the Census process, but.. the government is a big foot to be stepping on. (:

Aug 12, 2:27PM EDT59

Do you think in future data will be stored in gadgets installed into our  bodies?

Aug 13, 5:23AM EDT41

What parts of the data analysis process do you help others with?

Aug 10, 12:05AM EDT53

As a humble data scientist, there is no way my job title would exist if it weren't for all of the data analyst out there, spending their time answering 'What happened?' & 'How did it happen?' based on the data. If anything, a data scientist only heightens the data analysis process by adding the answers to the 'What will happen next?' & 'How do we continue on this process?' While it is optimal for data scientists to know how to answer the initial questions, no one does it better than a seasoned data analyst.

Aug 12, 2:21PM EDT56

Haven't you thought about being a data analyst?

Aug 13, 5:33AM EDT70

How does data management enable a company to better serve its customers?

Aug 9, 11:25PM EDT67

The more organized your data is in tuned with your customer, the more potentiality you have of better serving them specific to their needs. It can be as simple as logging in their likelihood of applying a discount beyond a certain purchase amount to send them better coupons for both brand loyalty & bringing in their friends. It can also be as complicated & left field as logging how often their modem goes down & suggesting them a special pricing towards a new one to better their experience. Money, in the vast majority of industries with customers, come from not the new customers, but the existing customers. The longer or more effort you ulitize to keep your customers, the more even your profits will be. & how do you keep these customers consistently happy? Collecting data on them to generalize how personalize their experience is with your products. (:

Aug 12, 2:13PM EDT60

Yeah, it's like you should know something about people to understand what they want

Aug 13, 4:40AM EDT62

Is data collection a continuous process for specific departments or uses?

Aug 9, 9:52PM EDT86

I think you hit the bullseye with your observation, but I will explain why. Data collection is indeed an ongoing process & should never be disrupted unless for optimization, quality, or the whole purpose has been deemed unnecessary. Time is the biggest & best feature to collect with any type of data, if possible. Predictions are made over time, beyond time, & require timely results & data to be as accurate as possible. So yes, it is a continuous effect because time is a continuous effect whether we can keep up or not. 

Aug 12, 2:07PM EDT21

I think so too

Aug 13, 4:43AM EDT61

What is one benefit of data managing that a regular person may not know about?

Aug 9, 9:13PM EDT47

Why do some people collect their gas receipts in a box? Their 1099 refunds depend on it. Little things like just collecting the paper trail or the virtual places you've been to will guide you towards a better understanding of how you live & possibly how you would like to change  forthe better. Someone will disagree with me, but I think life can be very well predicted & organized with just a little bit more self-awareness of your actions & your carbon footprint. My motto is the best decisions are made with data.

Aug 12, 2:02PM EDT44

And a damn good motto this is!

Aug 13, 4:44AM EDT53

How will companies ever catch up to processing the vast amounts of data they are acquiring?

Aug 9, 8:06PM EDT87

That's a big concern a lot of medium-sized companies have, especially those who are aware that they are not collecting enough. How? At least one data engineer & one data scientist, qualified in their skills as well as enough persuasion for their company to be open-minded about some changes that will drive their company forward. The more vast the data is, the more data scientist should be brought in, along with a division for expert interpretation of the data. 

Aug 12, 1:58PM EDT49

So, the question is only about the number of people, right?

Aug 13, 4:46AM EDT35

What are some practical utility of data collection in your daily life?

Aug 9, 7:33PM EDT30

My FitBit tells me how long I've been sleeping for the past 7 days & if my intake of water has any effect on when I do go to sleep. Little things like your healthy & the optimization of your personal life can benefit greatly from just a little more self awareness or tools like FitBit.

Aug 12, 1:54PM EDT61

FitBit? Is it only for that or it does something else?

Aug 13, 5:10AM EDT49

How essential is attention to detail to your line of work?

Aug 9, 7:16PM EDT59

Attention to detail is great for initial analysis in cleaning the data on the surface level. What a better skill to possess is to see information in between the data, to see beyond what is presented to you. This requires both domain knowledge of the data as well as understanding of the process in which the data was stored in the first place. For example, I can definitely tell you someone's birthday tells me more than just their birthday, but their age. It can be even more complicated than that with data that is in its rawest form, like CT scans of the brain & my domain knowledge of medicine would let me know not only the immediate ridges that are abnormal, but the suspected age of the person based on their brain. Bones, the TV show, really highlights this skill. 

Aug 12, 1:52PM EDT18

Oh. is it this series about deduction in criminal investigation?

Aug 13, 4:50AM EDT43

What are some of your experiences with the fields you mentioned here?

Aug 9, 6:45PM EDT79

If you are asking about data science in particular, I had academia training as well as industry training before diving into the consulting positions & projects that I have been fortunate to be included in. The projects themselves were very short, but very meaningful & I use the skills from each of the previous projects in my upcoming projects. (Life is Bayesian.) There is very little I can say before I am overstepping in some of my contracts with clients. I apologise if I did not answer your question.

Aug 12, 1:47PM EDT49

That's okay, thanks for answering

Aug 13, 4:51AM EDT26

Which departments or areas of work require detailed sensor data?

Aug 9, 6:31PM EDT0

At the top of my head, I will have to elaborate on what I think your definition of sensor data is. Sensors can be all sorts of interpretations: physical or virtual. Physical sensors I can think of fitness training, meaning body activity, brain wave measures. There are physical sensors in earthquake predictions (I'm from California), sensors for changes in temperature for something as simple as the moisture in your soil to as complicated as the pH level in the tanks for your fish farm (which is not the official term but I am not from that industry). Virtual sensors like the sensors in the space of the internet: a bot that detects when someone tweets a certain way or phrase, the cybersecurity wall that interprets signals coming in & which are harmful & avoidable. Essentially sensors are for detection of events & if i were to pinpoint a department, I would say Research. 

Aug 12, 1:43PM EDT0

How much data is enough data?

Aug 9, 6:25PM EDT66

There is a statistical test to show you how much data you will need to adjust to the minimal qualified data, but in this time & age, with a small amount of data, we can also simulate it to an extent. I have no personal threshold because it also depends on the data science project. Sometimes all that you have is all that you can get.

Aug 12, 1:36PM EDT64

Is data simulation legal?

Aug 13, 5:06AM EDT46

How long does it take you on average to compile your data?

Aug 9, 6:14PM EDT0

It depends on what is right: the client's needs are or what the statistical needs are. A client can come to any of the consulting data science with somewhat of a half-hearted data collection & we calculate how much more data is necessary for effectiveness & how long & the client will readjust our timeline if possible. If it is based on feedback by customers or data that is dependent on the process, such as mail or even clicked responses, then it will take longer to achieve results. To make up for the time spent, we try to qualify the data collected as much as possible through effect survey design.

Aug 12, 1:32PM EDT0

How do you calculate how much data is needed?

Aug 12, 3:47PM EDT0

How can data management help an individual?

Aug 9, 6:10PM EDT0

Just as you would (or should) keep your tax files organized & prepared, if you are consistent with your collection in data, managing it would not only ease the pain of sorting through it, but also help you discover if you are collecting enough data or even the right data. Management of quality & quantity is clear cut for data science. 

Aug 12, 1:26PM EDT0

Ok, thank you

Aug 12, 3:57PM EDT0

Do you have any tips for making your data stand out in presentations?

Aug 9, 4:41PM EDT0

Ah, the age old question of making your data seem more interactive than normal. I find that the best way to display my data has always been the most interpretable displays. The golden rule of thumb in my case is if you come back to the same  graph again & you had no idea what it says or what it was even suppose to represent, it didn't do its job. If I wanted to show relationships, I wouldn't show bar graph; I would use interactive scatterplots, 3D if I could. Viewers also benefit from labels & a color scheme that is natural towards the presentation: green is my go-to color if I am unsure. 

Aug 12, 1:24PM EDT0

How one can easy perform his data?

Aug 12, 4:01PM EDT0

Does Data Science use statistical tools to reach an outcome or an opinion?

Aug 9, 4:16PM EDT0

This is a great question: the bread & butter of data science is machine learning, which stems naturally from a combination of statistical inferences & intelligent programming. I like to think statistics as the 'why?' & programming as the 'how?'

Aug 12, 1:19PM EDT0

Hm, data collection is new science? Yes no?

Aug 12, 4:07PM EDT0