I co-founded a Data science startup and built teams to solve business problems using Machine Learning and Data Visualization. AMA on Data Science Careers and Projects.

Ganes Kesari
Aug 9, 2018

After spending a decade in technology consulting and management, I co-founded a data science startup, Gramener. Over the past 7 years, I've built and scaled the core data science practice areas across machine learning and data visualisation to serve global clients.

Having advised and consulted clients across 10+ domains, I've created analytics roadmaps and led projects on-the-ground. I have hired, trained and groomed hundreds of data science professionals across a variety of roles in this industry.

AMA on entering the data science field, mix of skills needed, growing your career in the industry and working in machine learning - visualisation projects.

Here are few of my articles in these areas:

IMG_6814.jpg

What is the Vibe of this AMA? What is AMA Vibe?

This AMA has finished, no more comments and questions can be posted and votes submitted to those. Check other similar AMAs here or host your own AMA!

Conversation (80)

In three easy steps and under a minute you could be hosting your own AMA. Join our passionate community of AMA hosts and schedule your own AMA today.

Let's get started!

Where do you see the data science industry going in the next five years and how should business prepare for the upcoming changes?
Aug 13, 12:44PM EDT1

Thanks for the broad question! I will do this crystal-ball gazing at 3 levels: Industry roles, Analytics delivery and Business adoption.

1. What will happen to the industry & roles? Much of the mist around data science as an industry will clear up. Talking about perceptions, remember how every tech company in 2000 was hazy in the dotcom frenzy? Similarly, even companies into MIS reporting or google analytics are bucketed as AI companies today. In less than 5 years, this frenzy will get sorted out and clear sub-disciplines in data science will evolve. Likewise, data scientists will not be seen as a panacea and more standardized sub-roles and designations will emerge.

2. How will analytics be delivered? Today, a bulk of the value in data science comes from careful, painstaking, manual implementation, whether in data cleaning to wrangling, insight spotting to model engineering, data story design to interactive implementation. We are seeing glimpses of automation and APIs, but most of these are pre-teasers with little value on-the-ground, as yet. Substantial progress will be made in a few years to make truly automated machine learning, intelligent self-creating stories possible. And, there will continue to be those small market gaps in between where custom service providers will continue to thrive. 

3. How will businesses adopt data science? Adoption will be ubiquitous and invisible, atleast partly in the next 5 years, and more so in a decade. Every role in any organization will have to deal with data and handle it in some way (draw insights or present or just wrap heads around and use it). And the role of experts will be reduced and refined to handling just the specialized data science work. Quite like the way every person today needs to know how to use a laptop/portal/apps or just know how to place technology in their work. The real success for data science will be in the discipline truly blending in everywhere and becoming invisible.

"We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten." - Bill Gates

Aug 14, 7:13PM EDT0
What are the best programming languages for data science and which one is your favourite?
Aug 13, 2:06AM EDT0

Python is by far the most popular language for data science, and packages like pandas, numpy, scipy and more recent ML platforms like Tensorflow make it very powerful. At Gramener, we've embraced Python right from the initial days.

I got started with data science on R, and it still is my favorite. R has a rich library of packages that cover almost every data handling, visualization, statistical or ML needs. Given its limitations with scalability and running deep learning models, I've recently been exploring Python. Here is an article I came across that does a good comparison of the languages.

Aug 13, 7:57PM EDT0
What are your thoughts on data science, machine learning, and deep learning as a driving force for the creation of smart cities?
Aug 12, 12:00PM EDT1

Interesting question! If we define the key traits of a 'smart city' as connectivity, automation, optimization, and contextuality, this cannot be achieved without data as one of the foundational pillars.

With data as the backbone of a smart city, data science will turn into the clouds (no, a cloud literally!) that overhang every aspect of this city life and shape pretty much everything underneath:

  1. Planning and forecasting using the wealth of techniques that statistics and machine learning have to offer
  2. Efficient logistics by continuous optimization and predictive analytics
  3. Security infrastructure made state of the art through deep learning algorithms that automate the detection of crime from image/video/audio feeds. Deeper algorithms mashing up various data feeds can predict crime even before they occur, quite like science fiction.
  4. High quality of life is ensured when every service delivered (critical or incidental) plugs into this data backbone and the model layer overhanging in the air to make everything personalized, timely and automated.
  5. ..and so on..

The possibilities are exciting and limitless and I really look forward to some of this playing out very soon.

Aug 13, 8:36PM EDT0
How did you go about ensuring that your application is loaded with best of the breed automated statistical algorithms available?
Aug 11, 9:27PM EDT0

Almost all algorithms are available in every tool and package and they aren't real differentiators. We reused them from freely available packages, rather than recreate them. The common difficulty and hence differentiation is in easing the model implementation process.

We focused on building wrappers around models, which are the adaptations needed for a model to be implemented in a business usecase. And, for those niche usecases that didn't have ready model implementations, we built them from scratch (ex. pair-wise correlation of clusters created through hierarchical agglomerative clustering). This direction has been validated from the feedback in client projects.

Aug 13, 8:53PM EDT0
What is your opinion about online courses in Data science? Are there any suggested Data science courses you can recommend to people?
Aug 11, 7:18PM EDT0

Anyone trying to learn data science is spoilt for options, and there are so many good ones out there. Some good starting points are: Basics of Data science by Coursera, Intro to machine learning by Udacity, and there are the authoritative ones from the master, Andrew Ng on Machine learning and Deep learning.

There are also excellent university courses with all videos/lectures opened to the public, such as Jeff Heer's Data Visualization CSE512 at University of Washington. Do explore and get started somewhere. Good luck!

Aug 13, 10:21PM EDT0
What is a decision tree? What are the steps to be followed to make a decision tree?
Aug 11, 7:26AM EDT0

Decision trees are very popular in data analytics since they provide a simple visual way to trace the path to a particular decision. By asking a series of simple questions (with 2 or more answers), one can find out what is likely to happen in a given scenario. Here is a good example by NYT from the past US Elections titled, 512 paths to the White House:

Every statistical/analytical package has ready modules that implement variants of the decision tree algorithms, and here is a tutorial for RPART.

Aug 11, 2:24PM EDT0
What is a root cause analysis?
Aug 10, 7:49PM EDT0

Root cause analysis is an attempt to find out why something happened. In enterprises, it is commonly used when one investigates why an issue, an unfavorable incident or an escalation happened. In the analytics context, diagnostic analysis is a pattern of discovery that attempts to do the same. An example: root cause analysis to find why a team has missed SLA in timely closure of customer service tickets.

Aug 11, 2:12PM EDT0
How many levels of data does your application support and why is this a vital feature of the application?
Aug 10, 9:48AM EDT0

Enterprises have data stored in various places and at multiple levels, so if one needs to get a holistic perspective of the business they need to combine all these sources. This is a common requirement in our projects and our platform offers the ability to do this mashup of data.

After the analytics is performed, when the insights are being consumed, we enable access at the right level that a user needs, to get their job done. Users have the option to drill-in and access deeper levels of data, on demand. This keeps the user experience good and helps avoid information overload.

Aug 11, 2:46PM EDT0
What is the present scope of data science and how it would be in the future?
Aug 10, 2:10AM EDT0

Today, every business in the world generates or consumes data, without exceptions. Enterprises are actively using this data not just for their day-to-day operations, but to ensure survival in the market. At the other end, we consumers generate and consume data as well. The economic potential of data science is magnified when you bring these two data streams together (business & consumer, which tend to be highly intertwined). 

In the future, the use of data (for productivity, convenience or commercial gains) would become so pervasive that we won't talk about this as a separate industry. Perhaps the way technology as a whole underpins every discipline, and beyond.

Aug 11, 2:38PM EDT0
What do you mean when you say that your application offers a systematic and intuitive workflow?
Aug 9, 5:46PM EDT0

We place a lot of emphasis on UX and Information design in our data science applications. Telling an effective data story needs a lot of planning, profiling of users, requirements understanding and creation of a narrative that is systematic, and in line with how a user would expect to use the application.

An intuitive interface should avoid the need for elaborate explanations or extensive user training. While newer forms of information presentation call for some onboarding and handholding of users the first time, we try to make it a breeze on subsequent usage and save time for users on insights discovery. 

These are subtle but important aspects, and I've written in detail about the top 6 reasons for the failure of data visualization projects.

Aug 11, 1:53PM EDT0
What are the prerequisites for starting out in deep learning?
Aug 9, 10:57AM EDT0

Its essential to be good at programming (Python would be your best bet), and brush up your basic statistics. It helps to have some exposure to machine learning (supervised learning to start with) and the model engineering process.

While all this should get you started building DL models and demonstrating cool stuff, for getting deeper into practical applications, you'll need some math (linear algebra and calculus). The Coursera specialization in Deep Learning is a recommended resource. Good luck!

Aug 11, 1:20PM EDT0
What are the real applications of deep learning?
Aug 9, 6:02AM EDT0

The potential for deep learning (DL) is huge and it can be applied wherever good volumes of labeled data are available. Four major areas where it is already being used are:

  1. Recommendation systems: Most major social and consumer websites (FB, Amazon, Netflix..) have adopted DL based recommendation systems, though they started with more traditional ones.
  2. Image recognition: Computer vision is a sweet spot for deep learning and all types of image-video recognition leading up to self-driving cars use DL.
  3. Speech recognition: DL powers a lot of conversational systems and recognition modules in them.
  4. Natural language processing: Making sense of human-generated text and detecting intent or anomalies (fake news..) is being given to DL.
Aug 11, 12:56PM EDT0
What is the difference between deep learning and machine learning?
Aug 9, 5:16AM EDT0

Deep learning is a subset of machine learning and there is some overlap between the two. One critical difference is in the amount of data needed. While statistical methods and many techniques of machine learning work fine with very small data (even dozens of rows), deep learning has a huge appetite.

Deep learning may be lower in effectiveness for very small data, but once it is fed with abundant volumes, it peaks in performance. Here is a slide from Andrew Ng that demonstrates this well.

Aug 11, 12:47PM EDT0
What process must one follow in order to train the product to speak in a certain tone?
Aug 9, 1:18AM EDT0

I guess you're talking about software product focus and roadmap to maintain a standard tone/consistency.

The approach we took for our analytics platform was different from traditional product development. We provide consulting and services on top of our analytics platform, and hence we built both in parallel. In our very first client project, we implemented a MVP version of our platform. With every engagement, we contrasted client needs with the market requirements and took a call on what goes into the platform (standardized), while all others became a custom build for the specific client.

This way, our platform has evolved based on direct market needs and has been validated in every implementation. The platform roadmap and continuous prioritization of features helped bring in some direction.

Aug 11, 1:33PM EDT0
What is deep learning? Why is it so popular?
Aug 8, 10:16PM EDT0

Deep learning is a sub-discipline of machine learning. The key intent here is to make algorithms learn the deeper aspects of a problem. For example, to recognize a human face, the algorithm tries to learn the features like type of eyes, shape of eybrows, structure of nose, jawlines and even other deeper aspects which we humans may not be able to spell out.

The attempt is to make algorithms 'intelligent', perhaps similar to the way a human brain solves the same problem. So, this needs a lot of data and guesswork on what may be useful and work, versus what might end up as a distraction for the model (technically called 'over-fitting'). While we are far away from mimicking our brain, the small progress made looks revolutionary already.

Since deep learning involves learning more fundamental aspects of a problem, it is being used in a lot of practical applications like speech recognition, self-driving cars, understanding images and much more. Here is answer by Andrew Ng on Quora on about deep learning.

Aug 11, 12:31PM EDT0
Which are the most common problems that you are asked to solve?
Aug 8, 12:53AM EDT1

There are 3 broad kinds of questions that clients come to us with:

  1. What happened? This includes questions like, what was unusual in my business last month? Is there an underlying shift in the market I need to know?
  2. Why did it happen? Finding out the reasons or root causes. What caused my profit to decline? Why are customers leaving?
  3. What will happen? This includes the forward-looking questions. What will be my sales next quarter? What product will my customers buy next?

There are hundreds of ways and techniques to apply and solve these questions. But a common expectation is to come up with clear answers that can help drive recommendations and decisive actions.

Aug 11, 12:11PM EDT0
What is the ideal composition of Data Science teams? What traits are most important when choosing a team leader?
Aug 7, 9:14PM EDT0

Every data science team needs a certain mix of skills to identify the best approach, extract insights and communicate it to drive action. To achieve this the skills needed are: domain-business, statistics-machine learning, programming and design. Though all of these are mandatory, the extent of expertise in each depends on the problem. You can read more about the skills and roles in data science teams in my article.

As to the traits of a team leader: a) passion for data, b) curiosity to solve problems, c) empathy for users who will use the solution and d) excellent communication skills to understand and bridge gaps.

Aug 11, 12:02PM EDT0
Anonymous

How does coming from tier 4 university(undergrad) in India with a good profile get someone to even look at their resume for a Data Science role.

Aug 7, 9:00AM EDT0

I've seen many candidates who have made it into good data science jobs inspite of coming from not-so-favorable backgrounds or lower ranked institutions. Thanks to the web, there are several ways to make a mark and turn this into an advantage in one's job search. I can suggest 4 steps to greatly improve one's chances:

  1. Read, write and get conversant with data: It’s critical to gain breadth in analytics, in addition to depth in a few core techniques. Explore avenues like blogs, books, podcasts or videos, and this can make up for lesser exposure or shortcomings from an undergrad school.
  2. Execute your own data projects: It’s insincere to not back passion for data with a personal project, even if rudimentary. One just needs to pick a problem and apply the analytics learnings as a small side project.
  3. Display your wares in public: Put up your code on Github, show your technical competence by busting bugs on Stackoverflow. Flaunt your data visualisation skills on the public visualisation portals. Show your thought leadership in data science by penning blogs on Medium. This increases a candidate's visibility to recruiters, and breaks the limitations of a resume.
  4. Get competitive with data scientists: It’s raining data contests across skills, like Kaggle, InformationIsBeautiful and open data competitions. From a skill-building perspective, there is no better way to learn than by applying. This is yet another way to get noticed by potential employers.

 

Finally, attending data science events and networking with professionals in the field is a good way to get introductions to companies, as opposed to submitting on the careers page and waiting indefinitely. With the resources within one's reach this is very much doable and just needs some practice and effort. For more details on showing off one's love for data, take a look at this article.

Hope this helps, and Good luck!

Aug 10, 12:36AM EDT0
What are the most important ethical concerns about the use of machine learning methods?
Aug 7, 5:09AM EDT0

ML usecases are still getting mainstream but the ethical concerns around them is already a long list. In my opinion, the top 3 are:

- Bias in algorithms: An algorithm is as good as its training data. Data from a biased human sample would lead to reinforcement of the same bias in a model-driven world. This can multiply the effects of racism, and other stereotypes, and hence could get scary.

- Privacy concerns: Each of us is a real-time source of data, generating data whether we shift, sit or sleep. With so much of data at disposal, algorithms can tease out uncanny insights, at times even before we become conscious about it. With very little regulations, this can escalate into an alarming privacy issue. 

- Fake content: The deep learning techniques getting invented by the day can put fake news creators to shame. These can be remarkably powerful at creating video/image content which could look way more realistic than the original. All the recent buzz around fake news may be reduced to just a teaser before the big bad act.

For more reading, I came across this article that delves into such issues in more detail, and may be worth a look.

Aug 10, 12:05AM EDT0
Which statistical analysis methods are essential to the individuals looking for a job in Data science and Machine learning?
Aug 7, 4:46AM EDT1

I'm glad you asked this question! A firm grip on statistics and fundamentals of data analysis can be invaluable for a career in data science and machine learning. Unfortunately, people often skip or skim through this in a rush to jump into the modeling world.

Answering your question, the selection of a statistical method really depends on the data and scenario. Those that one encounters often are t-test, correlation and ANOVA. Technically, if you consider simple linear regression as a statistical method, that is borrowed into machine learning, then yes it would rank at the top of essential statistical methods.

Aug 9, 11:29PM EDT0
About #TechAMA

Your source for anything and everything in the world of technology.  From cellphone reviews to the Oculus Rift. You won’t want to miss one AMA Event here!

Interested in participating? Our user-friendly channel makes it possible for anyone to create an AMA on just about any imaginable topic that’s relevant to technology--and you. Just click the button on top right called CreateAMA.

The #TechAMA channel is owned and operated by AMAfeed, LLC.