How to monitor cloud infrastructure — AMA with Thomas Stocking

Jul 16, 2018

Overwhelmed with the task of maintaining your physical infrastructure as networks expand into the cloud and hybrid-cloud systems? Thomas Stocking, VP of Product Strategy at GroundWork, Inc., can answer your questions about how to best monitor your cloud infrastructure and get maximum performance from your entire IT infrastructure - from apps, VMs, servers, and containers, to network and storage devices.


What is the Vibe of this AMA? What is AMA Vibe?

This AMA has finished, no more comments and questions can be posted and votes submitted to those. Check other similar AMAs here or host your own AMA!

Conversation (48)

In three easy steps and under a minute you could be hosting your own AMA. Join our passionate community of AMA hosts and schedule your own AMA today.

Let's get started!

What cloud system would you recommend for a start up that does not have much in terms of data yet?
Jul 22, 11:01AM EDT0
Technology keeps changing and sometimes it is hard to keep up for some people. Do you think it needs to slow down a little or is it the people who should pick up their pace?
Jul 22, 1:00AM EDT0
How does traditional software licensing apply in the cloud world?
Jul 21, 1:11AM EDT0
How is cloud monitoring different than server monitoring?
Jul 20, 4:49PM EDT0
How does cloud computing affect budget predictability for CIOs?
Jul 19, 1:57AM EDT0
What are some of the factors that companies need to consider when selecting a cloud storage platform?
Jul 19, 12:29AM EDT0
What has been the biggest challenge you've witnessed that organizations face as they migrate to cloud computing?
Jul 18, 4:24PM EDT0
What are useful free and open-source tools for devops and sysadmin folks?
Jul 18, 7:12AM EDT0
What tools, or type of tools do you consider crucial for a successful DevOps strategy?
Jul 18, 7:03AM EDT0
How would you describe your career trajectory in the tech industry and how did you reach the position that you are at now?
Jul 17, 11:19PM EDT0
What cloud management software are people using today? What are their common uses?
Jul 17, 9:08AM EDT0
What are the key features and capabilities needed to monitor modern cloud-based applications and infrastructure?
Jul 16, 2:26AM EDT0

This is clearly a moving target, and very much depends on how "native" your cloud app actually is. That being said, there are at least some fundamentals you will need to see:

1) Some way to track throughput

This can be as simple as counts of requests, or transactions processed. This will obviously vary a lot with your use case (do you log requests? transactions? do you use queues, etc?), but at a minimum you should be able to get that data on a fairly frequent basis and graph it, for context at a minimum. 

2) Storage monitoring

Just because a storage is elastic, it doesn't mean you shouldn't watch how much is getting stored. Simple errors like forgetting to reset a debug flag on a log can quickly consume many gigabytes. RDS in EC2, for example, can tell you how much data you are committing - you should watch for it peaking when you don't expect it. 

3) Health checks on Micro services

Most frameworks for micro services are capable of telling you with a simple query whether they are healthy. In the cloud, that's often available in the API of the cloud services manager. Your micro-services (or meshed services) should be able to check in or be checked, and the monitoring tool should have a way to do that. 

4) Somewhat related to 1), but there should be a threshold on backlog of transactions. In other words, don't just watch the throughput, watch backlog. It will tell you when you need more resources faster than any detailed measurement from deeper in the apps. 

I'm probably leaving out a host of things, here, but these are the ones that I've seen bite my customers and friends. 

Of course I'm talking about monitoring metrics with time series and events, here. I'm not talking about a full observability system. That's another order of being (and cost) altogether. 


Jul 17, 9:59PM EDT0
What are the differences/advantages/disadvantages between cloud application monitoring vs server monitoring?
Jul 16, 1:30AM EDT0

Interesting area. Sorry for the long answer, but it's necessary in this case.

First of all, cloud applications can be of various types and levels. Some are mere ports of legacy apps to the cloud, some are optimized, and some are cloud-native. There's a spectrum, and depending on where you app is on it currently, your optimal monitoring methods will vary. 

For ported apps, you will probably monitor them in much the same way you would in the data center, that is, by looking at log files, open ports or interfaces, APIs, web portals and possibly internal counters such as JMX for queue depths and the like.  You will also want to track utilization numbers, like accepted/rejected logins, overall web requests, transaction throughput, and, if availabe and appropriate, cost and earning data. These app monitoring methods haven't really changed much for the past few years. 

Of course, in a data center you would also monitor the health of the server (CPU, RAM, Disk, processes, etc). This is less important in the cloud, but in the case of a ported app, you will want to look at the health of the virtual instance or VM in a similar way. 

You will also want to look at the hypervisor, and see what it has to tell you. Typically VMs are over-provisioned (especially for RAM), since the hypervisor can make good use of the resources and thus cut down on waste and inefficiency. The management counters for RAM management in the hypervisor can tell you a lot about how your app is performing. 

If your app is optimized, you may have more need to probe the hypervisor for performance data that is specific to the app, and will likely have less concern for the health of the VM, since optimized apps and cloud native apps are less instance-dependent (they are usually build to tolerate faults and scale transparently). The other app monitoring methods still apply, though. 

For truely native apps, monitoring becomes both simpler and more complex. Simpler, since you have almost no concern for the server or VM. More complex, since you will be making use of the more advanced features of the cloud provider (queue services, lambdas, hosted databases, query services, etc.). The tools that the cloud provider offers in this context are quite valueable, though you may want to persist data for longer periods than they typically let you do. 

GroundWork has capabilities in all these areas. 

Jul 16, 4:03PM EDT0
Does GroundWork include dashboards that provide insights at a glance and allow you to dig deeper as necessary to explore issues?
Jul 15, 9:49PM EDT0

We do. We have several dashboards to choose from:

Our own portlet-based dashboards provide summaries of status and performance data, and list the most recent events for each level of summary. These support drill down and remote access, and links to document repositories and run-books. 

SLA dashboards allow you to create high-level views of the status of entire monitored lines of business, or summary status.  

The Event console supports custom workflow and ticketing system integration, and the filtering it supplies can make it useful as a dashboard as well. 

We also support Grafana dashboards for all the performance information we gather as time series, as well as whatever you want to post into the included InfluxDB database. 

There's more - you can use any integrated open source tool's dashboard as well, in a portal page of its own. 

See for some context.  

Jul 16, 3:45PM EDT0
Is there any tool that can facilitate alerts for the cloud infrastructure, for both health monitoring and cost monitoring?
Jul 7, 6:22PM EDT0

Hi Maria87,

There are several tools, and GroundWork is one. We offer the GroundWork Cloud Hub, which when combined with NoMa, our notification monager, gives you the ability to monitor health checks and metrics from your cloud provider's API. 

While getting true cost estimates is always a challenge, if the API contains cost metrics, these can be monitored exactly like any other metric. In fact, even if you can only get the data using the CLI, you can still monitor it using GroundWork tools like the GDMA (GroundWork Distributed Monitoring Agent). 

Jul 12, 2:47AM EDT0
What are some ideas for final year projects based on cloud computing, virtualization, or Linux?
Jul 7, 9:03AM EDT0

I think the rest of this year will see a lot more work going into the automation of monitoring deployments.  I'm aware of many projects in this area. 

At GroundWork, we are working in a similar vein: updating the GroundWork Monitor product to fit more use cases, and enabling automation of deployment for cloud and virtual environment monitoring, as well as data persistence. 

Jul 14, 4:41PM EDT0
Why is cloud infrastructure so limited in terms of application performance control?
Jul 7, 5:30AM EDT0

Well, I think it really depends on how you architect your applications. Are they ported to the cloud, or cloud-native? Do they use lambda code, or run on virtual machines? Or containers? Are they micro-services enabled? How rich are the APIs, and do they interface to the cloud provider APIs? 

I think the issue isn't strictly the limits of application performance control, but the number of choices you have. There's a lot of trade-offs to consider too. Like containers for example: it's easy to spin up and down and do horizontal scaling, but you need good APIs, a way to manage configurations, auth/auth, and other considerations unique to the container tech you choose. 

It's never simple when you have so many choices, and so the limits start to creep in. I'm hopeful that some of this will shake out a bit more, and the limits will lift further on the "winners" of that shakeout. 

Jul 14, 5:36PM EDT0

How would you highlight your company’s competitive advantages? What makes it stand out from the crowd?

Jul 5, 5:53AM EDT0
What type of marketing has worked best to promote GroundWork?
Jul 5, 4:59AM EDT0

Generally, the best marketing is word-of-mouth. We find that allowing a version of GroundWork to be freely downloaded and used (albeit limited in some ways compared to the commercially licensed version) makes users our best spokespeople. 

We have tried a lot of different approaches, and done marketing at many levels. What we have found is that when we combine content that is really useful with software people can put in place themselves, we reach the qualified, ready-to-proceed base that really sustains and grows our business. 

Jul 16, 11:32AM EDT0
What are the features that make GroundWork Monitor® the right solution for infrastructure availability and performance issues?
Jul 5, 2:10AM EDT0
About #TechAMA

Your source for anything and everything in the world of technology.  From cellphone reviews to the Oculus Rift. You won’t want to miss one AMA Event here!

Interested in participating? Our user-friendly channel makes it possible for anyone to create an AMA on just about any imaginable topic that’s relevant to technology--and you. Just click the button on top right called CreateAMA.

The #TechAMA channel is owned and operated by AMAfeed, LLC.