03 Dec Co-Responsibility in Hybrid IT
Operational Continuity takes a Village
Today’s post departs from my current stream of topics because I am thinking about this subject often lately. I apologize for the ambiguous title, but I think it encapsulates what I want to talk about. “Hybrid IT” is a way of describing the technology supporting an organization in which the systems and capabilities that support the business are deployed in a combination of on-premise data centers, often in separate places, and “cloud” locations both geographically and legally separated. The legal separation is often defined by separate company charters and articles, and mediated by contracts between the entities. These contracts may define uptime and recovery service levels to be maintained, and the separate and mutual responsibilities in keeping technology up and running. These agreements are commonly known as Service Level Agreements or SLAs.
Business continuity can suffer greatly from simple failures, especially when they occur at critical nodes such as Domain Name Servers, telephony systems and primary storage systems. Authentication problems arising from forgotten policy settings can bring a company to its digital knees. It takes a village to troubleshoot many types of system failures, and the more complex the systems portfolio, the more important it is to formalize the co-responsibility model.
|Understanding Context Cross-Reference|
|Click on these Links to other posts and glossary/bibliography references|
|Prior Post||Next Post|
|Environmental Awareness for AI Geeks||Measuring Knowledge|
|responsible Hybrid IT||Tech Target Cloud|
|ambiguity escalation||LinkedIn discussion|
|Service Level Agreement||Joe Hertvik|
i.e. to be responsible to one’s commanding officer
i.e. to be responsible for a mistake
i.e. a responsible adult
(C16: from Latin responsus, from respondere to respond)
- Integration – As an EA, I would like to choose the best platform for the function I am trying to put on the cloud. Different functions may go to different vendors (on their clouds) but for my enterprise, it is crucial that they integrate well.
- Customisation – Even if the software my enterprise chooses is the best of the breed, I may still want to customise it for my organisation. How much flexibility would this arrangement offer?
- Control – What if the cloud platform we use for a particular function not up to scratch? How easy is it to move away from one vendor to another?
- Management Reporting – are there any platforms available, again in the cloud possibly, that could source data from disparate applications all over and put together management reports tailored for my enterprise needs?
I am sure there are other challenges too.”
- Level 1 support often receives calls, creates tickets and resolves those for which adequate triage scripts enable the help desk generalist to perform the work. If access restrictions prohibit help desk personnel from the necessary access, the ticket will be assigned to someone who has the needed access. If the complexity of the problem is beyond the generalist’s ability, the ITSM tool should provide specific guidance as to what person or group is qualified and available to handle the ticket. Mature organizations reduce as many common or recurring issues to scripts so they can be handled by the generalists without escalating to specialists.
- Level 2 support is for issues beyond the ability of the generalists or existing triage scripts, and requires either special skills or special access, or both to resolve. Many IT shops have several technical people who are capable of and assigned to handling trouble tickets. The more complex the IT systems portfolio, especially the more diverse the technologies in the mix, the larger this group needs to be to avoid more expensive Level 3 support escalations.
- Level 3 support is for problems that require the gurus. People with deep expertise in specific technologies, often outside experts from system vendors, are the big guns needed to solve the most vexing problems.
Joe Hertvik defines level 0 and level 4 support as well. Level 0 is self-service and may include automated password reset, web forms for requesting IT support, and FAQ or knowledge base lookup. Level 0 support requires no Help Desk technician. Level 4 involves hardware and/or software vendors for specialized application support, printer, copier and other equipment maintenance. Level 4 support, sometimes called depot maintenance, is contracted by an organization for specific services, but they are not part of the organization. “Generally speaking, the bigger the organization the more stratified these roles” (Joe Hertvik). A good definition of escalation is available on a Washington.edu wiki.
Tools like ServiceNow, Jira, Cherwell and Heat support these capabilities. When a problem can be easily isolated to a specific machine, application, disk or database, troubleshooting can be pretty straightforward. Sometimes, however, the root cause is not apparent and a team is needed to perform repairs and restore service.
Tools like SolarWinds, Splunk, Microsoft SCOM provide ongoing streams of system health information. Systems like these can help monitor network, server and workstation performance, optimize applications and databases, and oversee security. The capabilities range from electronic asset discovery (an automatic way to document the IT portfolio) to predictive intelligence on system failure modes and security breaches. These tools are getting smarter and smarter every year, but human experts are still needed for troubleshooting almost all complex problems. Monitoring is an indispensable component for maintaining operational continuity, but visibility needs to be shared with experts who may be from internal and vendor organizations to maximize the value of collaboration.
With the rapid progress of technology innovation and the frequent penetration of sensitive resources by malicious or otherwise unwanted hackers, many systems need periodic patches and updates. Sometimes these are managed by the vendors themselves, with automated access to the networks and devices that need patches and updates. Sometimes this is done using internal IT managed configuration tools such as MS SCCM, sometimes in ITSM tools such as ServiceNow, and sometimes in Monitoring tools such as Solarwinds. The Hybrid IT organization is likely to have multiple sources of configuration information across premise and cloud-based capabilities. The ideal scenario is for all this information to be accessible in a single place. This is seldom the case in real life, so the next best thing is for collaborating experts to be able to use a desktop sharing tool, such as WebEx, or other communication mechanism to be able to share critical configuration data (and monitoring data if available) with other team members in a collaborative troubleshooting session.
Control centers like NASA Mission Control in Houston or Strategic Air Command in Omaha once served as the model for IT support centers. With the experts geographically separated, the room with banks of large monitors is no longer feasible, nor needed. In lieu of this, however, there are many times when real-time communications and data sharing are needed to troubleshoot complex problems, or even simple problems in complex environments. That’s when electronic conferencing comes in handy. WebEx, Join.me, Skype, Google Hangouts and other tools provide good solutions for enabling the experts to combine their knowledge and insights to solve complex problems more efficiently.
- Keeping the configuration database up to date, including external vendors’ integration points
- Ensuring monitoring coverage for the entire critical portfolio