Some data centre managers were never sold on first generation data centre infrastructure management (DCIM) tools because the tools were limited in scope and involved considerable human intervention. These first generation tools would generate a pre-loaded list of devices and warn that a CRAC unit inlet temperature had exceeded an established threshold. The operator would then have to determine on their own what equipment was affected by the error. The tools were not capable of generating a correlation between a given physical infrastructure device and a server. Nor were these tools capable of initiating actions to prevent downtime, such as speeding up fans to dissipate a hot spot or moving VMs away from a rack with compromised power or cooling.
Newer management tools are designed to identify and resolve issues with a minimum amount of human intervention. With less need for on-going process and manual intervention, today’s DCIM systems are more likely to provide the business value they’re advertised to give. However, not only are they more automated than before, their scope has also been greatly extended making them much more useful and capable of providing value. By correlating power, cooling and space resources to individual servers, DCIM tools today can proactively inform IT management systems of potential physical infrastructure problems and how they might impact specific IT loads. Particularly in a highly virtualized and dynamic cloud environment, this real-time awareness of constantly changing power and cooling capacities is important for safe server placement. These more intelligent tools also enable IT to inform the lines of business of the consequences of their actions before server provisioning decisions are made. Business decisions that result in higher energy consumption in the data centre, for example, will impact carbon footprint and carbon tax. Charge backs for energy consumption are also possible with these new tools and can alter the way decisions are made by aligning energy usage to business outcomes.
Planning tools: understand effect / impact of decisions
Newer planning software tools Illustrate, through a graphical user interface, the current physical state of the data centre and simulate the effect of future physical equipment adds, moves, and changes. This capability provides answers to some common and important planning questions such as, “where should the next physical or virtual server get placed”, or “what will the impact of new equipment be on my redundancy and safety margins?” For example, modern planning tools can predict the impact of a new physical server on power and cooling distribution. Planning software tools also calculate the impact of moves and changes on data centre space, and on power and cooling capacities. This enhanced planning based on modeling and simulating proposed changes can save time, effort and money when compared to just simply making the change and hoping for the best.
These planning tools also help operators understand the impact of potential failures. Business executives and data centre operators share the goal of maintaining operational integrity even when failures occur in the data centre. Insight into the impact of potential failures helps business management feel secure about business process availability. More importantly, this insight can help data centre operators prepare for problems to shorten recovery times, or even avoid them in the first place. Simply put, planning tools help maintain business continuity while providing peace of mind.
Operations tools: completing more tasks in less time
New automated workflow tools allow operators to assign work orders, reserve space, track status, and extract an audit trail for complete visibility and history to the change cycle when equipment goes in and out. These tools facilitate, automate or supplement existing operations processes to help ensure there is always sufficient power, cooling, and space resources even as the data centre changes and evolves over time. And, in part thanks to the tools, this is done without wastefully over provisioning infrastructure capacities as has been traditionally done.
Today’s planning & Implementation software management tools allow for improvements to standard operational procedures that helps get more done in less time. Here are two examples of how this might work. The traditional method for tracking IT room equipment in/out logs involves either removal or installation of a device and then logging the device into a book (by a designated person). This procedure is followed for any device the size of a disk/tape and larger. All drive bays are audited nightly by security and if drives go missing, security reviews the access logs and server room security footage to see who might have taken them. Operations software can provide data centre inventory information from a hand held device while on the data centre floor. An integrated barcode scanner simplifies the task of implementing work orders and identifying equipment. Using a wireless network, server locations are automatically synchronized, and device and asset attributes are detailed. Searches can be run by equipment vendor name, model, and type. Information can also be exported to an Excel format.
Next, consider a scenario where the data centre operator is attempting to determine the overall health of the power and cooling physical infrastructure. In a traditional data centre the operator would have to measure and interpret the health of each individual device. This measurement information would have to be kept on spreadsheets. The data would have to be manually aggregated for reporting. Management tools are capable of 7×24 centralized device discovery, management, and monitoring. When problems occur, instant infrastructure alerts and alarms are triggered based on user defined thresholds and conditions. Reports and graphs are quickly generated to help diagnose the nature of the problem.
Analysis tools: identifying operational strengths and weaknesses
The goal with analysis is to arrive at an optimal or realistic decision based on data. For example, an audit trail can be generated for all changes to assets within the computer room. If a spike in power demand seems to occur on the same rack at the same time every night, and the spike is dangerously close to tripping a breaker threshold, then a decision can be made to modify workflow so that the consumption peak for that rack can be reduced.
Analysis of physical infrastructure operational data can also determine the cause of problems (i.e., what is slow, what is costly). Combining analytics and predictive simulation is yet another way the data centre can help to generate business value. Performance reports track outages, for example, by rack, row, and power distribution zone. When servers fail more frequently in one area, an underlying reason can be determined. Without a frame of reference, the value of data centre metrics is limited if the purpose of the operator is to raise efficiency and reduce data centre cost.
Some of the common questions analytics tools can answer include things like, “what do I have in my data centre”, “when will I run out of power and cooling capacity”, “do I have any stranded power, cooling or space capacities”, and “when will the next large infrastructure investment be needed?”
Holistic management capabilities described above (and available today) can enable data centre professionals to maximise their capacity to control their energy costs and to advise the business on how to utilize IT assets more effectively. By sharing key data points, historical data, and asset tracking information, and by developing the ability to charge back users, these newer tools allow users to take actions based upon data centre business intelligence. In short, effective use of today’s data centre IT infrastructure management software will help make your data centre more reliable and efficient while increasing its overall business value.
(This primer was written by Soeren Brogaard Jensen, VP Enterprise Software –IT Business, Schneider Electric)