NetWorker to the Cloud

May 16, 2015, 12:43 am

Introduced alongside NetWorker 8.2 SP1 is integration with a new EMC product, CloudBoost.

The purpose of CloudBoost is to allow a NetWorker server to write deduplicated backups from its datazone out to one of a number of different types of cloud (e.g., EMC ECS Storage Service, Google Cloud Storage, Azure Cloud Storage, Amazon S3, etc.) in an efficient form.

The integration point is quite straight forward, designed to simplify the configuration within NetWorker.

A CloudBoost system is a virtual appliance that can be deployed within your VMware vSphere environment. The appliance is an “all in one” system that includes:

NetWorker 8.2 SP1 storage node/client software
CloudBoost management console
CloudBoost discovery service

One of the nifty functions that CloudBoost performs in order to make deduplicated storage to the cloud efficient is a splitting of metadata and actual content. The metadata effectively relates to all the vital information the CloudBoost appliance has to know in order to access content from the object store it places in the selected cloud. While the metadata is backed up to the cloud, all metadata operations will happen against the local copy of the metadata, thereby significantly speeding up access and maintenance operations. (And everything written out to cloud is done so using AES-256 encryption, keeping it safe from prying eyes.)

A CloudBoost appliance can logically address 400TB of storage in the cloud pre-deduplication. With estimated deduplication ratios of up to 4x for data analysis performed by EMC, that might equate to up to 1.6PB of actual stored data, and it can be any data that NetWorker has backed up.

Once a CloudBoost appliance has been deployed (consisting of VM provisioning and connection to a supported cloud storage system), and integrated into NetWorker as storage node with in-built AFTD, getting long-term data out to the cloud is as simple as executing a clone operation against the required data, with the destination storage node being the CloudBoost storage node. Since the data is written to the CloudBoost embedded NetWorker Storage Node, recovery from backups that have been sent to the cloud is as simple as executing a recovery with the copy on the CloudBoost appliance being selected to use.

In other words, once it’s been setup, it’s business as usual for a NetWorker administrator or operator.

To get a thorough understanding of how CloudBoost and NetWorker integrate, I suggest you read the Release Notes and Integration Guide (you’ll need to log into the EMC support website to view those links). Additionally, there’s an excellent overview video you can watch here:

↧

Pool size and deduplication

May 20, 2015, 3:21 am

≫ Next: Basics – Configuring a reports-only user

≪ Previous: NetWorker to the Cloud

When you start looking into deduplication, one of the things that becomes immediately apparent is … size matters. In particular, the size of your deduplication pool matters.

In this respect, what I’m referring to is the analysis pool for comparison when performing deduplication. If we’re only talking target based deduplication, that’s simple – it’s the size of the bucket you’re writing your backup to. However, the problems with a purely target based deduplication approach to data protection are network congestion and time wasted – a full backup of a 1TB fileserver will still see 1TB of data transferred over the network to have most of its data dropped as being duplicate. That’s an awful lot of packets going to /dev/null, and an awful lot of bandwidth wasted.

For example, consider the following diagram being of a solution using target only deduplication (e.g., VTL only or no Boost API on the hosts):

In this diagram, consider the outline arrow heads to indicate where deduplication is being evaluated. Thus, if each server had 1TB of storage to be backed up, then each server would send 1TB of storage over to the Data Domain to be backed up, with deduplication performed only at the target end. That’s not how deduplication has to work now, but it’s a reminder of where we were only a relatively short period of time ago.

That’s why source based deduplication (e.g., NetWorker Client Direct with a DDBoost enabled connection, or Data Domain Boost for Enterprise Applications) brings so many efficiencies to a data protection system. While there’ll be a touch more processing performed on the individual clients, that’ll be significantly outweighed by the ofttimes massive reduction in data sent onto the network for ingestion into the deduplication appliance.

So that might look more like:

I.e., in this diagram with outline arrow heads indicating location of deduplication activities, we get an immediate change – each of those hosts will still have 1TB of backup to perform, but they’ll evaluate via hashing mechanisms whether or not that data actually needs to be sent to the target appliance.

There’s still efficiencies to be had even here though, which is where the original point about pool size becomes critical. To understand why, let’s look at the diagram a slightly different way:

In this case, we’ve still got source deduplication, but the merged lines represent something far more important … we’ve got global, source deduplication.

Or to put it a slightly different way:

Target deduplication:
- Client: “Hey, here’s all my data. Check to see what you want to store.”
Source deduplication (limited):
- Client: “Hey, I want to backup <data>. Tell me what I need to send based on what I’ve sent you before.”
Source deduplication (global):
- Client: “Hey, I want to backup <data>. Tell me what I need to send based on anything you’ve ever received before.”

That limited deduplication component may not be limited on a per host basis. Some products might deduplicate on a per host basis, while others might deduplicate based on particular pool sizes – e.g., xTB. But even so, there’s a huge difference between deduplicating against a small comparison set and deduplicating against a large comparison set.

Where that global deduplication pool size comes into play is the commonality of data that exists between hosts within an environment. Consider for instance the minimum recommended size for a Windows 2012 installation – 32GB. Now, assume you might get a 5:1 deduplication ratio on a Windows 2012 server (I literally picked a number out of the air as an example, not a fact) … that’ll mean a target occupied data size of 6.4GB to hold 32GB of data.

But we rarely consider a single server in isolation. Let’s expand this out to encompass 100 x Windows 2012 servers, each at 32GB in size. It’s here we see the importance of a large pool of data for deduplication analysis:

If that deduplication analysis were being performed at the per-server level, then realistically we’d be getting 100 x 6.4GB of target data, or 640GB.
If the deduplication analysis were being performed against all data previously deduplicated, then we’d assume that same 5:1 deduplication ratio for the first server backup, and then much higher deduplication ratios for each subsequent server backup, as they evaluate against previously stored data. So that might mean 1 x 5:1 + 99 x 20:1 … 164.8GB instead of 640GB or even (if we want to compare against tape) 3,200GB.

Throughout this article I’ve been using the term pool, but I’m not referring to NetWorker media pools – everything written to a Data Domain as an example, regardless of what media pool it’s associated with in NetWorker will be globally deduplicated against everything else on the Data Domain. But this does make a strong case for right-sizing your appliance, and in particular planning for more data to be stored on it than you would for a conventional disk ‘staging’ or ‘landing’ area. The old model – backup to disk, transfer to tape – was premised on having a disk landing zone big enough to accommodate your biggest backup, so long as you could subsequently transfer that to tape before your next backup. (Or some variant thereof.) A common mistake when evaluating deduplication is to think along similar lines. You don’t want storage that’s just big enough to hold a single big backup – you want it big enough to hold many backups so you can actually see the strategic and operational benefit of deduplication.

The net lesson is a simple one: size matters. The size of the deduplication pool, and what deduplication activities are compared against will make a significantly noticeable impact to how much space is occupied by your data protection activities, how long it takes to perform those activities, and what the impact of those activities are on your LAN or WAN.

↧

Basics – Configuring a reports-only user

May 25, 2015, 2:26 am

≫ Next: Tales from the trenches – Tape

≪ Previous: Pool size and deduplication

Something that’s come up a few times in the last year for me has been a situation where a NetWorker user has wanted to allow someone to access NetWorker Management Console for the purpose of running reports, but not allow them any administrative access to NetWorker.

It turns out it’s very easy to achieve this, and you actually have a couple of options on the level of NetWorker access they’ll get.

Let’s look first at the minimum requirements – defining a reports only user.

To do that, you first go into NetWorker Management Console as an administrative user, and go across to the Setup pane.

You’ll then create a new user account:

Within the Create User dialog, be certain to only select Console User as the role:

At this point, you’ve successfully created a user account that can run NMC reports, but can’t administer the NetWorker server.

However, you’re then faced with a decision. Do you want a reports-only user that can “look but don’t touch”, or do you want a reports-only user that can’t view any of the NetWorker configuration (or at least, anything other than can be ascertained by the reports themselves)?

If you want your reports user to be able to run reports and you’re not fussed about the user being able to view the majority of your NetWorker configuration, you’re done at this point. If however your organisation has a higher security focus, you may need to look at adjusting the basic Users NetWorker user group. If you’re familiar with it, you’ll know this has the following configuration:

This usergroup in the default configuration allows any user in the NetWorker datazone to:

Monitor NetWorker
Recover Local Data
Backup Local Data

The key there is any user – *@*. Normally you want this to be set to *@*, but if you’re a particularly security focused organisation you might want to tighten this down to only those users and system accounts authorised to perform recoveries. The same principle applies here. Let’s say I didn’t want the reports user to see any of the NetWorker configuration, but I did want any root, system or pmdg user in the environment to still have that basic functionality. I could change the Users usergroup to the following:

With this usergroup modified, logging in as the reports user will show a very blank NMC monitoring tab:

Similarly, the client list (as an example) will be quite empty too:

Now, it’s worth mentioning there are is a key caveat you should consider here – some modules may be designed in anticipation that the executing user for the backup or recovery (usually an application user with sufficient privileges) will at least be a member of the Users usergroup. So if you tighten the security against your reports user to this level, you’ll need to be prepared to increase the steps in your application onboarding processes to ensure those accounts are added to an appropriate usergroup (or a new usergroup).

But in terms of creating a reports user that’s not privileged to control NetWorker, it’s as easy as the steps above.

↧

Tales from the trenches – Tape

June 11, 2015, 3:55 am

≫ Next: Celebrating 25 years of NetWorker

≪ Previous: Basics – Configuring a reports-only user

I’m back in a position where I’m having a lot of conversations with customers who are looking at infrastructure change.

What I’m finding remarkable in every single one of these conversations is the pervasiveness of Cloud considerations in data protection. I’m not just talking Spanning for your SaaS systems (though that gets people sitting up and taking notice every time), I’m talking about businesses that are looking towards Cloud to deal with something that has been a feature of datacentres for decades: tape.

I’ve mentioned CloudBoost before, and I personally think this is an exciting topic.

An absolute ‘classic’ model now with NetWorker is to have it coupled with Data Domain systems, with backups duplicated between the Data Domain systems and removing of tape – at least for that daily and weekly backup cycle. Data Domain Extended Retention is getting a lot of interest in companies, but without a doubt there’s still been some people who look at a transition to deduplication as a phased approach: start with short-term backups going to deduplication, and keep those legacy tape libraries around for handling tape-out for monthly backups.

That certainly has appeal for businesses that want to stretch their tape investment out for the longest possible time, especially if they have long-term backups already sitting on tape.

But every time I talk to a company about deploying Data Domain for their backups, before I get to talk about CloudBoost and other functions, I’m getting asked: hey, can we say, look towards moving our long term backups to Cloud instead of tape at a later date?

You certainly can – CloudBoost is here now, and whether you’re ready to start shifting longer-term compliance style backups out to Cloud now, or in a year or two years time, it’ll be there waiting for you.

Over time (as the surveys have shown), backup to disk has increased in NetWorker environments to over 90% use. The basic assumption for years had been disk will kill tape. People say it every year. What I’m now seeing is Cloud could very well be the enabler for that final death of tape. Many businesses are entirely killing tape already thanks to deduplication: I know of many who are literally pulling their tapes back from their offsite storage vendor and ingesting them back into their Data Domains – particularly those with Extended Retention. But some businesses don’t want to keep all those long-term backups on local disk, deduplicated or not – they want a different economic scale, and they see Cloud as delivering that economy of scale.

I find it fascinating that I’m being so regularly asked by people: can we ditch tape and go to Cloud? That to me is a seismic shift on the remaining users of tape.

[Side note: Sure, you’ll find related links from me below about tape. Just because I wrote something 1-3 years ago I can’t change my opinion ]

↧

Celebrating 25 years of NetWorker

June 29, 2015, 12:31 pm

≫ Next: Understanding NetWorker licensing options

≪ Previous: Tales from the trenches – Tape

25 years ago, a post was made onto Usenet to announce the immediate availability of a new backup product called NetWorker. Thanks to Google’s archive of Usenet, you can still read the original post here. There’s a lot of delightful snippets in there and I really recommend anyone interested in the history of NetWorker to take a few minutes out to read it. What perhaps caught my eye most was this very future-looking line in the introduction:

And as your network and volume of files expands, NetWorker has the capacity and performance to handle the load.

I think it’s fair to say that with it now being 25 years old, NetWorker has lived up to that claim. More than that, of course – the product did what it promised and is still going strong today, having been under EMC’s stewardship for the last 11 or 12 years. NetWorker was designed from the ground up to be expandable, reliable, fast and – just as importantly – a framework. This to me is where NetWorker has stood out over the years. It’s easy to create a widget that has everything you think a customer might need. The smarter business though creates a widget that has a lot of what a customer might need and is extensible so the customer can increase its usefulness to their own organisation.

Remember that old adage … don’t skate to the puck: skate to where the puck is going. Framework based products are built on that premise.

The photo above is a bit of a walk down memory lane. There was a time for me when DAT tapes ruled my working life. They were the core technology we used for backups at my first employer, and it’s sitting on a deconstructed 500MB drive that had been the first index disk for the first backup server I administered. As you can imagine based on that photo, the index disk failed. That taught me the virtue of having data protection for your data protection (and just how hard it was to fight for a pair of 2GB drives for index mirrors back in the mid-late 90s).

Funnily enough, the first backup server I administered was even called mars. If you’ve ever looked at a NetWorker man page, you’ll understand the humour. Here’s an excerpt for instance from the mminfo man page:

NetWorker was the first enterprise backup product I ever used, and I started my long history with it in 1996. I joined a Unix system administration team and was told something along the lines of:

We just installed this new backup product. You can administer it.

I think a lot of backup administrators started that way – particularly back then. Backup was something given to the junior administrator. It was the start of a very long relationship with NetWorker, but I can also look back at it now and say that it was operationally wrong to give backups to the most junior person in the group. There’s nothing wrong with juniors being involved in backups, but to be in charge of their overall operation and configuration? That should be a senior administrator.

These days we’ve certainly seen that data protection attitudes have (for the most part) shifted. Now given the volume and variation of data encountered by an average business, the people most equipped to make decisions around backup/recovery (and all other aspects of data protection) are the senior technical experts in the business. To overload an EMC term, I’ve constantly referred to these people over the last few years as the Data Protection Advocates.

My initial stint as a Unix system administrator had me configuring and operating a variety of NetWorker environments across a diverse number of locations in Australia. In addition to my main NetWorker sites in Newcastle, NSW, I had datazones in Sydney, Perth, and far north Queensland, just to name a few areas. Some of those environments were somewhat inimical to technology. One site in far north Queensland for instance would have to get their DLT jukeboxes swapped out every 6-9 months, and would have to air-blast them before sending them back.

A computer room is just somewhere with less than a centimetre of coal dust in it.

It was in those early days that I became a strong user of the NetWorker command line, simply because the network connections between my office and those remote sites were minimal at best. Running a GUI over those links was practically impossible, so I learnt everything from mminfo to nsradmin and manual recovery via a uasm stream from a partially failed tape.

My first NetWorker server was a Sun system, and it was running Networker 4.1. (Or under a rebadging agreement: Solstice Backup.) We had some v3.x clients – we even had some AT&T Unix boxes getting backed up via NFS mounts. So while I’ve not used NetWorker since the very start, I remember the excitement of ‘new fangled’ things such as storage nodes and file type devices. Not advanced file type devices, file type devices. I saw the switches in resource database formats (specifically moving from nsr.res to nsrdb), the changes in media database and index formats, more modules, consolidation in operating systems as the operating systems themselves came and went, and through it all: successful recoveries.

At the end of the day, that’s what NetWorker is all about: being able to recover the data when we need to, and that’s what NetWorker did. There’s no doubt in my mind – I’m aware of some of them myself, including some very large businesses, but I won’t name names – there are many businesses operating today who owe their continued operation to NetWorker facilitating an urgent, mission critical recovery when the chips were down.

But it’s not just me who is celebrating NetWorker being around for 25 years. There’s some great official EMC resources available with stories from other long-term users, examples of prior marketing material and install kits, and an infographic on how far things have come. Take a few minutes out of your schedule and check them out!

↧

Understanding NetWorker licensing options

July 7, 2015, 1:01 am

≫ Next: The lazy admin

≪ Previous: Celebrating 25 years of NetWorker

There are now three types of licensing options you can consider with NetWorker, and it’s helpful to understand the difference between them – particularly if you’re approaching your maintenance/support renewal time.

The three types are:

Traditional
NetWorker Capacity
Data Protection Solutions (DPS) Capacity

Traditional licensing is the ‘classic’ approach where specific features in NetWorker require a license. You start with a base enabler for the NetWorker server (usually Network Edition or Power Edition), then you stack on particular features as required. That might include Data Domain licensing, NetWorker Microsoft Module Licensing (NMM), Standard client licenses, Virtual Client licensing, and so on.

Depending on the number of individual features, you could have to buy multiple instances of particular licenses. For instance, NMM is licensed per server it runs on – so if you have 3 Exchange hosts running a 3-way DAG, and then say, 2 Microsoft SQL databases running on individual servers, you’ll need 5 x NMM licenses.

Traditional licensing is extremely exact in what you get – you get precisely what you pay for, no more – but it’s also a wee bit fiddly and you’re left with no room for changes in your environment. In the world of rapidly changing IT environments, traditional licensing make it difficult for you to adapt to changing circumstances. If someone needs to run up a few extra SQL databases, or deploy an Oracle database, or provide backups for another 100 virtual machines, or … whatever, you have to acquire each necessary license. Then if those requirements change, you’re potentially left with a bunch of licenses you don’t need any more.

NetWorker Capacity licensing is a way of consolidating all your licenses down to a simple question:

If you do a full backup of everything you need to protect, how many TB does that come to?

That’s it. That’s all there is to it. If that number comes to say, 20TB, then you add around 5% (or whatever you believe your growth rate for a year will come to) and you get yourself NetWorker Capacity Licensed to say, 21TB. At that point you can go and do a full backup every day of every system if you wanted and your licenses are still covered. It’s not about the amount of back-end space you utilise for your backups, but simply the front-end space of a single full backup of everything.

The big advantage – no, the huge advantage – here is that once you’ve got that license, you can pretty much work with any NetWorker feature you want. If someone comes up and says they need to backup an Oracle database but forgot to ask for licensing, you can languidly sit back and say “sure”. If someone comes up and says they’re about to virtualise 100 traditional clients and turn them all into virtual clients, you can languidly sit back and say “sure”. If someone decides the business needs to deploy a dozen storage nodes, one for each security zone in a DMZ, you can languidly sit back … well, you get the picture.

NetWorker capacity licensing grants you flexibility to adjust what you backup based on changes to the business requirements. In a world where IT has to adapt faster and faster, NetWorker capacity licensing gives you the power to change.

And then there’s DPS licensing. This is the awesome-sauce licensing model because it gives you more than just NetWorker. DPS licensing is capacity based as well: if a full backup of everything in your environment you want to protect is 200TB, then you again look at say, 5% for growth and get a DPS capacity license for say, 210TB and then…

…you can back it up with NetWorker

…you can back it up with Avamar and NetWorker

…you can back it up with NetWorker and Data Domain Enterprise Applications

…you can back it up with NetWorker, Avamar, and Data Domain Enterprise Applications

…you can back it up with NetWorker, Avamar, and Data Domain Enterprise Applications and report on it using DPA.

DPS licensing for backup is the ultimate flexibility for a data protection environment. If you’re licensed for 200TB and your company acquires another business that’s got a bunch of remote offices, you can just go right ahead and deploy some Avamar Virtual Edition systems to look after those. If you want to have centralised monitoring and reporting for your data protection activities and provide chargeback pricing to business units, you can do it by folding Data Protection Advisor in. If you want to give your DBAs the option of backing up to your Data Domain systems but they want to use their own scheduling system, you can just point them at the Data Domain Boost plugin and let them get to work.

DPS is the Swiss Army Knife of data protection licensing. What’s more, it’s the landing place for new features in the EMC data protection universe as they come out. For instance, EMC CloudBoost and EMC Data Protection Search were both added automatically into the licensing options for existing DPS customers at no extra charge. So suddenly those customers can, if they want, provide a central search system for all their backups, or push long term retention backups out to the Cloud without having to adjust their licensing.

Good (Traditional) – Better (NetWorker Capacity) – Best (DPS Capacity). They’re the three licensing options for NetWorker. And there’s a model that suits everyone.

____

Why the squirrel? I wanted a photo in the article and it’s pretty cute.

↧

The lazy admin

July 10, 2015, 11:47 pm

≫ Next: Recovery survey

≪ Previous: Understanding NetWorker licensing options

Are you an industriously busy backup administrator, or are you lazy?

When I started in IT in 1996, it wasn’t long before I joined a Unix system administration team that had an ethos which has guided me throughout my career:

The best sysadmins are lazy.

Even more so than system administration, this applies to anyone who works in data protection. The best people in data protection are lazy.

Now, there’s two types of lazy:

Slothful lazy – What we normally think of when we think of ‘lazy'; people who just don’t really do much.
Proactively lazy – People who do as much as they can in advance in order to have more time for the unexpected (or longer term projects).

If you’d previously thought I’d gone nuts suggesting I’ve spent my career trying to be lazy (particularly when colleagues read my blog), you’ll hopefully be having that “ah…ha!” moment realising I’m talking about being proactively lazy. This was something I learnt in 1996 – and almost twenty years down the track I’m pleased to see whole slabs of the industry (particularly infrastructure and data protection) are finally following suit and allowing me to openly talk about the virtues of being lazy.

Remember that embarrassingly enthusiastic dance Steve Ballmer was recorded doing years and years ago at a Microsoft conference while he chanted “Developers! Developers! Developers!” A proactively lazy data protection administrator chants “Automate! Automate! Automate!” in his or her head throughout the day.

Automation is the key to being operationally lazy yet proactively efficient. It’s also exactly what we see being the focus of DevOps, of cloud service providers, and massive scale converged infrastructure. So what are the key areas for automation? There’s a few:

Zero error policies – I’ve been banging the drum about zero error policies for over a decade now. If you want the TL;DR summary, a zero error policy is the process of automating the review of backup results such that the only time you get an alert is when a failure happens. (That also means treating any new “unknown” as a failure/review situation until you’ve included it in the review process.)
Service Catalogues and Policies – Service catalogues allow standard offerings that have been well-planned, costed and associated clearly with an architected system. Policies are the functional structures that enact the service catalogue approach and allow you to minimise the effort (and therefore the risk of human error) in configuration.
Visual Dashboards – Reports are OK, notifications are useful, but visual dashboards are absolutely the best at providing an “at a glance” view of a system. I may joke about Infographics from time to time, but there’s no questioning we’re a visual species – a lot of information can be pushed into a few simple glyphs or coloured charts*. There’s something to be said for a big tick to indicate everything’s OK, or an equally big X to indicate you need to dig down a little to see what’s not working.

There’s potentially a lot of work behind achieving that – but there are shortcuts. The fastest way to achieving it is sourcing solutions that have already been built. I still see the not-built-here syndrome plaguing some IT environments, and while sometimes it may have a good rationale, it’s an indication of that perennial problem of companies thinking their use cases are unique. The combination of the business, the specific employees, their specific customers and the market may make each business potentially unique, but the core functional IT requirements (“deploy infrastructure”, “protect data”, “deploy applications”, etc.) are standard challenges. If you can spend 100% of the time building it yourself from the ground up to do exactly what you need, or you can get something that does 80% and all you have to do is extend the last 20%, which is going to be faster? Paraphrasing Isaac Newton:

If I have seen further it is by standing on the shoulders of giants.

As you can see, being lazy properly is hard work – but it’s an inevitable requirement of the pressures businesses now place on IT to be adaptable, flexible and fast. The proactively lazy data protection service provider can step back out of the way of business functions and offer services that are both readily deployable and reliably work, focusing his or her time on automation and real problem solving rather than all that boring repetitive busyness.

Be proudly lazy: it’s the best way to work.

—
* Although I think we have to be careful about building too many simplified reports around colour without considering the usability to the colour-blind.

↧

Recovery survey

July 13, 2015, 3:38 am

≫ Next: Sampling device performance

≪ Previous: The lazy admin

Back in 2012, I ran a survey to gauge some basic details about recovery practices within organisations. (The report from that survey can be downloaded here.)

It’s been a few years and it seems worthwhile coming back to that topic and seeing how things have changed within NetWorker environments. I’ve asked mostly the same questions as before, but this time I’ve expanded the survey to ask a few extra questions about what you’re recovering as well.

I’d really appreciate if you can take a few minutes to complete the survey using your best estimates. I’ll be running this survey until 31 August and will publish the results by mid-September.

↧

Sampling device performance

August 2, 2015, 3:45 pm

≫ Next: Updated checks in nsradmin

≪ Previous: Recovery survey

Data Protection Advisor is an excellent tool for producing information about your backup environment, but not everyone has it in their environment. So if you’re needing to go back to basics to monitor device performance unattended without DPA in your environment, you need to look at nsradmin.

Of course, if you’ve got realtime access to the NetWorker environment you can simply run nsrwatch or NMC. In either of those systems, you’ll see device performance information such as, say:

writing at 154 MB/s, 819 MB

It’s that same information that you can get by running nsradmin. At its most basic, the command will look like the following:

nsradmin> show name:; message:
nsradmin> print type: NSR device

Now, nsradmin itself isn’t intended to be a full scripting language aka bash, Perl, PowerShell or even (heaven forbid) the DOS batch processing system. So if you’re going to gather monitoring details about device performance from your NetWorker server, you’ll need to wrap your own local operating system scripting skills around the process.

You start with your nsradmin script. For easy recognition, I always name them with a .nsri extension. I saved mine at /tmp/monitor.nsri, and it looked like the following:

show name:; message:
print type: NSR device

I then created a basic bash script. Now, the thing to be aware of here is that you shouldn’t run this sort of script too regularly. While NetWorker can sustain a lot of interactions with administrators while it’s running without an issue, why add to it by polling too frequently? My general feeling is that polling every 5 minutes is more than enough to get a view of how devices are performing overnight.

If I wanted to monitor for 12 hours with a five minute pause between checks, that would be 12 checks an hour – 144 checks overall. To accomplish this, I’d use a bash script like the following:

#!/bin/bash
for i in `/usr/bin/seq 1 144`
do
        /bin/date
        /usr/sbin/nsradmin -i /tmp/monitor.nsri
        /bin/echo
        /bin/sleep 300
done >> /tmp/monitor.log

You’ll note from the commands above that I’m writing to a file called /tmp/monitor.log, using >> to append to the file each time.

When executed, this will produce output like the following:

Sun Aug 02 10:40:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 94 MB/s, 812 MB";
 
 
Sun Aug 02 10:45:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 22 MB/s, 411 MB";
 
 
Sun Aug 02 10:50:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 38 MB/s, 81 MB";
 
 
Sun Aug 02 10:55:02 AEST 2015
                        name: Clone;
                     message: "writing at 8396 KB/s, 758 MB";
 
                        name: Backup;
                     message: "reading, data ";

There you have it. In actual fact, this was the easy bit. The next challenge you’ll have will be to extract the data from the log file. That’s scriptable too, but I’ll leave that to you.

↧

Updated checks in nsradmin

August 19, 2015, 12:12 am

≫ Next: Melbourne DPUG and VMware Data Protection

≪ Previous: Sampling device performance

A while ago EMC engineering updated the venerable nsradmin utility to include automated checking options, with an initial focus on checks for NetWorker clients. As a NetWorker administrator I would have crawled over hot coals for this functionality, and as an integrator I found myself writing Perl scripts from company to company to do similar checks.

As of NetWorker 8.2.1.6, the checks have been expanded a little, with a few new enhancements:

Client check now performs Client/Server time synchronisation checking
Client check now does a ping test against configured Data Domains
Storage node check has been added.

I currently don’t have a Data Domain in my lab, but I’ll show you want the time synchronisation check looks like at least. As always, for client checks in nsradmin, the command sequence is:

# nsradmin -C query

Where query is a valid NetWorker query targeting clients. In my case in my lab, I used:

# nsradmin -C "NSR client"

The output from this included:

In the example output, I’ve highlighted the new time synchronisation check. With this included, the nsradmin client check utility expands yet again in usefulness.

Moving on to the Storage Node option, we can now have NetWorker verify connectivity list the devices associated with each storage node. As you might imagine, the command for this is:

# nsradmin -C "NSR storage node"

The output in my lab resembles the following:

As I mentioned at the start – these have been added into NetWorker 8.2.1.6. If you’re running an earlier release, service pack or cumulative release than that exact version, you won’t find the new features in your installation.

↧

Melbourne DPUG and VMware Data Protection

September 19, 2015, 6:03 pm

≫ Next: LTO-7 and the Streaming Conundrum

≪ Previous: Updated checks in nsradmin

Recently a colleague and I initiated the Melbourne Data Protection User Group (DPUG).

If you’re interested in joining and participating and based in Melbourne, you can find details for the user group over at Meetup.

Our first presentation was on Wednesday 9 September, and EMC Melbourne were kind enough to provide the office space for the session. That being said, DPUG is not about EMC products – it’s designed to be a vendor neutral community forum to discuss techniques, strategies and best practices relating to data protection.

Starting DPUG was a healthy reminder that data protection is an overloaded term in the IT industry. To those of us who work within data storage and more broadly, IT infrastructure, data protection covers concepts such as backup and recovery, continuous availability, continuous data protection, replication, snapshots and so on. For people who work at the application layer or communication layer though, data protection is almost invariably interpreted to be something like security, data privacy or intrusion detection/threat mitigation. Data protection is a term we share with other areas of the industry. In the end it’s all data protection, but it has two very different areas of focus.

Our first session was about VMware Data Protection. We’re now seeing a very high percentage of virtualisation within most businesses – it’s not uncommon to see 80% or 90% virtualisation now, and many companies are continuing to pursue a strategy of achieving 100% system and infrastructure virtualisation.

In the VMware Data Protection presentation I walked the audience through a history of how the industry overall has protected virtual machines since their inception in the midrange space. First, we started with treating virtual machines like regular hosts – installing agents on each virtual machine and backing it up as if it were no different from a physical host. That provides a high degree of granularity and flexibility, but as we know, virtualisation is about cooperative resource sharing, whereas traditional backups are about minimising the time it takes to get data from the client into the protection storage. There’s not a lot of compatibility between “cooperative resource sharing” and “minimising the time it takes to get data from the client…”, and a poorly designed backup strategy using in-guest backup agents can bring virtual infrastructure to a screaming halt – even today.

The next attempt to provide a comprehensive solution for backing up virtual machines saw businesses installing backup agent software on the hypervisors, and writing custom scripts to snapshot virtual machines prior to copying them to protection storage. This was usually error prone and when you stop to think about how virtual machines are usually just very big files, it meant that a single change within a virtual machine would trigger a new full backup every time. Once technology such as VMotion became available these techniques became difficult if not impossible to maintain – you could not really predict where a virtual machine would be for backups at any given time. What’s more, hypervisors are a bit like NAS appliances – they’re designed to do one thing really well, and you shouldn’t be trying to install third party software on them.

The solution was an API based approach, of course. While different in practice, you can equate the API approach of VMware backups to the NDMP approach of NAS. The virtualisation system provides an integration point for backup software to use, and leveraging that, backup products are able to streamline the data protection process with image level backups and file level recoveries from those image level backups.

This is something that NetWorker for instance has been doing for some time – most recently with VBA. VBA is something I’ve covered a few times over the last twelve months (Current state of Virtual Machine Backups in NetWorker, NetWorker 8.2 and VBA Instant-Access, and Testing and Debugging an Emergency Restore, for instance).

VMware offers its own version of VBA as well so that businesses (particularly smaller ones) can still protect their environments. It used to be split into VDP and VDP/A, but as of vSphere 6 Essentials, those options have been combined into a single (free) VDP. VDP can’t do everything VBA can do – for example, VDP can’t:

Perform instant-access to a virtual machine (powering on from Data Domain storage)
Perform tape-out
Write to storage other than Data Domain or internal storage

As a means of demonstrating some of the advantages of virtual machine image level backups though, VDP is useful, and that’s what I used in the DPUG session earlier this month. And now, after taking the plunge and investing in some screen recording software, I’ve made three of the demos from the DPUG session available for viewing. If you’re using VBA already you’ll be familiar with all of these. However, if you’ve not yet taken the plunge in utilising VBA for your backup environment, check them out – while the demos show the VMware Data Protection Appliance (VDP) in use, they’re equally applicable and in fact it’s the same process for a VBA install in each situation.

Creating and executing a protection policy:

Executing an image level recovery that makes use of changed block tracking:

Executing a file level recovery from an image level backup:

Don’t forget, if you’re in Melbourne and want to participate in DPUG, you’re more than welcome – regardless of whether you use EMC products or not. We want this to be an open group and look forward to seeing a broad spectrum of regular companies, integrators and vendors participating!

Also, if you’re interested in seeing screencasts for NetWorker related topics on this blog, let me know.

↧

LTO-7 and the Streaming Conundrum

September 22, 2015, 2:11 pm

≫ Next: NetWorker 9: The future of backup

≪ Previous: Melbourne DPUG and VMware Data Protection

The LTO consortium has announced:

That the LTO Ultrium format generation 7 specifications are now available for licensing by storage mechanism and media manufacturers.

LTO-7 will feature tape capacities of up to 15TB (compressed) and streaming speeds of up to 750MB/s (compressed). LTO is now working off a 2.5:1 compression ratio – so those numbers are (uncompressed) 6TB and 300MB/s.

Don’t get me wrong – I’m not going to launch into a tape is dead article here. Clearly it’s not dead.

That rocket car is impressive. It hit 1,033KM/h – Mach ~~9.4~~* – over a 16KM track. There’s no denying it’s fast. There’s also no denying that you couldn’t just grab it and use it to commute to work. And if you could commute to work using it but there happened to be a small pebble on the track, what would happen?

I do look at LTO increasingly and find myself asking … how relevant is it for average businesses? It’s fast and it has high capacity – and this is increasing with the LTO-7 format. Like the rocket car above though, it’s impressive as long as you only want to go in one direction and you don’t hit any bumps.

Back when Tape Was King, a new tape format meant a general rush on storage refresh towards the new tape technology in order to get optimum speed and capacity for a hungry backup environment. And backup environments are still hungry for capacity and speed, but they’re also hungry for flexibility, something that’s not as well provided by tape. Except in very particular conditions, tape is no longer seen as the optimum first landing zone for backup data – and increasingly, it’s not being seen as the ideal secondary landing zone either. More and more businesses are designing backup strategies around minimising the amount of tape they use in their environment. It’s not in any way unusual now to see backup processes designed to keep at least all of the normal daily/weekly cycles on disk (particularly if it’s deduplication storage) and push only the long-term retention backups out to tape. (Tape is even being edged out there for many businesses, but I’ll leave that as a topic for another time.)

Much of the evolution we’ve seen in backup and recovery functionality has come from developing features around high speed random access of backups. Deduplication, highly granular recoveries, mounting from the backup system and even powering on virtual machines from backup storage all require one thing in common: disk. As we’ve come to expect that functionality in data protection products, the utility of tape for most organisations has likewise decreased significantly. Recoverability and even access-without-recovery has become a far more crucial consideration in a data protection environment than the amount of data you can fit onto a cartridge.

I have no doubt LTO-7 will win high-praise from many. But like the high speed rocket car video above, I don’t think it’s your “daily commute” data protection vehicle. It clearly has purpose, it clearly has backing, and it clearly has utility. As long as you need to go in a very straight line, don’t make any changes in direction and don’t attempt to change your speed too much.

As always, plan your data protection environment around the entire end-to-end data protection process, and the utility of that protected data.

—
* Oops, Mach 0.94. Thanks, Tony. (That’ll teach me to blindly copy the details from the original video description.)

↧

NetWorker 9: The future of backup

October 1, 2015, 2:11 pm

≫ Next: Linux Block Based Backups in NetWorker 9

≪ Previous: LTO-7 and the Streaming Conundrum

Introduction

When NetWorker 8 was released I said at the time it represented the biggest consolidated set of changes to NetWorker in the all the years I’d been working with it. It represented a radical overhaul and established the groundwork for further significant changes with NetWorker 8.1 and NetWorker 8.2.

NetWorker 9 – Leaping Into the Future

NetWorker 9 is not a similarly big set of changes: it’s a bigger set of changes.

There’s a good reason why it’s NetWorker 9. This year we celebrated the 25th birthday of NetWorker, and NetWorker has done an excellent job protecting data in those 25 years, but with the changing datacentre and changing IT environment, it was time for NetWorker to change again.

NetWorker 9 NMC Splash Screen

NetWorker 9 NMC Login

The changes are more than cosmetic, of course. (Much, much more.) A while ago I posted of the need for an evolved, modern approach to data protection activities, that being the orientation of said policies and processes around service catalogues. This is something I’ve advocated for years, but it was also something I deliberately hinted at with a view towards what was coming with NetWorker 9.

The way in which we’ve configured backups in NetWorker for the last couple of decades has been much the same. When I started using NetWorker in 1996, it was by configuring groups, retention policies, schedules and clients. That’s changing.

A bright new world – Policies

NetWorker 9 represents a move towards a simpler, more containerised approach to configuration, with an emphasis on the service catalogue approach – and here’s what it looks like:

NetWorker 9 Configuration Engine

The changes in NetWorker 9 are sweeping – classic configuration components such as savegroups, scheduled staging and scheduled cloning are being replaced with a new policy engine that borrows much from the virtual machine protection engine introduced in NetWorker 8.1. This simultaneously makes it easier and faster to maintain data protection configurations, and develop more complex data protection configurations for the modern business. The policy engine is a containerised configuration system that makes it straightforward to identify and modify components of NetWorker configuration, and even have parts of the configuration dynamically adjust as required.

The core configuration process now in NetWorker 9 consists of:

A policy, which is a container for workflows
One or more workflows, which have:
- A set of actions and
- A list of data sources to run those actions against

If you’re upgrading NetWorker from an earlier version, your existing NetWorker configuration will be migrated for you into the new policy engine configuration. I’ll get to that in a little while. Before that though, we need to talk more about the policy engine.

Regardless of whether you’re setting up a brand new NetWorker server or upgrading an existing NetWorker server, you’ll get 5 default policies created for you:

Server Protection
Bronze
Silver
Gold
Platinum

Each of these policies do distinctly different things. (If you’re migrating, you’ll get some additional policies. More of that in a while.)

NW9 Protection - Policy with 2 workflows

NetWorker 9 Protection Window

In this case, the server protection policy consists of two workflows:

NMC server backup – Performs a backup of the NetWorker management console database
Server backup – Performs a bootstrap backup and a media database expiration

You can see straight away that’s two entirely different things being done within the same policy. In the world of NetWorker 8.x and lower, each Group was effectively an atomic component that did only one particular thing. With policies, you’ve got a container that encapsulates multiple logically similar activities. For instance, let’s look at the difference between the default Bronze policy and the default Silver policy:

NetWorker 9 Bronze Policy

The Bronze policy has two workflows – one for Applications, and one for Filesystem backups. Each workflow does a backup to the Default pool (which of course you can can change), and that’s it. By comparison, the Silver policy looks like the following:

NetWorker 9 Silver Policy

You can see the difference immediately – a Silver policy is about backing up, then cloning. The policy engine is geared very much towards a service catalogue design – setup a small number of policies with the required workflows and consolidate your configuration accordingly.

Oh – and here’s a cool thing about the visual policy engine – right clicking within the visualisation of the policy and changing settings, such as:

NetWorker 9 Right Clicking in Visual Policy

The policy engine is not a like-for-like translation from older versions of NetWorker configuration (though your existing configuration is migrated). For instance, here’s an “Emerald” policy I created on my lab server:

Sample policy with advanced cloning

That policy backs up to the Daily pool and then does something new for NetWorker – clones simultaneously to two different pools – “Site-A Clone” and “Site-B Clone”. There’s also something different about the selection process for what gets backed up. The group here is…

…wait, I need to explain Groups in NetWorker 9. Don’t think they’re like the old NetWorker groups. A group in NetWorker 9 is simply a selection of data sources. That could be a collection of clients, a collection of virtual machines, a collection of NAS systems or a collection of savesets (for cloning/staging). That’s it though: groups don’t start backups, control cloning, etc.

…the group here is a dynamic group. This is a new option for traditional clients. Rather than being an explicit list of clients, a dynamic group is assembled at the time the workflow is executed based on a list of tags defined in the group list. Any client with a matching tag is automatically included in the backup process. This allows for hosts to be moved easily between different policies and workflows through just by changing the tags associated with it. (Alternatively, it might be configured as automatically selecting every client.)

NetWorker 9 Dynamic Groups

There’s a lot more to the policy engine than just what I’ve covered above, but there’s also a lot more I need to cover, so I’ll stop for now and come back to the new policy engine in more detail in a future blog post.

Policy Migration

Actually, there’s one other thing I’ll mention about policies before I continue, and that’s the policy migration process. When you upgrade a NetWorker server to NetWorker 9, your existing configuration is migrated (and as you might imagine this migration process is something that’s received a lot of attention and testing). For example, a “classic” NetWorker environment that consists of a raft of groups. On migration, each group is converted into a workflow of the same name and placed under a new policy called Backup. So a basic group list of say, “Daily Dev Servers”, “Daily Filesystem” and “Monthly Filesystem” will get converted accordingly. Here’s what the group list looks like under v8 (with the default Default group):

NetWorker 8 Group List

Under version 9, this becomes the following policy and workflows:

NetWorker 9 Converted Policy

The workflow visualisation for the groups above converted into policy format is:

NetWorker 9 Converted Policy Workflow Visualisation

(By the way, that “Monthly Filesystem” workflow cloning to the “Default Clone” pool was just a lazy error on my part while setting up a test server – not an error.)

I know lots of people tested some fairly hairy configuration migrations. If I recall correctly the biggest configuration I tested had over 1000 clients defined and around 300 groups, schedules, etc., associated with those clients. And I did a whole bunch of shortcuts and tricks in schedules and they converted successfully.

The back-end changes

I’ll undoubtedly do some additional blog articles about the NetWorker 9 policy engine, but it’s time to move on to other topics and other changes within NetWorker. I’ll start with some back-end changes to the environment.

Media database

The “WISS” database format has been around for as long as I can recall with NetWorker. It’s served NetWorker well, but it’s also had some limitations. As of version 9, the NetWorker media database format is now SQLite, which gives NetWorker a big boost for performance and parallelisation of media activities. As per the policy engine, this migration happens automatically for you as part of the upgrade process. (Depending on the size of your media database this may take a little while to complete, but the media database is usually fairly small for most organisations.)

NetWorker Management Console (NMC) Database

Previous versions of NetWorker have used the Sybase embedded SQLAnywhere database for NMC. NetWorker version 9 switches the NMC database to PostgreSQL. If you’re wanting to keep your existing NMC database, you’ll need to take some pre-ugprade steps to export the Sybase embedded database content into a format that can be imported into the PostgreSQL database. Be sure to read the upgrade documentation – but you were going to do that anyway, right?

License Server

Other than the options around traditional vs NetWorker capacity vs DPS capacity, NetWorker licensing has remained mostly the same for the entire 19 years I’ve been dealing with it. There was a Legato License Manager introduced some time ago but it had mainly been pushed as a means of centralising management of traditional licensing across multiple datazones. Since the capacity formats aren’t so bothered on datazone counts, LLM usage has fallen away.

With a lot of customers deploying multiple EMC products and EMC moving towards transformative enterprise licensing models, a move to a new licensing service that can handle licensing for multiple products makes sense. From a day to day basis, the licensing server won’t really change how you interact with NetWorker, but you’ll want to deal with your sales/pre-sales team or your integrator (depending on which way you procure NetWorker licenses) in order to prep for the license changes. It’s not a change to functionality of traditional vs capacity licenses, and it doesn’t signal a move away from traditional licenses either, but it is a much needed change.

Authentication System

NetWorker has by and large used OS provided user-authentication for authorisation. That might be localised on a per-system basis or it might leverage Active Directory/etc. This however left somewhat of a split between authorisation supported by NetWorker Management Console and authorisation supported from the command line. The new authentication system is effectively a single sign-on approach providing integrated authentication between NMC related activities and command line activities.

Restricted Data Zones

Restricted datazones get a few tweaks with NetWorker 9, too. I’ve had very little direct cause to use RDZs myself, so I’ll let the release notes speak for themselves on this front:

You can now associate an RDZ resource to an individual resource (for example, to a client, protection policy, protection group, and so on) from the resource itself. As a result, RDZ resources can no longer effect resource associations directly.
Non-default resources, that are previously associated to the global zone and therefore unusable by an RDZ, are now shared resources that can be used by an RDZ. Although, these resources cannot be modified by restricted administrators.

If you’re using RDZs in your environment, be sure to understand the implications of the above changes as part of the upgrade process.

Scaling

With a raft of under-the-hood changes and enhancements, NetWorker servers – already highly scaleable – become even more scaleable. If your NetWorker environment has been getting large enough that you’ve considered deploying additional datazones, now is the time to talk to your local EMC teams to see whether you still need to go down that path. (Chances are you don’t.)

NetWorker Server Platform

There are actually very few environments left where the NetWorker server itself runs on what I’d refer to as “classic” Unix systems – i.e., Solaris, HPUX or AIX. As of NetWorker 9, the NetWorker server processes (and similarly, NMC processes) will now run only on Windows 64-bit or Linux 64-bit systems. This allows a concentration of development, leveraging the substantially (I’d say massively) reduced use of these platforms for better development efficiencies. However, NetWorker client support is still extremely healthy and those platforms are also still fully supported as storage nodes.

From a migration perspective, this is actually relatively easy to handle. EMC for some time has supported cross platform migration, wherein the NetWorker media database, configuration and index (i.e., the NetWorker server) is moved from say, Linux to Windows, Solaris to Linux, Solaris to Windows, etc. If you are one of those sites still using the NetWorker server services on Solaris, HPUX or AIX, you can engage cross platform migration services and transfer across to Windows or Linux. To keep things simple (I’ve done this dozens of times myself over the years), consider even keeping the old server around, renaming it and turning it into a storage node so you don’t really have to change any device connectivity. Then, elevate the backup server to a “director only” mode where it’s not actually doing any client backup itself. All up, this sort of transition can be seamlessly achieved in a very short period of time. In short: it may be a small interruption and change to your processes, but having executed it many times myself in the past, I can honestly say it’s a very small change in the grand scheme of things, and very manageable.

In summary, the options along this front if you’re using a non-Windows/non-Linux NetWorker server are:

Do a platform migration of your NetWorker server to Windows or Linux using your current NetWorker version, then upgrade to the new version
Stand up a new NetWorker datazone on Windows or Linux and retain the existing one for legacy recoveries, migrating clients across

I’m actually a big fan of the former rather than the latter – I really have done enough platform migrations to know they work well and they allow you to retain everything you’ve been doing. (IMHO the only reason to not do a platform migration is if you have a very short retention period for all of your backups and you want to start with a brand new configuration approach.)

(Cross platform migrations do have to be done by an authorised party – if you’re not sure who near you can do cross platform migrations, reach out to your local EMC team and find out.)

One more thing: with the additional services now running on a NetWorker server, you could need more RAM/CPU in your server. Check out the release notes for some details on this front. Environments that have been sized with room for spare likely won’t need to worry about this at all – but if you’ve got an environment where you’ve got an older piece of hardware running as your NetWorker server, you might need to increase its performance characteristics a little.

[Clarifying point: I’m only talking about the NetWorker server platform. Traditional Unix systems remain fully supported for storage nodes and clients.]

Cloning

NetWorker gets a performance and optimisation boost with cloning. Cloning has previously been a reasonably isolated process compared to regular save or recovery operations. With NetWorker 9, cloning is now a more integrated function, leveraging the in-place recovery technology implemented in NetWorker 8.2 to speed up cloning of synthetic backups.

This has some advantages relating to parallelising clones and limiting the need for additional nsrmmd processes to handle the cloning operation, and introduces scope for exciting changes in future versions of NetWorker, too.

With continuing advances in how you can configure and manage cloning from within NetWorker policies, manual command line driven cloning is becoming less necessary, but if you do still use it you’ll notice some difference in the output. For instance:

[root@sirius ~]# mminfo -q "name=/usr,savetime>=24 hours ago" -r ssid
4278951844
[root@sirius ~]# nsrclone -b "Site-A Clone" -S 4278951844
140988:nsrclone: launching backend job on host sirius.turbamentis.int
140990:nsrclone: Backend started: job Id(160004).
85401:nsrrecopy: Input client or saveset is NULL, information not updated in jobdb
09/30/15 18:48:04.652904 Clone pool size used:4
09/30/15 18:48:04.756405 Init Clone PARAMS: Network constant(73400320) Saveset computation overhead(2000000 microsec) Threshold(600000000 microsec) MIN-Threads(16) MAX-Threads(32)
09/30/15 18:48:04.757495 Adjust Clone param: Total overhead(50541397 microsec) Threshold(12635349 microsec) MIN-threads(1) MAX-Threads(4)
09/30/15 18:48:04.757523 Add New saveset group(0x0x3fe5db0): Group overhead(50541397 microsec) Num ss(1)
129290:nsrrecopy: Successfully established direct file retrieve session for save-set ID '4278951844' with adv_file volume 'Daily.001'.
09/30/15 18:49:30.765647 nsrrecopy exiting
140991:nsrclone: Backend exited: job Id(160004).
 [ORIGINAL REQUESTED SAVESETS]
4278951844;
 [CLONE SUCCESS SAVESETS]
4278951844/1443603606;

Note that while the command line output is a little difference, the command line options remain the same so your scripts can continue to work without change there. However, with enhanced support for concurrent cloning operations you’ll likely be able to speed up those scripts … or replace them entirely with new policies.

Performance tuners win too

The performance tuning and optimisation guide has been getting more detailed information over more recent versions, and the one that accompanies NetWorker 9 is no exception. For example, there’s an entire new section on TCP window size and network latency considerations that a bunch of examples (and graphs) relating to the impact of latency on backup and cloning operations of varying sizes based on filesystem density. If you’re someone who likes to see what tuning and adjustment options there are in NetWorker, you’ll definitely want to peruse the new Performance Tuning/Optimisation guide, available with the rest of the reference documentation.

(On that front, NDMP has now been broken out into its own document: the NDMP User Guide. Keep an eye on it if you’re working with NAS systems.)

Additional Features

Block Based Backup (BBB) for Linux

Several Linux operating systems and filesystems now get the option of performing block based backups. This can significantly speed up the backup of large/dense filesystems – even more so than parallel save streams – by actually bypassing the filesystem entirely. It’s been available in Windows backups for a while now, but it’s hopped over the fence to Linux as well. Like the Windows variant, BBB doesn’t require image level recovery – you can do file level recovery from block based backups. If you’ve got really dense filesystems (I’m looking at large scale IMAP servers as a classic example), BBB could increase your backup performance by up to an order of magnitude.

Parallel Save Streams

Parallel Save Streams certainly aren’t forgotten about in NetWorker 9. There are now options to go beyond 4 parallel save streams per saveset for PSS enabled clients, and we’ve seen the introduction of automatic stream reclaiming, which will dynamically increase the number of active streams for a saveset already running in PSS mode to maximise the utilisation of client parallelism settings. (That’s a mouthful. The short: PSS is more intelligent and more reactive to fluctuations in used parallelism on clients.)

ProtectPoint

ProtectPoint is a pretty exciting new technology being rolled out by EMC across its storage arrays and integrates with Data Domain for the back-end storage. To understand what ProtectPoint does, consider a situation where you’ve got say, a 100TB Oracle database sitting on a VMAX3 system, and you need to back it up as fast as possible with as little an impact to the actual database server itself as possible. In conventional agent-based backups, it doesn’t matter what tricks and techniques you use to mitigate the amount of data flowing from the Oracle server to the backup environment, the Oracle server still has to read the data from the storage system. ProtectPoint is an application aware and application/integrated system that allows you to seamlessly have the storage array and the Data Domain handle pretty much the entire backup, with the data transfer going directly from the storage array to the Data Domain. Suddenly that entire-database server read load associated with a conventional backup disappears.

NetWorker v9 integrates management of ProtectPoint policies in a very similar way to how NetWorker v8.2 introduced highly advanced NAS snapshot service integration into the data protection management. This further grows NetWorker’s capabilities in orchestrating the overall data protection process in your environment.

(There’s a good overview demo of ProtectPoint over at YouTube.)

NVE

Some people want to be able to stand up and completely control a NetWorker environment themselves, and others want to be able to deploy an appliance, answer a couple of questions, and have a fully functioning backup environment ready for use. NetWorker Virtual Edition (NVE) addresses the needs and desires of the latter. For service providers or businesses deploying remote office protection solutions, NVE will be a boon – and it won’t eat into any operating system licensing costs, as the OS (Linux) is bundled with the virtual machine template file.

Base vs Extended Client Installers

For Unix systems, NetWorker now splits out the client package into two separate installers – the base version and the extended version – lgtoclnt and lgtoxtdclnt respectively. You install the base client on clients that need to get fairly standard filesystem backups. It doesn’t include binaries like mminfo, nsrwatch or nsradmin – they’re now in the extended package. This allows you to keep regular client installs streamlined – particularly useful if you’re a service provider or dealing with larger environments.

VBA

There’s been a variety of changes made to the Virtual Backup Appliance (introduced in NetWorker 8.1), but the two I want to particularly single out are the two that users have mentioned most to me over the last 18 months or so:

Flash is no longer required for the File Level Recovery (FLR) web interface
There’s a command line interface for FLR

If you’ve been leery about using VBA for either of the above reasons, it’s time to jump on the bandwagon and see just how useful it is. Note that in order to achieve command line FLR you’ll need to install the basic NetWorker client package on the relevant hosts – but you need to get a binary from somewhere, so that makes sense.

Module Enhancements

Both the NetWorker Module for Microsoft Applications (NMM) and NetWorker Module for Databases and Applications (NMDA) have received a bunch of updates, including (but not limited to):

NMM:
- Simpler use of VSS.
- Block based support for HyperV and Exchange – yes, and Exchange. (This speeds up both types of backups considerably.)
- Federated backups for SharePoint, allowing non-primary databases to be leveraged for the backup process.
- I love the configuration checker – it makes getting NMM up and running with minimum effort so much easier. It’s been further enhanced in NetWorker 9 to grow its usefulness even more.
- HyperV support for Partial VSS writer – previously if you had a single VM fail to backup under HyperV the backup group running the process would register as a failure. Now the backups will continue and only the VM that fails to backup will be be declared a failure. This aligns HyperV backups much more closely to traditional filesystem or VMware style backups.
- Improved support for Federated backups of HyperV SMB 3 clusters.
- File Level Recovery GUI for HyperV virtual machine backups.
- Full integration of policy support for NMM.
NMDA:
- Support for DDBoost over Fibre-Channel for AIX.
- Full integration of policy support for NMDA.
- Support for log-only backups for Lotus Notes systems.
- NetWorker Snapshot Manager support for features like ProtectPoint.
- Various DB2 enhancements/improvements.
- Oracle RAC discovery in the NMC configuration wizards.
- Optional use of a CONFIG_FILE parameter for RMAN scripts so you can put all the NMDA related customisations for RMAN backups into a single file (or small number of files) and keep that file/those files updated rather than having to make changes to individual RMAN scripts.

Policies, Redux

Before I wrap up: just one more thing. With the transition to a policy configuration engine, the nsrpolicy command previously introduced in NetWorker 8.1 to support Virtual Machine Protection Policies has been extensively enhanced to be able to handle all aspects of policy creation, configuration adjustment and policy/workflow execution. This does mean that if you’ve previously used nsradmin or savegrp to handle configuration/group execution processes, you’ll have to adjust some of your scripts accordingly. (It also means I’ll have to work on a new version of the Turbocharged NetWorker Administration Guide.)

Wrapping Up

I wasn’t joking at the start when I said NetWorker 9 represents the biggest set of changes I’ve ever seen in my 19 years of using NetWorker. What I will say is that these are necessary changes to prepare NetWorker for the rapidly changing datacentre. (Or even the rapidly changing datacenter if you’re so minded.)

This upgrade will require very careful review of the release notes and changed functionality, as well as potentially revisiting any automation scripts you’ve done in the past. (But you can do it.) If you’ve got a heavily scripted environment, my advice is to run up a test NetWorker 9 server and review your scripts against the changes, first evaluating whether you actually need to continue using those scripts, and then if you do, adjusting them accordingly. EMC has also prepared some video training for NetWorker 9 which I’d advise looking into (and equally I’d suggest leveraging your local EMC partner or EMC resources for the upgrade process).

It’s also an excellent time to consider revisiting your overall backup configuration and look for optimisations you can achieve based on the new policy engine and the service-catalogue approach. As I’ve been saying to my colleagues, this is the perfect opportunity to introduce policies that align to service catalogues that more precisely define and meet business requirements. If you’re not ready to do it from day zero, that’s OK – NetWorker will migrate your configuration and you’ll be able to continue to offer your existing backup and recovery services. But if you find the time to re-evaluate your configuration and reset it to a service catalogue approach, you can migrate yourself from being the “backup admin” to being the “data protection architect” within your organisation.

This is a big set of changes in NetWorker, but it’s also very much an exciting and energising set of changes, too.

As you might expect, this won’t be my only blog post on NetWorker 9 – it’s equally an energising time for me and I’m looking forward to diving into a variety of topics in more detail and providing some screen casts and videos of changes, upgrades and improvements.

(And don’t forget to wear your sunglasses: the future’s looking bright.)

↧

Linux Block Based Backups in NetWorker 9

October 25, 2015, 1:22 pm

≫ Next: Multihost consistent backups in NetWorker 9

≪ Previous: NetWorker 9: The future of backup

As mentioned in my introductory post about it, NetWorker 9 introduces the option to perform Block Based Backups (BBB) for Linux systems. (This was introduced in NetWorker 8 for Windows, and has actually had its functionality extended for Windows in v9 as well, with the option to now perform BBB for Hyper-V and Exchange systems.)

BBB is a highly efficient mechanism for backing up without worrying about the cost of walking the filesystem. Years ago I showed just how much filesystem density can have a massive detrimental impact on the performance of a backup. While often the backup product is blamed for being “slow”, the fault sits completely with operating system and filesystem vendors for having not produced structures that scale sufficiently.

BBB gets us past that problem by side-stepping the filesystem and reading directly from the underlying disk or LUN. Instead of walking files, we just have to traverse the blocks. In cases where filesystems are really dense, the cost of walking the filesystem can increase the run-time of the backup by an order of magnitude or more. Taking that out of the picture allows businesses to protect these filesystems much faster than via conventional means.

Since BBB needs to integrate at a reasonably low level within a system structure in order to successfully operate, NetWorker currently supports only the following systems:

CentOS 7
RedHat Enterprise Linux v5 and higher
SLES Linux 11 SP1 and higher

In all cases, you need to be running LVM2 or Veritas Volume Manager (VxVM), and be using ext3 or ext4 filesystems.

To demonstrate the benefits of BBB in Linux, I’ve setup a test SLES 11 host and used my genfs2 utility on it to generate a really (nastily) dense filesystem. I actually aborted the utility when I had 1,000,000+ files on the filesystem – consuming just 11GB of space:

genfs2 run

I then configured a client resource and policy/workflow to do a conventional backup of the /testfs filesystem. That’s without any form of performance enhancement. From NetWorker’s perspective, this resulted in about 8.5GB of backup, and with 1,178,358 files (and directories) total took 36 minutes and 37 seconds to backup. (That’s actually not too bad, all things considered – but my lab environment was pretty much quiesced other than the test components.)

Conventional Backup Performance

Next, I switched over to parallel savestreams – which has become more capable in NetWorker 9 given NetWorker will now dynamically rebalance remaining backups all the way through to the end of the backup. (Previously the split was effectively static, meaning you could have just one or two savestreams left running by themselves after others had completed. I’ll cover dynamic parallel savestreams in more detail in a later post.)

With dynamic parallel savestreams in play, the backup time dropped by over ten minutes – a total runtime of 23 minutes and 46 seconds:

Dynamic Parallel Savestream Runtime

The next test, of course, involves enabling BBB for the backup. So long as you’ve met the compatibility requirements, this is just a trivial checkbox selection:

Enabling Block Based Backup

With BBB enabled the workflow executed in just 6 minutes and 48 seconds:

Block Based Backup Performance

That’s a substantially shorter runtime – the backups have dropped from over 36 minutes for a single savestream to under 7 minutes using BBB and bypassing the filesystem. While Dynamic Parallel Savestreams did make a substantial difference (shaving almost a third from the backup time), BBB was the undisputed winner for maximising backup performance.

One final point – if you’re doing BBB to Data Domain, NetWorker now automatically executes a synthetic full (using the Data Domain virtual synthetic full functionality) at the end of every incremental backup BBB you perform:

Automatic virtual synthetic full

The advantage of this is that recovery from BBB is trivial – just point your recovery process (either command line, or via NMC) at the date you want to recover from, and you have visibility of the entire filesystem at that time. If you’re wondering what FLR from BBB looks like on Linux, by the way, it’s pretty straight forward. Once you identify the saveset (based on date – remember, it’ll contain everything), you can just fire up the recovery utility and get:

BBB FLR

Logging in using another terminal session, it’s just a simple case of browsing to the directory indicated above and copying the files/data you want:

BBB FLR directory listing

And there you have it. If you’ve got highly dense Linux filesystems, you might want to give serious thought towards upgrading to NetWorker 9 so you can significantly increase the performance of their backup. NetWorker + Linux + BBB is a winning combination.

↧

Multihost consistent backups in NetWorker 9

October 26, 2015, 10:56 pm

≫ Next: NetWorker 8.2 SP2 is here

≪ Previous: Linux Block Based Backups in NetWorker 9

NetWorker 9 introduces a new action that can be incorporated into workflows, Check Connectivity. You can use this prior to a backup action to confirm that you have connectivity to a host before starting the backup. Now, you may think this is a little odd, since NetWorker effectively checks connectivity as part of the backup process, but that’s if you’re looking at Check Connectivity on a per-host basis. Used optimally, Check Connectivity allows you to easily streamline the process of confirming that all hosts are available before starting the backup.

This option is important when we consider multi-host applications and services within environments where it’s actually deemed critical that the backup either run for everything or nothing. That way you can’t (in theory) capture logically inconsistent backups of the environment – for example, getting a backup of an application server but not the database that runs in conjunction with it.

In the example policy below I’ve created a simple workflow that does the following:

Checks client connectivity
If that’s successful:
- Executes a backup of the hosts in question to the AFTD_Backup pool
- Clones those backups to the AFTD_Clone pool

I’ll step through the check connectivity activity so you can see what it looks like:

Check Connectivity Action Screen 1

Check Connectivity Action Screen 2

This is probably the most important option in the check connectivity action: “Succeed only after all clients succeed” – in other words, the action will fail if any of the clients we want to backup can’t be contacted.

Check Connectivity Action Screen 3

Check Connectivity Action Screen 4

It’s a pretty simple action, as you can see.

Zooming in on a little on the workflow visualisation, you’ll see it in more detail here:

Multihost Workflow Visualisation

By the way, I’m loving the option to edit components of the workflow and actions from the visualisation, i.e.:

Multihost Workflow Visualisation Pool Edit

In order to test and demonstrate the check connectivity action, I configured 6 backup clients:

test01
test02
test03
test04
test05
test06

On the first test, I made sure NetWorker was running on all 6 clients, and the backup/clone actions were permitted to execute after a successful connectivity test:

Multihost Workflow Executing Successfully

Now, after that finished, I shutdown the NetWorker services on one of the clients, test06, to see how this would impact the check connectivity action:

Stopping NetWorker on a Client

With NetWorker stopped, the workflow failed as a result of the connectivity check failing for one of the hosts. The high level failure looked like this:

Multihost Workflow Failure

Double-clicking on the check connectivity action results in the Monitoring view of NMC showed me the following:

Check Connectivity Error Dialog

To see the messages in more detail I just copied and pasted it into Notepad, which revealed the full details of the connectivity testing:

Multihost Workflow Check Connectivity Results

And there you have it. For sure, I’ve done this sort of multi-host connectivity testing in the past using NetWorker 8 and 7 (actually, even using NetWorker 6), but it’s always required nesting savegroups where the parent savegroup executes a pre-command to check via rpcinfo the availability of each host in the child savegroup before using nsradmin to invoke the child savegroup. It’s a somewhat messy approach and requires executing at least some form of backup in the parent savegroup (otherwise NetWorker declares the parent group a failure). The new functionality is simple, straight forward and is easily incorporated into a workflow.

If you have the requirement in your environment to ensure all or no clients in a group backup, this is an excellent reason to upgrade to NetWorker 9. If you’re already on NetWorker 9, keep an eye out for where you can incorporate this into your policies and workflows.

↧

NetWorker 8.2 SP2 is here

November 4, 2015, 11:41 pm

≫ Next: Recovering nsrd.info

≪ Previous: Multihost consistent backups in NetWorker 9

While NetWorker 9 went DA at the end of September and is seeing healthy uptake around the world, NetWorker 8.2 is still getting updates.

Released a couple of weeks ago, NetWorker 8.2 SP2 includes a slew of changes. While undoubtedly NetWorker 9 will be seeing the majority of new-feature development henceforth, that’s not to say that NetWorker 8.2 won’t get refinements as required and planned.

Some of the changes and updates to NetWorker with the 8.2 service pack 2 release include:

Support for JRE 1.8/Java 8 – You can now run the NMC interface using Java 8
Reduced name resolution checking – NetWorker still requires name resolution (of some form), as you’d hope to see in an enterprise product, but there’s been a refinement to the times NetWorker will perform name resolution. Thus, if your DNS service is particularly slow or experiencing faults, NetWorker should not be as impacted as before*.
Automated checks:
- Peer Information
- Storage nodes
- Usergroup hosts
Maximum device limit can be increased to 1024. If you’ve got a big datazone that’s running at or near 750 devices, you can now increase the maximum device count to 1024.
DDBoost Library upgrade – NetWorker 8.2 SP2 uses libDDBoost 3.0.3.0.

It’s the automated checks I want to spend a little time on. As a NetWorker admin or implementation consultant I would have killed for these sorts of tests; and indeed, I even wrote some scripts which did some similar things. If you’ve not used them before, you should have a look at some of the tests previously added (here and here).

The new options though continue to enhance that automated checking and will be a boon for any NetWorker administrator.

Storage Node Checking

The first new option I’d like to show you is the option to automatically check the status of all the storage nodes in your environment. This can be performed by executing the command:

# nsradmin -C "NSR Storage Node"

The output from this will obviously be dependent on your environment. In my lab I get output such as the following:

So for each storage node, you get name resolution checking for the storage node (forward and reverse), as well as a dump of the devices attached to the storage node. If you have an environment that has a mesh of Data Domain device mounts, or dynamic drive sharing/library sharing, this will make getting a very quick overview of devices a piece of cake.

Usergroup Host Checking

As a means of more tightly checking security options, you can now query the hosts referenced in NetWorker Usergroups and determine whether they can be correctly resolved and reached. This command is run as follows:

# nsradmin -C "NSR usergroup"

And the output (for my system) looks a bit like the following:

(Note that it will check the referenced hosts for each user specified for that host.)

NSR Peer Information

NetWorker uses peer information – generated certificates – to validate that a host which connects to the NetWorker server saying it’s client-X really can prove it’s client-X. (It’s like the difference between asking someone their name and asking them for their driver’s license.) This prevents hosts from attaching to your network, impersonating one of your servers, and then recovering data for that server.

Sometimes if things radically change on a client that peer information may get outdated and require refreshing. (Fixing NSR Peer Information Errors remains the most accessed post on my blog.) Now you can use the automated checking routine to have a NetWorker host check all the peer certificates in its local client resource database.

The command in this case has to reference the client daemon, and so becomes:

# nsradmin -p nsrexec -s clientName -C "NSR peer information"

If you run it from the NetWorker server, for the NetWorker server, the clientName in the above becomes the server name, since you’re checking the client resource for the server. Note if you’re feeling particularly old-school (like I do all the time with NetWorker), you can replace nsrexec with 390113 in the above as well. This is actually also a good way of checking client connectivity, since we verify any local client certificates by comparing the locally cached certificate against the certificated stored on each client. (Given I’m running this on a lab server, it’s reasonable to see some timeouts and errors.)

For my lab, the results look like the following:

…

If you intend to wait a while (or until it hits GA) before you upgrade to NetWorker 9, I’d heartily recommend upgrading to NetWorker 8.2 SP2 if for no other reason than the incredibly useful automated checks that have been introduced.

—
* If your DNS/name resolution is improperly configured or faulty, I’d suggest it should be dealt with quickly.

↧

Recovering nsrd.info

November 15, 2015, 8:53 pm

≫ Next: NetWorker Usage Survey for 2015

≪ Previous: NetWorker 8.2 SP2 is here

Regular visitors will have noticed that nsrd.info has been down quite a lot over the last week.

I’m pleased to say it wasn’t a data loss situation, but it was one of those pointed reminders that just because something is in “the cloud” doesn’t mean it’s continuously available.

In the interests of transparency, here’s what happened:

The nsrd.info domain, it turned out, was due for renewal December 2014.
I didn’t get the renewal notification. Ordinarily you’d blame the registrar for that, but I’m inclined to believe the issue sits with Apple Mail. (More of that anon.)
My registrar did a complimentary one year renewal for me even without charging me, so nsrd.info got extended until December 2015.
I did get a renewal notification this year and I’d even scheduled payment, but in the meantime because it was approaching 12 months out of renewal, whois queries started showing it as having a pendingDelete status.
My hosting service monitors whois and once the pendingDelete status was flagged stopped hosting the site. Nothing was deleted, just nothing was served.
I went through the process of redeeming the domain on 10 November, but it’s taken this long to get processing done and everything back online.

So here’s what this reinforced for me:

It’s a valuable reminder of uptime vs availability, something I’ve always preached: It’s easy in IT to get obsessed about uptime, but the real challenge is achieving availability. The website being hosted was still up the entire time if I went to the private URL for it, but that didn’t mean anything when it came to availability.
You might be able to put your services in public-cloud like scenarios, but if you can’t point your consumers to our service, you don’t have a service.
In an age where we all demand cloud-like agility, if it’s something out of the ordinary domain registrars seemingly move like they’re wading through treacle and communicating via morse code. (It took almost 4 business days, three phone calls and numerous emails to effectively process one domain redemption.)
Don’t rely on Apple’s iCloud/MobileMe/.Mac mail for anything that you need to receive.

I want to dwell on the final point for a bit longer: I use Apple products quite a bit because they suit my work-flows. I’m not into (to use the Australian vernacular), pissing competitions about Apple vs Microsoft or Apple vs Android, or anything vs Apple. I use the products and the tools that work best for my work-flow, and that usually ends up to be Apple products. I have an iPad (Pro, now), an Apple Watch, an iMac, a MacBook Pro and even my work laptop is (for the moment) a MacBook Air.

But I’m done – I’m really done with Apple Mail. I’ve used it for years and I’ve noticed odd scenarios over the years where email I’ve been waiting for hasn’t arrived. You see, Apple do public spam filtering (that’s where you see email hitting your Junk folder), and they do silent spam filtering. That’s where (for whatever reason), some Apple filter will decide that the email you’ve been sent is very likely to be spam and it gets deleted. It doesn’t get thrown into your Junk folder for you to notice later, it gets erased. Based on the fact I keep all of my auto-filed email for a decade and the fact I can’t find my renewal notification last year, that leaves me pointing the finger for the start of this mess at Apple. Especially when, while trying to sort it out, I had half a dozen emails sent from my registrar’s console to my @me.com account only to have them never arrive. It appears Apple thinks my registrar is (mostly) spam.

My registrar may be slow to process domain redemptions, but they’re not (mostly) spam.

A year or so ago I started the process of migrating my email to my own controlled domain. I didn’t want to rely on Google because their notion of privacy and my notion of privacy are radically different, and I was trying to reduce my reliance on Apple because of their silent erasure habit, but the events of the last week have certainly guaranteed I’ll be completing that process.

And, since ultimately it’s still my fault for having not noticed the issue in the first place (regardless of what notifications I got), I’ve got a dozen or more calendar reminders in place before the next time nsrd.info needs to be renewed.

↧

NetWorker Usage Survey for 2015

November 30, 2015, 11:57 am

≫ Next: NetWorker Scales El Capitan

≪ Previous: Recovering nsrd.info

↧

NetWorker Scales El Capitan

December 19, 2015, 8:15 pm

≫ Next: 2015 – that’s a wrap!

≪ Previous: NetWorker Usage Survey for 2015

When Mac OS X 10.11 (El Capitan) came out, I upgraded my personal desktop and laptop to El Capitan. The upgrade went pretty smoothly on both machines, but then I noticed overnight my home lab server reported backup errors for both systems.

When I checked the next day I noticed that NetWorker actually wasn’t installed any more on either system. It seemed odd for NetWorker to be removed as part of the install, but hardly an issue. I found an installer, fired it up and to my surprise found the operating system warning me the installer might cause system problems if I continued.

Doing what I should have done from the start, I set up an OS X virtual machine to test the installation on, and it seemed to go through smoothly until the very end when it reported a failure and backed out of the process. That was when I started digging into some of the changes in El Capitan. Apple, it turns out, is increasing system security by locking down third party access to /bin, /sbin, /usr/bin and /usr/sbin. As NetWorker’s binaries on Unix systems install into /usr/bin and /usr/sbin, this meant NetWorker was no longer allowed to be installed on the system.

Fast forward a bit, and as of NetWorker 8.2 SP2 Cumulative Release 2 (aka 8.2.2.2), was released a week or so ago including a relocated NetWorker installer for Mac OS X – now the binaries are located in /usr/local/bin and /usr/local/sbin instead. (Same goes for NetWorker 9.) Having run 8.2.2.2 on my home Macs for a couple of weeks, with backup and recovery testing, the new location works.

If you’ve got Mac OS X systems being upgraded to El Capitan, be sure to download NetWorker 8.2.2.2.

Oh, and don’t forget to fill in the 2015 NetWorker Usage Survey!

↧

2015 – that’s a wrap!

December 21, 2015, 10:38 pm

≫ Next: Data isn’t data isn’t data

≪ Previous: NetWorker Scales El Capitan

As we approach the end of 2015 I wanted to spend a bit of time reflecting on some of the data protection enhancements we’ve seen over the year. There’s certainly been a lot!

NetWorker 9

NetWorker 9 of course was a big part to the changes in the data protection landscape in 2015, but that’s not by any means the only advancement we saw. I covered some of the advances in NetWorker 9 in my initial post about it (NetWorker 9: The Future of Backup), but to summarise just a few of the key new features, we saw:

A policy based engine that unites backup, cloning, snapshot management and protection of virtualisation into a single, easy to understand configuration. Data protection activities in NetWorker can be fully aligned to service catalogue requirements, and the easier configuration engine actually extends the power of NetWorker by offering more complex configuration options.
Block based backups for Linux filesystems – speeding up backups for highly dense filesystems considerably.
Block based backups for Exchange, SQL Server, Hyper-V, and so on – NMM for NetWorker 9 is a block based backup engine. There’s a whole swathe of enhancements in NMM version 9, but the 3-4x backup performance improvement has to be a big win for organisations struggling against existing backup windows.
Enhanced snapshot management – I was speaking to a customer only a few days ago about NSM (NetWorker Snapshot Management), and his reaction to NSM was palpable. Wrapping NAS snapshots into an effective and coordinated data protection policy with the backup software orchestrating the whole process from snapshot creation, rollover to backup media and expiration just makes sense as the conventional data storage protection and backup/recovery activities continue to converge.
ProtectPoint Integration – I’ll get to ProtectPoint a little further below, but being able to manage ProtectPoint processes in the same way NSM manages file-based snapshots will be a big win as well for those customers who need ProtectPoint.
And more! – VBA enhancements (notably the native HTML5 interface and a CLI for Linux), NetWorker Virtual Edition (NVE), dynamic parallel savestreams, NMDA enhancements, restricted datazones and scaleability all got a boost in NetWorker 9.

It’s difficult to summarise everything that came in NetWorker 9 in so few words, so if you’ve not read it yet, be sure to check out my essay-length ‘summary’ of it referenced above.

ProtectPoint

In the world of mission critical databases where impact minimisation on the application host is a must yet backup performance is equally a must, ProtectPoint is an absolute game changer. To quote Alyanna Ilyadis, when it comes to those really important databases within a business,

“Ideally, you’d want the performance of a snapshot, with the functionality of a backup.”

Think about the real bottleneck in a mission critical database backup: the data gets transferred (even best case) via fibre-channel from the storage layer to the application/database layer before being passed across to the data protection storage. Even if you direct-attach data protection storage to the application server, or even if you mount a snapshot of the database at another location, you still have the fundamental requirement to:

Read from production storage into a server
Write from that server out to protection storage

ProtectPoint cuts the middle-man out of the equation. By integrating storage level snapshots with application layer control, the process effectively becomes:

Place database into hot backup mode
Trigger snapshot
Pull database out of hot backup mode
Storage system sends backup data directly to Data Domain – no server involved

That in itself is a good starting point for performance improvement – your database is only in hot backup mode for a few seconds at most. But then the real power of ProtectPoint kicks in. You see, when you first configure ProtectPoint, a block based copy from primary storage to Data Domain storage starts in the background straight away. With Change Block Tracking incorporated into ProtectPoint, the data transfer from primary to protection storage kicks into high gear – only the changes between the last copy and the current state at the time of the snapshot need to be transferred. And the Data Domain handles creation of a virtual synthetic full from each backup – full backups daily at the cost of an incremental. We’re literally seeing backup performance improvements in the order of 20x or more with ProtectPoint.

There’s some great videos explaining what ProtectPoint does and the sorts of problems it solves, and even it integrating into NetWorker 9.

Database and Application Agents

I’ve been in the data protection business for nigh on 20 years, and if there’s one thing that’s remained remarkably consistent throughout that time it’s that many DBAs are unwilling to give up control over the data protection configuration and scheduling for their babies.

It’s actually understandable for many organisations. In some places its entrenched habit, and in those situations you can integrate data protection for databases directly into the backup and recovery software. For other organisations though there’s complex scheduling requirements based on batch jobs, data warehousing activities and so on which can’t possibly be controlled by a regular backup scheduler. Those organisations need to initiate the backup job for a database not at a particular time, but when it’s the right time, and based on the amount of data or the amount of processing, that could be a highly variable time.

The traditional problem with backups for databases and applications being handled outside of the backup product is the chances of the backup data being written to primary storage, which is expensive. It’s normally more than one copy, too. I’d hazard a guess that 3-5 copies is the norm for most database backups when they’re being written to primary storage.

The Database and Application agents for Data Domain allow a business to sidestep all these problems by centralising the backups for mission critical systems onto highly protected, cost effective, deduplicated storage. The plugins work directly with each supported application (Oracle, DB2, Microsoft SQL Server, etc.) and give the DBA full control over managing the scheduling of the backups while ensuring those backups are stored under management of the data protection team. What’s more, primary storage is freed up.

Formerly known as “Data Domain Boost for Enterprise Applications” and “Data Domain Boost for Microsoft Applications”, the Database and Application Agents respectively reached version 2 this year, enabling new options and flexibility for businesses. Don’t just take my word for it though: check out some of the videos about it here and here.

CloudBoost 2.0

CloudBoost version 1 was released last year and I’ve had many conversations with customers interested in leveraging it over time to reduce their reliance on tape for long term retention. You can read my initial overview of CloudBoost here.

2015 saw the release of CloudBoost 2.0. This significantly extends the storage capabilities for CloudBoost, introduces the option for a local cache, and adds the option for a physical appliance for businesses that would prefer to keep their data protection infrastructure physical. (You can see the tech specs for CloudBoost appliances here.)

With version 2, CloudBoost can now scale to 6PB of cloud managed long term retention, and every bit of that data pushed out to a cloud is deduplicated, compressed and encrypted for maximum protection.

Spanning

Cloud is a big topic, and a big topic within that big topic is SaaS – Software as a Service. Businesses of all types are placing core services in the Cloud to be managed by providers such as Microsoft, Google and Salesforce. Office 365 Mail is proving very popular for businesses who need enterprise class email but don’t want to run the services themselves, and Salesforce is probably the most likely mission critical SaaS application you’ll find in use in a business.

So it’s absolutely terrifying to think that SaaS providers don’t really backup your data. They protect their infrastructure from physical faults, and their faults, but their SLAs around data deletion are pretty straight forward: if you deleted it, they can’t tell whether it was intentional or an accident. (And if it was an intentional delete they certainly can’t tell if it was authorised or not.)

Data corruption and data deletion in SaaS applications is far too common an occurrence, and for many businesses sadly it’s only after that happens for the first time that people become aware of what those SLAs do and don’t cover them for.

Enter Spanning. Spanning integrates with the native hooks provided in Salesforce, Google Apps and Office 365 Mail/Calendar to protect the data your business relies on so heavily for day to day operations. The interface is dead simple, the pricing is straight forward, but the peace of mind is priceless. 2015 saw the introduction of Spanning for Office 365, which has already proven hugely popular, and you can see a demo of just how simple it is to use Spanning here.

Avamar 7.2

Avamar got an upgrade this year, too, jumping to version 7.2. Virtualisation got a big boost in Avamar 7.2, with new features including:

Support for vSphere 6
Scaleable up to 5,000 virtual machines and 100 vCenters
Dynamic policies for automatic discovery and protection of virtual machines within subfolders
Automatic proxy deployment: This sees Avamar analyse the vCenter environment and recommend where to place virtual machine backup proxies for optimum efficiency. Particularly given the updated scaleability in Avamar for VMware environments taking the hassle out of proxy placement is going to save administrators a lot of time and guess-work. You can see a demo of it here.
Orphan snapshot discovery and remediation
HTML5 FLR interface

That wasn’t all though – Avamar 7.2 also introduced:

Enhancements to the REST API to cover tenant level reporting
Scheduler enhancements – you can now define the start dates for your annual, monthly and weekly backups
You can browse replicated data from the source Avamar server in the replica pair
Support for DDOS 5.6 and higher
Updated platform support including SLES 12, Mac OS X 10.10, Ubuntu 12.04 and 14.04, CentOS 6.5 and 7, Windows 10, VNX2e, Isilon OneFS 7.2, plus a 10Gbe NDMP accelerator

Data Domain 9500

Already the market leader in data protection storage, EMC continued to stride forward with the Data Domain 9500, a veritable beast. Some of the quick specs of the Data Domain 9500 include:

Up to 58.7 TB per hour (when backing up using Boost)
864TB usable capacity for active tier, up to 1.7PB usable when an extended retention tier is added. That’s the actual amount of storage; so when deduplication is added that can yield actual protection data storage well into the multiple-PB range. The spec sheet gives some details based on a mixed environment where the data storage might be anywhere from 8.6PB to 86.4PB
Support for traditional ES30 shelves and the new DS60 shelves.

Actually it wasn’t just the Data Domain 9500 that was released this year from a DD perspective. We also saw the release of the Data Domain 2200 – the replacement for the SMB/ROBO DD160 appliance. The DD2200 supports more streams and more capacity than the previous entry-level DD160, being able to scale from a 4TB entry point to 24TB raw when expanded to 12 x 2TB drives. In short: it doesn’t matter whether you’re a small business or a huge enterprise: there’s a Data Domain model to suit your requirements.

Data Domain Dense Shelves

The traditional ES30 Data Domain shelves have 15 drives. 2015 also saw the introduction of the DS60 – dense shelves capable of holding sixty disks. With support for 4 TB drives, that means a single 5RU data Domain DS60 shelf can hold as much as 240TB in drives.

The benefits of high density shelves include:

Better utilisation of rack space (60 drives in one 5RU shelf vs 60 drives in 4 x 3RU shelves – 12 RU total)
More efficient for cooling and power
Scale as required – each DS60 takes 4 x 15 drive packs, allowing you to start with just one or two packs and build your way up as your storage requirements expand

DDOS 5.7

Data Domain OS 5.7 was also released this year, and includes features such as:

Support for DS60 shelves
Support for 4TB drives
Support for ES30 shelves with 4TB drives (DD4500+)
Storage migration support – migrate those older ES20 style shelves to newer storage while the Data Domain stays online and in use
DDBoost over fibre-channel for Solaris
NPIV for FC, allowing up to 8 virtual FC ports per physical FC port
Active/Active or Active/Passive port failover modes for fibre-channel
Dynamic interface groups are now supported for managed file replication and NAT
More Secure Multi-Tenancy (SMT) support, including:
- Tenant-units can be grouped together for a tenant
- Replication integration:
  - Strict enforcing of replication to ensure source and destination tenant are the same
  - Capacity quota options for destination tenant in a replica context
  - Stream usage controls for replication on a per-tenant basis
- Configuration wizards support SMT for
- Hard limits for stream counts per Mtree
- Physical Capacity Measurement (PCM) providing space utilisation reports for:
  - Files
  - Directories
  - Mtrees
  - Tenants
  - Tenant-units
Increased concurrent Mtree counts:
- 256 Mtrees for Data Domain 9500
- 128 Mtrees for each of the DD990, DD4200, DD4500 and DD7200
Stream count increases – DD9500 can now scale to 1,885 simultaneous incoming streams
Enhanced CIFS support
Open file replication – great for backups of large databases, etc. This allows the backup to start replicating before it’s even finished.
ProtectPoint for XtremIO

Data Protection Suite (DPS) for VMware

DPS for VMware is a new socket-based licensing model for mid-market businesses that are highly virtualized and want an effective enterprise-grade data protection solution. Providing Avamar, Data Protection Advisor and RecoverPoint for Virtual Machines, DPS for VMware is priced based on the number of CPU sockets (not cores) in the environment.

DPS for VMware is ideally suited for organisations that are either 100% virtualised or just have a few remaining machines that are physical. You get the full range of Avamar backup and recovery options, Data Protection Advisor to monitor and report on data protection status, capacity and trends within the environment, and RecoverPoint for a highly efficient journaled replication of critical virtual machines.

…And one minor thing

There was at least one other bit of data protection news this year, and that was me finally joining EMC. I know in the grand scheme of things it’s a pretty minor point, but after years of wanting to work for EMC it felt like I was coming home. I had worked in the system integrator space for almost 15 years and have a great appreciation for the contribution integrators bring to the market. That being said, getting to work from within a company that is so focused on bringing excellent data protection products to the market is an amazing feeling. It’s easy from the outside to think everything is done for profit or shareholder value, but EMC and its employees have a real passion for their products and the change they bring to IT, business and the community as a whole. So you might say that personally, me joining EMC was the biggest data protection news for the year.

In Summary

I’m willing to bet I forgot something in the list above. It’s been a big year for Data Protection at EMC. Every time I’ve turned around there’s been new releases or updates, new features or functions, and new options to ensure that no matter where the data is or how critical the data is to the organisation, EMC has an effective data protection strategy for it. I’m almost feeling a little bit exhausted having come up with the list above!

So I’ll end on a slightly different note (literally). If after a long year working with or thinking about Data Protection you want to chill for five minutes, listen to Kate Miller-Heidke’s cover of “Love is a Stranger”. She’s one of the best artists to emerge from Australia in the last decade. It’s hard to believe she did this cover over two years ago now, but it’s still great listening.

I’ll see you all in 2016! Oh, and don’t forget the survey.

↧