NetWorker 8.2 and VBA Instant-Access

January 9, 2015, 2:22 pm

One of the great new features in NetWorker is the integration of Instant Access, whereby virtual machines backed up with the VBA appliance to Data Domain systems may be instantly accessed from the Data Domain without needing to actually recover them. This allows you to quickly startup a failed service even as you’re migrating the virtual machine to a production datastore, or pull one or two essential files out of the virtual machine without needing to resort to a file level recovery.

To see this in action, I configured a lab virtual machine for backups then did an Instant Access operation on it.

In the above screen shot, I picked a VM that hadn’t been used for VBA backups previously, test03, and added it to the Data Domain backup policy, DDBackup. I was then able to run the policy to get a brand new backup of the VM:

Of course, because the virtual machine was a plain CentOS Linux install, much like other Linux VMs that had been backed up, the first full backup was still remarkably quick. Once that was completed, the bulk of the work shifted across to the vSphere Web Client:

You’ll need to follow your standard enterprise operational practices for logon, obviously. In this case being a lab server I’m using the virtual vCenter Appliance, and I logged on as the root user. Next stop, the EBR plugin:

Once logged in, go to Restore and drill down to the virtual machine backup instance you want to recover:

With the virtual machine backup instance selected, if the backup target was a Data Domain running the right DDOS (5.4 or higher), you’ll be able to initiate the Instant Access option:

The Instant Access wizard is pretty straight forward and doesn’t really require much thought, other than what the ‘restored’ virtual machine will be named and where in the cluster it’ll be made available:

Having nominated the name and location you can continue onto final confirmation of the operation:

When ready, you can complete Finish and before you know it, you’ll see this:

Now, here’s the kicker. By the time you’ve clicked OK and switched back to say, the vSphere Windows client, your VM will likely be waiting for you:

There it is in the ‘Test Clients’ pool. It really takes almost no time at all: Instant access is not a lie. You can see the temporary datastore that the VBA appliance has provided for the recovery if you go up to your storage resources, too:

In this case, because the virtual machine I ‘restored’ wasn’t running any services that publish their presence, it was safe to run both virtual machines at the same time, since the ‘restored’ virtual machine gets reconfigured to use DHCP, thus getting a different IP address to the original:

In the above, the top console is for the original virtual machine, and the bottom console is for the one made available via Instant Access.

At this point, you’ve got a couple of options – you can either pull out the files you want from the virtual machine using normal operating system access techniques, or you can keep the virtual machine running and migrate it to a production datastore. The migration works in the same way as any normal VMware migration runs, so for this case I just powered down the virtual machine and removed it from Inventory:

Once you’ve done that, your only other task is to drop the temporary datastore so that VBA cleans up after itself. I’ve found the simplest way to do this is to switch back to the Web GUI and go to do another instant restore of the same virtual machine. This will trigger the following prompt:

At that point, you can just hit Unmount, then subsequently cancel the operation.

And there you have it – Instant Access. It really is that quick and simple.

Hey, now you’ve finished this article, would you mind quickly filling in the NetWorker Usage Survey if you haven’t already done so? It’ll only take 5 minutes of your time. You can get to the survey here.

↧

Client checking with nsradmin

January 21, 2015, 10:29 pm

≫ Next: Happy birthday NetWorker 8.2 SP1

≪ Previous: NetWorker 8.2 and VBA Instant-Access

I’ve probably looked at the man page for nsradmin a half dozen times since NetWorker 8.2 came out, and I’d not noticed this, but someone in NetWorker product management mentioned it to me and I’m well and truly kicking myself I hadn’t noticed it.

You see, nsradmin with 8.2 introduced a configuration checker. It’s not fully functional yet, but the area where it’s functional is probably the most important – at the client level.

I’ve longed for an option like this – I even wrote a basic tool to do various connectivity checking against clients a long time ago, but it was never as optimal as I’d have liked. This option on the other hand is impressive.

You invoke it by pulling up nsradmin and running:

# nsradmin -C "query"

For instance:

If you’re a long-term NetWorker administrator, you can’t look at that and not have a “whoa!” moment.

If you’re used to nsradmin, you can see the queries are literally just nsradmin style queries. (If you’re wanting to know more about nsradmin, check out Turbocharged EMC NetWorker, my free eBook.)

As a NetWorker geek, I can’t say how cool this extension to nsradmin is, and just how regularly I’ll be incorporating it into my diagnostics processes.

↧

Happy birthday NetWorker 8.2 SP1

January 29, 2015, 12:00 am

≫ Next: Shared Data Domain Access

≪ Previous: Client checking with nsradmin

The Features

NetWorker 8.2 SP2 became available today, and as is becoming traditional for an SP1 release of NetWorker, the main focuses are on rolling up fixes from the cumulative releases – but that doesn’t mean it’s the only focus of the release. Included in the 8.2 SP1 release you’ll find several enhancements, and I’ve outlined them below.

New VBA Options

VBA was such a huge feature introduction that we’ll be seeing important additions slotted into it for the next several releases. For this service pack, a virtual machine that’s been cloned to a secondary site with its own vCenter and VBA appliance can be recovered on the secondary site to a different vCenter even if the primary site becomes unavailable, so long as it’s all within the same NetWorker datazone.

Additionally, VBA now supports dual-homing so that the backup appliance and proxies are provisioned onto a separate network space to the primary interface of the NetWorker server, neatly allowing you to split off traditional vs virtual machine backups.

If you’ve been using VBA with NetWorker, make sure when you upgrade to download the handy upgrade ISOs provided by EMC. These can be used to automatically boot your VBA appliance from and perform an in-place upgrade from whatever version you have been running.

NetApp Improvements

I admit I don’t use NetApp much, but support has been added for SnapVault and SnapMirror replication to remote NetApp devices. Check the release notes for more details on that one, since I’d likely do an awful job if I tried to summarise it.

Security Enhancements

NetWorker now supports https when communicating with a cloud server, which is definitely good for security. Further on the security front, you can now configure ssh port forwarding on your NMC client in order to run your NMC session over an ssh-encrypted link. In case you’re wondering, that’ll look like the following:

# ssh -L9000:localhost:9000 -L9001:localhost:9001 -L2638:localhost:2638 nmcServer -N

# javaws http://localhost:9000/gconsole.jnlp

That’ll be good not only for sites that are particularly paranoid about security (the NMC client/server communications process is already encrypted), but it’ll also be great for people working over VPNs. I tried this out today and it worked a treat. (At first I thought it was a little slow, then I realised my partner’s laptop was churning away at home with a CrashPlan backup flooding our link. A later test when the link was clear gave a much more favourable result.)

Logging

I’m a big fan of logging. I love logging, and I think you should keep your logs for as long as you keep your backups – so that if you have problems with any recovery you can go to the logs from the time and see whether there was an undetected issue with the backup. Logging gets a bit of a boost with NetWorker 8.2 SP1 via an option to redirect logs from cluster backups to more favourable locations than the default.

IPv6

Like it or loathe it (the formatting, not the idea, of course), IPv6 is continuing to gain traction. While many organisations will undoubtedly end up connecting to an IPv6 internet while maintaining an IPv4 intranet, added IPv6 support in NetWorker will always be beneficial: first to those organisations that go all-IPv6, and secondly to those that end up working with backups over an IPv6 internet.

In 8.2 SP1, NetWorker adds support for IPv6 NDMP backup and recovery operations, as well as Boost communication with Data Domain systems over IPv6.

Other Data Domain Enhancements

Support for DDBoost over Fibre Channel has been added to HPUX on IA-64, both for client direct and via storage nodes.

Renaming the NetWorker Server

The nsrclientfix utility can now be used to rename a NetWorker server. If you’ve not heard of nsrclientfix, don’t worry – it’s relatively new. I’m aiming to publish an article about it in the near future.

Should I upgrade?

As always, I’m not in a position to answer the question: should I upgrade? It’s going to entirely depend on what your organisation needs to do. That being said, I strongly encourage you to have a test/lab NetWorker environment where you can kick the tyres on the 8.2 SP1 release and make an informed decision as to when you jump up to this release.

For more information on NetWorker 8.2 SP1, including the binaries and all the documentation, make your way across to support.emc.com.

↧

Shared Data Domain Access

February 2, 2015, 12:33 am

≫ Next: Housing your software

≪ Previous: Happy birthday NetWorker 8.2 SP1

If you’ve come from a tape library environment, you’re well aware the only way to share a tape library between multiple NetWorker datazones – i.e., multiple NetWorker servers – is to use library partitioning. Thus, a 1000 slot library might be presented as 1 x 800 slot library and 1 x 200 slot library to the independent servers.

But what happens when you’re using Data Domain? That’s considerably more flexible.

If we look at basic Virtual Tape Libraries (VTLs), the answer is obvious: create as many VTLs as you need and present each one to a different NetWorker server. VTLs though, like their physical counterparts, suffer limitations that can make them impractical to use. You remain hampered by the number of drives provisioned in the library, and (far more importantly with VTL), you still can’t simultaneously write to a virtual tape and read from it. Looking down to specific technologies, VTLs are also severely limited in a NetWorker environment now when you compare them against Boost functionality. You can’t for instance use a Data Domain VTL for instant access or for granular Exchange recoveries. (Check out this post for more details of VTL vs Boost.)

You don’t have to use VTL to share a Data Domain between two or more NetWorker servers. Once you’ve configured Boost on a Data Domain, you can simply add the Data Domain to each NetWorker server you want to use it with, and a new Mtree will be automatically created for each NetWorker server as you create the first device on each.

For instance, this Data Domain system has 3 NetWorker servers communicating with and using it:

sysadmin@squeezebox# ddboost storage-unit show
Name      Pre-Comp (GiB)   Status
-------   --------------   ------
centaur             74.5   RW    
cyclops            181.8   RW    
medusa               6.6   RW    
-------   --------------   ------
 D    : Deleted 
 Q    : Quota Defined
 RO   : Read Only
 RW   : Read Write

One obvious advantage of using the same Data Domain system for multiple NetWorker servers is the deduplication still happens globally across all Mtrees. Thus, if you’ve got a production NetWorker environment and a test NetWorker environment, you can share a Data Domain between them and achieve a high deduplication rate, assuming your test environment is populated with copies of systems from your production environment.

(With the new multi-tenancy support in DDOS 5.5 and higher, you’ve got the additional advantage of using potentially multiple Boost accounts, thereby segregating access between the NetWorker servers. While a NetWorker server won’t have access to an Mtree for another NetWorker server, a shared Boost user will. Independent Boost users won’t.)

There are of course a couple of caveats to this approach. Including the need to either use multi-tenancy on the Data Domain or make sure you trust everyone who is simultaneously using it, you’ll need to consider the following:

Performance:
- You’ll want to make sure the systems are mutually compatible for performance requirements. Two large-scale production NetWorker environments might for instance have a greater risk of causing performance challenges than say, one large scale production environment and small to medium size lab environment.
Stream count:
- Almost a subset of performance, but it warrants calling out independently – regardless of whether you’ve got one NetWorker server or fifty NetWorker servers using a Data Domain system, you still have to face the recommended hard high limits on stream counts from the Data Domain administration guides for any unit you’re using. With that in mind, you’ll likely want to use NetWorker’s pool based parallelism settings to ensure there are hard limits established to the number of simultaneous savesets NetWorker can write.

Sharing a Data Domain between multiple NetWorker servers won’t be a solution for every environment, but if you’re limited for budget it’s something to keep in mind.

↧

Housing your software

February 10, 2015, 1:34 am

≫ Next: 2014 NetWorker Usage Report

≪ Previous: Shared Data Domain Access

I’ve admitted in the past I’m a bit of a data hoarder. That’s partly come from the reasonably abysmal download speeds you get in Australia unless you’re lucky enough to be on the relatively small fibre-connected portion of the National Broadband Network. (Even now, my average “best” download speed peaks at about 1.8MB/s at home.) Sure, businesses get better download speeds, but overall we’re definitely an Internet-hungry country.

So that tendency to “download once, keep forever” has stuck with me since I first got connected to the ‘net. And that definitely holds true for NetWorker software downloads.

Which begs the question – do you keep all your NetWorker downloads?

I generally advocate allocating a centralised share somewhere within the organisation to hold locally downloaded copies of the software, and to keep it well organised. This serves a few key purposes:

For Windows in particular, switching cumulative release versions (e.g., 8.2.0.3 to 8.2.0.4) requires you to uninstall your current version and install the new version. For optimum results, you should always run the uninstall from the original version.
Consistency – if you’re rolling out NetWorker to a bunch of new hosts over the course of a couple of months, to keep trouble-shooting consistent you’ll likely want to keep the versions consistent as well.
Roll-back – sometimes it becomes necessary to roll back a cumulative release version, minor version or maybe even a service pack. Having that version on hand makes it much easier.
Patches – if you happen to have a situation where support issue you a patch binary, you’ll want to make sure you keep those patches somewhere safe rather than relying on remembering they’re installed.
Space savings – you may take up a little more disk space by retaining older versions of the software, but usually that’s an order of magnitude or two less than the cumulative effect of a variety of teams and projects retaining their own local copies (…because there’s no central share…)

As a matter of example, I have 40+ iterations of NetWorker 8.x on my home NAS at the moment consuming around 100GB of space. Adding a large number of 7.x releases, that comes to around 180GB. If 180GB is a serious imposition to your storage requirements (particularly if you’ve got tiered NAS storage!), then you’ve likely got other problems within your organisation you need to more urgently address.

Keep (and organise) those NetWorker downloads. If you have been keeping them but they’ve been haphazardly organised until now, spend a little while sorting them out and storing them by version. It may be a little bit of work up-front, but over time the benefits will outweigh the storage requirements.

↧

2014 NetWorker Usage Report

February 14, 2015, 10:41 pm

≫ Next: I’ve been Elected again

≪ Previous: Housing your software

I’m pleased to say I’ve completed and made available the NetWorker usage report for 2014. I’m particularly grateful to everyone who took the time to answer the 20 questions in the survey conducted between December 1, 2014 and January 31, 2015.

The report continues to track trends in NetWorker usage within organisations: deduplication adoption continues to grow, for instance, and Data Domain remains the overwhelmingly preferred method to enable that deduplication.

This was the first survey that asked a basic question I should have been asking for the last several years (!), how big is a full backup for your environment? The results on that question were particularly insightful as to just how large some environments get, and put to rest that FUD you see occasionally from other vendors that NetWorker doesn’t cut the mustard. (It not only cuts it, but it spreads it in an entirely appetising way…)

You can find on the main NetWorker Hub site, or access it directly here.

↧

I’ve been Elected again

February 24, 2015, 12:54 am

≫ Next: Testing (and debugging) an emergency restore

≪ Previous: 2014 NetWorker Usage Report

I’ve been running the NetWorker Blog since 2009, and since it started it’s grown to hundreds of articles in addition to a bunch of reports and some mini (and not so mini) manuals. I’ve been lucky enough to be named part of the EMC Elect Community now for 3 years running since its inception, but I thought it worthwhile spending a few minutes mentioning some of the other EMC Elect I’ve been lucky enough to meet, or whose musings I’ve found particularly interesting over the years.

Dave Henry, who can speak storage like it’s his first language both in person and on his Geek Fluent Blog.
Rob Koper at 50mu.
Matt Davis, the fearsomely smart guy who runs da5is.
Jon Klaus, who runs faststorage.
Christopher Kusek, not only a guru on all things virtual, but one of the bravest bloggers I know.
Jason Nash, who often seems to know everything there is about storage you can know, and then some.

There were a lot more in EMC Elect 2015 than the above select list, of course. Last year was a bit chaotic for me, between job changes and a few other big personal events. This year, I’m planning on diving into a lot more of what my Elect colleagues (both above, and across the entire spectrum) post about, and you’ll be seeing more links appear to their articles.

Jumping back to me for a moment, I figure this is as good an opportunity as ever to do a quick summary of some of the bigger posts on the NetWorker hub – so here goes:

Top 5 Blog Posts:
- Basics – Fixing NSR Peer Information Errors. A perennial favourite, this has been visited more than twice as often as any other article on the site.
- Introducing NetWorker 8. Everyone was hungry for information on NetWorker 8 when it launched, and this remains well read even now.
- Basics – Stopping and Starting NetWorker on the Windows Command Line. I’ve always found wading through the services control panel in Windows to be slower than firing a command prompt and typing a couple of commands. I thought that was because I was a die-hard Unix/Command Line junkie, but it turns out a lot of people want to know this.
- Basics – Changing Browse/Retention Time. We’ve all done it: accidentally configured a client and left the default browse and retention settings in place, only to realise a month or two later that we need to correct it. Don’t worry, I won’t tell who has looked at this article – we’ve all been in the same boat…
- NetWorker 8 Advanced File Type Devices. NetWorker 8 saw device handling for AFTDs (and for that matter, DD Boost devices) completely upgraded. This article dove in on the nitty gritty and a lot of people access it still.
Manuals and reports you might find interesting:
- NetWorker Usage Surveys for 2014, 2013, 2012, 2011 and 2010!
- Turbocharged EMC NetWorker. Want to become a power administrator for NetWorker? Start here.
- Stop, Collaborate and Listen. Want to know how your IT department can integrate better with your business? It’s three simple words.
- Configuring LinuxVTL with NetWorker. Did you know you can get a free VTL to run in your lab environment? This manual explains how.

Thanks for reading my blog over the years, I look forward to many more years to come!

↧

Testing (and debugging) an emergency restore

February 24, 2015, 10:23 pm

≫ Next: How secure are your backups?

≪ Previous: I’ve been Elected again

A few days ago I had some spare time up my sleeve, and I decided to test out the Emergency Restore function in NetWorker VBA/EBR. After all, you never want to test out emergency recovery procedures for the first time in an emergency, so I wanted to be prepared.

If you’ve not seen it, the Emergency Restore panel is accessed from your EBR appliance (https://applianceName:8580/ebr-configure) and looks like the following:

The goal of the Emergency Restore function is simple: you have a virtual machine you urgently need to restore, but the vCenter server is also down. Of course, in an ideal scenario, you should never need to use the Emergency Restore function, but ideal and reality don’t always converge with 100% overlap.

In this scenario, to simulate my vCenter server being down, I went into vCenter, selected the ESX server I wanted to recover a virtual machine for (c64), and disconnected from it. To all intents and purposes to the ESX server, vCenter was down – at least, enough to satisfy VBA that I really needed to use the Emergency Restore function.

Once you’ve selected the VM, and the backup of the VM you want to restore, you click the Restore button to get things underway. The first prompt looks like the following:

(Yes, my ESX server is named after the Commodore 64. For what it’s worth, my vCenter server is c128 and a smaller ESX server I’ve got configured is plus4.)

Entering the ESX server details and login credentials, you click OK to jump through to the recovery options (including the name of the new virtual machine):

After you fill in the new virtual machine name and choose the datastore you want to recover from, it’s as simple as clicking Restore and the ball is rolling. Except…

After about 5 minutes, it failed, and the error I got was:

Restore failed.

Server could not create a restore task at this time. Please ensure your ESX host is resolvable by your DNS server. In addition, as configuration changes may take a few minutes to become effective, please try again at a later time.

From a cursory inspection, I couldn’t find any reference to the error on the support website, so I initially thought I must have done something wrong. Having re-read the Emergency Restore section of the VMware Integration Guide a few times, I was confident I hadn’t missed anything, so I figured the ESX server might have been taking a few minutes to be sufficiently standalone after the disconnection, and gave it a good ten or fifteen minutes before reattempting, but got the same error.

So I went through and did a bit of digging on the actual EBR server itself, diving into the logs there. I eventually re-ran the recovery while tailing the EBR logs, and noticed it attempting to connect to a Data Domain system I knew was down at the time … and had my ahah! moment.

You see I’d previously backed up the virtual machine to one Data Domain, but when I needed to run some other tests, changed my configuration and started backing up the virtual infrastructure to another Data Domain. EBR needed both online to complete the recovery, of course!

Once I had the original Data Domain powered up and running, the Emergency Restore went without a single hitch, and I was pleased to see this little message:

Before too long I was seeing good progress on the restore:

And not long after that, I saw the sort of message you always want to see in an emergency recovery:

There you have it – the Emergency Restore function tested well away from any emergency situation, and a bit of debugging while I was at it.

I’m sure you’ll hope you never need to use the Emergency Restore feature within your virtual environment, but knowing it’s there – and knowing how simple the process is – might help you avoid serious problems in an emergency.

↧

How secure are your backups?

March 2, 2015, 1:39 am

≫ Next: Virtualised servers and storage nodes

≪ Previous: Testing (and debugging) an emergency restore

Backup security is – or rather should be a big requirement in any organisation. Sadly we still see periodic examples of where organisations fail to fully comprehend the severity of a backup breach. In a worst case scenario, a backup breach involving physical theft might be the equivalent of someone having permanent and unchecked access to a snapshot of your entire network.

There are two distinct aspects to backup security to consider:

Physical
Electronic

For each type of backup security, we need to consider two key areas:

At rest
In transit

This usually leads businesses to start backup security planning by considerations such as:

Do we encrypt backup media?
Do we used security guards for movement of backup media?
Are on-disk backups encrypted?

Oddly enough, there’s a bigger gorilla in the room for backup security that is less often thought of: your backups are only as secure as the quality of and your/their adherence to your security policies.

A long time ago in a state far, far away, a colleague was meeting with a system administrator in the offices of an environmental organisation. She needed to ensure the security restrictions for system access could be drastically lowered from the default install criteria. “Everyone here is an anti-authority hippy” she said (or words to that effect), “If we give them hard passwords they’ll just write the in permanent marker on their monitors.”

The solution was to compromise to a mid-point of security vs ease-of-access.

These days few organisations would yield to their users disdain for authority so readily, but it serves to highlight that a system is only as secure as you choose to make it. A backup environment does not sit in isolation – it resides on the hosts it is being used to protect (in some form or another), and it will hae a host-based presence within your network at some point. If someone can breach that security and get onto one of thoe hosts, there’s a good chance a significant aspect of your backup security protocols have been breached as well.

That’s why all backup security has to start at a level outside the backup environment … rather, it requires consideration at all layers. It doesn’t start with the complexity of the password required to access an administrator interface, and nor does it end with enabling data-at-rest encryption. So if you’re reading this thinking your backups are reasonably secure but your organisation only has mediocre access restrictions to get onto the network, you may have closed the gates after the horse has bolted.

↧

Virtualised servers and storage nodes

March 5, 2015, 3:00 pm

≫ Next: Basics – Running VMware Protection Policies from the Command Line

≪ Previous: How secure are your backups?

A little over 5 years ago now, I wrote an article titled, Things not to virtualise: backup servers and storage nodes. It’s long past time to revisit this topic and say that’s no longer a recommendation I’d make.

At the time I suggested there were a two key reasons why you wouldn’t virtualise these systems:

Dependencies
Performance

The dependencies point related to the potentially thorny situation of needing to recreate a certain level of your virtualised environment before you could commence disaster recovery operations using NetWorker, and the second related to guaranteeing maximum performance for your backup server (and for that matter, storage nodes).

With appropriate planning, I believe neither of these considerations represent reasons to avoid virtualising backup infrastructure any longer. But if you disagree, first consider a few statistics from the 2014 NetWorker Usage Report:

10% of respondents said some of their NetWorker servers were virtualised.
10% of respondents said some of their Storage Nodes were virtualised.
5% of respondents said all of their Storage Nodes were virtualised.
9% of respondents said all of their NetWorker servers were virtualised.

Stepping back to the original data from that report, of the 9% of respondents who said all of their NetWorker servers were virtual, there were small environments, but there were just as many environments with 501+ clients, and some with 5001+ clients backing up 5+PB of data. Similar correlations were applicable for environments where all storage nodes were virtualised.

Clearly size or scale is not an impediment towards virtualised backup infrastructure.

So what’s changed?

There’s a few key things from my perspective that have changed:

Substantially reduced reliance on tape
Big uptake in Data Domain backup solutions
More advanced and mature virtualisation disaster recovery options

Let’s tackle each of those. First, consider tape – getting tape access (physical or virtual) within a virtual machine has always been painful. While VMware still technically supports virtual machine access to tape, it’s fraught with considerations that impact the options available to other virtual machines on the same ESX server. That’s not really a portable option.

At the same time, we’re seeing a big switch away from tape as a primary backup target. The latest NetWorker usage report showed that just 9% of sites weren’t using any form of backup to disk. As soon as tape is removed as a primary backup target, virtualisation becomes a much simpler proposition, for any storage node or backup server.

Second, Data Domain. As soon as you have Data Domain as a primary backup target, your need for big, powerful storage nodes drastically decreases. Client Direct, where the individual clients are tasked with performing data segmentation and send data directly to an accessible device practically eliminates storage node requirements in many environments. Rather than being hosts capable of handling the throughput of gigabytes of data a second, a storage node simply becomes the host responsible for giving individual clients a path to write to or read from on the target system. Rather than revisit that here, I’ll point you at an article I wrote in August 2014 – Understanding Client Direct. In case you’re thinking Data Domain is only just a single product, keep in mind from the recent usage report that a whopping 78% of respondents said they were using some form of deduplication, and of those respondents, 47% were using Data Domain Boost. In fact, once you take VTL and CIFS/NFS into account, 80% of respondents using deduplication were using Data Domain. (Room, meet gorilla.)

Finally – more advanced virtualisation disaster recovery options. At the time I’d written the previous article, I’d just seen a demo of SRM, but since then it’s matured and datacentres have matured as well. It’s not uncommon for instance to see stretched networks between primary and disaster recovery datacentres … when coupled with SRM, a virtual backup server that fails on one site can be brought up on the other site with the same IP address and hostname within minutes.

Of course, a virtual backup server or storage node may somehow fail in such a way that the replicated version is unusable. But the nature of virtualisation allows a new host to be stood up very quickly (compared to say, a physical server). I’d argue when coupled with backup to disk that isn’t directly inside the virtual machine (and who would do that?) the disaster recovery options are more useful and comprehensive for virtual backup servers and storage nodes than they are for physical versions of the same hosts.

Now dropping back briefly to performance: the advanced functionality in VMware to define guaranteed performance characteristics and resources to virtual machines allows you to ensure that storage nodes and backup servers deliver the performance required.

vCenter clustering and farms of ESX servers also drastically reduces the chance of losing so much of the virtual infrastructure that it must be redeployed prior to commencing a recovery. Of course, that’s a risks vs costs game, but what part of disaster recovery planning isn’t?

So here I am, 5 years later, very openly saying I disagree with 2009-me: now is the time to seriously consider virtualising as much as possible of your backup infrastructure. (Of course, that’s dependent on your underlying infrastructure, but again, what part of disaster recovery planning isn’t dependent on that?)

↧

Basics – Running VMware Protection Policies from the Command Line

March 10, 2015, 12:38 am

≫ Next: Service catalogues and backups

≪ Previous: Virtualised servers and storage nodes

If you’ve been adapting VMware Protection Policies via VBA in your environment (like so many businesses have been!), you’ll likely reach a point where you want to be able to run a protection policy from the command line. Two immediate example scenarios would be:

Quick start of a policy via remote access*
External scheduler control

(* May require remote command line access. You can tell I’m still a Unix fan, right?)

Long-term users of NetWorker will know a group can be initiated from the backup server by using the savegrp command. When EMC introduced VMware Protection Policies, they also introduced a new command, nsrpolicy.

The simplest way to invoke a policy is as follows:

# nsrpolicy -p policyName

For example:

[root@centaur ~]# nsrpolicy -p SqueezeProtect
99528:nsrpolicy: Starting Vmware Protection Policy 'SqueezeProtect'.
97452:nsrpolicy: Starting action 'SqueezeProtect/SqueezeBackup' with command: 'nsrvba_save -s centaur -j 544001 -L incr -p SqueezeProtect -a SqueezeBackup'.
97457:nsrpolicy: Action 'SqueezeProtect/SqueezeBackup's log will be in /nsr/logs/policy/SqueezeProtect/544002.
97461:nsrpolicy: Action 'SqueezeProtect/SqueezeBackup' succeeded.
99529:nsrpolicy: Vmware Protection Policy 'SqueezeProtect' succeeded.

There you go – it’s that easy.

↧

Service catalogues and backups

March 16, 2015, 12:53 am

≫ Next: Basics – virtual machine names in VBA backups

≪ Previous: Basics – Running VMware Protection Policies from the Command Line

Service catalogues are sometimes seen as an unwieldy way of introducing order with a substantial risk of introducing red tape. That being said, I’m a big fan of them for backup and recovery systems, and not because of some weird fetish for bureaucracy.

Like ITIL, I’m firmly of the opinion that service catalogues get such a bad rap for many IT workers because they’ve experienced a poor implementation at one or two locations they’ve worked. Service catalogues only need to be as formal and/or as complex as the needs of the individual organisation. So that means a small company with say, 50 employees can likely have a radically simpler service catalogue definition than would, say, a multinational with 50,000 employees.

It’s not uncommon to review the backup environment for an organisation only to find there’s no central theme for backup configuration. This server gets full backups every day with backups retained for a month, that server gets full backups weekly, incrementals the rest of the time and backups kept for six weeks. That other server looks to have a good configuration but it hasn’t been added to an active group. And so on…

While service catalogues don’t guarantee avoiding a mixed-up configuration, they do set a certain base level of order in the same way a standard system build or even a document template does. This works in a number of ways, namely:

It allows backup administrators to have a standard definition of exactly what configuration should be established for any given service catalogue selection
It allows the business group booking the backup function to clearly understand exactly what level of protection they can expect (and hopefully what SLAs are included as well)
It can help in capacity planning
It allows exceptions to be more easily captured

The first item above helps to eliminate human error. Rather than relying on an administrator or operator choosing options at the moment when configuring backups, he or she knows that a particular service catalogue option requires a particular set of configuration items to be put in place.

The second item allows the business to be more confident about what it’s getting. There’s no excuse for believing in platinum-level service when a bronze-level option is chosen, but more importantly, the business unit booking the function can more clearly understand the value of the different levels.

There are two distinct aspects to capacity planning – knowing growth rates, and knowing service requirements. Growth rates are the relatively easy things to capture: a few mminfo reports run regularly, or periodic interrogation of your NMC reports will tell you what your month-on-month growth rates are for backup. What won’t be as immediately visible perhaps is how that growth breaks down between say, production systems and development systems, or high priority systems and low priority systems. Assigning service catalogue units to individual hosts (or applications) will allow a better understanding of the growth rate of the individual sorts of service options you want to provide. Month-on-month you should be able to see how many platinum or production (or whatever names you use) systems you’re adding. Particularly in situations where you’ve got tiered backup targets, this is essential in understanding where you need to add capacity. (In short: knowing your backups are growing at 2TB a month is pointless if you don’t know whether that’s 2TB of backup-to-disk, 2TB of tape, or some mix between the two.)

Finally we get to exceptions – and these are exceptionally important. (Excuse the pun.) Any system that’s designed to be rigorously enforced to the exclusion of any variation at all is just going to be a fount of pain. Key systems might get missed because they’re not compatible with the service catalogue, or just as bad, business units might deploy competing data protection systems to suit their specific needs, drastically increasing operational cost. Therefore, the solution is to have an exceptions system which allows variation to the standard service catalogue items but in such a way these variations are clearly:

Justified,
Documented, and
Costed

Ultimately service catalogues for backup and recovery systems (and data protection more broadly) aren’t about imposing rigid rules, but allowing for faster and more accurate deployment of carefully planned protection models. Any sensible business would only consider this as being a valuable and useful approach to IT/business insurance strategies.

↧

Basics – virtual machine names in VBA backups

March 26, 2015, 1:02 am

≫ Next: World backup day misses the point

≪ Previous: Service catalogues and backups

If you’ve been backing up your virtual machines with VBA, you’ve probably hit that moment when you’ve run an mminfo query and seen output looking like the following:

As you can see, that’s not the most effective way to see virtual machine names – vm:<id> doesn’t allow you to easily match it back to the virtual machine in question.

However, not all is lost. With VBA backups came a couple of new options. The first one is a “VBA backups” style report, using the command:

# mminfo -k

Using mminfo -k you’ll get a very tailored output focused entirely on your VBA backups, and it’ll resemble the following:

That’s a really good way of seeing a quick listing of all your VBA-based virtual machine backups, but if you’re wanting a way of reconciling in normal mminfo output, you can also make use of a new mminfo report field, vmname. For example:

(In the above command I could have used name and vmname in order to reconcile vm:<id> entries to virtual machine names, but elected not to for brevity.)

There you have it – a couple of quick and easy ways of quickly seeing details of your virtual machine backups via mminfo.

↧

World backup day misses the point

March 30, 2015, 1:03 am

≫ Next: Turbocharged EMC NetWorker, v1.1

≪ Previous: Basics – virtual machine names in VBA backups

It’s fair to say I’m a big fan of backup and recovery. So much so that a substantial part of the last 19 years of my career have been devoted to it in some form or another.

Yet here’s the rub: World backup day (March 31) is full of good intentions but has entirely the wrong focus. By that I don’t just mean it should be World Recovery Day (although that would be a nice change); instead, it places emphasis on just one aspect of data protection, and these days there’s no such thing as a data protection strategy that only leverages a single aspect.

Data protection – Information Lifecycle Protection (ILP), as I like to think of it starts well before the first backup is taken, and extends into a variety of fields: storage, operating systems and virtualisation. You might say at bare minimum, ILP is comprised of the following:

Components of ILP

(It’s also impossible to have a truly effective Information Lifecycle Protection strategy without also having a data lifecycle management strategy – i.e., be comfortable with archival and pruning of data.)

It would be easy to look at the above diagram and assume it’s all about storage, but there’s more to it than that. Smart companies are starting to focus on their data protection in an application-centric approach. That’s not to suggest decentralisation of data protection, but more decentralised integration with intelligent centralised reporting, capacity management and policy management. For sure, storage is one aspect of what we need to protect, but if you look at an average enterprise now there are whole realms of data protection functions that have made their way up into higher layers – VMware’s SRM, vMotion, etc., are perfect examples of data-protection concepts applied at a higher level to provide more functional protection.

By application-centric approach, I’m not talking about “MSSQL Initiated” or “Oracle Initiated” (though I’ll admit that plays a part in a centralised policy/decentralised integration approach), but more a consideration of how enterprise IT needs to work in an evolving – indeed, evolved – landscape. It’s time in IT we stop thinking about backup and recovery or data protection being about a list of hosts and databases that need protection, and instead think about data protection in terms of business functions, business applications that need to be protected. From the business perspective the hosts cyclops, medusa and cerberus running the database fipr00 is meaningless – the business wants to know that the financial planning system is being protected. As cloud based approaches to IT take hold and introduce a consumer-based, service-centric view of IT, IT must adjust to think of data protection from a service, application or business function perspective.

Celebrate world backup day by all means, but let’s keep in mind it’s at just one quadrant in the information lifecycle protection approach.

↧

Turbocharged EMC NetWorker, v1.1

April 5, 2015, 11:16 pm

≫ Next: Upgrade times

≪ Previous: World backup day misses the point

Today I’m announcing the availability of Turbocharged EMC NetWorker, v1.1. As you can imagine from the version number, this is an incremental update rather than a substantial revision of the previous document. The change log for the updated manual is as follows:

Added details on the nsradmin -C option for automated client probes
Added details for reporting on VBA backups using mminfo
Added details for the dbgcommand utility
Moved the index of tables and table of figures to the end of the document
Various corrections

Turbocharged EMC NetWorker v1.1 replaces the previous version of the document, and can be downloaded from the same location as before.

And now a small note…

When I publish a manual, I make it free on the condition that downloaders supply their names and email addresses. I do this so that if it turns out there’s a need to issue an urgent correction or notification to users I can do so. I haven’t needed to do this yet, and I hope not to, but that’s why I ask for it. From a privacy perspective, I do not use those email addresses for any other purpose, and they have always remained completely quarantined from public/cloud email servers. I also do not make those email addresses available to anyone else (third parties, employers, etc.).

Cheers!

↧

Upgrade times

April 22, 2015, 12:46 am

≫ Next: Basics – Taking a turn about the filesystem

≪ Previous: Turbocharged EMC NetWorker, v1.1

We live in a world of activities. Particularly in IT, almost everything we do can at some point be summarised as being an activity. Activities we perform typically fall into one of three organisational types, viz.:

Parallel – Activities that can be performed simultaneously
Sequential – Activities that must be performed in a particular order
Standalone – Activities that can be performed in any order

Additionally, we can say that most activities we perform are either blocking or non-blocking, at least in some context. Setting up a new client in NetWorker for instance is not normally considered to be a blocking activity, unless of course we consider it in the context of a subsequent backup of that same client.

One thing we can definitely consider to be a blocking activity in NetWorker is a server upgrade. These days you have to be a little more conscientious about server upgrades. Whereas before it was not uncommon to encounter environments with multiple storage nodes of varying NetWorker versions (despite whether this was a good idea or not), the upgrade process now requires you to ensure all storage nodes are upgraded either before, or at least at the same time as the NetWorker server. Additionally, if you’re using a dedicated NetWorker Management Console server, you should also upgrade that before the NetWorker server so that your NMC interface is capable of addressing any new functionality introduced in NetWorker*.

Each of those upgrades (NMC, storage nodes and NetWorker server) take a particular amount of time. If you’re using standard NetWorker authentication, the package upgrade or removal/reinstall process will take mere minutes. If your authentication is integrated with an Active Directory or LDAP service, that’ll be a little bit more complicated.

But (and there’s always a but), the process of either upgrading the NetWorker binaries on the server (or uninstalling and re-installing, depending on your operating system) is not the be-all and end-all of the NetWorker upgrade process. There are a sequence of activities you need to perform prior to the upgrade as part of consistency checking and validation that you really shouldn’t ever skip. Depending on the number of clients, backup volumes or savesets you have, that might take some time to complete**.

Looking at the NetWorker 8.2 upgrade guide, the recommended activities you should complete before performing a NetWorker server software upgrade are as follows:

nsrim -X
nsrck -m
nsrck -L6
nsrls -m
nsrls
nsrports
savegrp -l full -O groupName ***
mminfo -B

Of those commands, the ones that will potentially take the most time to execute are shown in bold, though the timing difference between just those three commands can be quite substantial. Consider the nsrck -L6 in particular: that’s a complete in-place index rebuild for all clients. If your indices are very large, or your NetWorker index filesystem storage incapable of providing sufficient IOPS (or a combination of both), the run-time for that activity may be lengthy. Equally, if your indices are large, a savegrp -l full -O to force a full index backup**** may also take quite a while to backup, depending on your backup destination.

As an example, on a virtual lab server significantly under the minimum performance specifications for a NetWorker server (not to mention virtualised in vSphere which was in turn virtualised in VMware Fusion), I built up an index for a client of approximately 10GB after first creating then repeatedly backing up a filesystem with approximately 3,000,000 files in it. (mminfo reports an nfiles of 3,158,114). After those backups were generated, nsrls told me:

# nsrls centaur

/nsr/index/centaur: 88957985 records requiring 10 GB
/nsr/index/centaur is currently 100% utilized

Moving on to compare the run-times of an nsrck -L1, -L3 and -L6 yielded the following:

# date; nsrck -L1 centaur; date

Wed Apr 22 17:19:51 AEST 2015
nsrck: checking index for 'centaur'
nsrck: /nsr/index/centaur contains 88957985 records occupying 10 GB
nsrck: Completed checking 1 client(s)
Wed Apr 22 17:19:51 AEST 2015

# date; nsrck -L3 centaur; date

Wed Apr 22 17:19:51 AEST 2015
nsrck: checking index for 'centaur'
nsrck: /nsr/index/centaur contains 88957985 records occupying 10 GB
nsrck: Completed checking 1 client(s)
Wed Apr 22 17:19:52 AEST 2015

# date; nsrck -L6 centaur; date

Wed Apr 22 17:19:52 AEST 2015
nsrck: checking index for 'centaur'
nsrck: /nsr/index/centaur contains 88957985 records occupying 10 GB
nsrck: Completed checking 1 client(s)
Wed Apr 22 17:26:26 AEST 2015

Now, these timings are not under any circumstances meant to be typical of how long it might take to perform an nsrck -L6 against a client index of a similar size on a real NetWorker server. But the fact that there’s a jump in the time taken to execute an nsrck -L3 vs an nsrck -L6 should serve to highlight my point: it’s a nontrivial operation depending on the size of the indices (as are nsrim -X and the savegrp -O), and you must know how long these operations are going to take in your environment.

When planning NetWorker upgrades, I think it’s quite important to keep in mind the upgrade process is blocking: once you’ve started it, you really need to complete it before you can do any new backups or recoveries. (While some of those tasks above can be interrupted, you may equally find that an interruption should be followed by starting afresh.) So it becomes very important to know – in advance of when you’re actually going to perform the upgrade – just how long the various pre-upgrade steps are going to take. If not, you may end up in the situation of watching your change window or allowed outage time rapidly shrinking while an operation like nsrck -L6 is still running.

It may be that I’ve made the upgrade process sound a little daunting. In actual fact, it’s not: those of us who have been using NetWorker for close to two decades will recall just how pugnaciously problematic NetWorker v4 and v5 upgrades were with the older database formats*****. However, like all upgrades for critical infrastructure, NetWorker upgrades are going to be optimally hassle-free when you’re well prepared for the activities involved and the amount of time each activity will take.

So, if you’re planning on doing a NetWorker upgrade, make sure you plan the timings for the pre-upgrade maintenance steps … it’ll allow you to accurately predict the amount of time you need and the amount of effort involved.

—
* Finally of course, if you’re using VBA, you’ll possibly need to upgrade your EBR appliance and proxies, too.

** On the plus side, NetWorker is quite efficient at those maintenance operations compared to a lot of other backup products.

*** The guide doesn’t mention including ‘-l full’. However, if doing anything approaching a major version upgrade (e.g., 8.1 to 8.2), I believe you should do the index backup as a full one.

**** To execute this successfully, you optimally should have a group defined with all clients in it. Otherwise you’ll have to carefully run multiple groups until you’ve captured all clients.

***** For servers, I only go back as far as v4, so I can’t say what upgrades were like with v3 or lower.

↧

Basics – Taking a turn about the filesystem

April 27, 2015, 1:29 am

≫ Next: New role

≪ Previous: Upgrade times

“Miss Eliza Bennet, let me persuade you to follow my example, and take a turn about the room. — I assure you it is very refreshing after sitting so long in one attitude.”

Jane Austin: Pride and Prejudice.

The NetWorker savegrp command has a lot of different command line options, but one which falls into that useful-for-debugging category for me has always been the -n option. This allows you to invoke the save commands for a group (or a single client in the group) in walk/don’t do mode.

While filesystems have become considerably more capable at self-repair and resilient towards minor corruption, there was a time in the past where you could encounter an operating system crash as a result of attempting to access a particularly corrupt file or part of the filesystem. Backups, of course, want to walk all the filesystems (unless you direct them otherwise), and so being able to see what NetWorker might do during a backup was helpful to diagnose such issues. (Even if it meant one more crash.)

These days, if a host being backed up by NetWorker via a filesystem agent gets a lot of changes during a day, you might simply be interested in seeing just how many files are going to be backed up.

The command is pretty straight forward:

# savegrp -nv [-c client] groupName

For instance, consider the following execution:

[root@orilla ~]# savegrp -nv -c mondas Servers
90528:savegrp: mondas:All level=incr
7236:savegrp: Group will not limit job parallelism
83643:savegrp: mondas:All started
savefs -s orilla -c mondas -g Servers -p -n -l full -R -v
mondas:/ level=incr, vers=pools, p=4
mondas:/d/01 level=incr, vers=pools, p=4
mondas:/boot level=incr, vers=pools, p=4
mondas:/d/backup level=incr, vers=pools, p=4
90491:savegrp: mondas:All succeeded.
83647:savegrp: Servers mondas:All See the file /nsr/logs/sg/Servers/832077 for command output
83643:savegrp: mondas:/ started
save -s orilla -g Servers -n -LL -f - -m mondas -t 1430050510 -o MODIFIED_ASOF_TIME:timeval=1430050506;RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1429877707:1430050510; -l incr -W 78 -N / /
83643:savegrp: mondas:/d/01 started
save -s orilla -g Servers -n -LL -f - -m mondas -t 1430050508 -o MODIFIED_ASOF_TIME:timeval=1430050506;RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1429877710:1430050508; -l incr -W 78 -N /d/01 /d/01
83643:savegrp: mondas:/boot started
save -s orilla -g Servers -n -LL -f - -m mondas -t 1430050507 -o MODIFIED_ASOF_TIME:timeval=1430050506;RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1429877709:1430050507; -l incr -W 78 -N /boot /boot
83643:savegrp: mondas:/d/backup started
save -s orilla -g Servers -n -LL -f - -m mondas -t 1430050509 -o MODIFIED_ASOF_TIME:timeval=1430050506;RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1429877708:1430050509; -l incr -W 78 -N /d/backup /d/backup
77562:savegrp: job (832078) host: mondas savepoint: / had WARNING indication(s) at completion
90491:savegrp: mondas:/ succeeded.
83647:savegrp: Servers mondas:/ See the file /nsr/logs/sg/Servers/832078 for command output
90491:savegrp: mondas:/boot succeeded.
83647:savegrp: Servers mondas:/boot See the file /nsr/logs/sg/Servers/832080 for command output
90491:savegrp: mondas:/d/01 succeeded.
83647:savegrp: Servers mondas:/d/01 See the file /nsr/logs/sg/Servers/832079 for command output
90491:savegrp: mondas:/d/backup succeeded.
83647:savegrp: Servers mondas:/d/backup See the file /nsr/logs/sg/Servers/832081 for command output
83643:savegrp: mondas:index started
save -s orilla -S -g Servers -n -LL -f - -m orilla -V -t 1429878349 -l 9 -W 78 -N index:147f6a46-00000004-5457fce2-5457fce1-0016b3a0-02efe8cc /nsr/index/mondas
128137:savegrp: Group Servers waiting for 1 jobs (0 awaiting restart) to complete.

90491:savegrp: mondas:index succeeded.
83647:savegrp: Servers mondas:index See the file /nsr/logs/sg/Servers/832082 for command output
* mondas:All savefs mondas: succeeded.
* mondas:/ suppressed 2038 bytes of output.

...snip...

You’ll see there the output reaches a point where NetWorker tells you “suppressed X bytes of output”. That’s a protection mechanism for NetWorker to prevent savegroup completion notifications growing to massive sizes. However, because we’ve used the verbose option, the output is captured – it’s just directed to the appropriate log file for the group. In this case, the output (underlined above) tells me I can check out the file /nsr/logs/sg/Servers/832078 to see the details of the root filesystem backup for the client mondas.

Checking that file, I can see what files would have been backed up:

[root@orilla Servers]# more /nsr/logs/sg/Servers/832078
96311:save: Ignoring Parallel savestreams per saveset setting due to incompatibl
e -n/-E option(s)
75146:save: Saving files modified since Sun Apr 26 22:15:06 2015
/var/log/rpmpkgs
/var/log/secure
/var/log/audit/audit.log
/var/log/audit/
/var/log/lastlog
/var/log/cron
/var/log/wtmp
/var/log/maillog
/var/log/
/var/run/utmp
/var/run/
/var/lock/subsys/
/var/lock/
/var/spool/clientmqueue/qft3QI22Tg020700
/var/spool/clientmqueue/dft3QI22Tg020700
/var/spool/clientmqueue/
/var/spool/cups/tmp/
/var/spool/cups/
/var/spool/anacron/cron.daily
/var/spool/anacron/
/var/spool/

...snip...

This command only works for filesystem backups performed by the core NetWorker agent. It’s not compatible for instance, with a database module or VBA – but regardless, it is the sort of debugging/analysis tool you should be aware of. (Forewarned is forearmed, and forearmed is a lot of arms… Ahem.)

Check out savegrp -n on a client/group when you have time to familiarise yourself with how it works. It’s reasonably straightforward and is a good addition to your NetWorker utility belt.

↧

New role

April 29, 2015, 11:21 pm

≫ Next: Files and files and files

≪ Previous: Basics – Taking a turn about the filesystem

I’m pleased to say that on Monday I’ll be starting a new role. While I’ve worked closely with EMC for many a year as a partner, and more recently in a subcontracting position, come Monday that’ll all be changing…

…I’m joining EMC. I’ll be working in the Data Protection Solutions group as a sales engineer.

I’ve got to say (and not just because people from EMC will no doubt see this!) that I’m really looking forward to this role. This will allow me more so than ever before to look holistically at the entire data protection spectrum. While I’ve always had an eye on the bigger picture of data protection, enterprise backup has always been the driving activity I’ve focused on. More so than that, EMC is one of only a very small handful of vendors I’ve ever wanted to work for (and one of the other vendors I wanted to work for was Legato, so you might say I’m achieving two life goals with just one job) – so I’m going to be revved up from the start.

I’ll be continuing this blog, but with a broader exposure to the entire data protection suite I’ll be working with at EMC, expect to see more coverage on those integration points, too.

It’ll be a blast!

↧

Files and files and files

April 30, 2015, 10:36 pm

≫ Next: One target to rule them all

≪ Previous: New role

A while ago, I gave away a utility I find quite handy in lab and testing situations called genbf. If you’ll recall, it can be used to generate large files which are not susceptible to compression or deduplication. (You can find that utility here.)

At the time I mentioned another utility I use called generate-filesystem. While genbf is designed to produce potentially very large files that don’t yield to compression, generate-filesystem (or genfs2 as I’m now calling it) is designed to create a random filesystem for you. It’s not the same of course as taking say, a snapshot copy of your production fileserver, but if you’re wanting a completely isolated lab and some random content to do performance testing against, it’ll do the trick nicely. In fact, I’ve used it (or predecessors of it) multiple times when I’ve blogged about block based backups, filesystem density and parallel save streams.

Overall it produces files that don’t yield all that much to compression. A 26GB directory structure with 50,000 files created with it compressed down to just 25GB in a test I ran a short while ago. That’s where genfs2 comes in handy – you can create really dense test filesystems with almost no effort on your part. (Yes, 50,000 files isn’t necessarily dense, but that was just a small run.)

It is however random by default on how many files it creates, and unless you give it an actual filesystem count limit, it can easily fill a filesystem if you let it run wild. You see, rather than having fixed limits for files and directories at each directory level, it works with upper and lower bounds (which you can override) and chooses a random number at each time. It even randomly chooses how many directories it nests down based on upper/lower limits that you can override as well.

Here’s what the usage information for it looks like:

$ ./genfs2.pl -h
Syntax: genfs2.pl [-d minDir] [-D maxDir] [-f minFile] [-F maxFile] [-r minRecurse] [-R maxRecurse] -t target [-s minSize] [-S maxSize] [-l minLength] [-L maxLength] [-C] [-P dCsize] [-T mfc] [-q] [-I]

Creates a randomly populated directory structure for backup/recovery 
and general performance testing. Files created are typically non-
compressible.

All options other than target are optional. Values in parantheses beside
explanations denote defaults that are used if not supplied.

Where:

    -d minDir      Minimum number of directories per layer. (5)
    -D maxDir      Maximum number of directories per layer. (10)
    -f minFile     Minimum number of files per layer. (5)
    -F maxFile     Maximum number of files per layer. (10)
    -r minRecurse  Minimum recursion depth for base directories. (5)
    -R maxRecurse  Maximum recursion depth for base directories. (10)
    -t target      Target where directories are to start being created.
                   Target must already exist. This option MUST be supplied.
    -s minSize     Minimum file size (in bytes). (1 K)
    -S maxSize     Maximum file size (in bytes). (1 MB)
    -l minLength   Minimum filename/dirname length. (5)
    -L maxLength   Maximum filename/dirname length. (15)
    -P dCsize      Pre-generate random data-chunk at least dcSize bytes.
                   Will default to 52428800 bytes.
    -C             Try to provide compressible files.
    -I             Use lorem ipsum filenames.
    -T             mfc Specify maximum number of files that will be created.
                   Does not include directories in count.
    -q             Quiet mode. Only print updates to the file-count.

E.g.:

./genfs2.pl -r 2 -R 32 -s 512 -S 65536 -t /d/06/test

Would generate a random filesystem starting in /d/06/test, with a minimum
recursion depth of 2 and a maximum recursion depth of 32, with a minimum
filesize of 512 bytes and a maximum filesize of 64K.

Areas where this utility can be useful include:

…filling a filesystem with something other than /dev/zero
…testing anything to do with dense filesystems without needing huge storage space
…doing performance comparisons between block based backup and regular backups
…doing performance comparisons between parallel save streams and regular backups

This is one of those sorts of utilities I wrote once over a decade ago and have just done minor tweaks on it here and there since then. There’s probably a heap of areas where it’s not optimal, but it’s done the trick, and it’s done it relatively fast enough for me. (In other words: don’t judge my programming skills based on the code – I’ve never been tempted to optimise it.) For instance, on a Mac Book Pro 13″ writing to a 2TB LaCie Rugged external via Thunderbolt, the following command takes 6 minutes to complete:

$ ./genfs2.pl -T 50000 -t /Volumes/Storage/FSTest -d 5 -D 15 -f 10 -F 30 -q -I
Progress:
        Pre-generating random data chunk. (This may take a while.)
        Generating files. Standby.
         --- 100 files
         --- 200 files
         --- 300 files
         ...
         --- 49700 files
         --- 49800 files
         --- 49900 files
         --- 50000 files

Hit maximum file count (50000).

I don’t mind waiting 6 minutes for 50,000 files occupying 26GB. If you’re wondering what the root directory from this construction looks like, it goes something like this:

$ ls /Volumes/Storage/FSTest/
at-eleifend/
egestas elit nisl.dat
eget.tbz2
facilisis morbi rhoncus.7r
interdum
lacinia-in-rhoncus aliquet varius-nullam-a/
lobortis mi-malesuada aenean/
mi mi netus-habitant-tortor-interdum rhoncus.mov
mi-neque libero risus-euismod ante.gba
non-purus-varius ac.dat
quis-tortor-enim-sed-lorem pellentesque pellentesque/
sapien-in auctor-libero.anr
tincidunt-adipiscing-eleifend.xlm
ut.xls

Looking at the file/directory breakdown on GrandPerspective, you’ll see it’s reasonably evenly scattered:

Since genfs2 doesn’t do anything with the directory you give it other than add random files to it, you can run it multiple times with different parameters – for instance, you might give an initial run to create 1,000,000 small files, then if you’re wanting a mix of small and large files, execute it a few more times to give yourself some much larger random files distributed throughout the directory structure as well.

Now here’s the caution: do not, definitely do not run this on one of your production filesystems, or any filesystem where running out of space might cause a data loss or access failure.

If you’re wanting to give it a spin or make use of it, you can freely download it from here.

↧

One target to rule them all

May 13, 2015, 2:47 am

≫ Next: NetWorker to the Cloud

≪ Previous: Files and files and files

Introduction

It’s true there are some data types that broadly aren’t suitable to sending to Data Domain – any more than they’re suitable for sending to any other deduplication appliance or system within any environment. Large imaging data and video files will yield minimal deduplication except over successive backups (assuming static data), and compressed and/or encrypted data aren’t all suited either.

But the majority of data within most organisations is suited for writing to Data Domain systems.

Years ago when EMC purchased Data Domain, I don’t think anyone anticipated just what they had in mind for the appliance. I certainly didn’t – and I’d been involved in the backup industry for probably 15 years at that point. Deduplication had been kicking around for several years, but it hadn’t been mainstreamed to the degree EMC has achieved.

The numbers practically speak for themselves. Data Domain represents an overwhelming lions share of the deduplication appliance space – but I’m not going to quote numbers here. I’m going to talk about the architectural vision of Data Domain.

As a target-only appliance, Data Domain represents considerable advantage to any business that deploys it, but that’s just the tip of the iceberg. The real magic happens when we start to consider the simple fact that a Data Domain is not a dumb appliance. EMC have chosen to harness the platform to deliver maximum bang for buck for any company that walks down that path.

May the source be with you

Target based deduplication works brilliantly for drastically reducing the total amount of data stored, but it still results in that data being sent. Avamar demonstrates this overwhelmingly – its source based deduplication backup process is unbelievably efficient and powerful and is a powerfully attractive choice for many businesses, particularly those in the xaaS industry.

Data Domain’s Boost functionality extends its deduplication technology up to the origin of the data. For products like NetWorker, Avamar and VDP/VDPA, this goes right to the source. (For third party products such as NetBackup, it covers the media servers.)

If Boost had stopped at NetWorker and Avamar integration, it would have been a remarkably powerful efficiency hook for many businesses, but there’s more power to be had. The extension of Data Domain Boost to include support for enterprise applications such as Oracle, SQL Server, SAP, etc., provides unparalleled extensibility in the backup space to organisations. It also means that businesses who have deployed other backup technologies but leverage the power of Data Domain deduplication in their data protection strategy can get direct client deduplication performance for what is often their most mission critical systems and applications.

I’m the first to admit that I’ve spent years trying to convince DBAs to hand over control of their application backups to NetWorker administrators. It’s a discussion I’ve won as much as I’ve lost, but the Data Domain plugins for databases have proven one key lesson: when I’ve ‘lost’ that discussion it’s not been through lack of conviction, but through lack of process. DBAs are all for efficiencies in the backup process, but given the enterprise criticality of databases in so many organisations, much of the push back on backup centralisation has been from a lack of control of the process.

The Boost application plugins get past that by allowing a business to make the decision to integrate their application backups into centralised backup storage while allowing for highly granular control of the backup process through the agreed and trusted scheduling methods that offer considerably more granular and flexible controls. Backup products offer scheduling, of course, but they’re not meant to be the bees knees of scheduling that you’ll find in products devoted solely to that purpose. That’s what DBAs have mostly resisted. (This, for what it’s worth, is the difference between app-centric aspects to backup and recovery and a decentralised backup ‘system’.)

Here’s where we’re at with Data Domain – it now sits at a nexus in the Data Centre for data protection and nearline archival storage:

(Yes, it’s even very well suited for archival workloads.)

NetWorker, Avamar, VDP/VDPA, Client Direct, Enterprise Apps – I could go on – Data Domain sits at the centre ready to receive the data you want to send to it.

But that diagram isn’t quite complete. To truly get the maximised efficiency out of Data Domain, the picture really should look more like this:

That’s right – logically, a Data Domain solution will have at least two Data Domains in it, so that whatever you’re protecting via the Data Domain will itself be protected. Now, by itself, Data Domain offers excellent protection for the data you’re storing, but unlike what most people think of on this front, RAID-6 storage protection is just the tip of the iceberg. RAID-6 is nice – it protects you from two drive failures at any point. On top of that though, you have the Data Invulnerability Architecture that you’ll hear EMC folks talk about quite regularly – that’s the magic sauce. The Data Domain doesn’t just sit there storing your data: it stores it, it checks it, it reads it again, and it checks it as part of regular verification. (If you want to compare it to tape, imagine having a tape library big enough to store every tape you keep for retention and constantly sits there loading all the tapes and confirming all the data can be read back.)

But we all know in the data protection world that you still need that added protection of keeping a second copy of that data, regardless of whether that’s for compliance or for true disaster protection. In terms of absolute efficiency, the absolute best way you’ll get a secondary copy of that data is via the global deduplicated replication offered between two Data Domains. (For what it’s worth, that’s where some companies make the mistake of deploying tape as their secondary copy from an original backup target of Data Domain: what’s the point of deploying efficient deduplication if the first thing you’re going to do is rehydrate all the content again?)

Aside: Coming back to encryption and compression

Earlier I said that compressed and encrypted workloads aren’t necessarily suited to Data Domain. That’s true, but that usually reflects an opportunity to revisit the process and thinking behind those workloads.

Compression is typically used in a data streaming activity for data protection because of a requirement to minimise the amount of data going across the network. Boost eliminates that need by doing something better than compression at the client side – deduplication. Deduplication doesn’t just compress the original data, but it substantially reduces the original data by not even bothering to send data that already exists at the target. For instance, if I turn my attention to Oracle, the two most common reasons why DBAs will create compressed Oracle backups are:

(a) They’re writing them to primary storage and trying to minimise the footprint, or

(b) They’re writing them to NAS or some other form of network storage, and want to minimise the amount of data sent over busy links.

Both of those are squarely addressed by Data Domain:

For (a), the footprint is automatically reduced by writing it in uncompressed format to the Data Domain. It handles the deduplication automatically. In fact, it’ll be a lot more space efficient than say, the three most recent database backups being written to Tier-1/Primary storage.
For (b), because only unique data is sent over the network, and that data is compressed by Boost before it’s sent over the network, you’re still ending up with a more efficient network transfer than writing a compressed copy over the network.

Encryption might be considered a trickier subject, but it’s not really. There’s two types of encryption a business might require – at rest, or in-flight. Data Domain has supported encryption at rest for quite a long time, and the recent support for in-flight encryption has completed that piece of the puzzle. (That in-flight encryption is integrated in such a way that it still allows for local/source deduplication and associated pre-send compression, too.)

What all this means

When EMC first acquired Data Domain, they acquired a solid product that had already established excellent customer trust built from high reliability and performance. While both of those features have continued to grow (not to mention capacity … have you seen the specs on the Data Domain 9500?), those features alone don’t make for a highly extensible product (just a reliable big bucket of storage). The extensibility comes from the vertical integration right up into the application stack, and the horizontal integration across a multitude of use cases.

Last year’s survey results revealed a very high number of NetWorker environments leveraging Data Domain within their environment, but what we see if we step back a little bit from a single-product focus is that Data Domain is a strategic investment in the enterprise, able to be utilised for a plethora of scenarios across the board.

So there’s two lessons – one for those with Data Domain already, and one for those preparing to jump into deduplication: if you’ve already got Data Domain in your environment, start looking at its integration points and talking to either EMC or your supplier about where else Data Domain can offer synergies, and if you’re looking at deploying, keep in mind that it’s a highly flexible appliance capable of fitting in to multiple workloads.

Either way, that’s how you achieve an excellent return on investment.

↧