Category Archives: July

Volume Clone Split Extremely Slow in clustered Data ONTAP

Problem

My colleague had been dealing with growth on an extremely large volume (60 TB) for some time. After discussing with Business groups it was aggreed to split the volume in two seperate volumes.The largest directory identified was 20 TB that could be moved to it’s own volume. Discussions started on the best possible solution to get this job completed quickly.

Possible Solutions

  • robocopy / securecopy the directory to another volume. Past experience says this could be lot more time consuming.
  • ndmpcopy the large directory to a new volume. The ndmpcopy session needs to be kept open, if the job fails during transfer, we have to restart from begining. Also, there are no progress updates available.
  • clone the volume, delete data not required, split the clone. This seems to be a nice solution.
  • vol move. We don’t want to copy entire 60 TB volume and delete data. Therefore, we didn’t consider this solution.

So, we aggreed on the 3rd  Solution (clone, delete, split).

What actually happened

snowy-mgmt::> volume clone split start -vserver snowy -flexclone snowy_vol_001_clone
Warning: Are you sure you want to split clone volume snowy_vol_001_clone in Vserver snowy ?
{y|n}: y
[Job 3325] Job is queued: Split snowy_vol_001_clone.
 
Several hours later:
snowy-mgmt::> volume clone split show
                                Inodes              Blocks
                        ——————— ———————
Vserver   FlexClone      Processed      Total    Scanned    Updated % Complete
——— ————- ———- ———- ———- ———- ———-
snowy snowy_vol_001_clone          55      65562    1532838    1531276          0
 
Two Days later:
snowy-mgmt::> volume clone split show
                                Inodes              Blocks
                        ——————— ———————
Vserver   FlexClone      Processed      Total    Scanned    Updated % Complete
——— ————- ———- ———- ———- ———- ———-
snowy snowy_vol_001_clone         440      65562 1395338437 1217762917          0

This is a huge problem. The split operation will never complete in time.

What we found

We found the problem was with the way clone split works. Data ONTAP uses a background scanner to copy the shared data from the partent volume to the FlexClone volume. The scanner has one active message at any time that is processing only one inode, so the split tends to be faster on a volume with fewer inodes. Also, the background scanner runs at a low priority and can take considreable amount of time to complete. This means for a large volume with millions of inodes, it will take a huge amount of time to perform the split operation.

Workaround

“volume move a clone”

snowy-mgmt::*> vol move start -vserver snowy -volume snowy_vol_001_clone -destination-aggregate snowy01_aggr_01
  (volume move start)
 
Warning: Volume will no longer be a clone volume after the move and any associated space efficiency savings will be lost. Do you want to proceed? {y|n}: y

Benefits of vol move a FlexClone:

  • Faster than FlexClone split.
  • Data can be moved to different aggregate or node.

Reference

FAQ – FlexClone split

Error Handling in Powershell Scripts

Introduction

I have been writing powershell scripts to address various problems with utmost efficiency. I have been incorporating error handling in my scripts, however, i refreshed my knowledge and i am sharing this with fellow IT professionals. While running powershell cmdlets, you encounter two kinds of errors (Terminating and Non Terminating):

  • Terminating : These will halt the function or operation. e.g. syntax error, running out of memory. Can be caught and handled.

Terminating-Error

  • Non Terminating : These allow the function or operation to continue. e.g. file not found, permission issues, if the file is empty the operation continues to next peice of code. Difficult to capture.

So How do you capture non terminating errors in a funcion?

Powershell provides various Variables and Actions to handle errors and exceptions:

  • $ErrorActionPreference : environment variable which applies to all cmdlets in the shell or the script
  • -ErrorAction : applies to specific cmdlets where it is applied
  • $Error : whenever an exception occurs its added to $Error variable. By default the variable holds 256 errors. The $Error variable is an array where the first element is the most recent exception. As new exceptions occur, the new one pushes the others down the list.
  • -ErrorVariable: accepts the name of a variable and if the command generates and error, it’ll be placed in that variable.
  • Try .. Catch Constructs : Try part contains the command or commands that you think might cause an error. You have to set their -ErrorAction to Stop in order to catch the error. The catch part runs if an error occurs within the Try part.

-ErrorAction : Use ErrorAction parameter to treat non terminating errors as terminating. Every powershell cmdlet supports ErrorAction.Powershell halts execution on terminating errors. For non terminating errors we have the option to tell powershell how to handle these situations.

Available Choices

  • SilentlyContinue : error messages are supressed and execution continues
  • Stop : forces execution to stop, behaves like a terminating error
  • Continue : default option. Errors will display and execution will continue
  • Inquire : prompt the user for input to see if we should proceed
  • Ignore : error is ignored and not logged to the error stream

function Invoke-SshCmd ($cmd){
try {
Invoke-NcSsh $cmd -ErrorAction stop | out-null
"The command completed successfully"
}
catch {
Write-ErrMsg "The command did not complete successfully"
}
}

$ErrorActionPreference : It is also possible to treat all errors as terminating using the ErrorActionPreference variable.You can do this either for the script your are working with or for the whole PowerShell session.

-ErrorVariable : Below example captures error in variable “$x”

function Invoke-SshCmd ($cmd){
try {
Invoke-NcSsh $cmd -ErrorVariable x -ErrorAction SilentlyContinue | out-null
"The command completed successfully"
}
catch {
Write-ErrMsg "The command did not complete successfully : $x.exception"
}
}

$x.InvocationInfo : provides details about the context which the command was executed
$x.Exception : has the error message string
If there is a further underlying problem that is captured in $x.Exception.innerexception
The error message can be futher broken in:
$x.Exception.Message
and $x.Exception.ItemName
$($x.Exception.Message) another way of accessing the error message.

$Error : Below example captures error in default $error variable

function Invoke-SshCmd ($cmd){
try {
Invoke-NcSsh $cmd -ErrorAction stop | out-null
"The command completed successfully"
}
catch {
Write-ErrMsg "The command did not complete successfully : $error[0].exception"
}
}

Query Oncommand Performance Manager (OPM) Database using Powershell

Introduction

OnCommand Performance Manager (OPM) provides performance monitoring and event root-cause analysis for systems running clustered Data ONTAP software. It is the performance management part of OnCommand Unified Manager. OPM 2.1 is well integrated with Unified Manager 6.4. You can view and analyze events in the Performance Manager UI or view them in the Unified Manager Dashboard.

Performance Manager collects current performance data from all monitored clusters every five minutes (5, 10, 15). It analyzes this data to identify performance events and potential issues. It retains 30 days of five-minute historical performance data and 390 days of one-hour historical performance data. This enables you to view very granular performance details for the current month, and general performance trends for up to a year.

Accessing the Database

Using powershell you can query MySQL database and retrieve information to create performance charts in Microsoft Excel or other tools. In order to access OPM databse you’ll need a user created with “Database User” role.

OPM-User

The following databases are availbale in OPM 2.1

  • information_schema
  • netapp_model
  • netapp_model_view
  • netapp_performance
  • opm

Out of the above, the two databases that have more relevant information are “netapp_model_view” and “netapp_performance”Database “netapp_model_view” has tables that define the objects and relationships among the objects for which performance data is collected, such as aggregates, SVMs, clusters, volumes, etc.  Database netapp_performance has tables which contain the raw data collected as well as periodic rollups used to quickly generate the graphs OPM presents through its GUI.

Refer to MySQL function in my previous post on Querying OCUM Database using Powershell to connect to OPM database.

Understanding Database

OPM assigns each object (node, cluster, lif, port, aggregate, volumes etc.) a unique id. These id’s are independent of id’s in OCUM database. Theser id’s are stored in tables in “netapp_model_view” database. You can perform join on various tables through the object id’s.

Actual performance data is collected and stored in tables in “netapp_performance” database. All table have a suffix “sample_”. Each table row contains OPM object id for the object (node, cluster, lif, port, aggregate, volumes etc.), the timestamp of the collection and the raw data.

Few useful Database queries

Below example queries database to retrieve performance counter of a node.

Connect to “netapp_model_view” database and list the objid and name from table nodes

"MySQL -Query ""select objid,name from node"" | Format-Table -AutoSize"

Connect to “netapp_performance” database and export cpuBusy, cifsOps, avgLatency from table node

"MySQL -Query ""select objid,Date_Format(FROM_UNIXTIME(time/1000), '%Y:%m:%d %H:%i') AS Time,cpuBusy,cifsOps,avgLatency from sample_node where objid=2"" | Export-Csv -Path E:\snowy-01.csv -NoTypeInformation"

Integrate Oncommand Performance Manager with Oncommand Unified Manager

Oncommand Performance Manager is the Performance Management component of Unified Manager. Both these products are self contained, however, they can be integrated so that all performance events can be viewed from Unified Manager Dashboard.

In this post we deploy a new instance of Performance Manager vApp on an ESXi host and integrate with a running instance of Unified Manager.

Software components used:

  • Oncommand Unified Manager Version 6.2P1
  • Oncommand Performance Manager Version 2.0.0RC1

Setup Performance Manager

Import OnCommandPerformanceManager-netapp-2.0.0RC1.ova file

Deploy_OPM_OVA

After the ova file is imported you may face issues powering on.

Poweron_issues_OVA

As the vApp has CPU and Memory reservations set. In a lab environment with limited resources we can remove the reservations to boot up the vApp.

OVA_Booting

After the vApp boots up, you need to Install VMware tools to proceed further.

As the installation progress, the setup wizard automatically runs to configure TimeZone, Networking (static/dynamic), Creation of Maintenance user, generate SSL certificate and starts Performance Manager services. Once complete, log in to Performance Manager console and verify settings (Network, DNS, timezone)

OPM_ConsoleNow login to Performance manager Web UI and complete the setup wizard. Do not enable Autosupport for vApp deployed in the Lab environment.

Login_OPMOpen Administration Tab (Top right corner)OPM_Administration

Setup Connection with Unified Manager

To view Performance events in Unified Manager Dashboard a connection between Performance Manager and Unified Manager must be made.

Setting up a Connection includes creating a specialized Events Publisher user in the Unified Manager web UI and enabling the Unified Manager server connection in the maintenance console of the Performance Manager server.

  • Click Administration -> Manage Users
  • In the Manage Users page, click Add
  • In the Add User dialog box, select Local User for type and Event Publisher for role and enter the other required information
  • click Add

eventpublisher_user_OUM

Connect Performance Manager to Unified Manager from vApp console

OPM_Connection_screen1

OPM_Connection_Registered

This completes the Integration part with Unified Manager. You can integrate multiple Performance Manager Servers with a single Unified Manager Server. When Performance Manager generates performance events they pass on to the Unified Manager server and are viewed on the Unified Manager Dashboard. So the admin keeps monitoring one window instead of logging in to multiple Performance Manager web UI’s.