*** ARCHITECTURE ***
We can simplify it a bit to three layers. From bottom to top:

  •  At the base of it all you have front end connectivity: Network Services with NICs for eth and Fibre Channel, TCP/IP offload engines etc. This layer can be subdivided into two groups: file semantics (NFS, CIFS, HTTP, DAFS), and block/LUN semantics (FCP, iSCSI).
  • Above we have the OS and its main elements: WAFL with NVRAM and snapshots. WAFL is Write Anywhere File Layout. Data written to many disks faster and more reliably. Non-Volatile RAM caches trasactions in a battery backed up memory before data is written to disk = cache memory. And finally snapshots: a fast and simple file backup method. Online without interruption when taken. Up to 255 per volume.
  • The last thing, on the back end, is what connects to the physical disk shelves: RAID manager: talks to the raw storage over fibre channel. RAID 4DP stores data on data disks and parity on two parity disks. "Faster than other RAIDs because data is stored in NVRAM before going to disk.

More on WAFL:

  • WAFL is Write Anywhere File Layout. Technically, not actually a filesystem, but a layer that supports multiple filesystems - the "top half." This part also manages "permissions," so who can do what, who created, who can view. The "bottom half" is physical disk management - which disks are part of which RAID, arrange data for max read and write perf, snapshots, remote mirrors, cloning, de-duplication, thin provisioning, and so on.
  • WAFL is a block-based file system that uses inodes to describe files. It uses 4 KB blocks with no fragments. Each WAFL inode contains 16 block pointers to indicate which blocks belong to the file. For very small files, data is stored in the inode itself in place of the block pointers. Inodes for files smaller than 64KB use the 16 block pointers to point to data blocks.
  • WAFL stores meta-data in files. WAFL's three meta-data files are the inode file, which contains the inodes for the file system, the block-map file, which identifies free blocks, and the inode-map file, which identifies free inodes. Keeping meta-data in files allows WAFL to write meta-data blocks anywhere on disk. This is the origin of the name WAFL.
  • Hmmm... The write-anywhere design allows WAFL to operate efficiently with RAID by scheduling multiple writes to the same RAID stripe whenever possible to avoid the 4-to-1 write penalty that RAID incurs when it updates just one block in a stripe.  

REF: http://media.netapp.com/documents/wp_3002.pdf

  • WAFL doesn't have consistency checks after a crash, so the whole system boots faster. From Ontap 8.0 it supports transparent compression, but no encryption (3rd party Decru DataFort does that). wow: Snapshot performance is important because WAFL creates a Snapshot every few seconds to allow quick recovery after unclean system shutdowns.  
  • Snapshots are the most salient feature. They are RO by definition and FlexClones from 7G release can do RW.
  • Another important feature is that it supports both unix and win directory models (and perms and ACLs). AND it can even support both on a per-file basis. Also, files can be accessible via both NFS and CIFS at the same time!
  • Automatically fragments data using "temporal locality" and writes metadata together with user data. Temporal locality, an example of locality of reference, is related to the same data being frequently accessed. Two types, spatial and temporal. The first is location close to each other and the second is data accessed within small time durations. So WAFL stores frequently accessed data close on disk?
  • Wow, from 7G you can run a command called "reallocate" to defragment. Can now be scheduled. Before it was "wafl scan reallocate" and only in advanced priv mode.
  • Data written to many disks faster and more reliably. Non-Volatile RAM caches trasactions in a battery backed up memory before data is written to disk = cache memory. And finally snapshots: a fast and simple file backup method. Online without interruption when taken. Up to 255 per volume.


*** INTRO ***

System status commands:
* sysconfig -v <- show hardware. Shows ONTAP version, system ID for support, Serial, hardware info including slots and what's in them (motherboard, network cards, Remote Mgmt Controller, FC Host Adapter with the list of ports hooked up to loops, and the list of disks available via each ports on FC with disk raw sizes listed). One of the slots should also have the NVRAM card and show how much cache memory you have.
* sysconfig -r <- this one will show you higher level info such as volumes and what raid groups and plexes they're on. You'll also see RAID structure, that is what drive is data and what parity. Drive IDs are host_adapter.position, e.g. 4a.16. It will also show shelf number and position in it! neat.
===== SYSCONFIG-R =====
Aggregate hr_aggr (online, raid_dp) (block checksums)
  Plex /hr_aggr/plex0 (online, normal, active)
    RAID group /hr_aggr/plex0/rg0 (normal)
      RAID Disk   Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------   ------      ------------- ---- ---- ---- ----- --------------    --------------
      dparity     3a.20       3a    1   4   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
      parity      3a.33    3a    2   1   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
      data        3a.22   3a    1   6   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
      data        3a.34   3a    2   2   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
      data        3a.23   3a    1   7   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
Volume vol0 (online, raid4) (block checksums)
  Plex /vol0/plex0 (online, normal, active)
    RAID group /vol0/plex0/rg0 (normal)
      RAID Disk   Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------   ------      ------------- ---- ---- ---- ----- --------------    --------------
      parity      3a.16    3a    1   0   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
      data        3a.17   3a    1   1   FC:B   -  FCAL 10000 34000/69632000    34276/70197544
Spare disks
(...)
NOTES: notice that vol0 is on a dedicated two disk RAID4 aggregate (Sporting Index!).
Finally you have spare disks and broken disks listed.
BTW vol0 is system, you know that.
is this one the same as sysconfig -r? "vol status -r"
sysconfig -c <- check if all hardware on good version for the Data ONTAP:
===== SYSCONFIG-C =====
sysconfig: There are no configuration errors.
===== SYSCONFIG-A =====
      NetApp Release 7.0RC3: Wed Nov 17 23:47:10 PST 2004
      System ID: 0101164287 (troy)
      System Serial Number: 110242511 (troy)
      System Rev: A0
      Backplane Part Number: 104-00009
      Backplane Rev: A0
      Backplane Serial Number: 110242511
      slot 0: System Board 2782 MHz (NetApp System Board VIII B0)
                Model Name:         FAS980
                Part Number:        110-00024
                Revision:           B0
                Serial Number:      G7GLH2AE160026
                Firmware release:   4.2.3_i2
                BMC info:           IPMI v1.5, BMC rev 0.20, SDR Version 00.20
                Processors:         4
                Processor ID:       0xf25
                Microcode Version:  0x11
                Memory Size:        8192 MB
      slot 0: 10/100 Ethernet Controller IV
            e0 MAC Address:     00:a0:98:01:38:de (100tx-fd-up)
                Device Type:        Rev 2
                        memory mapped I/O base 0xa0000000, size 0x20000
                        memory mapped I/O base 0xa0020000, size 0x20000
                        I/O base 0xffe0, size 0x20
(...)
      slot 3: FC Host Adapter 3a (Dual-channel, QLogic 2312 (2342) rev. 2, 64-bit, L-port, )
            Firmware rev:     3.2.22
            Host Loop Id:     7     FC Node Name:     2:000:00e08b:121a1a
            Cacheline size:   16    FC Packet size:   2048
            SRAM parity:      Yes   External GBIC:    No
            Link Data Rate:   2 Gbit
                 16: NETAPP   X270_SCHT6036F10 NA05  34.0GB 520B/sect (3JA72TE6000074297C3J)
                 17: NETAPP   X270_SCHT6036F10 NA05  34.0GB 520B/sect (3JA73557000074294NGN)
                 18: NETAPP   X270_SCHT6036F10 NA05  34.0GB 520B/sect (3JA72V8S00007429R0D2)
(...)
===== SYSCONFIG-D =====
Device          HA    SHELF BAY CHAN    Disk Vital Product Information
----------      --------------- -----   ------------------------------
3a.16           3a    1   0     FC:B    3JA72TE6000074297C3J
3a.17           3a    1   1     FC:B    3JA73557000074294NGN
3a.18           3a    1   2     FC:B    3JA72V8S00007429R0D2
(...)


Configuring the box: Three ways to configure the device.

  1. Options configured with the "options" command are auto added to registry and in most cases persistent! registry is in /etc/registry.default, and also /etc/registry.lastgood, /etc/registry.bck. NOTE: one not persistent registry option is autosupport.doit, triggers autosupport test email and report.
  2. The second way to set options is using the "volume options" command. You use it by typing e.g. "vol options vol_name option".
  3. Another set of options are configured by editing files: /etc/rc, /etc/hosts.equiv, /etc/dgateways, /etc/hosts. Config files are only executed at boot! The most important is the RC file: configuration file /etc/rc: set network interfaces, nfs mount exports, other startup commands. You should back it up manually by copying. I only used CIFS ETC$ share, but you can also NFS mount to edit /etc/, if you have the NFS license. If you mess up the file and can't access it to fix, use commands ifconfig, nfsexport.


*** ADMINISTRATION ***
Some main ways to connect and notes on configuration below.
* Telnet: 

    options telnet.enable.enable on
    options telnet.access [*|-|hostname|IP]
    options autologout.telnet.enable [on]
    ...
to log out from a telnet session use Ctrl-D or type "logout telnet"
* RSH: 
    options rsh.enable on
    options rsh.access all/none/*/host=ccitlucas
* FilerView: the web interface. Allows only one session from the web interface.
* Console: Console work: gives you history. Some useful ones: 
Ctrl-E cursor to end of line
Ctrl-A to beginning
Ctrl-K del all to the right of the cursor
Ctrl-U del the whole line so that you don't have to hold backspace
Ctrl-P page up(?)

Admin users: 
you can create multiple admin users. Commands:
    * useradmin useradd myusername
    * useradmin userdel myusername
    * useradmin userlist <- show all admins
    * passwd <- wow, good old passwd command, only for the current user. for others specify pass when adding. 
NOTE: I have a CIFS license and use domain users for administration.

Trusted hosts: List hosts that can access the filer. If a hacker gets the username/pass they won't be able to log in.
    * options trusted.hosts list_of_up_to_five_with_commas|*|-    <- "-" means no hosts

Admin host: Has access to the root volume over NFS and CIFS.

Autosupport: emails NetApp tech support. "options autosupport.enable on" Needs a mailhost with SMTP but the admin host is the default one. They are priority tagged. EMERGENCY, ALERT, ...etc Auto triggered by a number of events, such as failed disk, overheating, reboot, cluster takover of partner etc.
Includes output from a bunch of commands, including options, all sysinfo's I've mentioned above.
NOTE: includes /etc/serialnum <- five digit serial
    * options autosupport.enable on, .mail1-5 (SMTP relays hosts),

*** MANAGING DISKS ***
RAID groups: DP is dual parity. minumum 3 drives. Group 2-28 disks. By grouping too many you have a higher chance of double disk failure.

Disk failed, to find out which one use -> sysconfig -r 
Host spared are not assigned to any RAID group (global).
If a disk fails the system will operate in degraded mode for only 24h by default. Change:
    * options raid.timeout X
Double disk failure=data loss. I guess it doesn't help that you have two parity disks, because they are just spares of each other.
Disk sizes: you can mix, but performance degrading. Parity and hotspare disks have to be as large as the largest disk in the RAID volume. 

RAID options:
    * options raid.timeout <- the important shutdown interval
    * options raid.reconstruct.perf_impact <- how much resources will be used to rebuild, also "reconstruct_speed"
    * options raid.scrub.enable <- turn on to have the dev scrub drives at night at 1AM, also "scrub.perf_imact" similar to speed.
Scrubbing is looking for errors. If found fixed. "disk scrub start/stop" to operate.
    * options raid.default_raidtype <- when creating new volumes
    
Disk management options:
    * disk fail disk_name <- force failure
    * disk remove disk_name <- remove a SPARE
    * disk swap/unswap <- quiet bus for a disk swap. I wonder if this stops everything on that shelf?
    * disk sanitize <- wow, a built-in shredder!
    * disk zero spares <- write with 0000000s
    * disk replace start/stop <- ?

What is an unqualified disk? NetApp makes you buy from them. If you put an unqualified disk you'll get a delayed forced shutdown in 72 hours. Important piece of information, isn't it?
Try man qual_devices. There is a file on the filer that contains some qualification info. If you manually modify it you'll halt the filer! /etc/qual_devices file has to be updated if you put an unqualified drive in and you can get it from NetApp's NOW website.

Disk numbers: path_id.device_id. The first one is SCSI adapter number or FC host adapter name. The latter used to be the good old SCSI number. Two important commands for finding disks and IDs are "sysconfig -r" and "vol status -r"
In 4a.16 we can also know that it's shelf number 4, in the fourth slot on the PCI(?) bus.

Aggregates: in older versions you would have /vol0/plex0 and under it RAID groups 0, 1, 2, however many. Now you have aggregates and flexible volumes. /flexvol1/plex0, /flexvol2/plex0 an under it a one big RAID group.


*** MANAGING VOLUMES ***
Volume is a filesystem. Max 200 volumes. Traditional = one to one aggregate pairing. I guess to grow a volume you had to add disks to it! Shrinking was hard, if at all possible.
Flex volume = loosely coupled to the underlying aggregate. Can be increased and decreased.

Benefits of having many smaller volumes: you get faster backups and restores, individual per-volume options. Also, smaller volumes can be taken offline for maintenance without affecting other volumes.

You can have "foreign volumes" if you hook up a volume to another netapp.

vol0 is called root volume. There is also /vol with /vol/users/lucas. /vol is a virtual root path. /vol/vol0 = /vol0
Update: during one of my interviews we talked about having vol0 on a dedicated RAID/disks, instead of on a shared aggregate with production data. Yep, best practice to have a RAID4 with two disks, one parity and one data, holding vol0.

Data reliability: two ways are RAID-level checksums, and Multipath I/O that eliminates single point of failure.

MAX #of files: Each volume has a max numer of files. If you add a hard drive with more space this number will increase. This limit is because they don't want the inode file for a volume to grow more than necessary. You can NEVER reduce this number!
* df -i /vol/home <- will give you the %
===== DF-I =====
Filesystem               iused      ifree  %iused  Mounted on
/vol/vol0/                7706    1025266      1%  /vol/vol0/
/vol/hr_flex/               99      33677      0%  /vol/hr_flex/
* maxfiles home 2500000
Update: remember frome ext2/3/4, each file and directory is an inode. Inodes point at data blocks.

Aggregates: new. commands similar to vol commands. What are actually aggregates?
* aggr create aggr_name [options] disk_list <- notice you build from individual drives, not from plexes or RAID groups.
* aggr add aggr_name disks <- add drives to the aggregate
* aggr status aggr_name
* aggr rename aggr_name new_name <- wow, you can do that when in use?
* aggr options aggr_name option value <- ? what are they?
* aggr offline/online/destroy <- you have to take offline before you can destroy. Magneto will remind you :)

Volume commands: same as aggr commands above! What's weird is that in vol create you can also set a disk list (WTF?)
* vol status volume <- an important one. shows size, options, etc.
* vol restrict volume <- ? hmm... one that's not in aggr
FLEX commands:
* vol create flex_vol AGGR_NAME size <- aaaa... so aggregates do decouple volumes from actual physical disks
* vol size flex_vol +/- size

Scrubbing: you can scrub vols and aggrs. It means comparing what's on the parity disk with the data.

Storage health monitor: a daemon that runs. Looks for pending disk problems before they come up (S.M.A.R.T?) SNMP, autosupport, syslog delivery. 3 levels: URGENT, NON-URGENT, INFORMATIONAL.

*** HOSTNAME, DNS, NETWORKING ***
/etc/hosts, DNS, NIS, search order set in /etc/nsswitch.conf.

/etc/hosts:
You can edit using built in editor: WOW a built in EDITOR!!! You can also edit in FilerView in Network->Manage Host File.
* rdfile /etc/hosts
127.0.0.1 localhost
[leave one blank line at the end]

NIS: connects to the NIS server and gets hosts and usernames/pass. Old tech... unix or Novell?
nis commands: "nis info, options nis., ypgroup, ypwhich.
also conf in FilerView->Network->Manage DNS and NIS.

DNS:
* options dns.domainname name
^- wow, the above will not be permanent unless you set in /etc/rc!! also servers in /etc/resolv.conf
* options dns.enable on|off <- also not permanent!
* dns info <- resolver status and state of DNS servers.

Routing:
* netstat -r <- show the routing table
* route add|delete ...
* route -s <- same as netstat -r?
Default gw set in /etc/rc. You can edit using some magical command "wrfile /etc/rc" !!!
routed daemon uses RIP to find routers.
* routed status

NICS: names e0, e1, etc. unless you have a multi port card (e0a, e0b, e0c, etc.) FDDI uses f0, f1, etc.
ifconfig options: IP, netmask, media, spd, MTU, flow control, up|down. CHANGES NOT PERMANENT until in /etc/rc.
Standard MTU for ethernet is 1500. Jumbo frames are 9000! six times bigger frames.
* ifconfig e0 mtusize 9000
When options changed in FilerView they go permanent!

VIFS: virtual interfaces and trunks. VIF is a group of interfaces, up to four for throughput and fault tolerance. AKA trunks.
Two modes are single and multiple. In single you have one active and one standby - failover. In multi trunking you can have up to four active and sending data ones. Multi are load balanced IP based, MAC based, round robin (may give out of order delivery).
Once created, a VIF is managed using ifconfig.
* vif create single|multiple vif_name b rr|mac|ip list_of_nics
* vif delete <-removes nics|destroy <- deletes the whole vif|add interface
* vif status vif_name
* vif stat vif_name
In FilerView Network-> Add VIF

VLANS: logically segmented LAN for a department, floor, etc. Easier administration - you can move a user in the building and assign a new outlet easily. Smaller broadcast domains reduce traffic. More secure if no routing between VLANS.
You can have people from engineering and marketing depts sitting next to each other and they won't be able to connect to each other's PCs!
You "become" a member of a VLAN. Don't use VLAN id 1.
Wow, GDRP is some protocol that lets workstation register dynamically (no static list of port<->VLAN membership).
* vlan create on|off e4 2 3 4 <- creates e4-2, e4-3, e4-4
* vlan modify -g off e4 <- remove VLAN conf from interface e4
* vlan delete -q e4 4 <- deletes e4-4
* vlan add e4 4
* vlan stat e4 4 <- show stats

SNMP: you can query the agent on magneto to get info. Also traps can generate info and send. Custom MIB for netapp on NOW.
* snmp community add ro NAME <- sets up a RO community.
SNMP port is 161 TCP and UDP.

IPSEC: Internet Protocol Security gives you encryption transparent to apps. Data privacy, security, and integrity. Two modes tunnel and transport. NetApp only does transport mode for end to end connectivity. Two sub prots ESP encrypts and auths data and AH authenticates data <- with DES, 3DES, SHA, pre shared keys, IKE and diffe-helman.
No license required! Use options command to enable and permanent. Only pre-shared keys :( so only in a corporate lan.
No setup in FilerView.
* ipsec policy add -s srcIP -t dstIP -p esp|ah|none -e des|3des|null -a sha1|md5|null -d in|out
* ipsec policy show
* ipsec policy delete all
SA commands: once a policy is in place and some traffic comes that matches the policy criteria, a security association will be formed.
* ipsec sa show
* ipsec sa delete all
* options ip.ipsec.enable on|off

*** NFS ***
EXPORTS in NFS: Two ways to export: in /etc/exports persistent but have to be refreshed using "exportfs -a" command, and
for temporary use exportfs command.
* /etc/exports format:
Full path including /vol, so
/vol/vol0/pubs -rw=host1:host2, root=host1 <- commas separate options.
/vol/vol1 -rw=host2 <- obvious for host2, all other hosts ro.
/vol/vol0/home <- without the -rw|ro options all hosts can mount RW!!! wow, pretty premissive.
NOTES: you may also be able to use -host1 to exclude a host. Also, as soon as you specify -rw= you exclude all not listed hosts.
You can also export to networks/netmask and NETGROUPS /etc/netgroup

/vol/vol1 -rw=adminhost
/vol/vol1 -ro <- RO for all
/vol/vol1 -access=host1:host2 <- only two and RW
/vol/vol1 -access=host1:host2,ro <- only two hosts can mount at all and only RO
/vol/vol1/pubs -rw=host45 <- even if vol1 is exported for admin host only, this subfolder will only be exported to host45. The longer match rule.
NOTES: is that true? does adminhost have access to both, or all but pubs? My guess is that pubs does not give access to adminhost, but only to that one host45.

* exportfs command:
exportfs -i -o rw=host2 /vol/vol1/pubs/sales <- I don't know the -i, but -o lets you add that option rw=... Maybe -i is for insert?
exportfs -u /vol/vol1 <- unexports a volume. -u for unexport.
exportfs -ua <- ALERT, unexport all!
NOTES: -root=host1 gives root access... does that mean user with ID 0 can mount and cannot by default? Yes, root priviledges.

Wierd:
exportfs -i -o rw=host2:host3 /vol/volnew/sales/january <- supposedly this gives all other hosts RO to that directory!!

All this in FilerView is in NFS->Manage Exports.
UPDATE: I get it... the -access option limits access to a list of hosts. Only those listed after access can mount. If you have -rw only all others will have RO by default. I think if you only do -ro= you won't be changing much, because by default all have RO.

Update:
what is the most commonly used nfsstat command option for NFS performance troubleshooting? -d
21:00:00:2b:34:26:a6:54 is an example of a WWPN
Minimum number of paths that you can actively load balance across using ONTAP DSM to a single LUN in windows? 2
Which cfmode has a port on the target HBA that only becomes active when a filover occurs? standby
Aggregates may be made of manu RAID groups AND cannot be reduced!
iSCSI is used in IP SAN environments.
Two commands for getting perf information? nfsstat and sysstat. <- sysstat is covered much later. important command.
Three iSCSI Ethernet topologies: Direct Attach, Dedicated Ethernet, Shared Ethernet.

nfsstat:
NFS troubleshooting: use nfsstat -l <- usage per NFS clients mounting exports. Run periodically nfsstat -l to get baseline and perf deviations. If you see one client issuing 99% of operations, with the rest evenly distributed among the hosts, you may want to investigate.

*** CIFS ***
command "cifs setup" starts it all. Command "cifs terminate" stops all sessions and closes open files. You can also specify a single host to disconnect users:
* cifs terminate -t 10 chopinlaptop <- gives Lucas 10 minutes and a warning.
* cifs restart <- connect to the DC and restart CIFS session(?)
showing and managing sessions:
* cifs sessions user|machine|IP <- there is also -s, which I guess shows security information
Can also be done in FileView using CIFS session report.

Managing shares:
* cifs shares <- list shares including ETC$, HOME, C$.
* cifs shares sharename <- list ACLs for the share
* cifs shares -add name /vol/vol1/share -comment 'new share' -maxusers 30
* cifs shares -change share -nomaxusers <- modify share settings
* cifs shares -delete webfinal <- remove
NOTES: shares not case sensitive.

To modify rights use:
* cifs access share_name user|group rights <- rights are: no access, read, change, and full control
* cifs access -delete share [user|group] <- remove user or group from the ACL

*** QTREES ***
It's a subdirectory used to manage options. Volumes can contain qtrees. Groups files by security style, quota, oplock setting, quota limit, backup units. You can back up individual qtrees, instead of the whole volume.
Max 254 per volume. You cannot move files in or out of qtrees just like you can't in and out of partitions, but you can copy and delete from the source.
* qtree status <- show all options for all qtrees in the system
* qtree create /vol/onbase/images <- new one
* qtree security [path unix|ntfs|mixed] <- set security type
* qtree oplocks /vol/onbase/images enable|disable
The three types of user name mapping: there is a file /etc/usermap.cfg that specifies pairs. /etc/usermap.cfg: maps NT and unix account names. Processed sequentially and first match done. The process goes like this: first domain user is authenticated as: rejected, guest access accepted, or accepted as that user. If the last one we continue down the list. Now that domain user is looked up in the map file and if matched with a unix user accesses with that username. If not matched it's mapped to user wafl_default_unix_user.
Format of usermap.cfg:
IP_address_or_network_with_mask_for_match: NT-domain\NTuser direction IP-qualifier: UnixUser <- here direction is Nt2unix etc.
If domain unsername has spaces, enclose in "". If domain user "" on the destination side, it means that unix user denied access.
Examples:
*\root => "" <-means root user in all domains not allowed access to UNIX (security)
"John Doe" == johnd <-simple one to one mapping
marketing\Roy => nobody <-map to annon
You can use * to map, but there are some tricks.

*** SAN ***
Two protocols used in SAN are: Fiber Channel Protocol(fiber channel only) and SCSI. SAN is a network that transfers data between computers and storage systems. In FCP there are two ways to attach: direct attached, fabric attached(single and multipath connection). In direct there are no switches, just point-to-point NetApp to server.

Creating LUNs: host are initiators and storage are targets. Targets store LUNs.
Guidelines for using LUNs:
* create volumes that are 2x size of LUNs if you'll use lun snapshots.
* volumes only for LUNs, no other files.
* CIFS accessible volumes can't have LUNs
* enable space reservation for the volume
* disable scheduled snapshots (snapshot schedule = off)
* enable or disable create_ucode option (create_ucode = off, convert_ucode = off)
* create qtree for grouping LUNs of the same type.
Attributes of a LUN: OS of the host, size, max 2TB(old), name of igroup, protocol of host iscsi/fcp, port name (wwpn), LUN ID number used by the igroup to access the LUN.

Initiators are grouped in igroups. There you specify what IQNs and what LUNs, with what IDs will be visible to the initiators.

Using SnapDrive in Windows: It integrates with MMC. I guess this tool lets you take a snapshot and make it into a drive in windows (so map the LUN).

WWNN = World Wide Node Name. On the storage side.
WWPN = World Wide Port Name. On the HBA. With multi-pathing and two port HBAs you'll have two.

*** SNAPSHOTS ***
Read only space efficient copy. Don't contain the actual data. There are even snapshots of aggregates (5% by default reserved), primary function of which is support of the SnapShot technology and those are managed by the device itself. You don't do anything with aggregate snapshots.
Snapshots are run according to schedule. They start to take space when changes are made. The changed block becomes the current filesystem's state, and the old one is saved in the snapshot.

Global snapshot commands:
* vol options volumename nosnap on <- you can manually create but no auto creation
* vol options volumename nosnapdir on <- hide the .snapshot(NFS) or ~snapshot(CIFS) directory.
* options cifs.show_snapshot off <- hide for CIFS clients. Both have to be set for OK.

* snap create volumename snapname
* snap list [volumename]
* snap delete [volumename snapname] <- -a deletes all in a volume
* snap rename ...
* snap reserve volume percentage_value
* snap sched volumename 0 2 6@8,12,16,20 <- numbers mean the number of snaps kept in each type. 0=no weekly, 2=daily(so states at two days), 6=hourly kept. When limit reached, oldest deleted. 6,12,16,20 are hours.
Default is every hour. hourly.0 is the latest of the six we'll keep


*** Checking Health and Performance ***

Latency - wasted time it takes for one component waiting for another. 

Throughput - amnt of data transfered per time, aka data transfer rate.

 * sysconfig: check hardware status. Is a shelf down? Is all hardware showing? -a is I/O device info. -c checks system conf. -d shows the disks on the system. -t shows tapes. -v is the good old show all info. 

 * sysstat 1: refresh every second(def 15sec). One line per second here. CPU usage %. Important things are disk writes - every ten seconds the filer has to write nvram cached data to disk(10sec consistency point), but if cache too small writes will be happening more often. Cache age column shows how old the oldest data in the cache is. The higher the better, as it means the cache is holding a lot of old data, so not overfilling.

 * FilerView->Show System Status. 

 

There is a bunch of options that affect CPU load:

 * One is "options raid.reconstruct.perf_impact medium|low|high" 

 * Scrubbing can affect performance too (Sun night). To disable run raid.scrub.enable on|off. 

 * vol.copy.throttle 10

 * wafl.maxdirsize or vol options vol_name maxdirsize


Troubleshooting a shelf:

 * shelfcheck: confirm connection to shelf OK. HBA OK? asks if LEDs are on or off.

 * led_on 7b.16: turn on LED light on disk. To use you need the advanced mode! "priv set advanced" and when done "priv set admin"

You can check the logfile to see if there were any problems: rdfile /etc/messages.

 * scsi test: scsitest dev_name <- queries drives about status.


Try sysconfig -r to see if a raid spans controllers -> that setup causes 10% lower write speed.


Other:

 * disk unfail 3.12

 * disk shm_stats <- storage health monitor on the disk


CIFS: cifs stat <- breakdown of all types of CIFS operations.

===== CIFS STAT =====

              reject       24  1%

               mkdir        0  0%

               rmdir        0  0%

                open        0  0%

              create        0  0%

               close      193 10%

             X&close        0  0%

               flush        0  0%

(...)

nbtsat <- if you're having issues with name resolution. cifs testdc <- test DCs :)

In advanced mode you can use "smb_hist"


*** Reboots ***

Warm or cold. The latter takes much longer and happens for example when panic occurs during the boot process, system halt command is ran, or instruction fault happens.

Warm reboot skipps RAM test and zeroing, kernel reloading from disk, LCD test, Serial I/O tests.

 * reboot -d REASON <- dump core, -t 3 <- wait three minutes.