Networked
Storage Glossary
General
Networked Storage Systems
Appliances - A storage platform
designed to perform a specific task, such as NAS,
routers, virtualization, etc.
Auto Loaders
- A single tape drive with robotics and multiple
tape cartridges.
Libraries
- A large-scale tape device with robotics that
can house multiple tape drives and a significant
amount of tape cartridges.
Modular
- Building block system where controller/host
connection function is physically separate from
storage, and storage and/or controller function
may be added independently of each other.
Monolithic
- Single box system containing all control and
storage in a single system, where the storage
is integrated with the other components and cannot
be segregated.
NAS/File
- Storage system that connects to a network (typically
Ethernet) that presents storage as network volume
shares (CIFS) or NFS mounted devices.
Purpose-built
- Storage device designed for a specific purpose
or application that includes specialized hardware/software
to provide that function.
Solid State
- RAM (memory)-based disk device.
Utility - Storage capable of
providing quality of service, carrier class availability
and serviceability, on-line expansion or reallocation
of resources, typically used as a back-end to
an IT profit center.
Virtual Tape - A system that
presents itself as a tape device (drive, autoloader,
or library) that actually contains disk drives
and memory (and also tape devices possibly), in
order to provide performance improvement, and
possibly increased connectivity to multiple backup
software environments concurrently.
RAID Storage Systems
RAID (Redundant
Array of Independent Disks) - A disk
subsystem that is used to increase performance
or provide fault tolerance. RAID can also be set
up to provide both functions at the same time.
RAID is a set of two or more ordinary hard disks
and a specialized disk controller that contains
the RAID functionality. Developed initially for
servers and stand-alone disk storage systems,
RAID is increasingly popular in desktop PCs. RAID
can also be implemented via software only, but
with less performance, especially when rebuilding
data after a failure.
RAID improves performance
by disk striping, which interleaves bytes or groups
of bytes across multiple drives, so more than
one disk is reading and writing simultaneously.
Fault tolerance is achieved by mirroring or parity.
Mirroring is 100% duplication of the data on two
drives (RAID 1), and parity is used (RAID 3 and
5) to calculate the data in two drives and store
the results on a third: a bit from drive 1 is
XOR'd with a bit from drive 2, and the result
bit is stored on drive 3 (see OR for an explanation
of XOR). A failed drive can be hot swapped with
a new one, and the RAID controller automatically
rebuilds the lost data. In addition, RAID systems
may be built using a spare drive (hot spare) ready
and waiting to be the replacement for a drive
that fails.
RAID systems come in all
sizes from desktop units to floor-standing models.
Any desktop PC can be turned into a RAID system
by adding a RAID controller board and the appropriate
number of IDE or SCSI disks. Stand-alone units
may also include large amounts of cache as well
as redundant power supplies.
In the late 1980s, RAID used
to mean an array of "inexpensive" disks,
being compared to large computer disks or SLEDs
(Single Large Expensive Disk). As hard disks became
cheaper, the RAID Advisory Board changed the name
to mean "independent." See SAN and sector
sparing.
RAID LEVEL 0
Level 0 is disk striping only, which interleaves
data across multiple disks for better performance.
It does not provide safeguards against failure.
RAID LEVEL 1
Uses disk mirroring, which provides 100% duplication
of data. Offers highest reliability, but doubles
storage cost.
RAID LEVEL 2
Bits (rather than bytes or groups of bytes) are
interleaved across multiple disks. The Connection
Machine used this technique, but this is a rare
method.
RAID LEVEL 3
Data are striped across three or more drives.
Used to achieve the highest data transfer, because
all drives operate in parallel. Parity bits are
stored on separate, dedicated drives.
RAID LEVEL 4
Similar to Level 3, but manages disks independently
rather than in unison. Not often used.
RAID LEVEL 5
Most widely used. Data are striped across three
or more drives for performance, and parity bits
are used for fault tolerance. The parity bits
from two drives are stored on a third drive.
RAID LEVEL 6
Highest reliability, but not widely used. Similar
to RAID 5, but does two different parity computations
or the same computation on overlapping subsets
of the data.
RAID LEVEL 10
A combination of RAID 1 and RAID 0 combined. Raid
0 is used for performance, and RAID 1 is used
for fault tolerance.
Storage Software
Archive - Software or appliance
technologies designed to migrate fixed content
data from primary storage to secondary storage,
stored offline or online, to a form of transportable
media to put away for long term data retrieval.
ARM: (Automated Resource
Management) - Storage software solutions
which combine elements of SNM, SRM and Policy
Management in order to provide user-defined SLA
enforcement and auto-provisioning of capacity.
Backup - Software or appliance
technologies designed to enable the migration
of data from primary storage to secondary storage,
in order to enable restoration in of primary data
in the event of a system failure.
Cluster - Technology used to
ensure the greatest available uptime of any system.
HA Technology includes redundant hardware and
intelligence that allows hardware and applications
to “fail over”, restart and maintain
proper business functions.
Data Management - A category
of storage management software designed to migrate
data from primary storage to secondary storage
in order to protect primary data sets or to create
a replica copy of primary data.
File System - A system that organizes
files and metadata in a hierarchical structure
and is responsible for managing the physical placement
of blocks that make up the files.
HSM/ADM: (Hierarchial Storage
Management/Automated Data Management) - Solutions
that migrate data within the storage environment
according to user-defined policies.
Policy Management - Enables users
to set specific policies about how storage is
implemented and used. Solutions containing policy
management will automatically enforce those policies
and take specific actions based on user-defined
conditions.
Replication - Software or an
appliance used to make an exact copy of data from
one location to a secondary location(s).
Security - Hardware and/or software
solutions that are designed specifically to protect
storage assets and resident data from unauthorized
access.
SRM: (Storage Resource
Management) - Storage software solutions
that discover, assess, and report the usage patterns
of storage systems.
SNM: (Storage Network Management) -
Storage software solutions that discover, monitor,
manage, and display the physical elements that
make up the storage network infrastructure.
Virtualization - Storage software
solutions that abstract the physical and logical
storage assets from the host systems.
Storage
Networking
Appliances - An integrated system
made up of purpose-built software and hardware
designed for a specific task (or set of tasks)
and sold as a turnkey solution. The concept is
to simplify installation and management, and deliver
high performance.
Directors -
High-end version of storage switches (FC, ICON,
SCON) that are designed specifically with 32+
ports and are highly available. These usually
have redundant switch cores and are based upon
a bladed architecture.
Intelligent Switching
Platforms - Switches which provide a
platform for intelligent storage services to reside
on, which makes the storage intelligence, such
as replication, volume management, virtualization,
and file serving, part of the switching infrastructure.
These switches are often multiprotocol (FC, IP,
IB).
Routers/Gateways
- A router is a device or, in some cases, software
in a computer, that determines the next network
point to which a packet should be forwarded toward
its destination. A gateway is a network point
that acts as an entrance to another network.
Switches
- Devices that reside in the network and direct
traffic based on source and destination addresses.
Most common examples are Fibre Channel or Ethernet
switches.
WAN/SAN Extension
- Gateways that connect geographically distributed
SANs via existing WAN infrastructure. These solutions
may use IP, DWDM, or SONET transport technologies
and support protocols that carry FC blocks via
those WAN networks (FCIP, iFCP etc).
Storage
Services
Assessment &
Design - A methodical process conducted
by companies that identifies key storage usage
patterns.
Implementation &
Deployment - Installation, setup, migration,
and process change services of storage technologies.
Solution Integration
- The planning and staging step of all
technology elements prior to production deployment.
Testing - "Proof of concept"
testing prior to design selection and or deployment.
Training
- The staff education and competency development
step that typically follows deployment and includes
knowledge transfer.
Storage
Technologies
Array Controllers - The intelligent
component of a disk subsystem where control software
executes.
ATA: (Advanced Technology
Attachment) - The specification for IDE
drives. The specification was published in 1994
as ANSI standard X3.221-1994, titled AT attachment
Interface for Disk Drives.
Chip - A
set of microminiaturized, electronic circuits
that are designed for use as processors and memory
in computers and countless consumer and industrial
products.
DLT: (Digital Linear
Tape) - A family of tape device and media
technologies developed by Quantum Corporation.
ESCON: (Enterprise
System Connection) - A 200Mb/sec serial
I/O bus used on IBM Corporation’s Enterprise
System 9000 data center computers. Similar to
Fibre Channel in many respects, ESCON is based
on redundant switches to which computers and storage
subsystems connect using serial optical connections.
FC: (Fibre Channel)
- A high-speed transport technology used to build
storage area networks (SANs). Although Fibre Channel
can be used as a general-purpose network carrying
ATM, IP and other protocols, it has been primarily
used for transporting SCSI traffic from servers
to disk arrays. The Fibre Channel Protocol (FCP)
serializes SCSI commands into Fibre Channel frames.
IP, however, is used for in-band SNMP network
management. Fibre Channel not only supports singlemode
and multimode fiber connections, but coaxial cable
and twisted pair as well.
Fibre Channel can be configured
point-to-point, via a switched topology or in
an arbitrated loop (FC-AL) with or without a hub,
which can connect up to 127 nodes (see below).
It supports transmission rates up to 2.12 Gbps
in each direction, and 4.25 Gbps is expected.
Fibre Channel uses the Gigabit Ethernet physical
layer and IBM's 8B/10B encoding method, where
each byte is transmitted as 10 bits. Fibre Channel
provides both connection-oriented and connectionless
services. Following are the class and functional
levels of the architecture.
FCIP: (Fiber Channel
Over Internet Protocol) - An Internet
Protocol (IP)-based storage networking technology
developed by the Internet Engineering Task Force
(IETF). FCIP mechanisms enable the transmission
of Fibre Channel (FC) information by tunneling
data between storage area network (SAN) facilities
over IP networks.
FICON: (Fibre Connectivity)
- IBM Corporation’s Implementation of ESCON
over Fibre Channel.
GigE: (Gigabit Ethernet)
- A type of high-speed network hardware. Also
known as 1000Base-T.
HBA: (Host Bus Adapter)
- Any hardware bridge between a storage interconnect
and a system’s I/O bus.
HCA/TCA:
(Host Channel Adapter/Target Channel Adapter)
- Host Channel Adapters (HCA) and Target Channel
Adapters (TCA) enable servers and I/O devices
to connect to the InfiniBand fabric, respectively.
These adapters are comprised of specialized chips
that process the InfiniBand link protocol at wire
speed and without incurring any host overhead.
InfiniBand
- A switched I/O architecture that delivers a
high performance, low latency interconnect for
the data center.
IP: (Internet Protocol)
- A protocol that provides connectionless best-effort
delivery of datagrams across heterogeneous physical
networks.
IP Storage (Internet
Protocol storage) - Using IP and Gigabit
Ethernet to build storage area networks (SANs).
Traditional SANs were developed using the Fibre
Channel transport, because it provided gigabit
speeds compared to 10 and 100 Mbps Ethernet used
to build messaging networks at that time. Fibre
Channel equipment has been costly, and interoperability
between different vendors' switches was not completely
standardized. Since Gigabit Ethernet and IP have
become commonplace, IP storage enables familiar
network protocols to be used, and IP allows SANs
to be extended throughout the world. Network management
software and experienced professionals in IP networks
are also widely available. iSCSI, iFCP and mFCP
are protocols that turn SCSI and Fibre Channel
Protocol commands into TCP/IP.
iSCSI: (Internet
Small Computer System Interface) - A
protocol that serializes SCSI commands and converts
them to TCP/IP.
IS-NIC: (Integrated
Storage Network Interface Card) - A NIC
card that incorporates TCP/IP offload features
and fully implements the TCP/IP stack.
LTO: (Linear Tape
Open) - A family of open magnetic tape
standards developed by HP, IBM and Seagate that
are licensed to third-party vendors.
NAS: (Network-Attached
Storage) - A file-storage device that
is accessed through a network.
NIC: (Network Interface
Card) - A printed circuit board that
connects a computer or other node to a network.
Also known as a network adapter.
SAN: (Storage Area
Network) - A network that transfers data
between computer systems and storage devices via
peripheral channels such as SCSI or Fibre Channel.
SCSI: (Small Computer
System Interface) - A collection of ANSI
standards and proposed standards that define I/O
buses primarily for connecting storage subsystems
or devices to hosts through host bus adapters.
Snapshot:
A fully usable copy of a defined collection of
data that contains an image of the data as it
appeared at the point in time at which the copy
was initiated. A snapshot may be either a duplicate
or a replicate of the data it represents.
TOE: (TCP/IP Offload
Engine) - Network Interface cards with
hardware or firmware onboard which is designed
to offload the majority of all TCP/IP processing
from the host CPU. The intent is to speed up I/O
by increasing the speed of TCP/IP processes.
Storage Architectures
SAN (Storage Area
Network) - A network of storage disks.
In large enterprises, a SAN connects multiple
servers to a centralized pool of disk storage.
Compared to managing hundreds of servers, each
with their own disks, SANs improve system administration.
By treating all the company's storage as a single
resource, disk maintenance and routine backups
are easier to schedule and control. In some SANs,
the disks themselves can copy data to other disks
for backup without any processing overhead at
the host computers.
The SAN network allows data
transfers between computers and disks at the same
high peripheral channel speeds as when they are
directly attached. Fibre Channel is a driving
force with SANs and is typically used to encapsulate
SCSI commands. SSA and ESCON channels are also
supported.
SANs can be centralized or
distributed. A centralized SAN connects multiple
servers to a collection of disks, whereas a distributed
SAN typically uses one or more Fibre Channel or
SCSI switches to connect nodes within buildings
or campuses. For long distances, SAN traffic is
transferred over ATM, SONET or dark fiber. To
guarantee complete recovery in a disaster, dual,
redundant SANs are deployed, one a mirror of the
other and each in separate locations.
Another SAN option is IP
storage, which enables data transfer via IP over
fast Gigabit Ethernet locally or via the Internet
to anywhere in the world. See LAN free backup.
Channel Attached Vs. Network Attached
A related storage device is the network attached
storage (NAS) system. The NAS is a disk subsystem
that attaches to the LAN like any server or workstation.
But rather than containing a full-blown operating
system, it typically uses a slim microkernel specialized
for handling only file reads and writes (CIFS/SMB,
NFS, NCP). Adding or removing a NAS box is like
adding or removing any node in a network. In contrast,
the channel-attached storage system of the SAN
must be taken offline to reconfigure it. However,
the NAS is subject to the variable behavior and
overhead of a network that may contain thousands
of users.
SAN-NAS terminology is confusing
(storage area network vs. network attached storage).
NASs are often contrasted with SANs, but are also
included under the "storage network"
umbrella. The major difference is that the channel-attached
SAN extends the peripheral channel to long distances,
whereas the NAS is another file server on the
network.
NAS (Network Attached Storage) -
A specialized file server that connects to the
network. A NAS device contains a slimmed-down
(microkernel) operating system and file system
and processes only I/O requests by supporting
popular file sharing protocols such as NFS (Unix)
and SMB/CIFS (DOS/Windows). Using traditional
LAN protocols such as Ethernet and TCP/IP, the
NAS enables additional storage to be quickly added
by plugging it into a network hub or switch. As
network transmission rates have increased from
Ethernet to Fast Ethernet to Gigabit Ethernet,
NAS devices have come up to speed parity with
direct attached storage devices.
General-purpose computers
with a full-blown operating system such as Windows
or Unix are sometimes labeled as NAS products,
but the true NAS is built from scratch as a dedicated
file I/O device.
|