Cool Interesting Stuff: introduction

Showing posts with label introduction. Show all posts

Tuesday, August 11, 2009

Introduction to Data Mining

Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.
Data mining is the analysis of (often large) observational data sets to find
unsuspected relationships and to summarize the data in novel ways that are
both understandable and useful to the data owner.
Data mining is an interdisciplinary field bringing together techniques from
machine learning, pattern recognition, statistics, databases, and visualization to
address the issue of information extraction from large data bases.

WHY MINE DATA ?
1. Commercial View :
- Lots of data is being collected and warehoused.
* Web data, e-commerce.
* Purchases at department/grocery stores.
* Bank/Credit Card transactions.
- Computers have become cheaper and more powerful
* Competitive Pressure is strong.
* Provide better, customized services for an edge.
2. Scientific View :
- Data collected and stored at enormous speeds(GB/hour).
* Remote sensors on a satellite.
* Telescopes scanning the skies.
* Micro arrays generating gene expression data.
* Scientific simulations generating terabytes of data.
- Traditional techniques infeasible for raw data.
- Data mining may help scientists :
* in classifying and segmenting data.
* in Hypothesis Formation.

SCOPE OF DATA MINING :
Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:
- Automated prediction of trends and behaviors.
- Automated discovery of previously unknown patterns.

Automated discovery of previously unknown patterns.
* More columns : Analysts must often limit the number of variables they examine when doing hands-on analysis due to time constraints. Yet variables that are discarded because they seem unimportant may carry information about unknown patterns. High performance data mining allows users to explore the full depth of a database, without preselect a subset of variables.
* More rows : Larger samples yield lower estimation errors and variance, and allow users to make inferences about small but important segments of a population.

Tuesday, August 4, 2009

Introduction to Databases

Databases play an important role in almost all areas where they are used including business, engineering, medicine, law, education, and library science, to name a few.
A database is a collection of related data, where data means recorded facts. A typical database represents some aspect of the real world and is used for specific purposes by one or more groups of users. Databases are not specific to computers. Examples of non-computerized databases abound: phone book, dictionaries, almanacs, etc. A database has the following implicit properties :
1. A database represents some aspect of the real world.
2. A database is a logically coherent collection of data with some inherent meaning.
3. A databse is designed, built, and populated with data for a specific purpose.
4. A databse can be of any size and of varying complexity.
5. A database may be generated and maintained manually or it may be computerized.
A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the process of defining, construction, and manipulating databases for different applications.
Defining a database involves specifying the data types, structures, and constraints for the data to be stored in the database.
Constructing a database is the process of storing the data itself on some storage medium that is controlled by the DBMS.
Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database and generating the reports from the data.

CHARACTERSTICS THAT DISTINGUISH DATABASE APPROACH FROM TRADITIONAL FILE-PROCESSING APPLICATIONS :
- Existence of a catalog : It contains information such as structure of each file, the type and storage format of each data item and various constraints on the data. The information stored in catalog is called meta-data.
- Program data independence : In traditional file processing, the structure of a file is embedded in the access programs, so any changes to the structure of a file may require changing all programs that access this file. By contrast, the structure of data files is stored in DBMS catalog separately from access programs. This property is called program data independence.
- Program operation independence: Users can define operations on data as part of database applications. An operation is specified in two parts - interface of operation : includes operation name and data types of its arguments, implementation of operation : specified separately and can be changed without affecting the interface. This is called
program operation independence.
- Data abstraction : The characteristic that allows program data independence and program operation independence is called data abstraction.
- Support of multiple user views.
- Sharing of data among multiple transactions.

Main Categories of Database users are :
- Administrators.
- Designers.
- End users.
- System analysts and application programmers.
- DBMS system designers and implementers.
- Tool Developers.
- Operators and maintenance personnel.

Advantages of using Databases :
- Potential for enforcing standards.
- Reduced application development time.
- Flexibility.
- Availability of up-to-date information to all users.
- Economies of sale.

Saturday, July 25, 2009

Introduction to Firewalls

A firewall is a hardware or software system that prevents unauthorized access to or from a network. They can be implemented in both hardware and software, or a combination of both. Firewalls are frequently used to prevent unauthorized Internet users from accessing private networks connected to the Internet. All data entering or leaving the Intranet pass through the firewall, which examines each packet and blocks those that do not meet the specified security criteria.

Firewalls can greatly enhance the security of a host or a network. They can be used to do one or more of the following things:
* To protect and insulate the applications, services and machines of your internal network from unwanted traffic coming in from the public Internet.
* To limit or disable access from hosts of the internal network to services of the public Internet.
* To support network address translation (NAT), which allows your internal network to use private IP addresses and share a single connection to the public Internet (either with a single IP address or by a shared pool of automatically assigned public addresses).

FIREWALL CONCEPTS
There are two basic ways to create firewall rulesets: “inclusive” or “exclusive”. An exclusive firewall allows all traffic through except for the traffic matching the ruleset. An inclusive firewall offers much better control of the outgoing traffic, making it a better choice for systems that offer services to the public Internet. It also controls the type of traffic originating from the public Internet that can gain access to your private network. All traffic that does not match the rules, is blocked and logged by design.
Inclusive firewalls are generally safer than exclusive firewalls because they significantly reduce the risk of allowing unwanted traffic to pass through them.

HOW FIREWALLS WORK ?
A firewall, working closely with a router program, examines each network packet to determine whether to forward it toward its destination. A firewall also includes or works with a proxy server that makes network requests on behalf of workstation users. A firewall is often installed in a specially designated computer separate from the rest of the network so that no incoming request can get directly at private network resources.
Firewalls use one or more of three methods to control traffic flowing in and out of the network:
* Packet filtering - Packets are analyzed against a set of filters. Packets that make it through the filters are sent to the requesting system and all others are discarded.
* Proxy service - Information from the Internet is retrieved by the firewall and then sent to the requesting system and vice versa.
* Stateful inspection - It compares certain key parts of the packet to a database of trusted information. Information traveling from inside the firewall to the outside is monitored for specific defining characteristics, then incoming information is compared to these characteristics. If the comparison yields a reasonable match, the information is allowed through. Otherwise it is discarded.

Cool Interesting Stuff

Tuesday, August 11, 2009

Introduction to Data Mining

Tuesday, August 4, 2009

Introduction to Databases

Saturday, July 25, 2009

Introduction to Firewalls

Feeds

My Blogs

Blog Archive

Contributors

Stats