Archive

Archive for March, 2005

Retrieving web pages with foreign characters

When you make a request to a web page using code such as:

HttpWebRequest httprequest = (HttpWebRequest)WebRequest.Create(requestURI);
HttpWebResponse
httpresponse = (HttpWebResponse)httprequest.GetResponse();
Stream responsestream = httpresponse.GetResponseStream();
StreamReader httpstream = new StreamReader(responsestream);
string bodytext = httpstream.ReadToEnd();

You may find that certain characters may be missing from the string returned, such as the copyright © character, or foreign characters, such as é (e acute). In order to get around this you need to use Latin encoding (ISO 8859) in the StreamReader thus:

StreamReader httpstream =

new StreamReader(responsestream, Encoding.GetEncoding("iso8859-1"));

… had me stumped for ages!

Categories: Uncategorized

Indexed views SQL Server 2005

In an attempt to speed up queries coming from my 11 million row table "Companies", I decided to dabble in a bit of indexed views, which are now available in sql 2005.

Here’s a transscript of how I got on (warts and all)

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:Documents and SettingsAdministrator>sqlcmd
1>
Sqlcmd: Warning: Last operation was terminated by the user by pressing Ctrl-C.

C:Documents and SettingsAdministrator>sqlcmd -S.SQLEXPRESS -E
1> use dinfo
2> create view VCompanies with schemabinding as
3> select * from companies
4> go
Msg 111, Level 15, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 2
‘CREATE VIEW’ must be the first statement in a query batch.
Msg 156, Level 15, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 3
Incorrect syntax near the keyword ‘select’.
Sqlcmd: Error : Microsoft OLE DB Provider for SQL Server : One or more errors oc
curred during processing of command..
1> use dinfo
2> go
Changed database context to ‘dinfo’.
1> create view VCompanies with schemabinding as
2> select * from companies
3> go
Msg 1054, Level 15, State 6, Server OPENMERCHANTACCSQLEXPRESS, Procedure VCompa
nies, Line 2
Syntax ‘*’ is not allowed in schema-bound objects.
1> create view VCompanies with schemabinding as
2> select id,name,address,city,PostCode,Telephone from companies
3> go
Msg 4512, Level 16, State 3, Server OPENMERCHANTACCSQLEXPRESS, Procedure VCompa
nies, Line 2
Cannot schema bind view ‘VCompanies’ because name ‘companies’ is invalid for sch
ema binding. Names must be in two-part format and an object cannot reference its
elf.
1> select id,name,address,city,PostCode,Telephone from dinfo..companies
2> create view VCompanies with schemabinding as
3> select id,name,address,city,PostCode,Telephone from dinfo..companies
4> go
Msg 111, Level 15, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 2
‘CREATE VIEW’ must be the first statement in a query batch.
Msg 156, Level 15, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 3
Incorrect syntax near the keyword ‘select’.
1> create view VCompanies with schemabinding as
2> select id,name,address,city,PostCode,Telephone from dinfo..companies
3> go
Msg 4512, Level 16, State 3, Server OPENMERCHANTACCSQLEXPRESS, Procedure VCompa
nies, Line 2
Cannot schema bind view ‘VCompanies’ because name ‘dinfo..companies’ is invalid
for schema binding. Names must be in two-part format and an object cannot refere
nce itself.
1> create view VCompanies with schemabinding as
2> select id,name,address,city,PostCode,Telephone from dbo.companies
3> go
1> CREATE UNIQUE CLUSTERED INDEX VDiscountInd ON Vdiscount1 (ProductID)
2> go
Msg 1088, Level 16, State 12, Server OPENMERCHANTACCSQLEXPRESS, Line 1
Cannot find the object ‘Vdiscount1’, because it does not exist or you do not hav
e permission.
1> create unique clustered index idxVcompanies on VCompanies (name,address,city,
postcode,telephone)
2> go
Msg 1935, Level 16, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 1
Cannot create index. Object ‘VCompanies’ was created with the following SET opti
ons off: ‘QUOTED_IDENTIFIER’.
Sqlcmd: Error : Microsoft OLE DB Provider for SQL Server : One or more errors oc
curred during processing of command..
1> SET ANSI_NULLS ON
2> SET ANSI_PADDING ON
3> SET ANSI_WARNINGS ON
4> SET CONCAT_NULL_YIELDS_NULL ON
5> SET NUMERIC_ROUNDABORT OFF
6> SET QUOTED_IDENTIFIER ON
7> SET ARITHABORT ON
8>
9> go
1> create unique clustered index idxVcompanies on VCompanies (name,address,city,
postcode,telephone)
2> go
Msg 1935, Level 16, State 1, Server OPENMERCHANTACCSQLEXPRESS, Line 1
Cannot create index. Object ‘VCompanies’ was created with the following SET opti
ons off: ‘QUOTED_IDENTIFIER’.
1> drop view vcompanies
2> go
1> create view VCompanies with schemabinding as
2> select id,name,address,city,PostCode,Telephone from dbo.companies
3> go
1> create unique clustered index idxVcompanies on VCompanies (name,address,city,
postcode,telephone)
2> go..

Warning! The maximum key length is 900 bytes. The index ‘idxVcompanies’ has maxi
mum length of 1275 bytes. For some combination of large values, the insert/updat
e operation will fail.
Msg 1105, Level 17, State 2, Server OPENMERCHANTACCSQLEXPRESS, Line 1
Could not allocate space for object ‘SORT temporary run storage:  49609231545139
2′ in database ‘dinfo’ because the ‘PRIMARY’ filegroup is full. Create disk spac
e by deleting unneeded files, dropping objects in the filegroup, adding addition
al files to the filegroup, or setting autogrowth on for existing files in the fi
legroup.
The statement has been terminated.
Sqlcmd: Error : Microsoft OLE DB Provider for SQL Server : One or more errors oc
curred during processing of command..

alter database dinfo
ADD FILE
(
 NAME = dinfot2,
 FILENAME = ‘C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLDatadinfo2.ndf’,
 SIZE = 5MB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 5MB
)
GO

… and re-ran the Create index statement..

 

 

Categories: Uncategorized

Recursive fallback not allowed for character u003F. Parameter name: chars

I just got this error on my website. The site has ASP.NET 2.0 beta 1 installed, It threw a wierd exception, which looked as if the Global.asax was corrupt, I deleted the file on the server, then got this error:

Recursive fallback not allowed for character u003F. Parameter name: chars

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.ArgumentException: Recursive fallback not allowed for character u003F. Parameter name: chars

Source Error:

An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Stack Trace:

[ArgumentException: Recursive fallback not allowed for character u003F. Parameter name: chars] System.Text.EncoderFallbackBuffer.ThrowLastCharRecursive(Int32 charRecursive) +127 System.Text.EncoderFallbackBuffer.InternalFallback(Char ch, Char*& chars) +259 System.Text.SBCSCodePageEncoding.GetByteCount(Char* chars, Int32 count, EncoderNLS encoder) +1632932 System.Text.EncodingNLS.GetByteCount(Char[] chars, Int32 index, Int32 count) +95 System.Text.Encoding.GetBytes(Char[] chars, Int32 index, Int32 count) +26 System.Text.Encoding.GetBytes(String s) +35 System.Diagnostics.EnvironmentBlock.ToByteArray(StringDictionary sd, Boolean unicode) +442 System.CodeDom.Compiler.Executor.ExecWaitWithCaptureUnimpersonated(SafeUserTokenHandle userToken, String cmd, String currentDir, TempFileCollection tempFiles, String& outputName, String& errorName, String trueCmdLine) +902 System.CodeDom.Compiler.Executor.ExecWaitWithCapture(SafeUserTokenHandle userToken, String cmd, String currentDir, TempFileCollection tempFiles, String& outputName, String& errorName, String trueCmdLine) +65 System.CodeDom.Compiler.CodeCompiler.Compile(CompilerParameters options, String compilerDirectory, String compilerExe, String arguments, String& outputFile, Int32& nativeReturnValue, String trueArgs) +443 System.CodeDom.Compiler.CodeCompiler.FromFileBatch(CompilerParameters options, String[] fileNames) +532 System.CodeDom.Compiler.CodeCompiler.System.CodeDom.Compiler.ICodeCompiler.CompileAssemblyFromFileBatch(CompilerParameters options, String[] fileNames) +230 System.Web.Compilation.AssemblyBuilder.Compile() +522 System.Web.Compilation.BuildProvidersCompiler.PerformBuild() +228 System.Web.Compilation.BuildManager.CompileWebFile(String virtualPath) +234 System.Web.Compilation.BuildManager.GetVPathBuildResultInternal(String virtualPath, Boolean noBuild, Boolean allowCrossApp, Boolean allowBuildInPrecompile) +580 System.Web.Compilation.BuildManager.GetVPathBuildResultWithNoAssert(HttpContext context, String virtualPath, Boolean noBuild, Boolean allowCrossApp, Boolean allowBuildInPrecompile) +234 System.Web.UI.PageHandlerFactory.GetHandler(HttpContext context, String requestType, String virtualPath, String path) +73 System.Web.HttpApplication.MapHttpHandler(HttpContext context, String requestType, String path, String pathTranslated, Boolean useAppConfig) +702 System.Web.MapHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +131 System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +162

 

Any Ideas any body?

(Follow up: I restarted the server, and it resolved itself…)

Categories: Uncategorized

Shrinking a 6 GB database

After spending the guts of a month compiling a 6 million row database, containing the name, address and phone number of every unique companiy in europe (Well, UK, Germany, Belgium, France, Italy, ang Luxemborg). – This exludes branches of the same company, which ammounts to 130 Million records – Which I didn’t store.

The resultant database was 2.3 GB MDF and 3.9GB LDF (log). stored on my local PC. In order to transfer this to my server, realistically, it needs to be below 1GB, or else it would take days to upload via 256K ADSL.

The first step is to shrink the log file. using this:

backup log dinfo with truncate_only

dbcc shrinkfile(dinfo, 100)

Unfortunately, I found that this did not work, instead, I logged into Enterpise mangager, selected the database from the list and selected Shrink Database. Selected 0% and "move pages to beginning of file". This reduced the database fragmentation, and thus shrunk the transaction log down to 500Kb, and the main MDF down to just over 2.2 Gb.

After Zipping and uploading the database, I then attempted to attach it with

EXEC sp_attach_db @dbname = ‘dinfo’,
     @filename1 = ‘C:Program FilesMicrosoft SQL ServerMSSQLDatadinfo.mdf’,
     @filename2 = ‘C:Program FilesMicrosoft SQL ServerMSSQLDatadinfo_log.ldf’

Which gave the error:

Server: Msg 1827, Level 16, State 2, Line 1
CREATE/ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 2048 MB per database.

However, I also have SQL server 2005 (Yukon) installed on my server, which doesn’t have the size limit of MSDE (yet it is still beta). So I ran the following command

C:Documents and SettingsAdministrator>sqlcmd -S.SQLEXPRESS -E
1> use generalpurpose
2> go
Changed database context to ‘generalPurpose’.
1> exec sp_attach_db @dbname=’dinfo’,@filename1 = ‘C:Program FilesMicrosoft SQ
L ServerMSSQL.1MSSQLDatadinfo.mdf’,@filename2 = ‘C:Program FilesMicrosoft
SQL ServerMSSQL.1MSSQLDatadinfo_log.ldf’
2> go
Converting database ‘dinfo’ from version 539 to the current version 587.
Database ‘dinfo’ running the upgrade step from version 539 to version 551.
Database ‘dinfo’ running the upgrade step from version 551 to version 552.
Database ‘dinfo’ running the upgrade step from version 552 to version 553.
Database ‘dinfo’ running the upgrade step from version 553 to version 554.
Database ‘dinfo’ running the upgrade step from version 554 to version 586.
Database ‘dinfo’ running the upgrade step from version 586 to version 587.

Et viola,  my 6GB database is running off SQL server 2005!

 

 

 

Categories: Uncategorized

Traditional storage mediums

A brief history of Hard Drive Technology

In 1950, Engineering Research Associates of Minneapolis, USA,  built the first commercial magnetic drum storage unit for the U.S. Navy, the ‘ERA 110’.  It could store one million bits of data and retrieve a word in 5 thousandths of a second.

In 1956 IBM invented the first computer disk storage system, the 305 RAMAC (Random Access Method of Accounting and Control).  This system could store five MBytes.  It had fifty, 24-inch diameter disks!

By 1961 IBM had invented the first disk drive with air bearing heads and in 1963 they introduced the removable disk pack drive.

In 1970 the eight inch floppy disk drive was introduced by IBM.  My first floppy drives were made by Shugart who was one of the "dirty dozen" who left IBM to start their own companies.  In 1981 two Shugart 8 inch floppy drives with enclosure and power supply cost me about $350.00.  They were for my second computer.  My first computer had no drives at all.

In 1973 IBM shipped the model 3340 Winchester sealed hard disk drive, the predecessor of all current hard disk drives.  The 3340 had two spindles each with a capacity of 30 MBytes, and the term "30/30 Winchester" was thus coined.

 

 

In 1980, Seagate Technology introduced the first hard disk drive for microcomputers, the ST506.  It was a full height (twice as high as most current 5 1/4" drives) 5 1/4" drive, with a stepper motor, and held 5 Mbytes.  My first hard disk drive was an ST506.  I cannot remember exactly how much it cost, but it plus its enclosure, etc. was well over a thousand dollars.  It took me three years to fill the drive.  Also, in 1980 Phillips introduced the first optical laser drive.  In the early 80’s, the first 5 1/4" hard disks with voice coil actuators (more on this later) started shipping in volume, but stepper motor drives continued in production into the early 1990’s.   In 1981, Sony shipped the first 3 1/2" floppy drives.

In 1983 Rodime made the first 3.5 inch rigid disk drive.  The first CD-ROM drives were shipped in 1984, and "Grolier’s Electronic Encyclopaedia," followed in 1985.  The 3 1/2" IDE drive started it’s existence as a drive on a plug-in expansion board, or "hard card."  The hard card included the drive on the controller which, in turn, evolved into Integrated Device Electronics (IDE) hard disk drive, where the controller became incorporated into the printed circuit on the bottom of the hard disk drive.   Quantum made the first hard card in 1985.

In 1986 the first 3 /12" hard disks with voice coil actuators were introduced by Conner in volume, but half (1.6") and full height 5 1/4" drives persisted for several years.  In 1988 Conner introduced the first one inch high 3 1/2" hard disk drives.  In the same year PrairieTek shipped the first 2 1/2" hard disks.

In 1997 Seagate introduced the first 7,200 RPM, Ultra ATA hard disk drive for desktop computers. Later milestones for IDE DMA, ATA/33, and ATA/66 were :

  • 1994 DMA, Mode 2 at 16.6 MB/s
  • 1997 Ultra ATA/33 at 33.3 MB/s
  • 1999 Ultra ATA/66 at 66.6 MB/s

In 2000 IBM tripled the capacity of the world’s smallest hard disk drive.  The drive held one gigabyte on a disk, was the size of an American quarter.  In retrospect, the world’s first gigabyte-capacity disk drive, the IBM 3380, introduced in 1980, was the size of a refrigerator, weighed 550 pounds (about 250 kg), and had a price tag of $40,000.

Types of Hard Disk Technology

RAID – Redundant Array of Inexpensive Drives

RAID technology stores your data on more than one drive to make sure nothing gets lost and to allow recovery of data from failed disk drives without shutting the system down.  RAID uses an assembly of disk drives, known as disk array, that operates as one logical storage unit.  The advantage of RAID, is because it’s a theoretical implementation, it can be implemented on any storage system with random data access, such as magnetic hard drives, optical storage, magnetic tapes, etc.  When the data transfer rate is an issue thought, the fastest SCSI hard drives are typically used. 

RAID allows for the immediate availability of data and, depending on the RAID level, recovery of lost data.  RAID levels provide redundancy of data at a various levels. (RAID Levels 0 , 1 & 5, 10).  Each of these levels uses a different ‘topology’ and method of backing up the data.  The more powerful the level, the more redundant and recoverable the data will be, and not surprisingly, the more expensive it will be to implement.

Small Computer Systems Interface

SCSI (pronounced scuzzy) is the acronym for Small Computer System Interface.  SCSI is a high performance peripheral interface that can independently distribute data among peripherals attached to the PC. It is generally regarded to be more suitable for high-end computer systems which require maximum possible performance than for the home user.

SCSI provides for higher data transfer rates and less CPU load than ATA (which we will see soon) but has higher cost and complexity.  SCSI is now available in several variations including, Fast SCSI, Fast Wide SCSI, Ultra SCSI, Wide Ultra SCSI, Ultra2 SCSI and Wide Ultra2 SCSI.  Each of these evolutions provides significant improvements in bus speed, bus width and possible connected devices.

S/ATA : Serial / Advanced Technology Attachment

ATA was the standard bus interface on the original IBM AT computer, and is the official ANSI (American National Standards Institute) standard term. Also known as IDE (Integrated Drive Electronics)

Most motherboards on modern computers come as standard, with two ATA 40-pin connectors each capable of supporting 2 devices (one master and one slave).

Serial ATA is the next -generation internal storage evolution and is designed to replace parallel ATA technology.  Serial ATA was introduced at 150Mbytes/sec, but is planned to support up to 600Mbytes/sec, as it evolves over a 10 year development roadmap.

SATA/ATA drives are more cost effective than the IDE technology, but currently not as fast.  It is therefore mainly used in domestic PC’s and standard office workstations. 

How a Hard Drive Works

A hard disk uses round, flat disks called platters, coated on both sides with a special media material designed to store information in the form of magnetic patterns. The platters are mounted by cutting a hole in the center and stacking them onto a spindle. The platters rotate at high speed, driven by a special spindle motor connected to the spindle. Special electromagnetic read/write devices called heads are mounted onto sliders and used to either record information onto the disk or read information from it. The sliders are mounted onto arms, all of which are mechanically connected into a single assembly and positioned over the surface of the disk by a device called an actuator. A logic board controls the activity of the other components and communicates with the rest of the PC.

Each surface of each platter on the disk can hold tens of billions of individual bits of data. These are organized into larger "chunks" for convenience, and to allow for easier and faster access to information. Each platter has two heads, one on the top of the platter and one on the bottom, so a hard disk with three platters (normally) has six surfaces and six total heads. Each platter has its information recorded in concentric circles called tracks. Each track is further broken down into smaller pieces called sectors, each of which holds 512 bytes of information.

 

The entire hard disk must be manufactured to a high degree of precision due to the extreme miniaturization of the components, and the importance of the hard disk’s role in the PC. The main part of the disk is isolated from outside air to ensure that no contaminants get onto the platters, which could cause damage to the read/write heads.

 

 

Conclusions

The first PC hard disks had a capacity of 10 megabytes and a cost of over $100 per MB. Modern hard disks have capacities approaching 300 gigabytes and dropped below US$1 per gigabyte last year. This represents an improvement of 1,000,000% in just under 20 years, or around 67% cumulative improvement per year. At the same time, the speed of the hard disk and its interfaces have increased dramatically as well.

The typical Western consumer now generates some 100 gigabytes of data during his or her lifetime, including medical, educational, insurance, and credit-history data.  Hard drive costs have never been so economically viable, allowing the purchase of such significant scales of storage capacity at extremely affordable prices.  With the parallel improvements in reliability and access speeds, the acquisition of hardware with which to implement ambitious ‘life caching’ projects utilising a multitude of formats (video, audio etc) is now, more than ever, a reality which can and will be realised.  

Categories: Uncategorized

The right and wrong way to clone an object in C#

The Wrong way…

    Flight[] flights = new Flight[0];
    BinaryFormatter myBinary = new BinaryFormatter();
    MemoryStream flightStream = (MemoryStream)dtSearched.Rows[i]["Flights"];  
    flightStream.Seek(0, SeekOrigin.Begin);
    flights = (Flight[])myBinary.Deserialize(flightStream);

The right way…

public class A : ICloneable {
   int x;
   public object Clone() {
       return MemberwiseClone();
   }
}

Categories: Uncategorized

Literature review on memory in cognitive computing

It seems intuitive that our memory is provided in at least two modes, a rapid-store/rapid-recall, area, which is located in our frontal lobes. A second mode, which is used for long term memories is available with relatively slow-store / slow-recall.

The paper “Fleeting memories” by Daphne Bavelier et al. investigates the performance characteristics of our rapid recall memory. Investigations into RSVP (rapid serial visual presentation), repetition blindness, and scene perception are discussed in this paper. The results of various experiments into our ability to comprehend and remember pictures of objects and scenes, written words, and sentences when the visual stimuli are presented sequentially at rates of up to ten items per second. are examined. The paper highlights our remarkably developed abilities to understand and remember the contents of very briefly presented material. 

Our memories can be further subdivided into spatial and non-spatial memories. A spatial memory is one which involves the visualization in three dimensions, where a non-spatial memory is one which is more abstract, and can be used to hold less tangible information. In a paper “Cortical Representations of Personally Familiar Objects and Places: Functional Organization of the Human Posterior Cingulate Cortex“ by Motoaki Sugiura et al. The contributors investigate the role of the posterior cingulate cortex, and how this area of the brain distinguishes between spatial and non-spatial memories. In some MRI (Magnetic Resonance Imaging) experiments, the distinction between the recognition of familiar objects as opposed to familiar places is clearly illustrated.
Another paper, “Sequential Memory: A Putative Neural and Synpatic Dynamical Mechanism“ by Prof. T. Rolls and Gustavo Deco, demonstrates, in a theoretically simulated neural network model a means by which sequences can be stored as memories with relative ease, without the need for successive repetition. In experiments, it was shown that a group of memories could be chained together in order to provide a strong recollection, as long as the sequence was logical. This perhaps points towards how certain people can recall large sequences of numbers by imagining a journey through a familiar environment.
A paper named “Separation of the Systems for Colour and Spatial Manipulation in Working Memory Revealed by a Dual-task Procedure” demonstrates how memories containing both spatial and colour information can be manipulated in our minds in parallel. But, through an experiment when the subject was distracted by performing a secondary task that required much of the subject’s attention, the manipulation of the memory was adversely affected, however, the subject could still maintain the memory of the object in their mind.

In a paper named “Prioritization of information in biological memory” by Christine Liu et al. The author examines the methods our brains use to provide prioritization of recall of everyday events. High priority memory provides quick access to important information, slower access to less important information, and we simply forget what unimportant details, such as what you had to eat for lunch two months ago. In a digital sense, if we were to store the Internet on a single computer, we couldn’t fit it all there, but we would start with the most useful and informative websites, and leave out websites such as “What I think of my neighbours cat” by Joe Nobody in Alabama

This idea is somewhat expanded in a paper named “Associative memories” by Juric Olsen et al. Who examines, by way of neural network models how memories can be enforced by groups of related memories. The author cites how the human brain has evolved from the hunter gatherer days, when it was important to remember and distinguish between poisonous berries, and fruit. In those days there would be little need to remember a set of thirteen digit numbers in order to speak to family members. Thus, our image generalization and storage mechanism is highly evolved, and, although it is simple for our brains to associate a symbolic representation to one of these stored generalised images, our brains find it very difficult to prioritize symbolic information which is not associated with an image, sound, or smell His experimentation with spiked neural networks supported his hypothesis.

In the paper “Conversations with Neil’s brain” by William H Calvin, the author pays special attention to human “cache” memory. Human cache memory is where rapid-access information is stored temporarily. In this memory, One can remember a phone number long enough to dial it, but not if you are distracted, by another activity, such as talking or writing. This cache memory is investigated in “Conversations with Neil’s brain”, where a patient with a damaged temporal lobe, retained the use of his cache memory, and could recall long-term memories but could no longer store new memories.

“Information generalization within finite-bounded neural space” is a paper by Evjen Rammer. The topic of information generalization is explored within selected clinical trials and neural network models. The paper, and related experiments show how our brains will generalize information, in order to save neuron-space. For instance, when you think of a horse, you would have a general image in your head, complete with head, tail, four legs etc. One can imagine the horse with or without a saddle, or in a variety of colours, but these are optional attributes. This is in stark comparison to how a computer would store images of horses, computers would have an index of a Grey horse, brown horse, Shetland pony etc., where a human would only have one image, but it would be general and accurate enough so that it is possible to instantly recognise a horse when seen.

 

Categories: Uncategorized

An Essay on Peer to Peer data storage

Report on data storage
Part 2. Distributed storage.

In order to store data in the order of thousands of terabytes, it becomes increasingly difficult to store such information on a single device. Distributed storage solves this problem by storing the data on multiple devices managed by different computers. Single computers in a distributed storage system may fail without rendering the entire storage system inoperable, this leads to enhanced durability and redundancy within the storage system.

When a client requests data from a distributed storage system, the system must be able to locate the distributed device (node) that holds the data that the client requested. There are three different techniques that can be used to achieve this, they are Ad-Hoc Peer to Peer (P2P), Pure P2P, or Indexed P2P.

Ad Hoc peer to peer is where the client knows which node within a distributed storage system contains the data that it requires. A good example of ad-hoc P2P is the WHOIS system.

The WHOIS system consists of about 100 or so computers which are managed independently. It is possible for a client to query the WHOIS system with a request such as “WHOIS google.com” and the system will return the name and address of the company which registered the domain name “google.com”. Each county (top level domain) manages their own WHOIS server, and some redundancy is provided where larger WHOIS servers contain duplicate data of country-specific WHOIS servers. For example, RIPE.NET contains WHOIS information for the whole of Europe, but so too do regional WHOIS servers in France, Germany, Holland etc.

The disadvantage of the WHOIS system and ad-hoc P2P systems in general is that regional differences can imagine, and the US format for WHOIS is different from the European format, and so forth. Also, if one server goes down, WHOIS information for that country may be lost. Furthermore, the client must know which WHOIS server to connect to. One advantage of the system is that it is quick and easy to make changes to data stored on the WHOIS system. This is not the case with pure P2P or unmanaged indexed P2P.

Pure P2P is used where a client knows a node within a distributed storage system, but not necessarily the node that holds the data that it is requesting.

A good example of Pure P2P is the DNS system. The DNS system consists of millions of interlinked DNS servers worldwide. It is possible for a client to query the DNS system with a request such as “resolve google.com” and the system will return the IP address for the “google.com” website. Each ISP manages their own DNS server, and many layers of redundancy are provided, as upstream DNS servers routinely exchange “routing advertisements”. Each DNS server knows the location of at least one other DNS server, and thus a request to any DNS server can be referred up the chain to the DNS server which holds the data the client requested.

The disadvantage of the DNS system and Pure P2P systems in general is that the topology of the system is not optimized to be ‘flat’, and each node in the system may have to query 30 other nodes in order to find an authoritative response to a client request. Furthermore, due to the ‘stingy’ topology of the DNS system and the periodicity of the routing advertisements, a change to a DNS record may take up to 48 hours to propagate through the internet. The main advantage of the pure P2P system is that there is no single point of failure, and even if multiple DNS servers go down, the system can continue to operate correctly.

Indexed P2P is a more recent invention, and is used effectively by file sharing networks such as WinMx, Kazaa and Napster. It is also used for large load-balanced multi-server websites, such as Google. It differs from pure P2P in that there are a single set of index servers which contain an index of the location of all other nodes in the network. Indexed P2P comes in two different forms, managed and unmanaged.

Unmanaged Indexed P2P is where the nodes within the distributed storage network are managed by private individuals. Such a system is used by music sharing networks such as WinMx. As a client connects to the WinMx network, and looks for a file such as “U2-vertigo.mp3”, the index servers return a list of IP addresses of peer servers which hold this file. The client then may download the file from one of the peer servers.

The disadvantage of unmanaged indexed P2P is that the content is not held by a single company, and thus the network is at the mercy of what data that each individual wishes to host. This may lead to users hosting copyright material, pornography, or viruses. Furthermore, since the index servers form the basis of the network, if the index servers fail, then the network is useless. Unmanaged servers run by private individuals typically have low-bandwidth, non-dedicated connections, making the network slower than managed servers.

Managed indexed P2P is where the nodes within the distributed storage network are managed by a single company. Such a system is used by large multi-server networks such as Google. In a managed indexed P2P works in a similar way to unmanaged indexed P2P. As a client makes a request to Google looking for “University of Ulster”, Google then passes this request on to an array of index servers which pass the request on to database servers which contain information on the “University of Ulster”. The database servers (nodes) return results on the pages found, a snippet of text containing the page from within the page referring to the searched text, and the weighting regarding the order in which it should appear in the list.

The disadvantage of managed P2P is that the system is extremely expensive to implement, as it requires several clustered servers. In the case of Google, they use over 10,000 networked desktop-grade servers. The advantage of managed P2P is that the company can control the content contained in its index. Also, since the bandwidth within and between their data-centres is very high, the result of any query can be returned in seconds.

 

 

Categories: Uncategorized

Site wide caching in ASP.NET 1.x

ASP.NET supports a feature known as caching. This is where the output of the page is stored in memory, for a short period of time until it is requested again, providing a performance boost. Expecially in IIS6/Win 2003 with it’s kernel mode HTTP.sys implementation.

Implementing this in ASP.NET 1.x would generally mean adding the page directive

<%@OutputCache Duration="5" VaryByParam="none" %>  to each page.

Which could be a big job if it is being fitted retrospectively to a large website.

However, it can be added to the Global.asax thus:

 protected void Application_BeginRequest(Object sender, EventArgs e)

 HttpContext Context = HttpContext.Current;
 HttpResponse resp = Context.Response;
 resp.Cache.SetExpires(DateTime.Now.AddSeconds(60));
 resp.Cache.SetCacheability(HttpCacheability.Server);
}

In asp.net 2.0 this can be added as a parameter in the web.config.

 

 

 

Categories: Uncategorized

JavaTiger.com, the next version of J2SE!

Java Tiger, J2SE 5.0

Java 2 Platform Standard Edition 5.0, was launched by Sun on September 30th 2004. J2SE 5.0 (known internally as J2SE 1.5.0 or informally as ‘Tiger’) has a number of key developments and new features, among them some changes to the fundamentals of the Java language.

This major feature release brings long-awaited advantages to the J2SE development community and some key business benefits to those involved in J2SE Development. This website highlights the key features of Java 5.0 and provides an overview of the benefits that can be gained through the adoption of Java 5.0.

  • Enhanced For loop 
  • Enumerated types
  • Static import and Autoboxing
  • Support for C-style-formatted input/output
  • variable arguments
  • Concurrency utilities
  • Simpler RMI (Remote Method Invocation) interface generation
  • JVM (Java Virtual Machine) monitoring and management
  • New Java look and feel

Further reading…

For more information on Java 1.5 Tiger, you may find Java 1.5 Tiger, A developer’s Notebook by D. Flanagan and B. McLaughlin from O’Reilly of interest. Other websites on Java Tiger can be found at Links-for-Java.

Categories: Uncategorized