Student Seminar Report & Project Report With Presentation (PPT,PDF,DOC,ZIP)

Full Version: Network News Transfer Protocol
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
[attachment=11776]
1.0 Introduction
NNTP

Definition:--NNTP (Network News Transfer Protocol) is the predominant protocol used by computer clients and servers for managing the notes posted on Usenet newsgroups. NNTP replaced the original Usenet protocol, UNIX-to-UNIX Copy Protocol (UUCP) some time ago. NNTP servers manage the global network of collected Usenet newsgroups and include the server at your Internet access provider. An NNTP client is included as part of a Netscape, Internet Explorer, Opera, or other Web browser or you may use a separate client program called a newsreader.
NNTP (Network News Transfer Protocol) is the protocol used by Usenet news servers and clients (readers).
Usenet is a huge shared message system which is used on the Internet.
Usenet consists of newsgroups such as comp.sys.ibm.pc.hardware.video and rec.collecting.sport.hockey.
The NNTP protocol exists on the Application Layer of the OSI Model.NNTP utilizes TCP port 119; NNTP with SSL utilizes TCP port 563.
About news servers and NNTP
This explains what an NNTP server is, and why if you haven't got one you can't just use someone else's.
How news is distributed
Internet news, (or "Usenet news"), is distributed using a completely different protocol to either electronic mail or normal W3 "HTTP" servers. The Network News Transfer Protocol "NNTP" has the effect of broadcasting every message to (basically) every site, in contrast to email protocols which send messages to specific sites, and HTTP, which only transfers the information on demand by the reader.
An NNTP server is set up by its system manager to know about some (at least one) nearby servers. With these servers, there is an arrangement that they will pass news to each other. Sometimes this arrangement is limited to certain news groups. Articles can be passed in both directions, and the servers compare article message-id headers to see whether they have any new news for each other.
Running an NNTP server
An NNTP server manager decides which parts of the news group hierarchy he will take, and finds another server administrator who is willing to feed news to him: his "news feed". Many administrators for example exclude the "alt.*" groups. The decision to take NNTP news involves allocating a lot of disk space to keep all the articles until they expire (typically in a few weeks). Some organizations feel it is an inappropriate use of their machines to store articles on strange topics. This is why, even if you do have a news server, you may not have access to some groups.
Sometimes there are conditions attached to the news feed, which forbid for example commercial use being made of the data, or it being passed on by any other way than NNTP. NNTP is much more efficient than HTTP for the case of articles which are going to be very widely read, because an article is only transferred once onto each site. Then, someone reads it, the WWW client only has to retrieve it from the local server and not from the server where it started. NNTP servers will only allow local clients to access them directly, as to allow everyone in the world to access the same NNTP server would destroy this efficiency, and could lead to disastrous loads on the net and on that server. This is why you can't use just anyone's NNTP server. We get a lot of queries asking whether people can use ours. Sorry, you can't.
It may be that on your site there is already an NNTP server. Smart sites just give it alias "news". If anyone on your site is reading Internet news then you could find how they have configured their new reader.
If you don't have an NNTP server on site, then someone is going to have to install it and arrange for a news feed from somewhere close.
When you have found one, then configure your WWW client to use it.
An alternative
If you really can't, for some reason, run an NNTP server on your site, you may find a friendly nearby site which will run an HTTP/NNTP gateway for you. This means running the cern_httpd with a small configuration file which allows anyone (or rather, anyone in your domain) to use it as a proxy for news: URLs. That is easy. Then, you configure your client to use that proxy for news. That should be easy too. The disadvantage is that you can't currently post news through the proxy.
Exception
There are some specialist news groups which are set up locally, and which it is not worth distributing via the general system. In this case, the administrators, instead of feeding the news into the NNTP network, just enable clients to connect from anywhere.
This doesn't scale very well, and in fact it would be more appropriate for them to run an HTTP server (or run an NNTP server and a server such as cern_httpd running as an HTTP/NNTP gateway). However, for these specialist groups, a URL of the form
nntp://host.dom.ain/news.group
Advocate takes a look at the historical trends in popularity among Different distributed and centralized protocols, from NNTP through HTTP and on to newer systems such as Napster and the ill-fated Gnutella. I argue that, in spite of the trend towards centralization in the Web, distributed protocols have some life left in them.
The NNTP protocol, when it was first published was quite a striking advance in Internet protocols. News articles propagated in an entirely distributed, decentralized fashion. The design of the protocol saved bandwidth when there were lots of people at a site reading the same news articles, and also tolerated failure of individual nodes. With these advantages, NNTP became one of the most popular Internet protocols (along with email and FTP), and for a time was the mechanism of choice for participating in Internet "communities."
Now, 14 years later, NNTP is a protocol in serious trouble. Its complete Openness and lack of centralized control made it vulnerable to spam, abuse, and other nastiness. While I used to "read news" almost every day, the quality has sunk to the point where it's just not worth it.
<P>NNTP's popularity has been largely displaced by Web-based message boards.
These systems have many serious disadvantages compared with NNTP, including:
Much poorer utilization of bandwidth, the forced use of clunky "one size fits"
HTML-based interfaces, and vulnerability to the failure or compromise of the Individual sites that host the content. Nonetheless people do find them significantly more useful on balance.
In spite of the mass migration from Usenet to the Web, NNTP does maintain a few strongholds, especially pr0n, mp3z, and wares’. In addition, a number of new protocols with peer-to-peer transmission of files are coming down the pike, of which Napster is surely the most popular.
A lot of people seem to be working on variations of Napster, including the
Gnutella project started by a couple of employees at Null soft (the people who make WinAmpand promptly shut down by their corporate overlords at AOL.
Since the issues are disparate, we'll look at them one at a time, by category.
Bandwidth
One of the main technical issues of all these protocols is bandwidth. In the Classical setup, your school has an NNTP server that talks over the outside Network to a few other NNTP servers. Within the school, the clients talk to the news server over a very fast local network. NNTP doesn't have any concept of loading files remotely only on demand, so the total bandwidth tradeoff depends on the average number of people accessing any one file (average, in this case, being mean weighted by file size). When this number is above 1, you win. When it's a lot more than 1, you win big.
With the Web, you basically load files directly from the central server to the clients (this is certainly how mp3.com works). In some cases, especially when there's a local network with a lot more load than the connection to the Internet really can support, it makes sense to add a caching proxy server such as Squid. Web caching has its own set of issues, though, and basically doesn't work well unless the server cooperates.
What's possible, of course, is a hybrid that is optimized for the heterogeneous networks common in schools and companies. Basically, you need the protocol to be sensitive to the relative capacities of the networks, and try to share files between multiple clients inside the local network, rather than duplicating their transfer from the outside. This is one of the goals of Gnutella, and it's easy to imagine that it will continue to be an area of active work on the part of distributed protocols.
Complexity
Let's face it, centralized systems are easier to manage and deploy than distributed ones. In a distributed protocol, you have to worry about consistency of namespaces, make sure the propagation algorithm works properly, and deal with things like partition of the network, failure of remote peers, and so on. Many of these problems pretty much go away in a centralized system.
Further, to really take advantage of the added robustness possible in a fully decentralized system, you need client support to browse different
Servers and select the one with the best availability. This is quite a bit harder than just doing a DNS lookup on the server's domain name, then connecting on a socket. Yet, the added complexity shouldn't be overwhelming. NNTP, after all, has lots of clients and servers by now.
Control
Here, I think, is the crux of the distinction between centralized and distributed protocols. In a centralized system, there is a single point of control for things like controlling access, blocking and removing spam, etc. In a distributed system, this kind of control is difficult or impossible. The lack of controllability is both a good thing and a bad. While nobody likes spam and other forms of abuse, the anarchic nature of the Internet is one of its more appealing features. In particular, decentralized systems seem to be particularly resistant to censorship, both blatant and the more subtle forms resulting from economic pressures.
From a censorship point of view, content lies on a spectrum from official propaganda and corporate-sponsored messages to flatly illegal stuff, with a lot of the interesting stuff in between. Thus, it's not surprising to see that a lot of the less "official" stuff, such as copyrighted music, pr0n, and wares, gravitate to the more decentralized forms, while e-commerce takes place entirely with centralized servers.
Note that censorship and resource utilization have been linked for a long time. Schools all over the world are now banning Napster because of the intense network utilization. Back in the good old days, the protocol of choice for wares’ and similar stuff was FSP, which had the major property that it degraded gracefully under load, simply throttling the transfer speed rather than killing the network.
What next?
The success of Napster is fueling a renaissance in distributed protocols for file distribution. While a lot of the development is currently ad hoc, it should be possible to learn from the successes and failures of systems which have gone before, and systematically design new stuff that works pretty well.
In the 14 years since NNTP was specified, a number of techniques have come to light which can help fix some of its limitations. These include:
Protocols such as sync and delta for synchronizing remote systems. The use of hashes to define a collision-free global namespace. Public key cryptography, particularly digital signatures for authentication. Systems such as Policymaker and Keynotes for implementing policies. A ton of academic research on special problems within distributed systems. Further, there are a bunch of exciting new things that might just nail the spam and abuse problems that seem to be endemic to distributed communications. This includes the existing work from people such as Spam Cop and NoCeM as well as the trust metric work being pioneered on this very website.
Advocate modestly predicts a renaissance in distributed protocols. The next few years seem like a very exciting time for new work in this area.
SMTP did it right, posted 22 Mar 2000 at 06:25 UTC by aaronl (Master) some distributed protocols just don't work well. One of these is Napster, whose centralized servers aren't even linked. SMTP is one of the best distributed protocols I have ever seen. The mail server relays an email to its destination. The destination is determined by using a record on the DNS of the hostname which is the latter part of the e-mail address. The only centralized part of the system is DNS.
Jabber seems to be using the same method as SMTP (user host; server is on host), which will make it the only instant messaging system that is not centrally controlled, AFAIK. In systems such as napster, AIM, and ICQ, only clients are released and users are expected to use the company's servers for their communication. If people go and smart smaller servers with these protocols, it all breaks down, because there is no mechanism for linking these servers together, especially to the main servers where many people are signed up (~30*10^6 in AOL's case IIRC). For this reason I am really looking forward to using both the jabber server and client. The internet in itself is a whole lot more distributed than people could have imagined. Steve Levy, in a 1990 MacWorld article proposed a One Net monopoly that would eliminate the problem of people using different online services and therefore not being able to communicate. With the internet, the only centralized management is DNS, and it is actually optional (http://208.163.51.55 will cause the routers to use their routing tables to find a path to my computer - they won't have to look it up in a central directory). You could call the web a huge distributed network of web servers. This is actually a huge step over what came before, where content was put up on a huge, centralized online service like AOL or CompuServe and the content was hosted and even censored by them. You had to pay a lot to get a "keyword" (AFAIK), and you couldn't shop around for prices because the only way to reach AOL’s subscribers was to pay AOL. Because of the advances made with the Internet in general and the Web, I think the rise of distributed networks and protocols actually happened with the widespread adoption of the Internet.
Distributed vs. Decentralized posted 22 Mar 2000 at 19:10 UTC by nelsonminar (Master) An interesting distinction to make in system design is the difference between "distributed" and "decentralized". It's useful to reserve the word “distributed" to talk about the fact of moving bits from place to place. Pretty much any system on the Internet is distributed. The question is How they’re distributed. Some systems on the Internet are fully decentralized - the Web is the premier one. Some systems are centralized, such as a single Web discussion board. In between are hierarchical systems: DNS falls in this category, where there is a single tree of authority but plenty of caching along the way. NNTP and the current Internet backbone architecture both fall in a different category. Neither system is fully decentralized: there's still a strong tree shape to the network, where leaf sites get feeds from upstream. However, neither system is fully hierarchical either: at the highest levels, traffic is mutually peered and shared between sites; there is no root authority like InterNIC.
Each type of design - centralized, hierarchical, semi-hierarchical, or fully decentralized - has its advantages. Centralized is the easiest to understand, but the least scalable and the least fault tolerant. Hierarchical has done well - the success of DNS over the last 20 years is nothing short of phenomenal. But hierarchical implies a root monopoly, and we've seen those disappear over time with things like the current Internet route peering architecture. The thing that's less clear is fully decentralized systems. It works very well on the Web, but only because we have full text search engines to knit things together. I think the most exciting area of future Internet research lies in this regime. The payoff could be huge, building truly scalable and self-healing systems. But the complexity is very difficult to manage.
Anarchic vs. Authoritarian, The World Wide Web wasn't the only contender for a successful decentralized hypertext system. The HyperGratz system from Austria was (is?) another, and for a while was more widely deployed.
Hyper/HyperGratz had a distribution and cache mechanism that was vastly more efficient than the WWW. It also had the idea that to run a server, you simply filled in a form and applied to someone in Austria to be added, and they'd tell you where your content fit into a global hierarchy. This sort of beaurocratic centralization is probably as "un-American" as you can get. I've portrayed it a little brutally to try and make clear how it might sound in North America. The political advantage of WWW is that anyone can set up a server right away, with no need to interact with anyone. You can do that with Netnews too, both with NNTP and with the older uucp transport and B-bnews distribution methods. The technical advantage is simplicity.
Another distinguishing factor is the document life cycle. Usenet articles last anywhere from hours to weeks; web pages last from hours to years, or to years after they are out of date. AIM messages last seconds, and unless someone saves a log, IRC messages last a few minutes, or the length of your scrollbar.
Bandwidth is less important in IRC than minimizing interruption of service, for example (which is why the minimum spanning tree routing is inappropriate), whereas a two hour interruption in a Usenet feed might not be noticed. The technology and the politics have to work together, and have to be appropriate for the content, the users and the way the content is used.
More protocols should have fundamental privacy features built in, and by that I don't mean crypto, I mean lack of information. One of the most useful features about Usenet is its utter lack of authentication. What good is an anonymous remailer if the server even has the ability to keep logs of who sent what where? If the information exists (or can exist) then someone somewhere someday will get a court order that will trump any promises of privacy. You can't subpoena Usenet. Things like Napster can't succeed unless they protect their users with technical features, and not mere words. Napster is maybe not the best example of this, since everyone knows that its primary use is for copyright violations, but our civil liberties are usually defended by people with unpopular views. Defending the pornographers and racists, and yes, even the wares kids, is how we defend ourselves.
One of the important areas of concern between distributed and centrally managed systems is, as was mentioned by jwz, anonymity and privacy versus control and responsibility. Within any centrally managed system, it is a trivial exercise to implement safeguards to limit abuse. The draw back to that is the inherent limit this would impose on privacy. The users have little choice beyond trusting the central authority will not do "Bad Things" with the information they track. Unfortunately, there are numerous examples of companies that will sell all the personal information they can find, because someone is willing to buy and subsequently abuse it.
The picture within a truly distributed system is actually somewhat worse (IMHO). With no controls on user activity because of complete anonymity, there is no longer a limit on the irresponsibility of the users. I might choose to steal a "respectable" online identity and post bogus stories to the effect that VALinux is about to report losses triple analyst's estimates, just to see what happens to the stock price. In a completely anonymous internet, who could stop me? Only I can, assuming I have some sense of ethics that identifies such behavior as unacceptable.