Fantastic idea, the main problem is it is distributed at the moment (iunless I missed something.
Though the main idea behind the majestic project is not to compete or take on the search giants. The idea behind it is to use the power of distributed computing, using spare PC and Internet connection of volunteers to index web pages (similar to the way SETI@home uses spare capacity or idle time of computers to search for extra terrestrials).
But given that only sixty volunteers has so far indexed over 7 billion documents, should the volunteers double, in no time, the size of majestic project’s indexed pages will be challenging the databases if the major search engines.
Its worth checking out majestic 12 website at http://www.majestic12.co.uk/
* Comprehensive UK Web Directory List . eCommerce software UK
* BossCart.com can build you a.
Register your domain names at Velnet
::
Add Eco sites to The Green Directory free of charge.
Use LBS Free PHP Directory Script . Web Hosting Blog
Fantastic idea, the main problem is it is distributed at the moment (iunless I missed something.
But being distributed in its self is not a problem, as long as there is a way to retrieve results when searchers search for information on the engine.
What I think the main problem could be is if many of the servers that hold the database are offline at the same time
* Comprehensive UK Web Directory List . eCommerce software UK
* BossCart.com can build you a.
Register your domain names at Velnet
::
Add Eco sites to The Green Directory free of charge.
Use LBS Free PHP Directory Script . Web Hosting Blog
I couldn't see anyway to retrieve results, which is why I said what I did.
I would imagine that the servers would be classed into normal servers and super hubs. The initial request will go through the superhub, which will load balance it.
I see your point OWG, i don't quite know what their configuration is like but its searchable, you can try it here: http://majestic12.kicks-ass.org:8888/
* Comprehensive UK Web Directory List . eCommerce software UK
* BossCart.com can build you a.
Register your domain names at Velnet
::
Add Eco sites to The Green Directory free of charge.
Use LBS Free PHP Directory Script . Web Hosting Blog
Nice idea. If I owned that site I would aim at offering the best content search, not the number of URL's crawled. How about designing a search engine without spam pages?
To do that you have to define spam, which is what Google are trying to do.
You see, many people complain about all the viagara, vicodin,casino mortgage spam sites, BUT If I carry out a search for 'buy vicodin' I have as much right to get a page about vicodin as the next man has about tom& Gerry.
The whole problem is in defining spam, which is why google use a very complicated algorithm. base 5 sliding log algorithm with 100 elements allowing it in fact to produce 5 BILLION combinations of the same algo!
I know, but I would use the spare PC and Internet connection of volunteers not only to index the URL's, but to determine what's spam and what's not.
Definition of spam is difficult, I think its almost impossible to design a search engine without some spam
* Comprehensive UK Web Directory List . eCommerce software UK
* BossCart.com can build you a.
Register your domain names at Velnet
::
Add Eco sites to The Green Directory free of charge.
Use LBS Free PHP Directory Script . Web Hosting Blog
HOW? This is the problem, HOW do you define spam? If Nike Spam the SE's for the word Nike, do you ban them? people searching for Nike deserve to find Nike.Originally Posted by hostingspeeds
By the way Nike did this, and Google DID ban them ( for a day)
It is not a processing problem it is an algorithmic one.
Google 'hope' to reduce the effects of spamming with their new trustrank system.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks