Ryze - Business Networking Buy Ethereum and Bitcoin
Get started with Cryptocurrency investing
Home Invite Friends Networks Friends classifieds
Home

Apply for Membership

About Ryze


Minding Your Own Business
Previous Topic | Next Topic | Topics
The Minding Your Own Business Network is not currently active and cannot accept new posts
Indexed PagesViews: 740
May 01, 2006 11:11 pmIndexed Pages#

Audrey Okaneko

I've only just learned how to know if my site is indexed. I think this means do the search engines know I exist?

My question is how do my other pages get indexed? Right now, only my front page is indexed. I checked a few of my friends sites and 60 pages are indexed. How will the rest of mine become indexed, or what steps do I need to take?

Audrey
Beginner Scrapbooking

Private Reply to Audrey Okaneko

May 01, 2006 11:42 pmre: Indexed Pages#

Andrew Barnes
It is always best to try to work 'wityh' the Search Engines.
They like to have a .txt file which is named a variety of titles, depending on the Engine.
One of my sites currently has robots.txt and urllist.txt .
Full instructions are given at the URL submission site for Google, certainly, and probably most others.

http://www.google.com/addurl/?continue=/addurl

As you'll read there, all the file need be is a lit of all the pages you would like to have spidered when the bot visits.


Metta.
AndyE
"The Honest Marketer".
http://www.etribes.com/britweb

Private Reply to Andrew Barnes

May 03, 2006 4:57 pmre: Indexed Pages#

Grace Judson
Audrey,

My understanding of the Robots text file is that it is a good way to keep search engines from indexing things you don't want them to have access to. For instance, it can keep them away from any email template files you may have for form handlers, so that spammers can't get hold of your email address. (Not that it works 100%, but every little bit helps!)

The search engine spiders (the programs that index sites) want main pages to have links to the rest of the site. If the main page doesn't have those links, then the spider can't crawl your site (sorry if you don't like the spider imagery!) because it can't find your other pages.

If you have a clear navigation system, with links to most of your second-level pages on the home page, and links going no more than another two or three pages deep, then the search engines can easily find and index all your pages.

What I mean is, if your home page is at level 1, and it links to five pages, and one or two of those pages has links to another two or three each, that would be three levels deep. And the search engine indexer (the spider) can easily find all those pages.

If you have a complex site, having a complete site map that's linked from the home page makes search engines very happy!

Note that there may be some pages you don't care if they're indexed or not. For instance, my site has some "thanks for signing up" pages that have very little content and I don't really want them to be indexed - and I don't have direct links to them anywhere; they get called from my form handlers.

Hope that helps! feel free to send me a message if anything needs clarification.

Grace

Private Reply to Grace Judson

May 03, 2006 5:50 pmre: re: Indexed Pages#

Reg Charie
Putting a robots.txt file is a poor security risk if you are using it to disallow robots access to certain folders/files.

Since robots.txt is in the clear, anyone can bring it up and see what folders/files you consider private.

Reg

Private Reply to Reg Charie

May 03, 2006 6:31 pmre: Indexed Pages#

Andrew Barnes
Grace,

I stand ready to be corrected, but my understanding of the 'rogbot'txt file was that it served as a kind of a 'road map' for the spiders.

Yes, I am aware taht you can effectively 'blodk' a spider by including the appropiate code, but getting back to the 'raod map' analogy.

If the spider knows what is on the 'map', it will look to find all of the 'roads'.

Now, for this to happen, you are quite correct, the roads must lead into each other, or connected - 'links', just as a road with no other 'leading into it' can't be accessed.

My point is, that having the 'map' makes the spider do it a lot more quickly.


Metta.
AndyE
"The Honest Marketer".
http://www.etribes.com/britweb

Private Reply to Andrew Barnes

May 03, 2006 7:09 pmre: re: Indexed Pages#

Reg Charie

My point is, that having the 'map' makes the spider do it a lot more quickly.

Not really Andy, if your navigation structure is well done.
If your primary navigation links properly to sub-pages from the home and other pages, and those pages link to their sub-pages there is no need for a robots.txt.

Traditionally they are used more to disallow than to allow or to point the search engines to specific paths.
Here is the stock robots.txt for e107, a CMS.
User-agent: *
Disallow: /e107_admin/
Disallow: /e107_plugins/
Disallow: /e107_languages/
Disallow: /e107_images/
Disallow: /e107_themes/
Disallow: /e107_files/
Disallow: /e107_docs/
Disallow: /e107_handlers/
Disallow: /e107_install/
 

Reg

Private Reply to Reg Charie

May 03, 2006 8:31 pmre: Indexed Pages#

Andrew Barnes
Thanks Reg,

I am learning.

Why, then, do google provide the guidance they do (see earlier link) which read to me that it was in my, and the spiders favour, to provide a robot.txt file.

I know, when I construct my pages (only my own, I don't for others - I outsource) all the links are clear. I normally have about 10-15 pages, with the architecture so constructed to allow expansion as required.
I use a java-menu, with the main pages linked in text at the foot of the page, secondary menus, when needed equally provided on lower levels.
All help welcome. It seems google have led me astray! ;-}


Metta.
AndyE
"The Honest Marketer".
http://www.etribes.com/britweb

Private Reply to Andrew Barnes

May 03, 2006 9:01 pmre: re: Indexed Pages#

Reg Charie
Ahh Google can do that.
Sometimes, depending on the JS used, a navigation system like that can have problems giving clear direction to the spider.

If anything other than primary navigation is to be used, I would consider a Google map.

It would be a lot less time consuming to set up than manually putting all your links into a text file.

Reg

Private Reply to Reg Charie

May 08, 2006 7:30 pmre: re: re: Indexed Pages#

Grace Judson
It's also worth remembering that a small but significant number of people don't allow JavaScript to run in their browsers for various reason.

I'm trying to find the material I based my robots.txt on, but so far I'm not succeeding; I may have dumped it after I got things set up (I've been known to go on cleanup frenzies that I later regretted!). In the meantime, yes, Reg, you're quite right that in some ways it doesn't make sense to put the robots.txt "disallow" statements in there since it is in clear, as you put it. But if you list a separate DIRECTORY and disallow that, and then put all the files you want to keep un-viewed in that directory, then anyone who wants to look at those files would have to be able to guess their names. (Mea culpa; I haven't done that - YET - on my site, I admit!)

It's a choice we make whether to bother with this, whether we think it helps or think it just causes more trouble than it's worth!

Private Reply to Grace Judson

Previous Topic | Next Topic | Topics

Back to Minding Your Own Business





Ryze Admin - Support   |   About Ryze



© Ryze Limited. Ryze is a trademark of Ryze Limited.  Terms of Service, including the Privacy Policy