38 users online (1 members and 37 guests)  

Thread: robots.txt


  Results 1 to 10 of 10

Related

  1. Help With Robots.txt File    Forum: HTML Forum
    Replies: 1
  2. robots.txt    Forum: Search Engine Optimization - SEO - Forum
    Replies: 2
  3. robots.txt    Forum: HTML Forum
    Replies: 0
  1. #1
    wahid's Avatar
    Senior Member

    Status
    Offline
    Join Date
    Oct 2003
    Posts
    75

    Question robots.txt

    Peace...

    Hi All...
    Could you please tell me what is the :
    1.use;
    2.advantage;
    3. disadvantage;
    of having a file or page called robots.txt on one's site, if 1-3 above are different, perhaps they are all the same...huh?

    I have a feeling that bots are looking for that file or page.

    Thanx for all responses and your time.

    Peace
    Wahid

  2. #2
    HTML's Avatar
    Administrator

    Status
    Offline
    Join Date
    Aug 2000
    Posts
    3,445

    Follow HTML On Twitter Add HTML on Facebook Add HTML on Google+ Add HTML on Linkedin Visit HTML's Youtube Channel
    Quote Originally Posted by wahid

    I have a feeling that bots are looking for that file or page.
    They sure are! Robots.txt is a simple text file which many spiders search for, this file tells them what to spider and what to leave alone. Keep in mind that there are many rogue spiders do not follow the rules that you place in this file.

    Creating your own robots.txt is very simple, so simple that I think the best way to show you how is to just show you my file at http://www.ahfb2000.com/robots.txt

    Dave

  3. #3
    wahid's Avatar
    Senior Member

    Status
    Offline
    Join Date
    Oct 2003
    Posts
    75

    Thumbs up Thanx a bunch Dave...WOW

    Quote Originally Posted by Dave
    They sure are! Robots.txt is a simple text file which many spiders search for, this file tells them what to spider and what to leave alone. Keep in mind that there are many rogue spiders do not follow the rules that you place in this file.

    Creating your own robots.txt is very simple, so simple that I think the best way to show you how is to just show you my file at http://www.ahfb2000.com/robots.txt

    Dave
    Peace

    I had not a clue....at all...wow!!!
    Thanx for the URL, link...I am able to see what such a file looks like....

    OK here is the zillion dollar question, or er two.

    1. Why do you disallow all those guys?
    2. Are others unnamed allowed?
    3. Are the disallowed guys RogueBOTS or TerroristBOTS?
    4. What do these badBOTS do, as opposed to the good BOTS

    I notice that you have some pages, that are named specifically, and disallow seems to be the general instruction.

    5. Is there an issue where these guys may spend a long time on these pages?
    6. Does that use up bandwdth?
    7. Does that slow your pages or something....?

    [b]Sorry for all the questions...but it just sends me thinking, when I see all the guys disallowed, and the specfied pages, that seem to be off limits.[b]

    Of course you do not have to answer all those of my (jaw-dropping) questions...
    I guess I now have to google robots.txt .

    Thanx a bunch Dave, really great to be on this board so mucn to learn, and it is so easy...LOL, after you explain that is.... .

    Sorry I do not count so well on the two questions.

    Peace
    Wahid

  4. #4
    HTML's Avatar
    Administrator

    Status
    Offline
    Join Date
    Aug 2000
    Posts
    3,445

    Follow HTML On Twitter Add HTML on Facebook Add HTML on Google+ Add HTML on Linkedin Visit HTML's Youtube Channel
    1. Why do you disallow all those guys? -Most are emailer harvesters, they spider sites searching for email strings for spam lists. Most of these do not obey the robots.txt file so I take the extra precaution to be certain that no members emails are shown to other users. The forum uses a form and only shows the user name.

    2. Are others unnamed allowed? -for now

    3. Are the disallowed guys RogueBOTS or TerroristBOTS?Folks may disallow for a variety of reasons. I ask google to not spider the folders which make up the backend of ahfb, not that they could get in to do it but they do not waste the resources trying, allowing them to spend more time where I want them. Some bots are disallowed because they are downloadable apps which are usally used to steal others content. As stated above a bunch are email harvesters.

    4. What do these badBOTS do, as opposed to the good BOTS -pretty much covered above

    I notice that you have some pages, that are named specifically, and disallow seems to be the general instruction.

    5. Is there an issue where these guys may spend a long time on these pages?
    - see item 3

    6. Does that use up bandwdth? -all spiders will use bandwidth(transfer).

    7. Does that slow your pages or something....? -I do not follow the question

    Dave

  5. #5
    wahid's Avatar
    Senior Member

    Status
    Offline
    Join Date
    Oct 2003
    Posts
    75

    Thumbs up I am advised and Informed, thanx

    Peace

    H/Lo Dave

    Thanx for your response...as per caption.
    My last question was unclear...sorry
    It was really asking whether a website's pages was slower to be accessed.

    That however has been answered by your response that says that all spiders will use bandwidth(transfer).

    What follows naturally is whether one can:
    1. calibrate/determine the amount of bots visiting a site at any given day/time...

    However...
    Here is one final issue...
    Does the number of visit requested by a web-page have any relation to the frequency of the bots....

    For example please take a look if you will at this...I will place it in:
    Code:
     <META content="3 days" name=revisit-after>
    <META content=ALL name=ROBOTS>
    <META content=INDEX name=ROBOTS>
    <META content=FOLLOW name=ROBOTS>
    2. How important are those "3 days" as opposed any other number of days, and bot visits?

    3. And how does:
    Code:
    <META content=INDEX name=ROBOTS>
    <META content=FOLLOW name=ROBOTS>
    figure in your page specification....
    In the foregoing all bots seem invited.....

    Not even sure whether these are necessary questions, but I thought of them while reading your response, and figured it may be a good idea to share it...

    Thanx again...
    Peace be unto you

  6. #6
    HTML's Avatar
    Administrator

    Status
    Offline
    Join Date
    Aug 2000
    Posts
    3,445

    Follow HTML On Twitter Add HTML on Facebook Add HTML on Google+ Add HTML on Linkedin Visit HTML's Youtube Channel
    1 -there should be no reason to block the search engine spiders, the more bandwidth the use, read pages they visit, the more traffic for you. The bad bots will disregard anyway.

    2 - the revisit tag is useless

    3 - I think one is better off not using those meta tags at all but using a robots.txt instead.

    Dave

  7. #7
    wahid's Avatar
    Senior Member

    Status
    Offline
    Join Date
    Oct 2003
    Posts
    75

    Thumbs up Thanx a bunch Dave

    Quote Originally Posted by Dave
    1 -there should be no reason to block the search engine spiders, the more bandwidth the use, read pages they visit, the more traffic for you. The bad bots will disregard anyway.

    2 - the revisit tag is useless

    3 - I think one is better off not using those meta tags at all but using a robots.txt instead.

    Dave
    Peace..
    H/Lo Dave

    Thanx for those quick responses,they are invaluable and timely information bits ...I am advised...and will act accordingly...

    Peace
    Wahid

  8. #8
    m4tr1x_w01f's Avatar
    New User

    Status
    Offline
    Join Date
    Feb 2004
    Location
    within the Matrix !!
    Posts
    13
    I have vaguely heard about these robots etc, but never knew anything about it.

    So, that said, how exactly do I use the robots.txt and how do you embed it into the source code etc ?

  9. #9
    haystack's Avatar

    Status
    Offline
    Join Date
    Feb 2004
    Location
    Minneapolis, MN
    Posts
    13
    m4tr1x_w01f, just create a text file called robots.txt and load it to the root directory of your site like Dave's. It doesn't have to be embedded in the source code of any pages because search engines know where to look for it.

  10. #10
    RWD's Avatar
    New User

    Status
    Offline
    Join Date
    Mar 2004
    Location
    Scotland
    Posts
    8
    This may seem a silly question, but where on the website would you place this file?




    EDIT: I didnt notice the post above, that has answered my question.



Tags for this Thread