robots.txt (spiders and crawlers)This is a discussion on robots.txt (spiders and crawlers) within the Open Talk forums, part of the General Information category; anyone here using a robots.txt file to prevent crawlers and spiders and suce from snooping around your sites?
if so ...
(#1)
| | Uber Poster
Posts: 3,382 Join Date: Feb 2005 Location: Houston, Texas Real First Name: James Camera: 60D Can Others Edit My Photos: Yes iTrader Rating: 3 LIKES Received: 1 LIKES Given: 0 | robots.txt (spiders and crawlers) -
10-31-2006, 08:17 PM
anyone here using a robots.txt file to prevent crawlers and spiders and suce from snooping around your sites?
if so do use this command
# go away
User-agent: *
Disallow: /
I currently just have the
# No robot will spider the domain
User-agent: *
Disallow: /
I want to keep as many crawlers away from my site as possible.
James | | | | | Sponsored Links | Premium Members do not see Google advertisements. SIGN UP today and help support our community.
|
(#2)
| | Uber Poster
Posts: 2,327 Join Date: Jan 2005 Location: Texarkana, Texas Real First Name: Clint Camera: Canon Can Others Edit My Photos: No iTrader Rating: 7 LIKES Received: 4 LIKES Given: 0 |
10-31-2006, 10:10 PM
hmmm...been getting a lot of "investment" mail lately...maybe should ad this to the contact page meta data...
--------------------------- RAW - Because I am smarter than my camera!
Website: ClintSmithPhoto.com | | | |
(#3)
| | Forum Master
Posts: 1,620 Join Date: Dec 2004 Location: Austin, Texas Real First Name: Ed Camera: Canon 40d Can Others Edit My Photos: Yes iTrader Rating: 2 LIKES Received: 16 LIKES Given: 7 |
10-31-2006, 10:52 PM
Only problem with that is, they already have your email, so it might stop new crawlers, but your still gonna get spam. I got ~200-300 on my domain a day now. | | | |
(#4)
| | Uber Poster
Posts: 3,382 Join Date: Feb 2005 Location: Houston, Texas Real First Name: James Camera: 60D Can Others Edit My Photos: Yes iTrader Rating: 3 LIKES Received: 1 LIKES Given: 0 |
10-31-2006, 10:57 PM
I have never listed my email address anywhere on the web, well I use to, but I removed them years ago due to spam and deleted those email address. I never display my email address anywhere. up until now, I just put them on my new site, but I want to keep all the crawlers away so my site does not generate traffic with out me giving out the link.
James | | | |
(#5)
| | Senior Member
Posts: 478 Join Date: Feb 2005 Location: Houston, Texas Real First Name: Harry Camera: Canon Can Others Edit My Photos: Yes iTrader Rating: 1 LIKES Received: 0 LIKES Given: 0 |
11-01-2006, 08:53 AM
Me too on the "investments" emails. For those of us who are web challenged, can you give us a short how to lesson?
---------------------------
Everything has its beauty but not everyone sees it - Confucius | | | |
(#6)
| | Bit herder
Posts: 3,265 Join Date: Jan 2005 Location: Austin, Tx, Real First Name: Gordon Camera: Canon Can Others Edit My Photos: Yes iTrader Rating: 2 LIKES Received: 0 LIKES Given: 0 |
11-01-2006, 08:57 AM
I just use disposable addresses for any web accounts. Makes life easier. | | | |
(#7)
| | Forum Master
Posts: 1,954 Join Date: Jul 2005 Location: Colleyville, TX, Texas Real First Name: Jim Camera: Nikon D7000 Can Others Edit My Photos: Yes iTrader Rating: 8 LIKES Received: 0 LIKES Given: 4 |
11-01-2006, 09:00 AM
I use javascript to display my email link.
Still works for customers, fools the harvestors.... Code: <script language="JavaScript" type="text/javascript">
<!-- Hide me from older non-JavaScript-enabled browsers
var name = "jim";
var atsign = "@";
var virtual_domain = "jimmarch";
var dotcom = ".com";
document.write("<a href=mailto:", name + atsign +
virtual_domain + dotcom, ">email me</a>");
// -->
</script>
Last edited by Detonate; 11-01-2006 at 09:02 AM..
| | | |
(#8)
| | You Can't Be Serious!!
Posts: 13,314 Join Date: Mar 2005 Location: DFW, Texas Real First Name: Brad (duh) Camera: Canon Can Others Edit My Photos: Yes iTrader Rating: 12 LIKES Received: 136 LIKES Given: 33 |
11-01-2006, 10:53 AM
it isn't going to stop a spam crawler... robots.txt is for well behaved, friendly crawlers. Spam bots are not.
--------------------------- Brad Barton, Grand Prairie, TX (DFW) Twitter -- Blog -- Headshots -- Portraits Honest critiques always welcomed. An artist is not paid for his labor, but for his vision. -- James Whistler, Painter, 1834-1903 | | | |
(#9)
| | Forum Master
Posts: 1,064 Join Date: Jul 2005 Location: Houston, Texas Real First Name: Mark Camera: Canon 1DsMkII, 7D Can Others Edit My Photos: No iTrader Rating: 6 LIKES Received: 11 LIKES Given: 0 |
11-01-2006, 11:41 AM
It won't stop dictionary attacks either. That's the job of your email provider. One other thing to do is make sure you don't have a catch-all account like "nobody" on your domain.
Unfortunately once you use your email address anywhere it can become available to spammers. Mostly by friends or business associates having your address in their address book on a computer that has been, is, or will be infected by a virus or trojan.
Spam campaigns have almost trippled within the last month and has caused quite a number of problems for email administrators.
The problem at Yahoo caused many of their email servers to be unresponsive to the point that some accounts, like mine, were not valid within their own networks for a period of time.
I had to reactivate my Yahoo account for groups that I subscribe to yesterday.
Another problem with email servers are the number of SMTP vulnerabilities that have been publicized lately that in some instances cause the service to be shut down on email servers.
Another area for you web designers is to make sure that any contact forms you write for your web site or clients sites are secured from having spammers take them over and spamming the world through your contact forms.
This appears to be one of the more recent vulnerabilities that many web hosting companies and developers haven't had to deal with in the past but is causing problems now.
We've had to strip out the contact forms on over 50 different web sites on our servers to stop this from happening.
---------------------------
"To be ignorant of one's ignorance is the malady of the ignorant." Amos Bronson Alcott
| | | |
(#10)
| | Senior Member
Posts: 328 Join Date: Jun 2006 Location: Houston, Real First Name: Your Real First Name iTrader Rating: 0 LIKES Received: 0 LIKES Given: 0 |
11-01-2006, 12:22 PM
robots.txt is a request, not enforceable. If you include a blanket disallow, all the good guys won't crawl you and all the bad guys still will, so it's useless to avoid spam harvesting etc.
By the way, the two entries you show are the same. Did you mean that?
Last edited by simon; 11-01-2006 at 12:24 PM..
| | | |
(#11)
| | Forum Regular
Posts: 515 Join Date: Feb 2006 Location: Austin, TX, Real First Name: Alex Camera: Canon Can Others Edit My Photos: Yes iTrader Rating: 3 LIKES Received: 0 LIKES Given: 0 |
11-01-2006, 12:29 PM
The javascript solution is excellent. Or don't have your email address on your site, have a contact form.
I encourage robots and spiders for cataloging my site in some of the lesser known search engines.
--------------------------- A creative man is motivated by the desire to achieve, not by the desire to beat others. | | | |
(#12)
| | Uber Poster
Posts: 3,382 Join Date: Feb 2005 Location: Houston, Texas Real First Name: James Camera: 60D Can Others Edit My Photos: Yes iTrader Rating: 3 LIKES Received: 1 LIKES Given: 0 |
11-01-2006, 12:50 PM
thanks all, there is some good help here.
I do NOT want my site indexed by search engines. I know some people want that, but not me. I will be adding some scripts and such to help reduce the amount of crawlers to my site.
yesterday I had over 800 hits on my new gallery and 57 users on line at one time and only me and my wife have the link for that. Im waiting on the web stats to update so I can see whats hitting it
I did use a disposable email forwarder address on my site, so if I start getting spam I can change it with no worries and delete the forwarder.
James | | | |
(#13)
| | Senior Member
Posts: 328 Join Date: Jun 2006 Location: Houston, Real First Name: Your Real First Name iTrader Rating: 0 LIKES Received: 0 LIKES Given: 0 |
11-03-2006, 04:10 PM
Quote: |
Originally Posted by JamesB thanks all, there is some good help here.
I do NOT want my site indexed by search engines. | You can't really stop it entirely, but you can minimize the impact by the way you structure the site. Are you trying to avoid email harvesting, or are you trying to avoid load on the server/bandwidth ? | | | |
(#14)
| | Uber Poster
Posts: 3,382 Join Date: Feb 2005 Location: Houston, Texas Real First Name: James Camera: 60D Can Others Edit My Photos: Yes iTrader Rating: 3 LIKES Received: 1 LIKES Given: 0 |
11-03-2006, 04:29 PM
Quote: |
Originally Posted by simon You can't really stop it entirely, but you can minimize the impact by the way you structure the site. Are you trying to avoid email harvesting, or are you trying to avoid load on the server/bandwidth ? | I know I can not stop all the crawlers, but I sure want to reduce the amount that scan my site.
I pretty much want to avoid the server/bandwidth issue. I am not all that concerned with email harvesting as Im using a disposable email address in which I can delete or change at will
James | | | |
(#15)
| | Forum Master
Posts: 1,064 Join Date: Jul 2005 Location: Houston, Texas Real First Name: Mark Camera: Canon 1DsMkII, 7D Can Others Edit My Photos: No iTrader Rating: 6 LIKES Received: 11 LIKES Given: 0 |
11-03-2006, 10:31 PM
why are you putting an email address on it at all?
---------------------------
"To be ignorant of one's ignorance is the malady of the ignorant." Amos Bronson Alcott
| | | | | Thread Tools | | | | Display Modes | Linear Mode |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | Google Sponsors | Premium Members do not see Google advertisements. SIGN UP today and help support our community.
| |
Copyright ©2004 - 2011, Abel Longoria - www.Pixtus.com
Powered by vBulletin® Version 3.8.7 Copyright ©2000 - 2012, vBulletin Solutions, Inc. |