Follow us on Twitter!
Follow us on Facebook!
 

Go Back   Pixtus - Photography Forum, Photographers, Photo Tips > General Information > Open Talk


robots.txt (spiders and crawlers)

This is a discussion on robots.txt (spiders and crawlers) within the Open Talk forums, part of the General Information category; anyone here using a robots.txt file to prevent crawlers and spiders and suce from snooping around your sites? if so ...

Reply
 
LinkBack Thread Tools Display Modes
  (#1) Old
Uber Poster
 
JamesB's Avatar
 
Posts: 3,382
Join Date: Feb 2005
Location: Houston, Texas
Real First Name: James
Camera: 60D
Can Others Edit My Photos: Yes
iTrader Rating: 3

Likes Received LIKES Received: 1
Likes Given LIKES Given: 0
robots.txt (spiders and crawlers) - 10-31-2006, 08:17 PM


anyone here using a robots.txt file to prevent crawlers and spiders and suce from snooping around your sites?

if so do use this command

# go away
User-agent: *
Disallow: /

I currently just have the

# No robot will spider the domain
User-agent: *
Disallow: /

I want to keep as many crawlers away from my site as possible.

James
Reply With Quote
Sponsored Links

Premium Members do not see Google advertisements. SIGN UP today and help support our community.
  (#2) Old
Uber Poster
 
Clint_Smith's Avatar
 
Posts: 2,327
Join Date: Jan 2005
Location: Texarkana, Texas
Real First Name: Clint
Camera: Canon
Can Others Edit My Photos: No
iTrader Rating: 7

Likes Received LIKES Received: 4
Likes Given LIKES Given: 0
10-31-2006, 10:10 PM


hmmm...been getting a lot of "investment" mail lately...maybe should ad this to the contact page meta data...

---------------------------
RAW - Because I am smarter than my camera!

Website: ClintSmithPhoto.com
Reply With Quote
  (#3) Old
Forum Master
 
Nocturnus's Avatar
 
Posts: 1,620
Join Date: Dec 2004
Location: Austin, Texas
Real First Name: Ed
Camera: Canon 40d
Can Others Edit My Photos: Yes
iTrader Rating: 2

Likes Received LIKES Received: 16
Likes Given LIKES Given: 7
10-31-2006, 10:52 PM


Only problem with that is, they already have your email, so it might stop new crawlers, but your still gonna get spam. I got ~200-300 on my domain a day now.

---------------------------
Ed Fay
http://www.photo-mojo.net
Reply With Quote
  (#4) Old
Uber Poster
 
JamesB's Avatar
 
Posts: 3,382
Join Date: Feb 2005
Location: Houston, Texas
Real First Name: James
Camera: 60D
Can Others Edit My Photos: Yes
iTrader Rating: 3

Likes Received LIKES Received: 1
Likes Given LIKES Given: 0
10-31-2006, 10:57 PM


I have never listed my email address anywhere on the web, well I use to, but I removed them years ago due to spam and deleted those email address. I never display my email address anywhere. up until now, I just put them on my new site, but I want to keep all the crawlers away so my site does not generate traffic with out me giving out the link.

James
Reply With Quote
  (#5) Old
Senior Member
 
haraki74's Avatar
 
Posts: 478
Join Date: Feb 2005
Location: Houston, Texas
Real First Name: Harry
Camera: Canon
Can Others Edit My Photos: Yes
iTrader Rating: 1

Likes Received LIKES Received: 0
Likes Given LIKES Given: 0
11-01-2006, 08:53 AM


Me too on the "investments" emails. For those of us who are web challenged, can you give us a short how to lesson?

---------------------------
Everything has its beauty but not everyone sees it - Confucius
Reply With Quote
  (#6) Old
Bit herder
 
Gordon's Avatar
 
Posts: 3,265
Join Date: Jan 2005
Location: Austin, Tx,
Real First Name: Gordon
Camera: Canon
Can Others Edit My Photos: Yes
iTrader Rating: 2

Likes Received LIKES Received: 0
Likes Given LIKES Given: 0
Send a message via Yahoo to Gordon
11-01-2006, 08:57 AM


I just use disposable addresses for any web accounts. Makes life easier.

---------------------------
--
ghost town graveyard
Reply With Quote
  (#7) Old
Forum Master
 
Detonate's Avatar
 
Posts: 1,954
Join Date: Jul 2005
Location: Colleyville, TX, Texas
Real First Name: Jim
Camera: Nikon D7000
Can Others Edit My Photos: Yes
iTrader Rating: 8

Likes Received LIKES Received: 0
Likes Given LIKES Given: 4
11-01-2006, 09:00 AM


I use javascript to display my email link.

Still works for customers, fools the harvestors....

Code:
<script language="JavaScript" type="text/javascript">
<!-- Hide me from older non-JavaScript-enabled browsers 

var name = "jim"; 
var atsign = "@"; 
var virtual_domain = "jimmarch"; 
var dotcom = ".com"; 

document.write("<a href=mailto:", name + atsign + 
virtual_domain + dotcom, ">email me</a>"); 

// --> 
        </script>

---------------------------
Jim March
JimMarch.com | SCUBA | MM | Facebook | flickr

Last edited by Detonate; 11-01-2006 at 09:02 AM..
Reply With Quote
  (#8) Old
You Can't Be Serious!!
 
brad's Avatar
 
Posts: 13,314
Join Date: Mar 2005
Location: DFW, Texas
Real First Name: Brad (duh)
Camera: Canon
Can Others Edit My Photos: Yes
iTrader Rating: 12

Likes Received LIKES Received: 136
Likes Given LIKES Given: 33
11-01-2006, 10:53 AM


it isn't going to stop a spam crawler... robots.txt is for well behaved, friendly crawlers. Spam bots are not.

---------------------------
Brad Barton, Grand Prairie, TX (DFW) Twitter -- Blog -- Headshots -- Portraits
Honest critiques always welcomed.
An artist is not paid for his labor, but for his vision. -- James Whistler, Painter, 1834-1903
Reply With Quote
  (#9) Old
Forum Master
 
markfh's Avatar
 
Posts: 1,064
Join Date: Jul 2005
Location: Houston, Texas
Real First Name: Mark
Camera: Canon 1DsMkII, 7D
Can Others Edit My Photos: No
iTrader Rating: 6

Likes Received LIKES Received: 11
Likes Given LIKES Given: 0
11-01-2006, 11:41 AM


It won't stop dictionary attacks either. That's the job of your email provider. One other thing to do is make sure you don't have a catch-all account like "nobody" on your domain.

Unfortunately once you use your email address anywhere it can become available to spammers. Mostly by friends or business associates having your address in their address book on a computer that has been, is, or will be infected by a virus or trojan.

Spam campaigns have almost trippled within the last month and has caused quite a number of problems for email administrators.

The problem at Yahoo caused many of their email servers to be unresponsive to the point that some accounts, like mine, were not valid within their own networks for a period of time.

I had to reactivate my Yahoo account for groups that I subscribe to yesterday.

Another problem with email servers are the number of SMTP vulnerabilities that have been publicized lately that in some instances cause the service to be shut down on email servers.

Another area for you web designers is to make sure that any contact forms you write for your web site or clients sites are secured from having spammers take them over and spamming the world through your contact forms.

This appears to be one of the more recent vulnerabilities that many web hosting companies and developers haven't had to deal with in the past but is causing problems now.

We've had to strip out the contact forms on over 50 different web sites on our servers to stop this from happening.

---------------------------
"To be ignorant of one's ignorance is the malady of the ignorant." Amos Bronson Alcott
Reply With Quote
  (#10) Old
Senior Member
 
simon's Avatar
 
Posts: 328
Join Date: Jun 2006
Location: Houston,
Real First Name: Your Real First Name
iTrader Rating: 0

Likes Received LIKES Received: 0
Likes Given LIKES Given: 0
11-01-2006, 12:22 PM


robots.txt is a request, not enforceable. If you include a blanket disallow, all the good guys won't crawl you and all the bad guys still will, so it's useless to avoid spam harvesting etc.

By the way, the two entries you show are the same. Did you mean that?

Last edited by simon; 11-01-2006 at 12:24 PM..
Reply With Quote
  (#11) Old
Forum Regular
 
AlexMorse's Avatar
 
Posts: 515
Join Date: Feb 2006
Location: Austin, TX,
Real First Name: Alex
Camera: Canon
Can Others Edit My Photos: Yes
iTrader Rating: 3

Likes Received LIKES Received: 0
Likes Given LIKES Given: 0
Send a message via AIM to AlexMorse Send a message via Yahoo to AlexMorse
11-01-2006, 12:29 PM


The javascript solution is excellent. Or don't have your email address on your site, have a contact form.

I encourage robots and spiders for cataloging my site in some of the lesser known search engines.

---------------------------
A creative man is motivated by the desire to achieve, not by the desire to beat others.
Reply With Quote
  (#12) Old
Uber Poster
 
JamesB's Avatar
 
Posts: 3,382
Join Date: Feb 2005
Location: Houston, Texas
Real First Name: James
Camera: 60D
Can Others Edit My Photos: Yes
iTrader Rating: 3

Likes Received LIKES Received: 1
Likes Given LIKES Given: 0
11-01-2006, 12:50 PM


thanks all, there is some good help here.
I do NOT want my site indexed by search engines. I know some people want that, but not me. I will be adding some scripts and such to help reduce the amount of crawlers to my site.

yesterday I had over 800 hits on my new gallery and 57 users on line at one time and only me and my wife have the link for that. Im waiting on the web stats to update so I can see whats hitting it

I did use a disposable email forwarder address on my site, so if I start getting spam I can change it with no worries and delete the forwarder.

James
Reply With Quote
  (#13) Old
Senior Member
 
simon's Avatar
 
Posts: 328
Join Date: Jun 2006
Location: Houston,
Real First Name: Your Real First Name
iTrader Rating: 0

Likes Received LIKES Received: 0
Likes Given LIKES Given: 0
11-03-2006, 04:10 PM


Quote:
Originally Posted by JamesB
thanks all, there is some good help here.
I do NOT want my site indexed by search engines.
You can't really stop it entirely, but you can minimize the impact by the way you structure the site. Are you trying to avoid email harvesting, or are you trying to avoid load on the server/bandwidth ?
Reply With Quote
  (#14) Old
Uber Poster
 
JamesB's Avatar
 
Posts: 3,382
Join Date: Feb 2005
Location: Houston, Texas
Real First Name: James
Camera: 60D
Can Others Edit My Photos: Yes
iTrader Rating: 3

Likes Received LIKES Received: 1
Likes Given LIKES Given: 0
11-03-2006, 04:29 PM


Quote:
Originally Posted by simon
You can't really stop it entirely, but you can minimize the impact by the way you structure the site. Are you trying to avoid email harvesting, or are you trying to avoid load on the server/bandwidth ?
I know I can not stop all the crawlers, but I sure want to reduce the amount that scan my site.

I pretty much want to avoid the server/bandwidth issue. I am not all that concerned with email harvesting as Im using a disposable email address in which I can delete or change at will
James
Reply With Quote
  (#15) Old
Forum Master
 
markfh's Avatar
 
Posts: 1,064
Join Date: Jul 2005
Location: Houston, Texas
Real First Name: Mark
Camera: Canon 1DsMkII, 7D
Can Others Edit My Photos: No
iTrader Rating: 6

Likes Received LIKES Received: 11
Likes Given LIKES Given: 0
11-03-2006, 10:31 PM


why are you putting an email address on it at all?

---------------------------
"To be ignorant of one's ignorance is the malady of the ignorant." Amos Bronson Alcott
Reply With Quote
Reply

Tags
crawlers, robotstxt, spiders

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



Visit Our Sponsors
 

Google Sponsors

Premium Members do not see Google advertisements. SIGN UP today and help support our community.

Copyright ©2004 - 2011, Abel Longoria - www.Pixtus.com
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.