Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

Spidering Hacks [Mīkstie vāki]

3.69/5 (104 ratings by Goodreads)

Morbus Iff

Formāts: Paperback / softback, 420 pages, height x width x depth: 233x153x25 mm, index
Sērija : Hacks Ser.
Izdošanas datums: 02-Dec-2003
Izdevniecība: O'Reilly Media
ISBN-10: 0596005776
ISBN-13: 9780596005771

Citas grāmatas par šo tēmu:

Internet guides & online services - (Noliktavā: 3 punkts)
Network security

Mīkstie vāki
Cena: 28,81 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Standarta cena: 33,90 €
Ietaupiet 15%
Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
Daudzums:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Ielikt grozā
Piegādes laiks - 4-6 nedēļas
Pievienot vēlmju sarakstam

Formāts: Paperback / softback, 420 pages, height x width x depth: 233x153x25 mm, index
Sērija : Hacks Ser.
Izdošanas datums: 02-Dec-2003
Izdevniecība: O'Reilly Media
ISBN-10: 0596005776
ISBN-13: 9780596005771

Citas grāmatas par šo tēmu:

Internet guides & online services - (Noliktavā: 3 punkts)
Network security

Permanent link: https://www.kriso.lv/db/9780596005771.html

Keywords:

Provides techniques on creating spiders and scrapers to retrieve information from Web sites and data sources.

With this crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when one has gone too far: what's acceptable and unacceptable), readers learn how to collect media files and data from databases; how to interpret and understand the data and repurpose it for use in other applications; and even build authorized interfaces to integrate the data into their own content.

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you.

Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

Aggregate and associate data from disparate locations, then store and manipulate the data as you like
Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
Integrate third-party data into your own applications or web sites
Make your own site easier to scrape and more usable to others
Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day

Like the other books in O'Reilly's popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you're interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data.

With this crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when one has gone too far: what's acceptable and unacceptable), readers learn how to collect media files and data from databases; how to interpret and understand the data and repurpose it for use in other applications; and even build authorized interfaces to integrate the data into their own content.

Credits

Preface

Walking Softly

(20)

A Crash Course in Spidering and Scraping

(2)

Best Practices for You and Your Spider

(4)

Anatomy of an HTML Page

(3)

Registering Your Spider

(2)

Preempting Discovery

(3)

Keeping Your Spider Out of Sticky Situations

(3)

Finding the Patterns of Identifiers

(3)

Assembling a Toolbox

(78)

Perl Modules

(1)

Resources You May Find Helpful

(1)

Installing Perl Modules

(3)

Simply Fetching with LWP::Simple

(2)

More Involved Requests with LWP::UserAgent

(1)

Adding HTTP Headers to Your Request

(2)

Posting Form Data with LWP

(2)

Authentication, Cookies, and Proxies

(4)

Handling Relative and Absolute URLs

(2)

Secured Access and Browser Attributes

(2)

Respecting Your Scrapee's Bandwidth

(4)

Respecting robots.txt

(1)

Adding Progress Bars to Your Scripts

(6)

Scraping with HTML::TreeBuilder

(3)

Parsing with HTML::TokeParser

(3)

WWW::Mechanize 101

(3)

Scraping with WWW::Mechanize

(5)

In Praise of Regular Expressions

(3)

Painless RSS with Template::Extract

(4)

A Quick Introduction to XPath

(4)

Downloading with curl and wget

(2)

More Advanced wget Techniques

(2)

Using Pipes to Chain Commands

(4)

Running Multiple Utilities at Once

(3)

Utilizing the Web Scraping Proxy

(4)

Being Warned When Things Go Wrong

(3)

Being Adaptive to Site Redesigns

(3)

Collecting Media Files

(42)

Detective Case Study: Newgrounds

(6)

Detective Case Study: iFilm

105

(3)

Downloading Movies from the Library of Congress

108

(3)

Downloading Images from Webshots

111

(4)

Downloading Comics with dailystrips

115

(3)

Archiving Your Favorite Webcams

118

(4)

News Wallpaper for Your Site

122

(3)

Saving Only POP3 Email Attachments

125

(7)

Downloading MP3s from a Playlist

132

(5)

Downloading from Usenet with nget

137

(4)

Gleaning Data from Databases

141

(208)

Archiving Yahoo! Groups Messages with yahoo2mbox

141

(2)

Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups

143

(4)

Gleaning Buzz from Yahoo!

147

(3)

Spidering the Yahoo! Catalog

150

(7)

Tracking Additions to Yahoo!

157

(3)

Scattersearch with Yahoo! and Google

160

(4)

Yahoo! Directory Mindshare in Google

164

(4)

Weblog-Free Google Results

168

(3)

Spidering, Google, and Multiple Domains

171

(5)

Scraping Amazon.com Product Reviews

176

(2)

Receive an Email Alert for Newly Added Amazon.com Reviews

178

(2)

Scraping Amazon.com Customer Advice

180

(2)

Publishing Amazon.com Associates Statistics

182

(3)

Sorting Amazon.com Recommendations by Rating

185

(3)

Related Amazon.com Products with Alexa

188

(5)

Scraping Alexa's Competitive Data with Java

193

(1)

Finding Album Information with FreeDB and Amazon.com

194

(9)

Expanding Your Musical Tastes

203

(4)

Saving Daily Horoscopes to Your iPod

207

(2)

Graphing Data with RRDTOOL

209

(4)

Stocking Up on Financial Quotes

213

(4)

Super Author Searching

217

(15)

Mapping O'Reilly Best Sellers to Library Popularity

232

(3)

Using All Consuming to Get Book Lists

235

(6)

Tracking Packages with FedEx

241

(2)

Checking Blogs for New Comments

243

(5)

Aggregating RSS and Posting Changes

248

(7)

Using the Link Cosmos of Technorati

255

(4)

Finding Related RSS Feeds

259

(11)

Automatically Finding Blogs of Interest

270

(3)

Scraping TV Listings

273

(4)

What's Your Visitor's Weather Like?

277

(4)

Trendspotting with Geotargeting

281

(6)

Getting the Best Travel Route by Train

287

(3)

Geographic Distance and Back Again

290

(6)

Super Word Lookup

296

(4)

Word Associations with Lexical Freenet

300

(3)

Reformatting Bugtraq Reports

303

(5)

Keeping Tabs on the Web via Email

308

(6)

Publish IE's Favorites to Your Web Site

314

(8)

Spidering GameStop.com Game Prices

322

(3)

Bargain Hunting with PHP

325

(6)

Aggregating Multiple Search Engine Results

331

(4)

Robot Karaoke

335

(4)

Searching the Better Business Bureau

339

(3)

Searching for Health Inspections

342

(3)

Filtering for the Naughties

345

(4)

Maintaining Your Collections

349

(14)

Using cron to Automate Tasks

349

(2)

Scheduling Tasks Without cron

351

(4)

Mirroring Web Sites with wget and rsync

355

(4)

Accumulating Search Results Over Time

359

(4)

Giving Back to the World

363

(28)

Using XML::RSS to Repurpose Data

364

(4)

Placing RSS Headlines on Your Site

368

(3)

Making Your Resources Scrapable with Regular Expressions

371

(7)

Making Your Resources Scrapable with a Rest Interface

378

(3)

Making Your Resources Scrapable with XML-RPC

381

(4)

Creating an IM Interface

385

(4)

Going Beyond the Book

389

(2)

Index

391

Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love. Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.

Spidering Hacks [Mīkstie vāki]

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Grāmatas angļu valodā

Izvēlieties iepirkumu grozu