Archive for category Internet

Quote Database Screen Scraper

I wanted to make fortune(6) cookie files out of the major internet quote databases. So I came up with a Bash script that uses lynx and sed to do it. It includes throttling by default because it’s not nice to suck up all the site’s bandwidth.

#!/bin/bash
# Scrapes a qdb and dumps output to a file
# and tokenizes it into a fortune cookie file
#
LYNX="/usr/bin/lynx"
DEFSLEEP="30"
PNUM_DELIM="%pnum%"

# parse command line args
# look for -s to specify sleep_int period
# look for -p to specify number of pages
# treat everything else like a URL
# %pnum% is page number delimiter
sleep_int=$DEFSLEEP
URLS=()
ARGV=("$@")
for (( thisarg = 0; thisarg < ${#ARGV[*]}; thisarg++ ))
do
	arg="${ARGV[$thisarg]}"
	if [ "$arg" = "-s" ]
	then
		thisarg=$(( $thisarg + 1 ))
		sleep_int=${ARGV[$thisarg]}
	elif [ "$arg" = "-p" ]
	then
		thisarg=$(( $thisarg + 1 ))
		pages=${ARGV[$thisarg]}
	else
		URLS=("${URLS[@]}" "$arg")
	fi
done

LYNXOPTS="-accept_all_cookies -assume_charset=iso-88591 -nolist -dump"

# debug
echo ${URLS[@]}
echo "Sleep is $sleep_int"

for url in "$URLS"
do
	name=`echo $url | cut -d / -f 3`
	mkdir "$name"

	for (( page = 1; page <= $pages; page++ ))
	do
		n_url=`echo $url | sed -e "s/$PNUM_DELIM/$page/g"`
		# debug
		echo "$LYNX $LYNXOPTS $n_url > $name/page_$page"
		$LYNX $LYNXOPTS $n_url > $name/page_$page
		sleep $sleep_int
	done

	# postprocess scraped text files
	cd "$name"

	# remove quote IDs and put in fortune delimiters
	for file in *
	do
		sed -e "s/^\s*#.*$/%/g" < "$file" >> "$name.txt"
	done

	# remove page navigation links
	sed -e "s/^.*[1-9][0-9]* of [1-9][0-9]*.*$//g" < "$name.txt" > "$name"

	# make the database
	strfile "$name"
	rm "$name.txt"

done

Leave a comment

Quick ‘n dirty anagrammer using the Internet Anagram Server

I was bored one day and noticed that the an(6) program wasn’t in Red Hat’s repos. So I wrote a Perl script to talk to the Internet Anagram Server’s HTTP interface:

#!/usr/bin/perl
use strict;
use warnings;

use HTTP::Client;
use URI::Escape;

my $client   = HTTP::Client->new();
my $string   = uri_escape(join(" ", @ARGV));
my $document = $client->get("http://www.wordsmith.org/anagram/anagram.cgi?anagram=" . $string . "&t=1000&a=n");
my $start    = index($document, "Anagrams for: $string</h3>");
my $end      = index($document, '<bottomlinks>');
my $relevant = substr($document, $start, ($end - $start));
$relevant    =~ s/<.*>//g;
print $relevant;

, , ,

Leave a comment

Ruby is rolling on Rails

So I’ve decided to learn Ruby on Rails.  I’ve so far gotten a new generic template up and running on my server.  Don’t bother trying to hit this URL from your browser – it won’t work outside my LAN.

Ruby on Rails landing page

Ruby on Rails landing page

Leave a comment

Name my photoblog

I’m making my own photoblog.  And I need to come up with a name.  For some reason Pete’s Pixels is the best I can come up with.  But I haven’t a clue who Pete is.  😛  So come up with a name for my photoblog.  🙂

2 Comments

Elitism is not helpful

I’ve been reading the photographic fora lately for advice and such.  I posted a question about relative IQ between older FD lenses with converter and newer EF lenses.  A pompous asshole answered.

He said, “Why are you trying to use old FD lenses?  They’re crap on EOS cameras.”  I said, “I have them and can’t afford EF lenses.  Look at my options for a 50mm:  (1) my current FD 50mm f/1.4 with adaptor, (2) the EF 50mm f/1.8 for $120, (3) the EF 50mm f/1.4 USM for $360, or (4) the EF 50mm f/1.2L for $900.  I could maybe afford the $120 lens if I had to but is there really that much IQ difference?”  He said, “Option one is crap because of the adaptor.  Option two is crap because the lens has a plastic barrel and no USM.  Option 3 is crap because the AF quits working after 3 months.  Clearly option 4 is the right answer.”  I said, “Right but $900 is not in my price range.”  He said, “Then why are you in photography?  Try stamp collecting – it’s cheaper.”

Dismissive assholes like this really don’t help me to want to take more pictures.  I was looking for advice on whether or not I should be concerned about using FD lenses on my EOS camera from an IQ perspective, not whether or not I was worthy to own a camera.  😦

,

2 Comments

Parallelism in news

This was on msn.com this morning:

News:  Teen Death, France Shooting, Santorum

News: Teen Death, France Shooting, Santorum

It’s always a bad day when Teen Death, France Shootings, and Santorum all happen.

Leave a comment

Why I love Regretsy

Why I Love Regretsy

Why I Love Regretsy

Leave a comment

This seems like the best way to bake

I love baking, but it’s too f-ing hot outside. So, I’ll just live vicariously through this girl.

Leave a comment

OMG! That’s my work!

You should all know by now that I spend a good deal of time working on my embroidery. I learn about my craft through looking at other people’s work. My favorite sites include Craftster, Feeling  Stitching, and most of all, mrxstitch. They have cool stuff, and this. Notice anything? That’s right, I made that. That’s my work. And it’s on mrxstitch. FUCKING AWESOME! If you’re too lazy to click the link, this is what I made.

1 Comment

I <3 Google Devs

Google Devs Are Awesome

Google Devs Are Awesome

I was using Picasa 3 Beta for Linux.  Rather than having the label for the checkbox say “Conserve Bandwidth,” the devs went with “Don’t eat all my bandwidth.”  This is why I love the Google devs.  🙂

Leave a comment