Archive for category Funny

Quote Database Screen Scraper

I wanted to make fortune(6) cookie files out of the major internet quote databases. So I came up with a Bash script that uses lynx and sed to do it. It includes throttling by default because it’s not nice to suck up all the site’s bandwidth.

#!/bin/bash
# Scrapes a qdb and dumps output to a file
# and tokenizes it into a fortune cookie file
#
LYNX="/usr/bin/lynx"
DEFSLEEP="30"
PNUM_DELIM="%pnum%"

# parse command line args
# look for -s to specify sleep_int period
# look for -p to specify number of pages
# treat everything else like a URL
# %pnum% is page number delimiter
sleep_int=$DEFSLEEP
URLS=()
ARGV=("$@")
for (( thisarg = 0; thisarg < ${#ARGV[*]}; thisarg++ ))
do
	arg="${ARGV[$thisarg]}"
	if [ "$arg" = "-s" ]
	then
		thisarg=$(( $thisarg + 1 ))
		sleep_int=${ARGV[$thisarg]}
	elif [ "$arg" = "-p" ]
	then
		thisarg=$(( $thisarg + 1 ))
		pages=${ARGV[$thisarg]}
	else
		URLS=("${URLS[@]}" "$arg")
	fi
done

LYNXOPTS="-accept_all_cookies -assume_charset=iso-88591 -nolist -dump"

# debug
echo ${URLS[@]}
echo "Sleep is $sleep_int"

for url in "$URLS"
do
	name=`echo $url | cut -d / -f 3`
	mkdir "$name"

	for (( page = 1; page <= $pages; page++ ))
	do
		n_url=`echo $url | sed -e "s/$PNUM_DELIM/$page/g"`
		# debug
		echo "$LYNX $LYNXOPTS $n_url > $name/page_$page"
		$LYNX $LYNXOPTS $n_url > $name/page_$page
		sleep $sleep_int
	done

	# postprocess scraped text files
	cd "$name"

	# remove quote IDs and put in fortune delimiters
	for file in *
	do
		sed -e "s/^\s*#.*$/%/g" < "$file" >> "$name.txt"
	done

	# remove page navigation links
	sed -e "s/^.*[1-9][0-9]* of [1-9][0-9]*.*$//g" < "$name.txt" > "$name"

	# make the database
	strfile "$name"
	rm "$name.txt"

done
Advertisements

Leave a comment

Gift Ideas ;)

If anyone is thinking of spending a few thousand dollars on a gift for me (XD), this page has the lenses that will fit my camera body.  Any lens on that page would be a welcome addition to my gear except the following:

  • EF 8-15mm f/4L Fisheye USM
  • EF-S 18-55mm f/3.5-5.6 IS II (came with my camera body)
  • EF-S 18-135mm f/3.5-5.6 IS STM
  • EF 15mm f/2.8 Fisheye
  • EF 40mm f/2.8 STM
  • Any of the TS-E lenses.

Leave a comment

Self-critical job ads

That's Bull!

That’s Bull!

This job description seems to have some self-esteem issues.  After every skill set mentioned, it denies that you actually need that.  “Proficient with CP firewalls.”  “Bull!”  Job ads that are overly self-critical make me sad.

(Yes, I know that it’s just the parsing engine misreading “&bull;,” the HTML bullet entity, but it’s funnier to think the ad is calling bullshit on itself.  :P)

Leave a comment

Geeky haiku

No, not the BeOS reimplementation of the same name.  The poetry form:

I lie here alone
The darkness all around me
Damn insomnia.
Read the rest of this entry »

Leave a comment

That explains the bounce messages

I have two accounts on my server.  One for admin work and one for Lodge stuff.  The admin one is “imbrius” and doesn’t have access to the lakelodge side.  The Lodge one is “jarmstrong” and has all the Lodge privileges.  jarmstrong@lakemasoniccenter.org is supposed to forward mail to imbrius@lakemasoniccenter.org.  That’s not happening.  Let’s find out why:

green (kitesfear) $ sudo cat ~jarmstrong/.procmailrc
[sudo] password for imbrius:
:0
! jarmstrong@lakemasoniccenter.org
green (kitesfear) $ hostname -d
lakemasoniccenter.org

Rather nice of it to forward mail to itself. Thankfully, sendmail refuses to processes that and instead sends a bounce. I’ve heard stories of the bad old days when Sendmail wasn’t that clever and instead went into Sorcerer’s Apprentice mode whenever an external processor like Procmail created an infinite forwarding loop. Happened a lot with SMTP->NNTP gateways I’m told.

Leave a comment

Happy Discoflux!!

Today is Pungenday, the 50th day of Discord in the YOLD 3178
Celebrate Discoflux

“Y-M-C-A!” *bonk* Wrong disco, idiot.

Leave a comment

Happy May 1!

Warning:  NSFW Read the rest of this entry »

Leave a comment

The dangers of neglected scripts

Here’s the scene:  A server is faithfully humming along doing its job.  One day, you institute a password expiration policy and it forces you to change your password.  Afterward, you get the following entry in your Logwatch report every day:

 --------------------- Connections (secure-log) Begin ------------------------ 

 **Unmatched Entries**
 unix_chkpwd[16436]: password check failed for user (imbrius)
 vsftpd: pam_tally(vsftpd:auth): user imbrius (500) tally 2128, deny 3

 ---------------------- Connections (secure-log) End -------------------------

My guess is that I have a script somewhere that uses my account to automate some process over FTP.  I’d further guess that it’s one of my signature block fetchers.  I have them running on UglyDuckling, Marlene, dropship… I’ll have to check all of them.

Moral of the story – keep track of your scripts.  XD

Leave a comment

Things that are insanely awesome

#1 Exhaust flappers

Exhaust Flapper

Exhaust Flapper

Exhaust flappers are just flat caps that sit on top of exhaust outlets and are attached to the pipes by hinges.  The idea is that when there’s sufficient gas pressure from the engine pushing exhaust out, the flap lifts to allow the gases to escape.  When there isn’t, the flap closes under its own weight, preventing rain from getting into the exhaust pipes and causing rust and other havoc.  My favorite time is when the engine is idling and the flap bobs up and down going “tink! tink!” every time it shuts.  :3

Read the rest of this entry »

Leave a comment

fixing ddate(1)

I needed to fix a date formatting problem wherein the ddate(1) command in Linux would print dates like the “11st” or “12nd” of a given season. I downloaded the source tarball from here and edited the only file in it, ddate.2.0.c. I changed line 97 from this:
Read the rest of this entry »

Leave a comment