Ads are not an endorsement by the blog author.

bowerbird's eye view

Public Journal
flyin' high with bowerbird
Archives | Subscribe to Alerts Alerts Subscribe to Alerts | Feeds
   
Monday, March 6, 2006
10:27:31 PM PST

a .pdf with excellent e-book design


with so many examples of bad .pdf design out there,
it gives me great pleasure to be able to point to one
where the design is well-done, on wonderful poetry:
>   http://www.poetrysuperhighway.com/ToHellWithRL.pdf

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Friday, February 24, 2006
1:46:39 PM PST

numbers so big they seem unbelievable


a sharp eye has pointed out to me
that _one_ billion dollars will digitize
_one_hundred_million_documents
at $10 each, not a "mere" 100,000,
which is what i had reported earlier.

sort of puts into full perspective the
$100+ billion (and rising daily) that
we've spent over in iraq, doesn't it?

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

9:11:28 AM PST

far too little, far too late


in november 2005, the librarian of congress,
james billington, announced plans to create
an online collection of rare books and more,
for a "world digital library", thanks to a grant
of $3 million from google.  billington called it
"the most ambitious international effort ever
to digitally copy items of artistic, historical
and literary significance".

while i think this is a wonderful idea, it is also
far too little, and far too late.

had i been the librarian of congress, i would've
started digitizing books some 25 years ago, and
i would've been a loud and vociferous proponent
for the laying of fiber-optic cable nationwide too.

take a look at the _billions_of_dollars_ that are
evaporating on the desert floors of the mideast
-- dozens and dozens of billions of dollars! --
and compare that with the fact that mr. billington
is crowing about "the most ambitious effort ever",
thanks to a _gift_ of _$3_million_ from _google_...

i suggest that mr. billington's malfeasance should
be seen as rivaling that of mr. brown from fema in
the katrina disaster, perhaps even surpassing it...

with a federal budget that runs into the _trillions_,
an allocation of _one_billion_dollars_ every _year_
is the _least_ that we should judge as "acceptable"
for the purpose of digitizing our cultural heritage.

$1 billion dollars per year would pay for digitizing
_100_million_items_, at an average of $10 each
-- google's project involves about 30 million items --
and that investment would pay us _huge_ benefits,
compared to the many billions evaporating in iraq.
(as if we need to make such a horrific comparison.)

it's time for us to demand that our "government" now
_take_responsibility_ for moving us to a digital future.
and the first thing we should do is sack billington's ass.

as it is, a $3 million project, nationwide, is _laughable_.  
thank god for google, or we wouldn't even have _that_!

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Wednesday, February 22, 2006
9:42:03 AM PST

too ironic for words


http://talk.talis.com/archives/2006/02/introducing_the.html

ha ha!  library 2.0 bloggers doing a "meeting" on the phone!

and then they turned the results into an mp3!  too ironic!

i tell you, these guys are _funny!_  but ok, i'll bite:
here's a request for a _written_transcript_...  ha ha!

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Wednesday, February 1, 2006
4:07:49 PM PST

set up your own wiki for free!


pbwiki.com lets you set up a wiki for free.  very cool.

>   http://PBwiki.com

their slogan is:
>   "PBwiki makes creating a wiki as easy as
>   making a peanut butter sandwich."

actually, since i would have to wash dishes, and
go buy a loaf of bread, and some peanut butter,
before i could make a peanut butter sandwich,
it was actually _easier_ for me to set up my wiki.

they give you plenty of webspace too,
and i was able to double my allowance
just by mentioning them here, which i'm
quite happy to do, to spread the word...

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Friday, January 27, 2006
12:01:24 PM PST

on viewer-apps and the p.g. library


a friend said:
>   in my thinking about e-readers
>   and capabilities, I thought that
>   a primary requirement ought to be
>   that they could handle a PG text
>   and display it tolerably well.

in general, i agree...

the problem is the p.g. e-texts have
maddeningly inconsistent formatting.

i set out originally to handle their library.
whenever i encountered an inconsistency,
i would puzzle some logic to resolve it and
then continue programming my viewer-app.

at some point, i realized that most of my time
and energy had gone toward the resolution of
senseless inconsistencies, and not app features.

most programmers won't have the _inclination_
to slog through the writing of routines to resolve
inconsistencies, even if they have the _ability_ --
and i've come to learn that most people do _not_
have such an ability.  indeed, many people tell me
that the inconsistencies simply cannot be resolved.
(which, since i've resolved them, sometimes using
the most "obvous" means, just makes me laugh...)

but if p.g. had just followed a few _simple_ rules,
faithfully, it would've made a world of difference.

so i approached the project gutenberg people and
explained the problem and asked them please to
try to eliminate the meaningless inconsistencies,
and do some consistency checks on their e-texts.

basically, i got kicked in the face.

they didn't want to hear any constructive criticism,
even in regard to catapulting the value of the library.

i tried -- in vain -- to give them a longer perspective,
arguing that the inconsistencies were driving away the
programmers that could leverage the library with apps.

but they just would not listen...  so i gave up on them...

and then i gave up on the p.g. library...

i had routines to handle their _current_ inconsistencies,
but if they were just going to generate more and more,
i didn't care to step in after them to clean up their work.

so i went off and "invented" z.m.l., the simple set of rules
that a book author can follow to create a powerful e-book.

eventually i'll get around to applying all my "fuzzy" routines
to do a mass conversion of the p.g. library into z.m.l. form,
and will then "shadow" their library in z.m.l., but i'm gonna
let them waste a whole bunch of their time first doing x.m.l.

eventually they will collapse under the complex work-flows.
ironically, it will be about that time that the world comes to
grasp the elegance of the zen-markup plain-text format...

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Thursday, January 26, 2006
11:48:12 AM PST

train your brain!


train your brain!

>   http://www.cabel.name/2006/01/on-brain-training.html

pay attention to the hardware design too...

-bowerbird


Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Monday, January 23, 2006
8:33:58 PM PST

once again, o'reilly has a clue, and guts too...

> http://radar.oreilly.com/archives/2006/01/the_long_snout.html

this article introduces "the aardvark model"
-- the aardvark being an animal that has
a "long snout" in addition to its "long tail".

thank heaven for the pioneers...

-bowerbird

Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own

Sunday, January 8, 2006
1:03:51 PM PST

first computer transcription of a book


if you're wondering, or know someone who is,
the first computer transcription of a book
-- the first e-book -- was done by michael hart,
who digitized "alice in wonderland".

the rumor is that he did it on a keypunch machine!

lots of people had the _idea_ of e-books,
but michael gets the nod for moving it
to a _reality_ by, d'oh, actually _doing_it_.

the first _document_ michael ever digitized was
the declaration of independence.  michael did that
on the fourth of july, 1971, and sent it out as widely
as he could at the time (to a few hundred people),
and "crashed the network" because it wasn't able to
handle the load.  we've come a long way since then.

michael's work -- in the form of project gutenberg --
has moved on to become the most well-known
cyberspace library, thanks to his deft handling of
a large number of volunteers along the way, with
many of them contributing just a book or two, but
with some dedicating large portions of their lives...

in the early days of cyberspace, with computer-based
communication happening on "bulletin board systems",
project gutenberg e-texts were the first indication that
the classics of literature -- even things like shakespeare
and the bible -- could flourish in this nerd-based world.

this far-flung collective effort was helped along
considerably more when charles franks, in 2000,
created "distributed proofreaders", a brilliant
web-based mechanism that lets people proof
as little as one single page out of a book,
until all the pages in that book are proofed,
and the results are assembled into a whole.

this meant a person didn't have to digitize
a whole book -- a rather substantial task to
do by yourself -- but could proof as little as
"a page a day", the slogan that charles used,
but the cumulative effort snowballed greatly.

tens of thousands of people have registered
over at distributed proofreaders, and as of
this date, they've finished nearly 8,000 books
for project gutenberg, thus constituting almost
half the current 17,000+ project gutenberg e-texts.
as with project gutenberg, there are now _many_
people who are devoting significant amounts of time
and energy volunteering for distributed proofreaders.

so big props for michael hart and charles franks,
for their contributions in 1971 and 2000, toward
making the dream of a cyberspace library come true!

i'm thankful as well for the efforts of google, with their
deep pockets, and the open content alliance, with their
commitment to universal access to knowledge, but i
take great comfort in the fact that, even before them,
individual human beings decided that we could create
a cyberspace library through our own d-i-y efforts, and
went out and starting doing it...

-bowerbird

http://www.gutenberg.org for project gutenberg
http://www.pgdp.net for distributed proofreaders


Written by bowerbird Permalink | Blog about this entry
This entry has 9 comments: Show Recent | Add your own

Wednesday, October 19, 2005
1:43:19 PM PDT

perfection or bust


a common observation about errors in electronic-books is that:
>   I am not sure I have ever read a paper-book
>   in which I didn't find errors, so 100% correctness
>   in electronic-books seems to be out of the question

100% correctness is certainly not "out of the question",
not for our electronic-texts.  not at all.  on the contrary,
it's really the only worthwhile goal to have, long-term...

in this day and age, with our tools,
it's not even an unreasonable aim...

given enough eyeballs, all errors are shallow.
and we have lots and lots of eyeballs, a ton...

and even better than that, we have _computers_...


>   I recently engaged a professional proofreader
>   to try to find the 1% worst Project Gutenberg eBooks
>   and bring them up to 99.95% accuracy, and was
>   totally surprized to find that even when trying this,
>   the accuracy level was still above 99.95%,
>   even presuming we found only half the errors.

i'm not knocking all those 9's.  as you'll see below,
i think they are something to crow about.  but still,
what those 9's scream at me is "it's not perfect yet!"

if you can find an error, you can fix it.  so 100% it must be.

we're not talking about a paper-copy here, a printing flub,
a horse that's out of the barn so no use in closing the door,
nothing you can do about it, what's done is done, accept it.

we are talking about a living breathing document that can
-- and therefore _must_ -- be fixed if we spot a problem...


>   the real question is how much more good we can do
>   in moving from 99.95% to 99.995% as compared to
>   simply doing more books with the same effort.

using a proofreader -- even a professional one --
isn't the way to find and remove remaining errors.
if you use that methodology, it _will_ be difficult to
move to perfection.  and it _won't_ be cost-effective.
so yes, you are right that no, you should not do that.

you need to change your ways to solve the problem.

"continuous proofreading" by the general public is
one of the means to march the e-texts to perfection.

in a system of "continuous proofreading", the scans
are made available side-by-side with the text itself,
so _any_ person can (dis)confirm their congruence.

when distributed proofreaders and project gutenberg
finally get _serious_ about obtaining error-free e-texts,
we will know it because you will begin posting the scans.
it's time to accept brewster's offer to host scans for you...

the other road to perfection is doing additional digitizations
and comparing the results to current ones to find differences.

i laid out the first general round of this argument a while back
>   http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-10-03,3
and will continue with it in messages i'll soon be posting _here_.

the value of this use of a comparison of double-digitizations
has been well-proven, in "double-keying" of keypunched data...

and it takes some elbow-grease, sure, what doesn't?, but
it's certainly doable, and it's not even all that hard, really.

and i say this on the basis of some very close observations.
we have _already_ moved some three e-texts to perfection
(or close enough -- as we're never really sure if we're there):
"my antonia", "books and culture", and "alice in wonderland".
(appended is the exact present status of these three e-texts.)

and thanks to the very good job that p.g./d.p. volunteers have
done on the vast majority of the e-texts in the p.g. library --
where the best estimate is that there are only about 50 errors
in the average e-text, with 80% of those being on punctuation
-- we have an excellent corpus of work to pursue this avenue.
again, michael, thanks for making e-books real, so long ago...

as our methodology becomes more and more automated
-- which is the thrust of that future message i will post --
it will become trivial to march every text toward perfection,
and we'll wonder why we ever thought it was a hard task...

(by "trivial", i mean it will take not more than a few hours;
and before many years pass, it will be a matter of minutes.)

-bowerbird

p.s.  further down are the reports of each e-text's accuracy.
first though, a quick review of the general protocol operative:

a person earns a reputation for declarations of accuracy by
declaring accuracy on a text and having it "stick to the wall",
i.e. fail to be challenged by another person that errors exist.
if you say it's perfect, and nobody can spot an error, you win.
only the first person to make the declaration (correctly) "wins".

however, if you declare that a text is accurate, and somebody
challenges your declaration, and you stand by your declaration,
then one of you must lose your reputation in an ensuing battle.
the battle is short, to the point, like the show of hands in poker:
the person who claimed that there are errors must reveal them.
if the errors are indeed there, then the declaration of accuracy
was made erroneously, and the declarer must sacrifice reputation
to the person who successfully located the presence of the errors.

in my treatments below, i make no _declarations_of_accuracy_,
i discuss the matter in a direct way, but make no _certifications_.
likewise, nobody else has made any _declarations_of_accuracy_
on these e-texts, either, so i'm not "challenging" anyone here,
or engaging anyone in battle.  where i say i have found errors,
i will indeed immediately challenge anyone who would make a
declaration of accuracy on that e-text in its current state.  but,
again, it is not the case that anyone _has_ called these e-texts
error-free quite yet, let alone formally declared their accuracy.

returning to the matter of _earning_a_reputation_ in this arena,
alternatively, you can declare there are errors in a certain e-text.
it's not necessary to say what the errors are, just that they exist.
if that e-text is one that some person had believed to be perfect,
your error-report gives them notice that a rechecking is needed.

i repeat, it's unnecessary to say exactly what or where the error is.
you can, of course, and then the entity maintaining the e-text will
go and make the corrections, if warranted, and that's end-of-story.
if they wanna give you credit (they should), they can.  or if not, not.
in other words, this is just "friendly notice" and it doesn't involve a
transfer of any reputation from the entity who maintains that e-text.

but if you report an error and _don't_ tell 'em what/where the error is,
and they find it themselves (because they did a rechecking stimulated
by your statement that an error was present) and they fix it themselves,
they don't have to give you any credit for that, and they probably won't.
again, no reputation-transfer results from such an interaction sequence.

but if you challenge the accuracy of a text, and the responsible party
does their rechecking, and they're convinced the e-text _is_ perfect
and that perhaps you're just "bluffing" them with your report of error,
they can "question" it, and we're in that "battle" situation i discussed,
where one of you walks away without a big chunk of your reputation...

those are the overall ground-rules, for declarations and challenges.
so now let's go on to the general discussion about these 3 e-texts...

one note is that i have read _none_ of these e-books for meaning
in their current digitized state, which is fairly much a requirement
for me before i'm going to issue any "declarations of accuracy"...
so please just slather that on top, as a generic point on each one.

>   http://www.openreader.com/myantonia
my antonia -- well, i know of at least two errors in noring's version
but this e-text is otherwise close to its perfection.  thus, when my
two errors are found by other people, i will do my own digitization
and use it to compare with the other versions, after which time we
could say with great confidence that we hold an error-free e-text.
even now, we're talking 2 characters in 450,000.  highly accurate.
i don't know how many 9's that adds up to, but it's close to 100.0%.
i believe we are already there, as i would be _shocked_ if there are
three _other_ errors remaining.  still, one more check ensures that.

>   http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml
alice in wonderland -- i just found two errors recently, when i had
believed for some time this e-text had been thoroughly cleaned
-- i noticed one error, and search turned up another just like it --
which makes me want to give it another peace-of-mind once-over.
but again, this has had many eyeballs on it; it must be pretty good.
if there are any errors, they _must_ be very minor.  (my two were.)
again, i would be shocked if there are three other errors here...
only 150,000 characters in this one, but that'd still be a lot of 9's...

>   http://www.ibiblio.org/ebooks/Mabie/
books and culture -- since d.p. and jose have now been reconciled,
i would say this e-text is now perfect.  and yes, i'll be willing to do a
digitization of it myself, independently, to confirm total perfection.
but my guess is that that confirmation will come out because i find
my version _matches_ the currently-perfect version that jose made
(to which the p.g. version has now been conformed), _not_ because
my digitization uncovered some common error shared by those two.
one error here would surprise me, two would shock, three would stun.
(four is bolt the door nobody leaves until we figure out why and fix it.)
again, not that big of a text, at 210,000 characters, but 100% is 100%.

so there...  3 e-texts that have been discussed around the watercooler,
with all of them painfully close to perfection if they ain't already there...

all this goes to show that 100% accuracy is _certainly_ within our reach.

it's easy to move texts to perfection.  all it takes is one dedicated person;
sometimes just one interested-maybe-engrossed reader is all you need.

-bowerbird

Written by bowerbird Permalink | Blog about this entry
This entry has 0 comments: Add your own