Monthly Archives: July 2010

What can the eSlick do?

Before buying my eSlick e-reader, I looked around and saw many reviews that described its functions, but none of them answered the question: what can you actually do with it? Can you read academic papers? How about ePub books?

Of course, the specs says it supports PDF and other formats, but it says nothing about the reading experience of these formats.

Now, after using an eSlick for a few weeks I can attempt to answer this question.

There’s only one thing that the eSlick does well: display PDF documents that are sized for its display. Most of the stuff you’ll want to read will be in another format (or created for a bigger page size) but that’s not a problem if you can convert it to the required format.

Here’s how I read different kinds of texts using the eSlick:

PDF’s created for a regular (about A4) page size

Since the eSlick’s display is smaller than that of a regular page, you can’t view these files directly. There are 3 ways that you can handle them.

(1) Zoom out: Not useful for prolonged reading since the text looks tiny.

(2) Normal size, but move around with the arrow keys: Very annoying, unusable.

(3) Reflow: (In reflow mode the eSlick rearranges the text and adapts it for its display). Doesn’t always work. Sometimes the text is arranged in the wrong order. When it does work it works pretty nice. It’s good enough for me that I’m now reading an entire book in reflow mode. There’s a bug that if a paragraph is split between two pages, the second part isn’t shown; I have to turn off reflow mode, finish the paragraph, and then turn on reflow again. However, this procedure is easy enough that it doesn’t hurt the reading experience very much.

Two-column academic articles

Reflow doesn’t work for the articles I’ve tested; the text comes out in the wrong order. However, due to a wonderful coincidence, each column in this type of text is in about the same width as the eSlick display. A program called PaperCrop renders the PDF and then automatically segments the resulting image into columns. The result can be saved as a PDF file that can be read on the eSlick. I’ve tested PaperCrop on one article and it worked perfectly, detecting columns and diagrams correctly.

EDIT Papercrop also knows how to reflow one-column articles! Just choose the “reflow” preset.

ePub files

This is a popular eBook format. You can download free books from the Project Gutenberg site in this format. eSlick supports it natively (in newer firmware versions) but not very well. It doesn’t even support text styles such as bold and italics. Also, it doesn’t display Hebrew text (and probably other non-Latin languages). The solution is to convert the ePub to PDF. I’ve used an online converter successfully on Hebrew books. (In that site, set the target ebook reader to Kindle as it has the same display size). However, it failed to convert some of the books. I’m looking for other ways to do it.

An open source command line tool called epub2pdf looks promising for this end but I couldn’t make it to work.

Web pages

This is the procedure I’m using to create eSlick-compatible PDFs from interesting web pages. First, use readability to remove all the supporting extras — sidebars, footers and so on — leaving only the content. (In the readability configuration, set the font to small and margins to extra-narrow). Next, print it to PDF using the Foxit PDF Creator. I’m doing it on Explorer since the PDF Creator stopped working for me from Firefox after the first time I’ve used it. (It doesn’t look a very high quality software).

Even though the PDF Creator is bundled with the eSlick, it doesn’t have the eSlick listed as a recognized target in its settings, so you have to set a custom page size. I’m using 13.2cm x 9.9cm. (Derived from the display aspect ratio and an estimate of the display width, as I couldn’t find the height and width specified anywhere).

String literals with embedded nulls in Boost

Boost’s string library makes global string replace in C++ easy:

std::string str = "$greeting, world!";
boost::replace_all(str, "$greeting", "Hello");
std::cout << str << "\n"; // print "Hello, world!"

Suppose however that you want the search string to contain a null character. Why would you want such a thing? Consider escaping strings for safe inclusion in some contexts:

std::string str2 = get_string_from_remote_source();
boost::replace_all(str2, "\0", "(nul)");
// do something with str2 that depends on it not to contain null chars

(Remember that, unlike C strings, it’s perfectly valid for C++ standard library strings to contain NUL characters.)

Alas, the code above doesn’t work; the call to replace_all() doesn’t do anything. It turns out that when you given Boost a string literal, it uses strlen() to get the string’s length. Since strlen() works on C style null-terminated strings, it stops on the first null character it sees.

Why did I expect Boost to behave differently? In C++, string literals are array of chars. With the help of some template magic, the Boost library can know the string’s length at compile-time. It doesn’t need to rely on functions like strlen() to compute the string’s length, so it can handle arbitrary string literals, including ones with embedded nulls.

After some thinking and googling about it, it becomes clear why Boost doesn’t work this way, or at least why it isn’t the only reasonable way. The reason is that Boost cannot tell the difference between string literals and other character arrays. Consider this case:

char search[80];
strcpy(search, "foo");
boost::replace_all(str, search, "bar");

We probably wouldn’t want replace_all() to look for the whole 80 character long string which the input array happens to contain, but only for the part initialized with a null-terminated string. Actually, this seems to be some sort of gray area. When a zero character appears inside a string literal, it certainly means that the programmer intended the character to be a part of the string. But when it appears inside another character array, it may or may not mark the end of the string.

We need a way to tell Boost you want to treat a char array as an array instead of a null-terminated string literal. To do this, wrap the array in a call to boost::as_array. For example:

char nullchar[] = {'\0'};
boost::replace_all(str2, boost::as_array(nullchar), "(nul)");

In fact, you can also pass a string literal to as_array, but remember that the corresponding array contains an (additional) terminating null character. So, returning to the original problem, for a string containing solely one null character, use boost::as_array(“”). Don’t use boost::as_array(“\0”), as the latter will contain two characters.

CA2W found

Here’s a problem that was driving me crazy for a while. I got the following error trying to use ATL’s CA2W class in a C++ program:

error C3861: ‘CA2W’: identifier not found

Usually problems like this occur when you don’t include the correct header file. But in this case, I did include it (atlconv.h), exactly like the documentation says.

I made sure that CA2W is really defined in the header. I checked other things, but the result was always that the compiler should have seen the definition. Still the nefarious error message appeared. So why did the compiler pretend not to know this symbol? Finally I saw this line at the beginning of atlconv.h:

namespace ATL
{

D’oh!

So CA2W did exist, but only in the ATL namespace. I never noticed this before; it turns out that by default projects created with Visual C++ include atlbase.h which does “using namespace ATL”. Including this file solved the problem.

The documentation says nothing about this, of course. I couldn’t find any mention of the word “namespace” in there.

Incidentally, looks like we’ll be seeing this problem a lot in the future since the default has changed in Visual Studio 2010.

A TCP proxy in Ruby

A TCP proxy (or a tunnel, or a bridge) is a program that listens at a certain network address for connections. Whenever a connection is made to that address, the program connects to another predefined network address and starts transferring data between the two ends.

The reason I wanted a TCP proxy is this: I needed to run a program on a virtual machine. This program needs Internet access, but I couldn’t make the VM’s Internet access to work — it could only connect to programs on the host machine, i.e. my computer. I thus used a TCP proxy on the host machine to connect the VM to the outside world. (The fact that the program running on the VM needed to access only one predefined network address simplified things greatly).

Below is a Ruby script I used, made from bits of example code that I found on the Web. I tested it with Ruby 1.8.6 on Windows 7.

Several notes regarding the script:

Preventing threads from disappearing

The script is designed to exit with a stack trace on exception. More extensive error handling would be overkill for a quick script. The problem is that in Ruby, by default, threads silently exit on exception — it caused me quite a headache before figuring this out. This is fixed by setting Thread.abort_on_exception to true.

Exiting with Ctrl-C

It’s nice to be able to exit the script by pressing Ctrl-C. On Windows, Ruby doesn’t handle Ctrl-C keypresses inside socket.accept (and apparently during other blocking calls). To fix this, we need a special thread that spends most of its life sleeping, but wakes up once in a second. During that time Ruby will be able to process the keypress and exit.

The script

require 'socket'

if ARGV.length < 1
    $stderr.puts "Usage: #{$0} remoteHost:remotePort [ localPort [ localHost ] ]"
    exit 1
end

$remoteHost, $remotePort = ARGV.shift.split(":")
puts "target address: #{$remoteHost}:#{$remotePort}"
localPort = ARGV.shift || $remotePort
localHost = ARGV.shift

$blockSize = 1024

server = TCPServer.open(localHost, localPort)

port = server.addr[1]
addrs = server.addr[2..-1].uniq

puts "*** listening on #{addrs.collect{|a|"#{a}:#{port}"}.join(' ')}"

# abort on exceptions, otherwise threads will be silently killed in case
# of unhandled exceptions
Thread.abort_on_exception = true

# have a thread just to process Ctrl-C events on Windows
# (although Ctrl-Break always works)
Thread.new { loop { sleep 1 } }

def connThread(local)
    port, name = local.peeraddr[1..2]
    puts "*** receiving from #{name}:#{port}"

    # open connection to remote server
    remote = TCPSocket.new($remoteHost, $remotePort)
    
    # start reading from both ends
    loop do
        ready = select([local, remote], nil, nil)
        if ready[0].include? local
            # local -> remote
            data = local.recv($blockSize)
            if data.empty?
                puts "local end closed connection"
                break
            end
            remote.write(data)
        end
        if ready[0].include? remote
            # remote -&gt; local
            data = remote.recv($blockSize)
            if data.empty?
                puts "remote end closed connection"
                break
            end
            local.write(data)
        end
    end
    
    local.close
    remote.close
    
    puts "*** done with #{name}:#{port}"
end

loop do
    # whenever server.accept returns a new connection, start
    # a handler thread for that connection
    Thread.start(server.accept) { |local| connThread(local) }
end

PS

When I started writing this script I got a cryptic error message if a didn’t add a “require ‘rubygems'” line at the beginning. However I can’t reproduce the problem now. In fact the browser history doesn’t show all the googling I’ve done to find the solution and I’m beginning to think that I hallucinated it all.