Building in Public: Deploy a PHP application with Kamal 2, part 4

This is the fourth and final part of a series about deploying a non-Rails application with Kamal 2. Read the previous article here. Follow this journey at https://github.com/jjeffers/sherldoc.

I’m almost out of the maze. I can hear road noise from here.

Installing PDFBOX

I checked the sherldoc README.md for steps I may have missed. There is comment about making sure that PDFBOX is installed. I think I assumed this was part of the dockerfile RUN commands, but it is not.

There is a release archive for the Apache PDFBOX project which has links to the version I need. I add the following kamal pre-build hook:

RESOURCES_DIR=resources
PDFBOX_JARFILE=pdfbox-3.0.2.jar

echo "Checking for $RESOURCES_DIR/$PDFBOX_JARFILE first..."
if ! [[ -f "$RESOURCES_DIR/$PDFBOX_JARFILE" ]]; then
    wget --directory-prefix=resources https://archive.apache.org/dist/pdfbox/3.0.2/$PDFBOX_JARFILE
fi

With that in place, I initiate another deployment. I’ll check the application state next.

Enabling SSL and Testing

The sherldoc README.md offers a suggested command line test:

curl -X POST -F file=@resources/sample1.pdf -F 'checks={"ensure_missing":
["perpetuity","prohibited", "free software"],
"ensure_existing":
["GNU", "license", "idaho"]}
' https://localhost:8088/api/scan

I am also eager to enable SSL for the application endpoint. I would be convenient to refer to a public hostname (sherldoc.planzerollc.com) rather than an IP address.

I already have Cloudflare SSL enabled as suggested in the Kamal guide. I add a new A record for the subdomain. While I’m waiting for the DNS updates to propagate, I amend the kamal-proxy settings to enable SSL connections:

...
proxy:
  ssl: true
  host: sherldoc.planzerollc.com
  ...

With SSL enabled let’s test sherldoc using that subdomain:

curl -X POST -F file=@resources/sample1.pdf -F 'checks={"ensure_missing":
["perpetuity","prohibited", "free software"],
"ensure_existing":
["GNU", "license", "idaho"]}
' https://sherldoc.planzerollc.com/api/scan

The results are less than overwhelming:

{"output":{"found":{"pages":[],"words":[]},"missing":["GNU","license","idaho"]}}
The application experienced a rapid unscheduled process degeneration.

Debugging the deployment

Is this result right? I don’t think so. Opening the sample.pdf I can see the words “GNU” appear at least once. So, how I determine where the application is failing?

I could jump into debugging the application locally. I realize that I am not a PHP expert. I do think I see a way to add log messages, which might be a quick way to triangulate the issue.

I need to examine logs on the application server, but checking the container logs can be tedious. kamal provides a shorthand, kamal app logs which produces:

...
2024-10-26T19:58:11.771616634Z {"level":"info","ts":1729972691.7715068,"msg":"FrankenPHP started 🐘","php_version":"8.3.11","num_threads":1}
2024-10-26T19:58:11.773111333Z {"level":"info","ts":1729972691.773031,"logger":"http.log","msg":"server running","name":"php","protocols":["h1","h2","h3"]}
2024-10-26T19:58:11.773771455Z {"level":"info","ts":1729972691.7731733,"msg":"Caddy serving PHP app on :80"}
2024-10-26T19:58:11.775121346Z {"level":"info","ts":1729972691.7750409,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00068c280"}
2024-10-26T19:58:11.782998428Z {"level":"info","ts":1729972691.7828696,"logger":"tls","msg":"cleaning storage unit","storage":"FileStorage:/root/.local/share/caddy"}
2024-10-26T19:58:11.783513553Z {"level":"info","ts":1729972691.7834342,"logger":"tls","msg":"finished cleaning storage units"}
2024-10-26T19:58:12.287121200Z 
2024-10-26T19:58:12.287172904Z   VITE v5.4.10  ready in 538 ms
2024-10-26T19:58:12.287176658Z 
2024-10-26T19:58:12.287876431Z   ➜  Local:   http://localhost:5173/
2024-10-26T19:58:12.290738129Z   ➜  Network: http://172.18.0.8:5173/
2024-10-26T19:58:12.400477692Z 
2024-10-26T19:58:12.400520511Z   LARAVEL v11.19.0  plugin v1.0.5
2024-10-26T19:58:12.400817577Z 
2024-10-26T19:58:12.401177066Z   ➜  APP_URL: http://localhost
...

Note this only works for messages sent via STDOUT from the entrypoint process, the frankenphp server.

I would prefer to see the application logs We need to ensure that the Laravel application can forward messages to STDOUT as well. Messages routed to STDOUT will show up frankenphp web server messages.

I modify config\logging.php:

'stack' => [
            'driver' => 'stack',
            'channels' => explode(',', env('LOG_STACK', 'stdout')),
            'ignore_exceptions' => false,
        ],
...
'stdout' => [
     'driver' => 'monolog',
     'handler' => StreamHandler::class,
     'with' => [
           'stream' => 'php://stdout',
     ],
 ],

Next I modify the application to see if the pdf text is captured. I’m not sure where the fault will be so I liberally add debug messages.

public function getTextFromPage($pathToPdf, int $page = 1)
    {
        $java = config('pdfbox.java_path');
        Log::debug("path to pdf:");
        Log::debug($pathToPdf);
        Log::debug("pdfbox java path");
        Log::debug($java);
        $pdfbox = config('pdfbox.pdfbox_jar_path');
        Log::debug("pdfbox jar path is:");
        Log::debug($pdfbox);
        $process = new Process([$java, '-jar', $pdfbox, 'export:text', '-i', $pathToPdf, '-startPage='.$page,'-endPage='.$page, '-console']);
        $process->run();
        $output = $process->getOutput();
        Log::debug("pdbox output was:");
        Log::debug($output);
        $strip = 'The encoding parameter is ignored when writing to the console.';
        return trim(str_replace($strip, '', $output));
    }

Then we redeploy and try to scan a document again.

Same result, but did our messages get logged?

2024-10-26T19:58:41.360636711Z [2024-10-26 19:58:41] local.DEBUG: path to pdf:  
2024-10-26T19:58:41.361082418Z [2024-10-26 19:58:41] local.DEBUG: /app/storage/app/8060-1729972721.3383.pdf  
2024-10-26T19:58:41.361227842Z [2024-10-26 19:58:41] local.DEBUG: pdfbox jar path is:  
2024-10-26T19:58:41.361476450Z [2024-10-26 19:58:41] local.DEBUG: /app/resources/pdfbox-app-3.0.2.jar  
2024-10-26T19:58:41.377582238Z [2024-10-26 19:58:41] local.DEBUG: pdbox output was:  
2024-10-26T19:58:41.377615486Z [2024-10-26 19:58:41] local.DEBUG:   
2024-10-26T19:58:41.377632340Z [2024-10-26 19:58:41] local.DEBUG: page text:  
2024-10-26T19:58:41.377635202Z [2024-10-26 19:58:41] local.DEBUG: array (
2024-10-26T19:58:41.377637318Z ) 

Closing in on the problem

Logs will save the day.

It appears the PDFBOX process isn’t returning any text. Using the shell alias I run the command manually:

root@159:/app# java -jar resources/pdfbox-3.0.2.jar export-text -i resources/sample1.pdf -startPage=1 -endPage=2
no main manifest attribute, in resources/pdfbox-3.0.2.jar

That’s odd! Wait a minute… something’s not right.

I double the README.md and see that the pdfbox jar is not correct! It needs to be pdfbox-app-3.0.2.jar not pdbox-3.0.2.jar.

I amend the prebuild hook:

PDFBOX_JARFILE=pdfbox-app-3.0.2.jar

After I redeploy and retest:

curl -X POST -F file=@resources/sample1.pdf -F 'checks={"ensure_missing":
["perpetuity","prohibited", "free software"],
"ensure_existing":
["GNU", "license", "idaho"]}
' https://sherldoc.planzerollc.com/api/scan
{"output":{"found":{"pages":{"1":["free software"],"6":["perpetuity"],"10":["free software"],"11":["free software"]},"words":{"free software":{"1":4,"10":3,"11":3},"perpetuity":{"6":1}}},"missing":["idaho"]}}

This is the output I was expecting! We did it! I’ll call this one done for now.

I hope you have enjoy my quest to use a Rails oriented deployment tool in an unexpected way. Despite the stumbles and bruises, I deployed a PHP application using Kamal. I learned new things along the way but there’s a lot more to discover.

We did it! The quest is complete!

If you have questions or comments about what you have read so far, please email me at [email protected]. I look forward to hearing from you.

Introduction to Programming Course Comparison

There are many courses available to help you learn how to program.

Here is a list of several available online, in no particular order. None of these courses assume you have previous programming experience.

DISCLAIMER: I receive no compensation for these summaries.

Automate the Boring Stuff (Udemy)

Cost: $10 for the Udemy video course if you go through the link at https://automatetheboringstuff.com/, otherwise $50.

Time to Complete: 9.5 hours

Summary: Comprehensive review of Python through video lectures. The author says the video course covers most of the same ground as the book, but the book’s probably a great alternative if you prefer that medium.

Free Programming Basics Course (Ministry of Test)

Cost: Free

Time to complete: a few hours.

Summary: No frills survey of programming concepts and tools. Course material is delivered by web content. There is no feedback or interaction with an instructor.

Programming for Everybody (Getting Started with Python) (Coursera)

Cost: Free 7-day trial, $49/mo after trial ends.

Time to complete: 6 weeks, 2-4 hours/week.

Summary: Long distance entry level college course. Python is the language used to illustrate concepts with videos, web content, and proprietary courseware. Assignments are graded as pass/fail by a auto-grader process.

Programming Foundations with Python (Udacity)

Cost: Free

Time to complete: 6 weeks.

Summary: Self-paced low-level programming course with video instruction, proprietary courseware, discussion forums. Also features quizes and forum interaction for feedback. “Nanodegree” offered for completion of a curriculum group. Favors lecture format with worked examples by the instructor over interactive application of Python by the student.

Master Fundamentals of Programming for Beginners (Udemy)

Cost: $194.00

Time to complete: 13 hours.

Summary: Comprehensive programming course that introduces C and Python. Relies on video lectures and a “Q&A” feature to review and search questions submitted by other students. Little opportunity to write programs and get feedback.

Try Python (Code School)

Cost: $29/mo, but some courses free

Time to complete: 2-3 hours

Summary: Self-paced entry level programming course focused on Python basics. Features videos, slide downloads, proprietary courseware, and an interactive Python emulator.

Learn Python (Code Academy)

Cost: Free, optional upgrades ($19/mo and $199) for access to technical support, more lessons, and additional material.

Time to complete: 10 hours

Summary: Self-paced entry level programming course focused on Python basics. Features web content, proprietary courseware, and an interactive Python emulator.

Ruby in Twenty Minutes (ruby-lang.org)

Cost: Free

Time to complete: 20 minutes

Summary: Very quick “up and running” tutorial. Assumes you have already installed Ruby and are comfortable with the command line. Nothing fancy here – just enough to wet the appetite for the language.

What is Programming? (Khan Academy)

Cost: Free

Time to complete: less than an hour

Summary: Similar to other Kahn Academy lessons, shows you the theory behind programs, and then begins to dig into some Javascript to manipulate images in an interactive emulator. A non-threatening introduction before getting into the deep end of the pool.

Introduction to Computer Science and Programming using Python (Edx/MIT)

Cost: Free. Optional certificate for $49.00, accredited tuition rate of $300. Textbook (available from amazon.com) is not included in the cost.

Time to complete: 9 weeks, 15 hours/week.

Summary: Self-paced college level course featuring introductory computer science concepts. Designed for students not majoring in CS or EE degree programs. Features lectures, interactive assignments, problem sets, and quizes. A certificate of completion is available (see Cost section). Credit hours available for qualified students.