2024年10月13日 星期日

Combining Scanned Images into a PDF with OCR on Mac OS X

Combining Scanned Images into a PDF with OCR on Mac OS X

This guide will walk you through the process of combining multiple scanned images of book pages into a single PDF file on Mac OS X and using OCR (Optical Character Recognition) to convert the text into searchable, selectable electronic text.

1. Installing Required Tools

First, we need to install a few tools to handle image and PDF processing, as well as perform OCR. The following tools are required:

  • ImageMagick: For image format conversion and processing.
  • ocrmypdf: For performing OCR on PDF files.
  • Tesseract: OCR engine used for text recognition.
  • Poppler: A PDF utilities suite, which includes tools like pdfunite for merging PDFs.

Install these tools using Homebrew:

brew install imagemagick    # Install ImageMagick
brew install ocrmypdf       # Install OCRMyPDF
brew install tesseract-lang # Install Tesseract language packs
brew install poppler        # Install Poppler

2. Merging Images into a PDF

If your scanned output is in image format (e.g., JPG), you can use ImageMagick to combine multiple image files into a single PDF document. This ensures that the subsequent OCR process works on a single PDF file.

To combine all .jpg images in the current directory into a single 001.pdf file:

magick *.jpg 001.pdf

This command uses the magick tool from ImageMagick to merge all .jpg files in the directory into one PDF file named 001.pdf.

3. Running OCR on the PDF

Next, we use the ocrmypdf tool to perform OCR on the generated PDF file and save the output as a new file. In this step, we specify the OCR languages as English (eng) and Traditional Chinese (chi_tra).

Run the following command:

ocrmypdf -l eng+chi_tra+chi_tra_vert 001.pdf 001a.pdf

This command will apply OCR to the 001.pdf file and save the result as 001a.pdf. The -l eng+chi_tra+chi_tra_vert option indicates that both English and Traditional Chinese will be used for text recognition.

4. Merging Multiple OCR Processed PDFs

If you have multiple PDF files that need to be combined into a single final PDF, the pdfunite tool can do this easily. To merge the processed PDF files into a single output named combined_pdf.pdf, use the following command:

pdfunite 001a.pdf 002a.pdf 003a.pdf combined_pdf.pdf

This command merges 001a.pdf, 002a.pdf, and 003a.pdf into a single combined_pdf.pdf file.

Summary

By following these steps, you can:

  1. Merge scanned images into a PDF file.
  2. Use the Tesseract engine via ocrmypdf to perform OCR text recognition.
  3. Combine multiple PDF files into a final merged PDF.

This allows you to convert scanned images of book pages into a searchable, selectable, and copyable PDF document.

Mac OS X 環境下將圖片合併為PDF並進行OCR處理

將圖片合併為PDF並進行OCR處理

在這份說明文件中,我們將介紹如何在Mac OS X環境下,將多個掃描成圖片的書籍頁面合併為一個PDF檔案,並使用OCR(光學文字識別)將其中的文字轉換為可搜索和選取的電子文字。

1. 安裝必要的工具

首先,我們需要安裝一些工具來處理圖片和PDF,並進行OCR。以下是需要的工具:

  • ImageMagick:用於圖片格式轉換與處理。
  • ocrmypdf:用於將PDF檔案進行OCR處理。
  • Tesseract:OCR引擎,用於實際進行文字識別。
  • Poppler:PDF工具集,包含 pdfunite 等工具,用於合併PDF。

使用 Homebrew 安裝這些工具:

brew install imagemagick    # 安裝 ImageMagick
brew install ocrmypdf       # 安裝 OCRMyPDF
brew install tesseract-lang # 安裝 Tesseract 語言包
brew install poppler        # 安裝 Poppler

2. 圖片合併為PDF

當掃描結果以圖片格式(如 JPG)儲存時,我們可以使用 ImageMagick 將多個圖片檔案合併為單一 PDF 文件。這樣做可以保證後續的 OCR 處理僅在一個 PDF 檔案上進行。

將當前目錄下的所有 .jpg 圖片檔合併為 001.pdf

magick *.jpg 001.pdf

這條指令將使用 ImageMagick 中的 magick 工具,將目錄中的所有 .jpg 檔案依序合併成一個 PDF 文件 001.pdf

3. 執行 OCR 文字識別

接下來,我們使用 ocrmypdf 工具來對生成的 PDF 檔案進行 OCR 處理,並將處理後的 PDF 儲存為新的文件。在此步驟中,我們指定使用英語(eng)和繁體中文(chi_tra)作為 OCR 的語言。

執行指令:

ocrmypdf -l eng+chi_tra+chi_tra_vert 001.pdf 001a.pdf

這條指令會對 001.pdf 文件進行 OCR 文字識別,並將結果儲存為 001a.pdf。其中,-l eng+chi_tra+chi_tra_vert 表示同時使用英語和繁體中文進行文字識別。

4. 合併多個 OCR 處理後的 PDF 檔案

如果你有多個 PDF 檔案需要合併為一個最終的 PDF 檔案,pdfunite 工具可以輕鬆做到這點。將處理過的 PDF 檔案合併為單一檔案 combined_pdf.pdf

pdfunite 001a.pdf 002a.pdf 003a.pdf combined_pdf.pdf

這條指令將 001a.pdf002a.pdf003a.pdf 三個檔案合併成一個 combined_pdf.pdf

總結

透過這些步驟,你可以:

  1. 將掃描的圖片合併成 PDF 檔案。
  2. 使用 Tesseract 引擎透過 ocrmypdf 進行 OCR 文字識別。
  3. 將多個 PDF 檔案合併成一個完整的 PDF。

這樣,你就能將掃描成圖片的書籍內容轉換為可以搜索、選取和複製文字的 PDF 電子檔。

2024年10月9日 星期三

Business Model for AI Music Generation Services

This document outlines a comprehensive business model for an AI music generation service. As the demand for personalized and unique music continues to rise, leveraging artificial intelligence to create music can provide a competitive edge in the music industry. This model will explore various revenue streams, target markets, and operational strategies to ensure the service's success.


Target Market

  1. Content Creators: YouTubers, podcasters, and streamers looking for royalty-free music to enhance their content.

  2. Advertising Agencies: Companies needing custom jingles or background music for advertisements.

  3. Game Developers: Indie and large-scale game developers seeking unique soundtracks for their games.

  4. Film and TV Production: Producers looking for original scores or soundtracks for their projects.

  5. Musicians and Bands: Artists wanting to experiment with new sounds or collaborate with AI-generated music.

Revenue Streams

  1. Subscription Model: Offer tiered subscription plans that provide users with access to different levels of music generation capabilities, including varying numbers of tracks per month, quality of music, and customization options.

  2. Pay-Per-Track: Allow users to purchase individual tracks or licenses for specific uses, catering to those who may not want a subscription.

  1. Custom Commissions: Provide a service where users can request custom music tailored to their specific needs, charging a premium for this personalized service.

  1. Partnerships and Collaborations: Collaborate with platforms like YouTube, Twitch, and gaming companies to offer integrated music solutions, potentially sharing revenue from users who utilize the service.

  1. Merchandising: Create and sell merchandise related to the AI-generated music, such as vinyl records, digital albums, or branded products.


Operational Strategy

  1. Technology Development: Invest in developing a robust AI music generation algorithm that can create high-quality, diverse music across various genres.

  1. User-Friendly Interface: Design an intuitive platform that allows users to easily generate and customize music tracks, ensuring a seamless user experience.

  1. Marketing and Outreach: Utilize social media, influencer partnerships, and targeted advertising to reach potential users in the identified target markets.

  1. Community Building: Foster a community of users who can share their experiences, provide feedback, and showcase how they use the generated music, enhancing user engagement and loyalty.

  1. Legal Considerations: Ensure compliance with copyright laws and provide clear licensing agreements to protect both the service and its users.



Conclusion

The AI music generation service presents a unique opportunity to tap into the growing demand for personalized music solutions. By targeting diverse markets, implementing multiple revenue streams, and focusing on technology and user experience, this business model can establish a strong foothold in the music industry. With the right strategies in place, the service can not only thrive but also innovate the way music is created and consumed.

#

2024年10月6日 星期日

Mac OS X 下 Python套件matplotlib繪製中文標題散點圖表(Scatter Plot)


  • 匯入了必要的套件:seaborn、matplotlib.pyplot 和 matplotlib.font_manager。
  • 設定了字體和字號:使用 FontProperties 類別來指定字體檔案路徑和字號大小。
  • 載入了 tips 數據集:使用 sns.load_dataset 函數來載入 seaborn 的 tips 數據集。
  • 設定了 seaborn 的樣式:使用 sns.set() 函數來設定 seaborn 的預設樣式。
  • 畫了散點圖:使用 sns.scatterplot 函數來畫散點圖。
  • 設定了圖表標題:使用 plt.title 函數來設定圖表標題,並指定了中文字體和字號。
    import seaborn as sns
    import matplotlib.pyplot as plt
    import matplotlib.font_manager as fm
    
    # 設定字體和字號
    font = fm.FontProperties(fname='/Users/YOURNAME/Library/Fonts/jf-openhuninn-2.0.ttf', size=12)
    
    tips = sns.load_dataset("tips")
    sns.set()
    sns.scatterplot(x="total_bill", y="tip", data=tips)
    plt.title('餐廳小費關係圖', fontproperties=font)
    plt.show()
    
  • 2021年2月11日 星期四

    Apple M1 Mac 安裝 python3 程式庫

    python3 -m pip install --upgrade pip
    
    arch -x86_64 $(which python3) -m pip install numpy scipy matplotlib dbr
    

    2019年11月10日 星期日

    bash shell 批次更改圖片檔名範例

    #!/bin/bash
    
    files="img-1*"
    start=1
    rename=1
    echo $count
    
    for i in `ls $files`
    do
      echo $i
      rename=$(($start * 2 - 1))
      #rename=$(($start * 2))
      start=$(($start + 1))
      echo $rename
      refilename=`printf "%02u" $rename`
      mv $i $refilename.tif
    
    done
    

    Ubuntu Elasticsearch 使用 IK Analysis Plugin

    下載新版的 IK Analysis Plugin (這裡以 Ver 7.4.2為例),並且下載擴充中文字典檔案
    sudo mkdir /usr/share/elasticsearch/plugins/ik/
    sudo cd /usr/share/elasticsearch/plugins/ik/
    sudo wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
    sudo unzip elasticsearch-analysis-ik-7.4.2.zip
    sudo \rm elasticsearch-analysis-ik-7.4.2.zip
    sudo cd /usr/share/elasticsearch/plugins/ik/config
    sudo wget https://github.com/samejack/sc-dictionary/raw/master/main.txt
    sudo mv main.dic main.dic.old
    sudo mv main.txt main.dic
    

    設定擴充字典,編輯 IKAnalyzer.cfg.xml 設定檔案
    cd /usr/share/elasticsearch/plugins/ik/config
    sudo vi IKAnalyzer.cfg.xml
    
    將 ext_dict 新增 main.dic
     <entry key="ext_dict">main.dic</entry>
    
    重新啟動ElasticSearch
    sudo service elasticsearch restart
    

    沒問題的話,就來測試一下 IK 吧...
    curl -XGET http://localhost:9200/_analyze -H 'Content-Type:application/json' -d'
    {
      "text":"後悔莫及的人家",
      "analyzer": "ik_smart"
    }'
    
    應該會得到這樣的結果:
    {
      "tokens" : [
        {
          "token" : "後悔莫及",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "CN_WORD",
          "position" : 0
        },
        {
          "token" : "的",
          "start_offset" : 4,
          "end_offset" : 5,
          "type" : "CN_CHAR",
          "position" : 1
        },
        {
          "token" : "人家",
          "start_offset" : 5,
          "end_offset" : 7,
          "type" : "CN_WORD",
          "position" : 2
        }
      ]
    }
    

    接著來建立一個新的 index 就叫做 test 吧,順便進行一下IK分詞測試看看...
    curl -XPUT http://localhost:9200/test
    
    curl -XPOST http://localhost:9200/test/_mapping -H 'Content-Type:application/json' -d'
    {
            "properties": {
                "content": {
                    "type": "text",
                    "analyzer": "ik_max_word",
                    "search_analyzer": "ik_smart"
                }
            }
    
    }'
    
    curl -XPOST http://localhost:9200/test/_create/1 -H 'Content-Type:application/json' -d'
    {"content":"曾經有一份真摯的感情放在我面前.我沒有珍惜.等到失去的時候才後悔莫及,塵世間最痛苦的事莫過於此.你的劍在我的咽喉上割下去吧!不要再猶豫了!如果上天能夠給我一個再來一次的機會,我一定會對那個女孩子說三個字\"我愛你\",如果非要在這份愛上加一個期限的話,我希望是一萬年。"}'
    
    curl -XPOST /test/_search -H 'Content-Type:application/json' -d'
    {
      "query": {"match": {"content": "如果"}},
      "highlight" : {
        "pre_tags" : ["", ""],
        "post_tags" : ["", ""],
        "fields" : {
          "content" : {}
        }
      }
    }'
    
    結果應該會是如此 :
    {
      "took" : 292,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.40037507,
        "hits" : [
          {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.40037507,
            "_source" : {
              "content" : """曾經有一份真摯的感情放在我面前.我沒有珍惜.等到失去的時候才後悔莫及,塵世間最痛苦的事莫過於此.你的劍在我的咽喉上割下去吧!不要再猶豫了!如果上天能夠給我一個再來一次的機會,我一定會對那個女孩子說三個字"我愛你",如果非要在這份愛上加一個期限的話,我希望是一萬年。"""
            },
            "highlight" : {
              "content" : [
                """<tag1>如果</tag1>上天能夠給我一個再來一次的機會,我一定會對那個女孩子說三個字"我愛你",<tag1>如果</tag1>非要在這份愛上加一個期限的話,我希望是一萬年。"""
              ]
            }
          }
        ]
      }
    }