機械学習の組み込み

You are currently viewing the new GraphQL transformer v2 docs Looking for legacy docs?

Amplifyを使用すると、@predictionsディレクティブを使用して、画像内のテキストを識別し、画像にラベルを付け、テキストを翻訳し、テキストから音声を合成できます。

注：@predictionsディレクティブには、amplify add storageで設定されたS3ストレージバケット、またはCDK使用時にpredictionsBucketプロパティを設定する必要があります。

画像内のテキストを識別

画像上のテキスト認識を設定するには、@predictionsディレクティブでidentifyTextアクションを使用します。

type Query {
  recognizeTextFromImage: String @predictions(actions: [identifyText])

GraphQLクエリでは、画像のS3 keyを渡すことができます。現在、このディレクティブはS3バケットのpublic/フォルダ内にあるオブジェクトでのみ機能します。public/プレフィックスは自動的にkey入力に追加されます。たとえば、以下の例ではpublic/myimage.jpgが入力として使用されます。

query RecognizeTextFromImage($input: RecognizeTextFromImageInput!) {
  recognizeTextFromImage(input: { identifyText: { key: "myimage.jpg" } })

画像上のラベルを識別

画像上のラベル認識を設定するには、@predictionsディレクティブでidentifyLabelsアクションを使用します。

type Query {
  recognizeLabelsFromImage: [String] @predictions(actions: [identifyLabels])

以下のクエリは、識別されたラベルのリストを返します。サポートされているラベルの完全なリストについては、Amazon Rekognitionドキュメントのラベルの検出を参照してください。

query RecognizeLabelsFromImage($input: RecognizeLabelsFromImageInput!) {
  recognizeLabelsFromImage(input: { identifyLabels: { key: "myimage.jpg" } })

テキストの翻訳

テキスト翻訳を設定するには、@predictionsディレクティブでidentifyLabelsアクションを使用します。

type Query {
  translate: String @predictions(actions: [translateText])

以下のクエリは翻訳された文字列を返します。sourceLanguageとtargetLanguageパラメータに、サポートされている言語コードのいずれかを入力してください。翻訳するテキストをtextパラメータで渡してください。

query TranslateText($input: TranslateTextInput!) {
  translate(
    input: {
      translateText: {
        sourceLanguage: "en"
        targetLanguage: "de"
        text: "Translate me"

テキストから音声を合成

テキスト音声変換の合成を設定するには、@predictionsディレクティブでconvertTextToSpeechアクションを使用します。

type Query {
  textToSpeech: String @predictions(actions: [convertTextToSpeech])

以下のクエリは、合成された音声を含む署名付きURLを返します。voiceIDパラメータに、サポートされている音声IDのいずれかを入力してください。合成するテキストをtextパラメータで渡してください。

query ConvertTextToSpeech($input: ConvertTextToSpeechInput!) {
  textToSpeech(
    input: {
      convertTextToSpeech: {
        voiceID: "Nicole"
        text: "Hello from AWS Amplify!"

Predictionsアクションの組み合わせ

複数のPredictionsアクションを組み合わせてシーケンスにすることもできます。次のアクションシーケンスがサポートされています：

identifyText -> translateText -> convertTextToSpeech
identifyLabels -> translateText -> convertTextToSpeech
translateText -> convertTextToSpeech

以下の例では、speakTranslatedImageTextは画像からテキストを識別し、それを別の言語に翻訳してから、翻訳されたテキストを音声に変換します。

type Query {
  speakTranslatedImageText: String
    @predictions(actions: [identifyText, translateText, convertTextToSpeech])

そのクエリの例は次のようになります：

query SpeakTranslatedImageText($input: SpeakTranslatedImageTextInput!) {
  speakTranslatedImageText(
    input: {
      identifyText: { key: "myimage.jpg" }
      translateText: { sourceLanguage: "en", targetLanguage: "es" }
      convertTextToSpeech: { voiceID: "Conchita" }

JSライブラリを使用したコード例を以下に示します：

import React, { useState } from 'react';
import { Amplify } from 'aws-amplify';
import { uploadData, getUrl } from 'aws-amplify/storage';
import { generateClient } from 'aws-amplify/api';
import config from './amplifyconfiguration.json';
import { speakTranslatedImageText } from './graphql/queries';
/* Configure Exports */
Amplify.configure(config);
const client = generateClient();
function SpeakTranslatedImage() {
  const [src, setSrc] = useState('');
  const [img, setImg] = useState('');
  function putS3Image(event) {
    const file = event.target.files[0];
    uploadData({
      key: file.name,
      data: file
      .result.then(async (result) => {
        setSrc(await speakTranslatedImageTextOP(result.key));
        setImg((await getUrl({ key: result.key })).url.toString());
      .catch((err) => console.log(err));
    <div className="Text">
      <div>
        <h3>Upload Image</h3>
        <input
          type="file"
          accept="image/jpeg"
          onChange={(event) => {
            putS3Image(event);
        <br />
        {img && <img src={img}></img>}
        {src && (
          <div>
            <audio id="audioPlayback" controls>
              <source id="audioSource" type="audio/mp3" src={src} />
            </audio>
          </div>
      </div>
async function speakTranslatedImageTextOP(key) {
  const inputObj = {
    translateText: {
      sourceLanguage: 'en',
      targetLanguage: 'es'
    identifyText: { key },
    convertTextToSpeech: { voiceID: 'Conchita' }
  const response = await client.graphql({
    query: speakTranslatedImageText,
    variables: { input: inputObj }
  return response.data.speakTranslatedImageText;
function App() {
    <div className="App">
      <h1>Speak Translated Image</h1>
      <SpeakTranslatedImage />
export default App;

仕組み

@predictionsディレクティブの定義：

directive @predictions(actions: [PredictionsActions!]!) on FIELD_DEFINITION
enum PredictionsActions {
  identifyText # uses Amazon Rekognition to detect text
  identifyLabels # uses Amazon Rekognition to detect labels
  convertTextToSpeech # uses Amazon Polly in a lambda to output a presigned url to synthesized speech
  translateText # uses Amazon Translate to translate text from source to target language

@predictionsディレクティブは、AppSync APIがデプロイされているリージョンでサポートする必要がある以下のAWSサービスに依存しています：

identifyTextはAmazon Rekognitionを使用
identifyLabelsはAmazon Rekognitionを使用
convertTextToSpeechはAmazon Pollyを使用
translateTextはAmazon Translateを使用

@predictionsは、Amazon Rekognition、Translate、Pollyと通信するためのリソースを作成します。各アクションについて、以下が作成されます：

各サービスのIAMポリシー（例：Amazon Rekognition detectTextポリシー）
AppSync VTL関数
AppSyncデータソース

最後に、クエリまたはフィールドのパイプラインリゾルバーが作成されます。パイプラインリゾルバーは、ディレクティブで提供されたアクションリストによって定義されるAppSync関数で構成されます。

JavaScript、Android、Swift、Flutter クライアントコード生成

GraphQL スキーマの進化