\documentclass[12pt,reqno]{article}
\usepackage[usenames]{color}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{amscd}
\usepackage[colorlinks=true,
linkcolor=webgreen,
filecolor=webbrown,
citecolor=webgreen]{hyperref}
\definecolor{webgreen}{rgb}{0,.5,0}
\definecolor{webbrown}{rgb}{.6,0,0}
\usepackage{color}
\usepackage{fullpage}
\usepackage{float}
\usepackage{psfig}
\usepackage{graphics,amsmath,amssymb}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{latexsym}
\usepackage{epsf}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{.1in}
\setlength{\evensidemargin}{.1in}
\setlength{\topmargin}{-.5in}
\setlength{\textheight}{8.9in}
\newcommand{\seqnum}[1]{\href{http://www.research.att.com/cgi-bin/access.cgi/as/~njas/sequences/eisA.cgi?Anum=#1}{\underline{#1}}}
\begin{document}
\begin{center}
\epsfxsize=4in
\leavevmode\epsffile{logo129.eps}
\end{center}
\begin{center}
\vskip 1cm{\LARGE\bf
The Minimal Density of a Letter in an \\
\vskip .1in
Infinite Ternary Square-Free Word
is $\frac{883}{3215}$
}
\vskip 1cm
\large
Andrey Khalyavin \\
Mech. \& Math. Department\\
Moscow State University\\
119992 Moscow \\
Russia\\
\href{mailto:halyavin@land.ru}{\tt halyavin@land.ru} \\
\end{center}
\vskip .2 in
\begin{abstract}
The problem of determining the minimal density of a letter in an infinite
ternary square-free word was investigated by
Tarannikov and Ochem.
In this paper we solve this problem, and prove
that the minimal density is equal to $\frac{883}{3215}$.
\end{abstract}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{conjecture}[theorem]{Conjecture}
%\newcommand{\mod}{\mathop{\rm mod}\nolimits}
%\def\squareforqed{\hbox{\rlap{$\sqcap$}$\sqcup$}}
%\def\qed{\ifmmode\squareforqed\else{\unskip\nobreak\hfil
%\penalty50\hskip1em\null\nobreak\hfil\squareforqed
%\parfillskip=0pt\finalhyphendemerits=0\endgraf}\fi}
\section{Introduction}
A word is called {\it square-free} if it cannot be written in the form $axxb$
for two words $a$, $b$ and a nonempty word $x$. In this paper we study the
minimal density of a letter $a$ in an infinite square-free word over the
alphabet $\{a,b,c\}$. The {\it density} of a letter $a$ in a finite word $w$ is
$\frac{n(w)}{|w|}$ where $n(w)$ is the number of letters $a$ in $w$ and
$|w|$ is the length of the word. The {\it density} of a letter $a$ in an
infinite word is $\liminf\limits_{l\rightarrow\infty}\frac{n(w_l)}{|w_l|}$
where $w_l$ is the prefix of the word $w$ of length $l$. Tarannikov
\cite{taran} found the minimal density of a letter $a$ with up to $4$ tight
digits after the decimal point. Ochem \cite{ochem} improved this result,
proving that the minimal density of a letter $a$ belongs to the interval
$[\frac{1000}{3641},\frac{883}{3215}]$. In this article we completely solve
this problem and prove that the minimal density of a letter $a$ is equal to
$\frac{883}{3215}$. We use huge computer calculations in our proof. You can
find sources of the programs at
{\tt http://mech.math.msu.su/department/dm/dmmc/PUBL/words.zip}.
\section{Main result}
Our aim is to prove that the minimal density is at least
$\rho=\frac{883}{3215}$. Let us construct $4$ special words $b_1,\dots,b_4$ of
length $3215$ that contain $883$ letters $a$ each (they are probably the same
words used by Ochem \cite{ochem} in order to construct an infinite word
with density $\rho$). We call these words the {\it basic words}.The files
with these basic words can be found at the URL address above. We proved the
next theorem by a computation using the technique of backtracking.
\begin{theorem}
In any square-free word of length 90000 there exist either
\begin{itemize}\item[1)]a prefix with density greater than or equal
than $\rho$,
\item[2)]two adjacent basic words (a subword $b_ib_j$).
\end{itemize}
\label{sqfree1}
\end{theorem}
So, either an infinite square-free word can be split into words with densities
greater than or equal than $\rho$ or there exist two basic words one after another. In the
first case the density of an infinite word is greater than or equal than $\rho$ because the
lengths of the words in the splitting are bounded by 90000. So we can assume
that an infinite square-free word begins with a word $b_ib_j$.
When our program that proves Theorem \ref{sqfree1} finds two adjacent basic
words in the current word $w$, it saves the pair $(n(w),|w|)$ and backtracks.
Let us denote by $W$ the set of all such current words. Then we delete equal
pairs $(n(w),|w|)$ from the list and obtain $80$ distinct pairs $(n(w),|w|)$.
Let us denote by $L$ the set of these pairs. Then for all $l<200000$ we
calculate the maximal number $p(l)$ such that after the concatenation of any
words $w\in W$ and $v$ such that $n(v)\ge p(l),|v|=l$, the word $vw$ has the
density greater than or equal than $\rho$. So,
$p(l)=\max_{(n(w),|w|)\in L}[\rho(l+|w|)]-n(w)+1$. Our second program proves
the next theorem.
\begin{theorem}
Let $u$ be an infinite square-free word that begins with a word
$b_ib_j$. Then $u$ contains one of the following prefixes:
\begin{itemize}\item[1)] a prefix $w$ such that $2\cdot 3215<|w|<200000$
and $n(w)\ge p(|w|)$,
\item[2)] a prefix $w$ such that $4\cdot 3215\le |w|<200000, n(w)\ge\rho|w|$,
that ends with two adjacent basic words.
\end{itemize}
\label{sqfree2}
\end{theorem}
Now, we are prepared to prove
\begin{theorem}
An infinite square-free word that begins with a word $b_ib_j$ has density not
less than $\rho$.
\label{sqfree3}
\end{theorem}
{\it Proof.}
Let us consider an infinite square-free word $u$ that begins with a word
$b_ib_j$. Let us show that $u$ must either have density greater than or equal than $\rho$ or
start with one of the following prefixes:
\begin{itemize}\item[1)] the prefix $ww_1w_2\dots w_nv$ such that
$|w|<200000,|w_i|<200000,2\cdot3215<|v|<200000,
n(w_i)\ge \rho|w_i|, n(wv)\ge \rho|wv|$, which ends with two adjacent basic
words.
\item[2)] the prefix $w$ such that $4 \cdot 3215\le |w|<200000,
n(w)\ge\rho|w|$, which ends with two adjacent basic words.
\end{itemize}
Let us apply Theorem \ref{sqfree2} to our infinite square-free word. If the
second case of Theorem \ref{sqfree2} holds, we have the second case in
the above sentence. Otherwise we apply Theorem \ref{sqfree1} to the word $u$
without the prefix $w$. We obtain either A) a subword $v$ that ends with two
adjacent basic words or B) a subword $w_1$ with the density greater than or equal than
$\rho$. In the case A) we have the second case in our sentence since
$n(w)\ge p(|w|)$. In the case B) we apply Theorem \ref{sqfree1} to the word $u$
without the prefix $ww_1$, and so on. If the second case of Theorem
\ref{sqfree1} never occurs then $u=ww_1w_2\dots$ where lengths of words $w$ and
$w_i$ are bounded. Therefore the density of $u$ is greater than or equal than $\rho$. If the
second case of Theorem \ref{sqfree1} occurs on the $n$th step, we obtain the
prefix $ww_1\dots w_{n-1}v$ such that $n(w)\ge p(|w|), v\in W$ and so
$n(wv)\ge \rho |wv|$.
Therefore our word $u$ can be split into subwords $w$ and $ww_1w_2\dots w_nv$ with
densities greater than or equal than $\rho$ and bounded lengths of words $w$,$w_i$ and $v$.
Therefore the word $u$ also has the density greater than or equal than $\rho$.\qed
From Theorem \ref{sqfree1} and Theorem \ref{sqfree3} we obtain immediately our
main Theorem:
\begin{theorem}
The density of a letter $a$ in an infinite square-free word is greater than or equal than
$\rho$.
\label{sqfree5}
\end{theorem}
Using the result from Ochem \cite{ochem} and Theorem \ref{sqfree5} we obtain
that the minimal density of an infinite square-free word is equal to $\rho$.
\section{Optimization of calculations}
We have spent about 12 hours of calculations (on a 2GHz CPU) in order to prove
Theorems \ref{sqfree1} and \ref{sqfree2}.
In order to speed up calculations we used a special algorithm for checking the
square-free property.
The program proving the theorems above uses a backtracking algorithm. So we must
check on every step if the current word $S$ ends with a word of the form $ww$.
Although the algorithm described below uses $O(n^2)$ time in the worst case
for checking the square-free property, on average it uses sublinear time.
First, we choose $m$. In our experiments the optimal $m$ was about 100. Then we
calculate the exact and inexact hash
for all positions from the $m$th character to the
end of the current string. Let us denote by $S_i$ the number corresponding to
the $i$th symbol ($S_i=0$ for symbol $a$, $S_i=1$ for symbol $b$, $S_i=2$ for
symbol $c$). Let us define the {\it inexact hash} for the $j$th position as
$h_j=S_j+S_{j-1}p+S_{j-2}p^2+\cdots+S_{j-m+1}p^{m-1} (\mod P)$ where $p=5$ and
$P$ is the size of the hash table (we use $P=100003$ for Theorem \ref{sqfree1}
and $P=200003$ for Theorem \ref{sqfree2}). If there do not exist positions
before the $j$th with the exact hash $h_j$ then we define the exact hash of the
$j$th position equal to the inexact hash. If such positions exist, we check the
last $m$ symbols before the $j$th position. If the last $m$ symbols before the
$j$th position coincide with the last $m$ symbols before $k$th position with
the exact hash $h_j$, we define the exact hash for the $j$th position equal to
$h_j$. In the opposite case we check all positions that have the exact hash
equal to $h_j+1$ and repeat the previous procedure. Then we check positions
with the exact hash equal to $h_j+2$ and so on. We continue to increase the
hash value until we have found a position with the same last $m$ characters
or hash value that does not correspond to any position in the string. We
define the exact hash of the $j$th position equal to the last checked hash
value. In order to check hash values quickly we use the table where for each
hash value the last position with the same exact hash is written. Also for each
position in the string we store the nearest position before it with the same exact
hash. Thus, all positions with the same exact hash are linked to the list.
Hence, the exact hash of the position uniquely
defines the last $m$ symbols before it and
we can quickly update it after deleting or inserting a symbol to the end of the
current string.
In order to check if the current word $S$ ends with a word of the form $ww$, we
check possible lengths (of word $w$) less than $m$ directly. Then we search all
positions with exact hash equal to the exact hash of last position. Then we
compare strings for these positions. We speed up this procedure by $m$ times
using exact hash values.
\section{Acknowledgments}
The author is grateful to his scientific supervisor Prof. Yuriy Tarannikov,
Dr. Roman Kolpakov and the anonymous referee for the attention to this work and
helpful remarks.
\begin{thebibliography}{1}
\bibitem{taran}
Y. Tarannikov. The minimal density of a letter in an infinite ternary
square-free word is 0.2746... J. Integer Sequences 5(2): Article 02.2.2
(2002).
\bibitem{ochem}
P. Ochem. Letter frequency in infinite repetition-free words.
Proceedings of Words 2005, Montreal, Canada, September 13-17 (2005), 335-340.
\end{thebibliography}
\bigskip
\hrule
\bigskip
\noindent 2000 {\it Mathematics Subject Classification}: Primary 11B05.
\noindent \emph{Keywords:} combinatorics on words; square-free word;
factorial languages; minimal density.
\bigskip
\hrule
\bigskip
\noindent (Concerned with sequence \seqnum{A006156}.)
\bigskip
\hrule
\bigskip
\vspace*{+.1in}
\noindent
Received June 19 2006;
revised versions received May 5 2007; June 12 2007.
Published in {\it Journal of Integer Sequences}, June 14 2007.
\bigskip
\hrule
\bigskip
\noindent
Return to
\htmladdnormallink{Journal of Integer Sequences home page}{http://www.cs.uwaterloo.ca/journals/JIS/}.
\vskip .1in
\end{document}